Skip to content

Conversation

@g-guenther
Copy link

@g-guenther g-guenther commented Jun 26, 2025

Adds the field 'FIELDNAME' with attributes for data_type, identifier, and units_identifier to NXobject base class to improve machine-readability of NeXus terms and non-NeXus terms, solving issue #1569. Might also be related to #1335, #1398, and #1440.

@g-guenther g-guenther marked this pull request as draft June 26, 2025 11:39
@g-guenther g-guenther marked this pull request as draft June 26, 2025 11:39
@g-guenther g-guenther marked this pull request as ready for review June 26, 2025 11:40
Copy link
Contributor

@mkuehbach mkuehbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR, if merged, will have far reaching consequences for the NeXus standard. A few initial thoughts:

The reason why currently NXobject does neither contain a nx:fieldType FIELDNAME nor a nx:groupType GROUPNAME is because it so far assumed that as NXobject is formulated in NXDL i.e. uses the grammar defined in nxdl.xsd you can add always nx:fieldType, nx:groupType, nx:attributeType instances to any base class. If that were not possible how at all would all field and groups and attributes in NeXus connect, see also that before the Autumn Code Camp 2024 edits, NXobject had any content.

The reason why instances using the semantically clarified nameType="partial" were added, like concepts following the pattern UPPERCASE_lowercase (e.g. FIELDNAME_set) was to reserve these concept names.

If FIELDNAME were to be added like here proposed every instance in every base class will in almost all cases be a specialization because since the Autumn Code Camp 2024 there is inheritance for base classes. All base classes inherit from NXobject means all concepts that are nx:fieldType instances will thus be connected to the here proposed FIELDNAME, so this PR is at least incomplete in the sense that one needs to carefully think about nx:groupType GROUPNAME as well as nx:attributeType ATTRIBUTENAME.

The idea of adding nx:fieldType FIELDNAME like here proposed is sound.

FIELDNAME has no "type" XML parameter. This is substantiated but again the documentation needs to be carefully investigated and discussed surplus clarification is required if "type" and other XML parameter like "dimensions" serve to impose a restriction on the allowed type and dimensions, like nx:fieldType value with type="NX_NUMBER" can be a scalar or tensor but if "dimensions" is added the shape is restricted and eventually a scalar no longer allowed. Are we 100% sure that afore-mentioned cases of an absence of the XML parameter "type" there is no default type = NX_CHAR kicking in, which would cause that each field in every base class that is not of type NX_CHAR will also automatically specialize.

When almost everything is specialized I think this reads weird, as special cases should not be the majority IMHO.

The intention of adding the identifier in this PR here are noble though demand edits in IMHO though.

Indeed, identifierNAME exists but strictly offers an identifier only for the parent group; not for fields, groups under NXobject will automatically inherit again via a chain from NXobject so also for these there is again an identifierNAME, nx:attributeType ATTRIBUTENAME though will not.

What indeed will have beneficial and far reaching consequences also is - when this PR is edited such that an nx:attributeType e.g. identifierNAME is added to each instance of FIELDNAME. This could then enable that each and every concept in NeXus gets equipped with the here envisaged possibility to assign a GUPRI (globally unique persistent resource identifier).

There is a catch though: At least in the HDF5 data model there are no attributes of an attribute. NeXus is not HDF5 alone though, so for other serialization formats such sub-graph of attributes could again be possible but not for the currently most frequently used serialization container format HDF5. Anyway, exactly because of that attribute of an attribute not existent, identifierNAME was designed as a field with attributes that detail the types of attributes. Also the concept was name identifierNAME to enable dropping NAME (see that nameType=partial) so each group can have again an identifier.
This was a achievement is useful for research data management where NeXus cuz of which FAIRmat proposed it:
Thereby, an instance of a nx:groupType NXsample will get thereby an identifier for the same "for free" i.e implicitly.

What one could indeed think about is if one adds {FIELDNAME || GROUPNAME}_thinkaboutseveralones, e.g. FIELDNAME_gupri this would achieve what is desired - with far reaching consequences.
Each field can then be connected to a persistent identifier or one could similarly point to other semantic concepts.

Thinking, though, then about the qualifiers in SSSOM that tell how precise each equivalent, narrow, or broad the precise the concept resolved by gupri matches to FIELDNAME in the NeXus realm. Now all these qualifiers would blow up NXobject but again with for the benefit exactly like it was already exercised for FIELDNAME_set, FIELDNAME_errors would allow to inject further connections with semantic artifacts that proliferate through as said above.

FYI: @lukaspie (here has completely by chance and magic timing already one such PR been proposed that raises exactly the technical questions that can be beneficial for NeXus users that seek to connect to concepts from "other ontologies"), @sanbrock

@g-guenther
Copy link
Author

@mkuehbach Many thanks for the feedback! My intention was indeed to establish a connection to other knowledge representations mainly for terms that are not defined in NeXus, as explained in #1569. Since I was reading the newly developed approach of identifiers in NXobject, I thought this would be an appropriate way.

Even if this PR is the wrong approach, I think, it would be very useful for the NeXus standard to address the questions that you raised: how to deal with identifiers with respect to all NeXus objects (groups, fields, attributes) in a consistent way. Of course, there is a limit of HDF5 due to the fact that there are no attributes of attributes. This I tried to solve with the _identifier suffix in @units_identifier, and, if required, could be a general attribute @ATTRIBUTENAME_identifier.

The FIELDNAME field could also help to define the units attribute with NXDL. So far, the units attribute is only defined in a paragraph of the manual.

Are we 100% sure that afore-mentioned cases of an absence of the XML parameter "type" there is no default type = NX_CHAR kicking in, which would cause that each field in every base class that is not of type NX_CHAR will also automatically specialize.

Maybe you can leave the type explicitly undefined, e.g. an NX_ANYTYPE that is something similar to None in Python. However, predefined NeXus fields would still be specializations as their type will override the NX_ANYTYPE.

What one could indeed think about is if one adds {FIELDNAME || GROUPNAME}_thinkaboutseveralones, e.g. FIELDNAME_gupri this would achieve what is desired - with far reaching consequences. Each field can then be connected to a persistent identifier or one could similarly point to other semantic concepts.

I have chosen the attribute to make the connection of identifier to the field it defines as close (and unambigiuous) as possible. However, a separate field would also be possible.

@g-guenther
Copy link
Author

g-guenther commented Jul 24, 2025

To record comments from the meeting on July 16:

@mkuehbach
Copy link
Contributor

mkuehbach commented Oct 20, 2025

A few more thoughts here also relevant for this issue: #1573:

  • We should distinguish identifiers for semantic concepts (T-box) from their instances (A-box) both profit from a clean handling of identifiers. An example --- NXsample is essentially a class used to model a concept for which there be instance data. Having an identifier for NXsample can be useful for stating that this concept has some relation to another semantic concept (aka the concept for a sample in another ontology). Applying for all instances implicitly. The value for identifier could then be the resolvable PID e.g. purl of that other ontology's concept. In contrast, the real physical object call it e.g., MySampleA can use identifier for resolving an id in a given namespace be this a sample registered in some global sample registry or some lab-local database.

  • We observe identifier may specify a name and resolver service, typically via the first part of an URI whereby the identifier is resolvable and the actual identifier unique ID part.

  • The current choice with identifierNAME exemplifies there are cases where the first part of an URI is provided, like purl but there are also cases it isn't like ISSN-L or ISBN. Exactly this second case is why an attribute type is currently required for identifierNAME because if just the ID part is given it is tricky to know the namespace. (Is it some local thingi, or an ISSN-L.?)

  • However, that demand for having two parts --- namespace (explicitly) and ID part --- make it tricky to extend identifiers for NeXus fieldType and especially attributeType instances. Specifically for the example of the HDF5 data model these attributes do not have childs.

  • So compression of both parts into one value to avoid the need for a type attribute is problematic.

  • One could use reserved suffixes to resolve the situation. An example for a fieldType temperature_value, temperature_unit, and then have attributeTypes for each, most importantly for _unit to state that this unit is e.g., equivalent to a specific unit mentioned in e.g., qudt ontology.
    The reason why we this _value, _unit approach is poor though is that in HDF5 the best practice recommendation is that units be attributes of fields and we would typically double the number of fields if making current attributes to units. All fine but the downside is now that we cannot make type an attribute of identifierNAME.
    Exactly this nesting would be good though to have so that also NeXus attributeTypes can have an identifierNAME.

  • Now fieldTypes that are enums: emitter_type = "thermionic", I see value in that instead of a string one could also give a PID to a concept that is defined in another ontology specifying sth equivalent to a thermionic emitter.

  • Let's thing about moving all essence of identifierNAME to attributes, spelling it out:

  • For groupType:

sample/@NX_class = "NXsample"
sample/@PREFIXidentifier_value = "purl ..."  (maybe drop the _value?)
sample/@PREFIXidentifier_type = "purl ..."
  • For fieldType:
sample/volume = "1."
sample/volume/@PREFIXidentifier_value = "https://w3id.org/emmo#EMMO_f1a51559_aa3d_43a0_9327_918039f0dfed"

Possibly, instead of the literal identifier_value one might even wish to qualify the extend of semantic mapping similarity,
e.g. `sample/volume/@is_equivalent_to = "https://w3id.org/emmo#EMMO_f1a51559_aa3d_43a0_9327_918039f0dfed"

  • For attributeType:
sample/volume/@units = "m^3"
sample/volume/@units_identifier_value = "[qudt ...](https://qudt.org/vocab/unit/M)" 
# sample/@units_identifier_type = "qudt"  # optional because identifier_value is descriptive enough to be auto-resolvable given there is an URI with namespace part

To make the concept names less verbose one could abbreviate identifier with id.
This proposal can deal with the deficiency of data models like HDF5 to add childs to attributes .

On the distinction T-box and A-box, i.e. the instance MySampleA and sample/@NX_class = NXsample, serving as its proxy suggests that one for our example fieldType

sample/name = "resolvable name in the labs database"   # should resolve the A-box
sample/name/@PREFIXidentifier_value = "..."  # should resolve the T-box

Regardless what will be extended here, a best practice guidance in the manual is required to guide users as to how these different semantic strengthening and cross-referencing options can be used for groups, fields, and attributes.

One more note on enums, also here already the NXDL.xml could store instead of strings "thermionic" PIDs. Will be a discussion likely though as this makes the standard dependant on external definitions. Alternatively, ontology mapping documents could be shipped with NeXus to map e.g. all values in all concepts that are enums of NeXus to exemplar ontologies the NeXusOntology can assist with that but it clearly it adds an additional level of new reposibilities.
Like whose job is it to assure that if "thermionic" is not defined as a classical docstring of the fieldType like it is current practice that the semantics mapping are kept up-to-date to avoid that enum values like e.g, "thermionic" remain undefined.

@g-guenther
Copy link
Author

g-guenther commented Oct 29, 2025

Thanks for the comments!

there are also cases it isn't like ISSN-L or ISBN. Exactly this second case is why an attribute type is currently required

Yes, thanks - I was not aware of that but I fully agree: a type is required.

sample/volume/@units = "m^3"
sample/volume/@units_identifier_value = "[qudt ...](https://qudt.org/vocab/unit/M)" 
sample[/volume]/@units_identifier_type = "qudt" 

Why not applying this solution for field names and groups as well (without the prefix as @identifier_value and @identifier_type? This would (i) streamline the implementation, (ii) would be closer to NeXus data model in which data is arranged the closer to each other, the more related they are, and (iii) the file would look more cleaner, since identifiers are hidden as attributes and only visible when inspecting a field or group more closely.

Concerning the data model of NeXus, the current approach would handle identifiers differently for different types: the identifier of a group would become a child (of fieldType) of this group, those of fieldTypes would become siblings on the same level of hierarchy, and attribute identifiers flattens identifiers_value and identifier_types as two siblings. Of course this would work and would help a lot to annotate data precisely. However, this would make the implementation more complex (less intuitive) and could introduce some ambiguity since one has to look at different levels depending on the type that is annotated.

Possibly, instead of the literal identifier_value one might even wish to qualify the extend of semantic mapping similarity,
e.g. `sample/volume/@is_equivalent_to = "https://w3id.org/emmo#EMMO_f1a51559_aa3d_43a0_9327_918039f0dfed"

I like this approach a lot, maybe combine it with an identifier by adding an additional sample/volume/@identifier_relation = "sameAs"? This would allow to create qualified references (using the terminology of the FAIR Data Maturity Model).

@mkuehbach
Copy link
Contributor

mkuehbach commented Nov 4, 2025

AreaB TF 2025/11/04

Identifier is a reserved prefix already

identifierFIELDNAME
identifierFIELDNAMEtype
identifierFIELDNAMEisa
identifierATTRIBUTENAME
identifierATTRIBUTENAMEtype
identifierATTRIBUTENAMEisa

@mkuehbach
Copy link
Contributor

mkuehbach commented Nov 4, 2025

In that same telco:

@g-guenther concerns raised:

  • Forcing a storage where identifiers are stored in different relations --- direct child or sibling mights depending on if group, field, or attribute (GFA) --- might be confusing, flattening to the rescue?

@rettigl:

  • Why to annotate individual GFA nodes at all with such identifier
  • Definitions in NeXus are placed in the docstring of the concept

@mkuehbach:

  • But annotate where?

@lukaspie:

  • Does NeXus allow an extension of its concept set via other e.g. ontologies

@g-guenther:

  • Strategy: avoid ever increasing copies of concepts into NeXus but allowing a linking to these concepts described elsewhere provided these definitions elsewhere are persistent.

@paulmillar:

  • Observed two discussion threads:
    i) text fields, i.e., NeXus docstrings are used for pinning semantics
    behind a concept, punctuation changes, PID valuable to use instead for this
    ii) building the ontologies and connecting to existent body of work,
    suggestion to please not mix these up
  • Pin identifiers to concepts in other e.g. ontologies e.g. by enums for the concepts that you wish to add to NeXus via application definitions; application definitions are the way to extend base classes beyond their set of concepts that for base classes should have broad and wide usability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants