-
Notifications
You must be signed in to change notification settings - Fork 65
Adding FIELDNAME field to NXobject #1570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR, if merged, will have far reaching consequences for the NeXus standard. A few initial thoughts:
The reason why currently NXobject does neither contain a nx:fieldType FIELDNAME nor a nx:groupType GROUPNAME is because it so far assumed that as NXobject is formulated in NXDL i.e. uses the grammar defined in nxdl.xsd you can add always nx:fieldType, nx:groupType, nx:attributeType instances to any base class. If that were not possible how at all would all field and groups and attributes in NeXus connect, see also that before the Autumn Code Camp 2024 edits, NXobject had any content.
The reason why instances using the semantically clarified nameType="partial" were added, like concepts following the pattern UPPERCASE_lowercase (e.g. FIELDNAME_set) was to reserve these concept names.
If FIELDNAME were to be added like here proposed every instance in every base class will in almost all cases be a specialization because since the Autumn Code Camp 2024 there is inheritance for base classes. All base classes inherit from NXobject means all concepts that are nx:fieldType instances will thus be connected to the here proposed FIELDNAME, so this PR is at least incomplete in the sense that one needs to carefully think about nx:groupType GROUPNAME as well as nx:attributeType ATTRIBUTENAME.
The idea of adding nx:fieldType FIELDNAME like here proposed is sound.
FIELDNAME has no "type" XML parameter. This is substantiated but again the documentation needs to be carefully investigated and discussed surplus clarification is required if "type" and other XML parameter like "dimensions" serve to impose a restriction on the allowed type and dimensions, like nx:fieldType value with type="NX_NUMBER" can be a scalar or tensor but if "dimensions" is added the shape is restricted and eventually a scalar no longer allowed. Are we 100% sure that afore-mentioned cases of an absence of the XML parameter "type" there is no default type = NX_CHAR kicking in, which would cause that each field in every base class that is not of type NX_CHAR will also automatically specialize.
When almost everything is specialized I think this reads weird, as special cases should not be the majority IMHO.
The intention of adding the identifier in this PR here are noble though demand edits in IMHO though.
Indeed, identifierNAME exists but strictly offers an identifier only for the parent group; not for fields, groups under NXobject will automatically inherit again via a chain from NXobject so also for these there is again an identifierNAME, nx:attributeType ATTRIBUTENAME though will not.
What indeed will have beneficial and far reaching consequences also is - when this PR is edited such that an nx:attributeType e.g. identifierNAME is added to each instance of FIELDNAME. This could then enable that each and every concept in NeXus gets equipped with the here envisaged possibility to assign a GUPRI (globally unique persistent resource identifier).
There is a catch though: At least in the HDF5 data model there are no attributes of an attribute. NeXus is not HDF5 alone though, so for other serialization formats such sub-graph of attributes could again be possible but not for the currently most frequently used serialization container format HDF5. Anyway, exactly because of that attribute of an attribute not existent, identifierNAME was designed as a field with attributes that detail the types of attributes. Also the concept was name identifierNAME to enable dropping NAME (see that nameType=partial) so each group can have again an identifier.
This was a achievement is useful for research data management where NeXus cuz of which FAIRmat proposed it:
Thereby, an instance of a nx:groupType NXsample will get thereby an identifier for the same "for free" i.e implicitly.
What one could indeed think about is if one adds {FIELDNAME || GROUPNAME}_thinkaboutseveralones, e.g. FIELDNAME_gupri this would achieve what is desired - with far reaching consequences.
Each field can then be connected to a persistent identifier or one could similarly point to other semantic concepts.
Thinking, though, then about the qualifiers in SSSOM that tell how precise each equivalent, narrow, or broad the precise the concept resolved by gupri matches to FIELDNAME in the NeXus realm. Now all these qualifiers would blow up NXobject but again with for the benefit exactly like it was already exercised for FIELDNAME_set, FIELDNAME_errors would allow to inject further connections with semantic artifacts that proliferate through as said above.
FYI: @lukaspie (here has completely by chance and magic timing already one such PR been proposed that raises exactly the technical questions that can be beneficial for NeXus users that seek to connect to concepts from "other ontologies"), @sanbrock
|
@mkuehbach Many thanks for the feedback! My intention was indeed to establish a connection to other knowledge representations mainly for terms that are not defined in NeXus, as explained in #1569. Since I was reading the newly developed approach of identifiers in NXobject, I thought this would be an appropriate way. Even if this PR is the wrong approach, I think, it would be very useful for the NeXus standard to address the questions that you raised: how to deal with identifiers with respect to all NeXus objects (groups, fields, attributes) in a consistent way. Of course, there is a limit of HDF5 due to the fact that there are no attributes of attributes. This I tried to solve with the _identifier suffix in @units_identifier, and, if required, could be a general attribute @ATTRIBUTENAME_identifier. The FIELDNAME field could also help to define the units attribute with NXDL. So far, the units attribute is only defined in a paragraph of the manual.
Maybe you can leave the type explicitly undefined, e.g. an NX_ANYTYPE that is something similar to None in Python. However, predefined NeXus fields would still be specializations as their type will override the NX_ANYTYPE.
I have chosen the attribute to make the connection of identifier to the field it defines as close (and unambigiuous) as possible. However, a separate field would also be possible. |
|
To record comments from the meeting on July 16:
|
|
A few more thoughts here also relevant for this issue: #1573:
Possibly, instead of the literal identifier_value one might even wish to qualify the extend of semantic mapping similarity,
To make the concept names less verbose one could abbreviate On the distinction T-box and A-box, i.e. the instance MySampleA and sample/@NX_class = NXsample, serving as its proxy suggests that one for our example fieldType Regardless what will be extended here, a best practice guidance in the manual is required to guide users as to how these different semantic strengthening and cross-referencing options can be used for groups, fields, and attributes. One more note on enums, also here already the NXDL.xml could store instead of strings "thermionic" PIDs. Will be a discussion likely though as this makes the standard dependant on external definitions. Alternatively, ontology mapping documents could be shipped with NeXus to map e.g. all values in all concepts that are enums of NeXus to exemplar ontologies the NeXusOntology can assist with that but it clearly it adds an additional level of new reposibilities. |
|
Thanks for the comments!
Yes, thanks - I was not aware of that but I fully agree: a type is required.
Why not applying this solution for field names and groups as well (without the prefix as Concerning the data model of NeXus, the current approach would handle identifiers differently for different types: the identifier of a group would become a child (of fieldType) of this group, those of fieldTypes would become siblings on the same level of hierarchy, and attribute identifiers flattens identifiers_value and identifier_types as two siblings. Of course this would work and would help a lot to annotate data precisely. However, this would make the implementation more complex (less intuitive) and could introduce some ambiguity since one has to look at different levels depending on the type that is annotated.
I like this approach a lot, maybe combine it with an identifier by adding an additional |
|
AreaB TF 2025/11/04 Identifier is a reserved prefix already |
|
In that same telco: @g-guenther concerns raised:
|
Adds the field 'FIELDNAME' with attributes for data_type, identifier, and units_identifier to NXobject base class to improve machine-readability of NeXus terms and non-NeXus terms, solving issue #1569. Might also be related to #1335, #1398, and #1440.