Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How best to represent DICOM lists with missing elements, in FHIR RDF? #141

Closed
dbooth-boston opened this issue Apr 29, 2024 · 15 comments
Closed

Comments

@dbooth-boston
Copy link
Contributor

Copying this issue description from Erich Bremer's email.

From: Erich Bremer
Date: Mon, 25 Mar 2024 11:22:24 -0400

Let me re-state the two problems that I feel need to be resolved as far as
I see in having a DICOM RDF for those not part of the original conversation.

For reference, here is a snippet of DICOM compliant JSON:
{
"00020002": { "vr": "UI", "Value": [ "1.2.840.10008.5.1.4.1.1.12.1"]},
"00020003": {"vr": "UI", "Value":
["1.3.12.2.1107.5.4.3.321890.19960124.162922.29"]},
"00020010": { "vr": "UI", "Value": ["1.2.840.10008.1.2.4.50"]},
"00020012": { "vr": "DS", "Value": [ "999.999"]}, ...

  1. DICOM requires the "vr" property:
    https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_F.2.3.html
  2. DICOM handle value multiplicity by putting all values in ordered arrays:
    https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_F.2.4.html
  3. If an attribute is present but the missing (value length is 0), DICOM
    says leave off the "value" property but the rest must be there.
    https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_F.2.5.html
  4. For null entries in the arrays, DICOM says put null for that list
    element as the position itself may/may not be important.

Now to RDF ******
a) It's easy to handle 1+2+3 with RDF using a blank nodes and RDF Lists:

[ dcm:00020002 [ dcm:vr: "UI", dcm:Value ("1.2.840.10008.5.1.4.1.1.12.1") ];
dcm:00020003 [ dcm:vr "UI", dcm:Value
("1.3.12.2.1107.5.4.3.321890.19960124.162922.29")];
dcm:00020010 [ dcm:vr: "UI", dcm:Value ("1.2.840.10008.1.2.4.50")];
dcm:00020012 [ dcm:vr: "DS", dcm:Value ( "999.999")]; ...

The problem arises with 4 - nulls

For 3, we leave off the dcm:Value triple as "null maps to no triple". The
problem is in the RDF Lists.

( "1" "2" "3") is short-hand for:
_:myList rdf:first "1" ;
rdf:rest [ rdf:first "2" ;
rdf:rest [ rdf:first "3" ;
rdf:rest rdf:nil
]
] .

Following "no triple asserted is null", if I wanted to leave the second
element out of the list, I would just remove rdf:first "2". No triplestore
that I know would complain. I can SPARQL using the long list version and I
can write the SPARQL with an optional {_:SecondPosition rdf:first ?second }
or even a minus {}. In this fashion, the positional information is
preserved as needed by elements like
https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.11.7.html#table_C.8-74f
or pixel spacing
https://dicom.innolitics.com/ciods/rt-dose/image-plane/00280030 and would
return an unbound value depending on the data.

This path falls apart as there is no support (an implicit rdf:first must
always be present) for the missing rdf:first triple in the second position
for the shorthand ( "1" "2" "3"). I can put a variable ( "1" ?second "3",
but I cannot say the equivalent of ( "1" optional { ?second} "3") or even
( "1" minus { ?second} "3") JSON-LD will simply remove the second element
if I say ( "1" null "3") with the thought that null is no triple
asserted. But this stance is not the same as removing the rdf:first
triple, it's removing multiple triples [ rdf:first "2" ; rdf:rest[] ] and
then pointing _:first to _:third which takes out the positional information
and changing the meaning of things as DICOM views their data. RDF List is
a container construct with triples that express the various positions and
relations and the associated positional values. What would be wrong (who
would be put out) if we allow something that is already allowable in the
RDF model? Honestly, I feel like RDF is not following its own rules here
and needs to be fixed even with DICOM out of the picture. The same, I
feel, applies to RDF Sequence containers:

ex:mySequence
rdf:type rdf:Seq ;
rdf:_1 "1" ;
rdf:_2 "2" ;
rdf:_3 "3" .

If I omitted rdf:_2, it should just now be:
ex:mySequence
rdf:type rdf:Seq ;
rdf:_1 "1" ;
rdf:_3 "3" .

If the current logic of RDF lists is applied it would become the below
which is not the same:
ex:mySequence
rdf:type rdf:Seq ;
rdf:_1 "1" ;
rdf:_2 "3" .

Keeping the positional triples in RDF Lists and sequences seems to be an
easier fix than introducing some type of null literal or typed
"null"^^xsd:integer but it seems to be an implicit taboo due to the lack of
the support in the syntactic sugar for lists and how JSON-LD compaction
behaves with nulls.
I appreciate and I think I understand the thinking of the elements as part
of the schema, but it seems to be heading to more complex territory. RDF
would allow us to deviate further from the DICOM JSON through using things
like custom data types to reduce the number of triples:
[ dcm:00020002 ("1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI);
dcm:00020003 ("1.3.12.2.1107.5.4.3.321890.19960124.162922.29" ^^dcm:UI );
dcm:00020010 ("1.2.840.10008.1.2.4.50" ^^dcm:UI );
dcm:00020012 ( "999.999"^^dcm:DS); ...
and even remove lists where DICOM value multiplicity is always 1 but it
makes things a bit more complicated. I think a DICOM RDF needs to be very
much bi-directional and keeping towards the DICOM JSON modeling makes it
more familiar for the people in the DICOM domain. It reduces the tooling
on both sides. Nothing stops a RDF person from making SPARQL update
transforms to mutate the data back and forth to a different design (perhaps
more performant) but I fear great becomes the enemy of the good. "a little
semantics goes a long way" - Erich

@dbooth-boston
Copy link
Contributor Author

dbooth-boston commented May 1, 2024

To help drive discussion, I'm listing here some options, with pros/cons. Are there others I should add?

Option 1: Omit rdf:first elements from an RDF list ladder

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

       [ rdf:first "bar" ;
         rdf:rest [  # Note no rdf:first here
                    rdf:rest [ rdf:first "foo" ;
                               rdf:rest rdf:nil
                             ]
                  ] .
       ]

PROS:

  • Compact

CONS:

  • Cannot use concise Turtle list notation ( "bar" ____ "foo" )
  • Some would consider this a malformed RDF list.

Option 2: Use an RDF Sequence with a missing qlement

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

    rdf:type rdf:Seq ;
    rdf:_1 "1" ;
    ###  Note no rdf:_2 triple here
    rdf:_3 "3" .

PROS:

  • Compact

CONS:

  • Cannot use concise Turtle list notation ( "bar" ____ "foo" )
  • RDF sequences do not seem to be well accepted by the community

Option 3: Use a distinguished fhir:null value to represent null.

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

( "bar" fhir:null "foo" )

PROS:

  • Compact
  • Concise Turtle list syntax can be used.

CONS:

  • If _myList is expected to be all integers or floats, having a fhir:null value in the list might cause processing difficulties.

Option 4: Use explicit fhir:indexes , like in FHIR RDF R4

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

... [  [ fhir:v "bar ; fhir:index 0 ] ;
       ### Note no element with fhir:index 1
       [ fhir:v "foo ; fhir:index 2 ] ;

PROS:

  • Easy to query all elements and their indexes

CONS:

  • Less concise
  • No syntactic sugar for it

@JervenBolleman
Copy link

JervenBolleman commented May 1, 2024

Option 5: Use a blank node that is a of fhir:null value to represent null.

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

( "bar" [ a fhir:null ] "foo" )

PROS:

Compact
Concise Turtle list syntax can be used.
Allow nothing why the thing is null.

CONS:

If _myList is expected to be all integers or floats, having a blank node fhir:null value in the list might cause processing difficulties.

Option 5 example [edit by ericP]

<>
	dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a fhir:null ] "SINGLE A" )] ;
	dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .

Option 5 expanded with type hierarchy for null flavors [edit by ericP]

PREFIX dinull: <https://dicom.nema.org/MEDICAL/Dicom/current/output/chtml/part20/sect_5.3.2.html>
dinull:UNK rdfs:label "Unknown. A proper value is applicable, but is not known." .
dinull:ASKU rdfs:label "Asked, but not known. Information was sought, but not found (e.g., the patient was asked but did not know)." ;
  rdfs:subClassOf dinull:UNK .

<>
	dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a dcm:UNK ] "SINGLE A" )] ;
	dcm:00012345 [ dcm:vr "CS"; dcm:Value ( 1 2 [ a dcm:UNK ] 4 [ a dcm:ASKU ] 6 )] ;
	dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .

Option 5a example [edit by DBooth]

<>
	dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a dicom:null ] "SINGLE A" )] ;
	dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .

@ebremer
Copy link

ebremer commented May 2, 2024

Perhaps something more generic than fhir:null like "rdf:null" as the issue seems to be a general RDF List issue than a FHIR issue?

@JervenBolleman
Copy link

Perhaps something more generic than fhir:null like "rdf:null" as the issue seems to be a general RDF List issue than a FHIR issue?

I don't think so. The question is what is the meaning of null in dicom and is that meaning consistent. I think here in the lists a null value is an existential variable. A variable that might be known in a different graph and/or inferable. So a blank node is the way to go.

The null issue outside of lists of values is larger.

  • dicom:null is a value is missing but could be known somewhere
  • dicom:null is that there is no value (owl:nothing)
  • dicom:null is a mix of the above
  • dicom:null can be subclassed in a value missing and a reason it was missing in a context .
    also is dicom:null equals to dicom:null or should the logic be like float:Nan as they are not equals. Which would be another reason to go for option 5.

I think looking at dicom null flavours we have a mix here. dicom:NA is an owl:Nothing like value ( Which IMHO means that triple should not be there at all)

If we are talking in part about null flavors we can do the Option 5 and capture this and allow for reasoning and graph merging to resolve this.

@ebremer
Copy link

ebremer commented May 2, 2024

Option 6: Just use a plain blank node

This DICOM JSON:

"Value": [ 1, null, 2 ]

would be this in Turtle:

( 1 [] 2 )

Pros

Simple.
Concise Turtle list syntax can be used.
Usable outside the world of DICOM/FHIR

Cons

JSON-LD compaction will change all literals in the list to blank nodes (with their values and such) when the blank node(s) are detected

@ebremer
Copy link

ebremer commented May 2, 2024

@JervenBolleman we discussed the DICOM Null flavors during the FHIR RDF call today. @ericprud had found and shared that link the other month at an earlier meeting. We discussed if a generalized approach could be used as the missing values in a rdf:List has use cases outside of DICOM. My original approach was to just omit an rdf:first when no value is known, but @dbooth-boston explained to me is that this is considered to not be "well-formed" and could cause issues. I lean towards a generalized (non-FHIR, non-DICOM) solution as JSON-LD compaction would not be changed for anything less than a general solution and to handle use cases outside of FHIR/DICOM. I do like having something that could indicate the "null flavor" for a FHIR/DICOM specific solution, it adds detail and clarity.

My first thought was to have the DICOM RDF / JSON-LD match the current DICOM JSON save the context to make acceptance and tooling easier, but, nulls in literal lists is problematic.

@ebremer
Copy link

ebremer commented May 23, 2024

If we followed DICOM XML rather than the DICOM JSON (giving up on a DICOM JSON / JSON-LD match)

DICOM XML (fragment)

	<DicomAttribute keyword="ImageType" tag="00080008" vr="CS">
		<Value number="1">DERIVED</Value>
		<Value number="2">PRIMARY</Value>
		<Value number="3">SINGLE PLANE</Value>
		<Value number="4">SINGLE A</Value>
	</DicomAttribute>
	<DicomAttribute keyword="SOPClassUID" tag="00080016" vr="UI">
		<Value number="1">1.2.840.10008.5.1.4.1.1.12.1</Value>
	</DicomAttribute>
	<DicomAttribute keyword="SOPInstanceUID" tag="00080018" vr="UI">
		<Value number="1">1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6</Value>
	</DicomAttribute>
	<DicomAttribute keyword="StudyDate" tag="00080020" vr="DA">
		<Value number="1">19970422</Value>
	</DicomAttribute>
	<DicomAttribute keyword="StudyTime" tag="00080030" vr="TM">
		<Value number="1">131047</Value>
	</DicomAttribute>

Option 7

RDF Turtle

<>
	dcm:DicomAttribute [ dcm:keyword "ImageType"; dcm:tag "00080008"; dcm:vr "CS";
		dcm:Value [ dcm:number 1; rdf:value "DERIVED" ];
		dcm:Value [ dcm:number 2; rdf:value "PRIMARY" ];
		dcm:Value [ dcm:number 3; rdf:value "SINGLE PLANE" ];
		dcm:Value [ dcm:number 4; rdf:value "SINGLE A" ]] ;

	dcm:DicomAttribute [ dcm:keyword "SOPClassUID"; dcm:tag "00080016"; dcm:vr "UI";
		dcm:Value [ dcm:number 1; rdf:value "1.2.840.10008.5.1.4.1.1.12.1" ]] ;
		
	dcm:DicomAttribute [ keyword "SOPInstanceUID"; dcm:tag "00080018"; dcm:vr "UI";
		dcm:Value [ dcm:number 1; rdf:value "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6" ]] ;

	dcm:DicomAttribute [ dcm:keyword "StudyDate"; dcm:tag "00080020"; dcm:vr "DA";
		dcm:Value [ dcm:number 1; rdf:value "19970422" ]] ;
	
	dcm:DicomAttribute [ dcm:keyword "StudyTime"; dcm:tag "00080030"; dcm:vr "TM";
		dcm:Value [ dcm:number 1; rdf:value "131047" ]] ;

Option 8

reduced scaffolding ( use rdf:List, eliminate dcm:number, and move keyword and tag string values to ontology )

<>
	dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" dcm:UNK "SINGLE A" )] ;
	dcm:00080016 [ dcm:vr "UI"; dcm:Value ( "1.2.840.10008.5.1.4.1.1.12.1" )] ;
	dcm:00080018 [ dcm:vr "UI"; dcm:Value ( "1.3.12.2.1107.5.4.3.11…030.6" )] ;
	dcm:00080020 [ dcm:vr "DA"; dcm:Value ( "19970422" )] ;
	dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] ;

@ebremer
Copy link

ebremer commented May 23, 2024

And in my "maybe" example:

Option 9a

<>
	dcm:00080008 ( "DERIVED"^^dcm:CS "PRIMARY"^^dcm:CS "UNK"^^dcm:nullFlavor "SINGLE A"^^dcm:CS ) ;
	dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI ) ;
	dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6"^^dcm:UI ) ;
	dcm:00080020 ( "19970422"^^dcm:DA ) ;
	dcm:00080030 ( "131047"^^dcm:TM ) ;

or possibly

option 9b

<>
	dcm:00080008 ( "DERIVED"^^dcm:CS "PRIMARY"^^dcm:CS dcm:UNK "SINGLE A"^^dcm:CS ) ;
	dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI ) ;
	dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6"^^dcm:UI ) ;
	dcm:00080020 ( "19970422"^^dcm:DA ) ;
	dcm:00080030 ( "131047"^^dcm:TM ) ;

option 9c (with EricP nulls)

<>
	dcm:00080008 ( "DERIVED"^^dcm:CS "PRIMARY"^^dcm:CS  [ a dcm:UNK ] "SINGLE A"^^dcm:CS ) ;
	dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI ) ;
	dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6"^^dcm:UI ) ;
	dcm:00080020 ( "19970422"^^dcm:DA ) ;
	dcm:00080030 ( "131047"^^dcm:TM ) ;

@w3c w3c deleted a comment from ebremer May 23, 2024
@w3c w3c deleted a comment from dbooth-boston May 23, 2024
@w3c w3c deleted a comment from ebremer May 23, 2024
@dbooth-boston
Copy link
Contributor Author

dbooth-boston commented Jun 11, 2024

I'd like to get this decided on our next teleconference (this week). Last week's discussion seemed to favor Option 5, but are there particular variants of option 5 that we should consider? For example, should we be recommending one specific null type? If so, specifically what URI? Or should we recommend a null-flavor type hierarchy? If so, specifically what, and what should the top-level null class be? If we can make options as concrete as possible, it will help facilitate our decision-making.

@ebremer
Copy link

ebremer commented Jun 13, 2024

@dbooth-boston of all of the DICOM null flavors, "NI" described in the spec as "No information. This is the most general and default null flavor." seems to be a good candidate for the root null flavor value with NA, UNK, ASKU, NAV, NASK, MSK, OTH being subclasses of "NI". I like Option 5 (@ericprud edits), as well, as it's closest to DICOM JSON. I also like Option 9 (with @ericprud nulls) being fairly compact. Further, the DICOM string usage could be mapped safely to more performant datatypes with SHEX/SHACL used to enforce the range/structure of values. A mapping/scripts from 5 to 9 (and 9 to 5) for those who want a few less triples?

@gaurav
Copy link

gaurav commented Jun 13, 2024

Just wanted to add a link to the HL7 NullFlavor code system, which is where the DICOM null flavor descriptions come from I think: https://terminology.hl7.org/CodeSystem-v3-NullFlavor.html

@gaurav
Copy link

gaurav commented Jun 13, 2024

Proposed 5B:

<>
	dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a fhir:null; fhir:nullFlavor "UNK" ] "SINGLE A" )] ;
	dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .

@ericprud
Copy link
Member

Per today's provocative, last-second assertion that subtyping xsd:string won't reduce utility of SPARQL operator semantics, here are the relevant operators:

Operator Type(A) Type(B) Function Result type
A = B xsd:string xsd:string op:numeric-equal(fn:compare(STR(A), STR(B)), 0) xsd:boolean
A != B xsd:string xsd:string fn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), 0)) xsd:boolean
A < B xsd:string xsd:string op:numeric-equal(fn:compare(STR(A), STR(B)), -1) xsd:boolean
A > B xsd:string xsd:string op:numeric-equal(fn:compare(STR(A), STR(B)), 1) xsd:boolean
A <= B xsd:string xsd:string fn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), 1)) xsd:boolean
A >= B xsd:string xsd:string fn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), -1)) xsd:boolean

-- SPARQL Operator Mapping

So refining an xsd:string to be e.g. "018M"dicom:AgeString would mean you couldn't just rely on the default interpretation of ""s as an xsd:string, e.g.:

opt 5

?s dcm:01234567 [ dcm:Value ?age ]
FILTER (?age = "018M")

opt 9 /c specialized types:

?s dcm:01234567 ?age
FILTER (?age = "018M"dicom:AgeString)

@ebremer
Copy link

ebremer commented Jun 20, 2024

Option 10 ( dcm:Null bnode, XSD data types, VR types moved to ontology/SHEX/SHACL, moving Lists up to their properties)

<>
	dcm:00080008 ( "DERIVED" "PRIMARY"  [ a dcm:Null ] "SINGLE A" ) ;
	dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1") ;
	dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6" ) ;
	dcm:00080020 ( "19970422"^^xsd:date ) ;
	dcm:00080030 ( "131047"^^xsd:time ) ;

11/21/24: Edited by dbooth to use dcm:Null instead of dcm:null.

@dbooth-boston
Copy link
Contributor Author

AGREED: Option 10, though we have not yet decided on namespace for dcm: prefix.

See also: #151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants