How best to represent DICOM lists with missing elements, in FHIR RDF? #141

dbooth-boston · 2024-04-29T17:18:03Z

Copying this issue description from Erich Bremer's email.

From: Erich Bremer
Date: Mon, 25 Mar 2024 11:22:24 -0400

Let me re-state the two problems that I feel need to be resolved as far as
I see in having a DICOM RDF for those not part of the original conversation.

For reference, here is a snippet of DICOM compliant JSON:
{
"00020002": { "vr": "UI", "Value": [ "1.2.840.10008.5.1.4.1.1.12.1"]},
"00020003": {"vr": "UI", "Value":
["1.3.12.2.1107.5.4.3.321890.19960124.162922.29"]},
"00020010": { "vr": "UI", "Value": ["1.2.840.10008.1.2.4.50"]},
"00020012": { "vr": "DS", "Value": [ "999.999"]}, ...

DICOM requires the "vr" property:
https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_F.2.3.html
DICOM handle value multiplicity by putting all values in ordered arrays:
https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_F.2.4.html
If an attribute is present but the missing (value length is 0), DICOM
says leave off the "value" property but the rest must be there.
https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_F.2.5.html
For null entries in the arrays, DICOM says put null for that list
element as the position itself may/may not be important.

Now to RDF ******
a) It's easy to handle 1+2+3 with RDF using a blank nodes and RDF Lists:

[ dcm:00020002 [ dcm:vr: "UI", dcm:Value ("1.2.840.10008.5.1.4.1.1.12.1") ];
dcm:00020003 [ dcm:vr "UI", dcm:Value
("1.3.12.2.1107.5.4.3.321890.19960124.162922.29")];
dcm:00020010 [ dcm:vr: "UI", dcm:Value ("1.2.840.10008.1.2.4.50")];
dcm:00020012 [ dcm:vr: "DS", dcm:Value ( "999.999")]; ...

The problem arises with 4 - nulls

For 3, we leave off the dcm:Value triple as "null maps to no triple". The
problem is in the RDF Lists.

( "1" "2" "3") is short-hand for:
_:myList rdf:first "1" ;
rdf:rest [ rdf:first "2" ;
rdf:rest [ rdf:first "3" ;
rdf:rest rdf:nil
]
] .

Following "no triple asserted is null", if I wanted to leave the second
element out of the list, I would just remove rdf:first "2". No triplestore
that I know would complain. I can SPARQL using the long list version and I
can write the SPARQL with an optional {_:SecondPosition rdf:first ?second }
or even a minus {}. In this fashion, the positional information is
preserved as needed by elements like
https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.11.7.html#table_C.8-74f
or pixel spacing
https://dicom.innolitics.com/ciods/rt-dose/image-plane/00280030 and would
return an unbound value depending on the data.

This path falls apart as there is no support (an implicit rdf:first must
always be present) for the missing rdf:first triple in the second position
for the shorthand ( "1" "2" "3"). I can put a variable ( "1" ?second "3",
but I cannot say the equivalent of ( "1" optional { ?second} "3") or even
( "1" minus { ?second} "3") JSON-LD will simply remove the second element
if I say ( "1" null "3") with the thought that null is no triple
asserted. But this stance is not the same as removing the rdf:first
triple, it's removing multiple triples [ rdf:first "2" ; rdf:rest[] ] and
then pointing _:first to _:third which takes out the positional information
and changing the meaning of things as DICOM views their data. RDF List is
a container construct with triples that express the various positions and
relations and the associated positional values. What would be wrong (who
would be put out) if we allow something that is already allowable in the
RDF model? Honestly, I feel like RDF is not following its own rules here
and needs to be fixed even with DICOM out of the picture. The same, I
feel, applies to RDF Sequence containers:

ex:mySequence
rdf:type rdf:Seq ;
rdf:_1 "1" ;
rdf:_2 "2" ;
rdf:_3 "3" .

If I omitted rdf:_2, it should just now be:
ex:mySequence
rdf:type rdf:Seq ;
rdf:_1 "1" ;
rdf:_3 "3" .

If the current logic of RDF lists is applied it would become the below
which is not the same:
ex:mySequence
rdf:type rdf:Seq ;
rdf:_1 "1" ;
rdf:_2 "3" .

Keeping the positional triples in RDF Lists and sequences seems to be an
easier fix than introducing some type of null literal or typed
"null"^^xsd:integer but it seems to be an implicit taboo due to the lack of
the support in the syntactic sugar for lists and how JSON-LD compaction
behaves with nulls.
I appreciate and I think I understand the thinking of the elements as part
of the schema, but it seems to be heading to more complex territory. RDF
would allow us to deviate further from the DICOM JSON through using things
like custom data types to reduce the number of triples:
[ dcm:00020002 ("1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI);
dcm:00020003 ("1.3.12.2.1107.5.4.3.321890.19960124.162922.29" ^^dcm:UI );
dcm:00020010 ("1.2.840.10008.1.2.4.50" ^^dcm:UI );
dcm:00020012 ( "999.999"^^dcm:DS); ...
and even remove lists where DICOM value multiplicity is always 1 but it
makes things a bit more complicated. I think a DICOM RDF needs to be very
much bi-directional and keeping towards the DICOM JSON modeling makes it
more familiar for the people in the DICOM domain. It reduces the tooling
on both sides. Nothing stops a RDF person from making SPARQL update
transforms to mutate the data back and forth to a different design (perhaps
more performant) but I fear great becomes the enemy of the good. "a little
semantics goes a long way" - Erich

dbooth-boston · 2024-05-01T19:17:06Z

To help drive discussion, I'm listing here some options, with pros/cons. Are there others I should add?

Option 1: Omit rdf:first elements from an RDF list ladder

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

       [ rdf:first "bar" ;
         rdf:rest [  # Note no rdf:first here
                    rdf:rest [ rdf:first "foo" ;
                               rdf:rest rdf:nil
                             ]
                  ] .
       ]

PROS:

Compact

CONS:

Cannot use concise Turtle list notation ( "bar" ____ "foo" )
Some would consider this a malformed RDF list.

Option 2: Use an RDF Sequence with a missing qlement

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

    rdf:type rdf:Seq ;
    rdf:_1 "1" ;
    ###  Note no rdf:_2 triple here
    rdf:_3 "3" .

PROS:

Compact

CONS:

Cannot use concise Turtle list notation ( "bar" ____ "foo" )
RDF sequences do not seem to be well accepted by the community

Option 3: Use a distinguished fhir:null value to represent null.

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

( "bar" fhir:null "foo" )

PROS:

Compact
Concise Turtle list syntax can be used.

CONS:

If _myList is expected to be all integers or floats, having a fhir:null value in the list might cause processing difficulties.

Option 4: Use explicit fhir:indexes , like in FHIR RDF R4

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

... [  [ fhir:v "bar ; fhir:index 0 ] ;
       ### Note no element with fhir:index 1
       [ fhir:v "foo ; fhir:index 2 ] ;

PROS:

Easy to query all elements and their indexes

CONS:

Less concise
No syntactic sugar for it

JervenBolleman · 2024-05-01T20:34:04Z

Option 5: Use a blank node that is a of fhir:null value to represent null.

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

( "bar" [ a fhir:null ] "foo" )

PROS:

Compact
Concise Turtle list syntax can be used.
Allow nothing why the thing is null.

CONS:

If _myList is expected to be all integers or floats, having a blank node fhir:null value in the list might cause processing difficulties.

Option 5 example [edit by ericP]

<>
	dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a fhir:null ] "SINGLE A" )] ;
	dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .

Option 5 expanded with type hierarchy for null flavors [edit by ericP]

PREFIX dinull: <https://dicom.nema.org/MEDICAL/Dicom/current/output/chtml/part20/sect_5.3.2.html>
dinull:UNK rdfs:label "Unknown. A proper value is applicable, but is not known." .
dinull:ASKU rdfs:label "Asked, but not known. Information was sought, but not found (e.g., the patient was asked but did not know)." ;
  rdfs:subClassOf dinull:UNK .

<>
	dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a dcm:UNK ] "SINGLE A" )] ;
	dcm:00012345 [ dcm:vr "CS"; dcm:Value ( 1 2 [ a dcm:UNK ] 4 [ a dcm:ASKU ] 6 )] ;
	dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .

Option 5a example [edit by DBooth]

<>
	dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a dicom:null ] "SINGLE A" )] ;
	dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .

ebremer · 2024-05-02T13:36:04Z

Perhaps something more generic than fhir:null like "rdf:null" as the issue seems to be a general RDF List issue than a FHIR issue?

JervenBolleman · 2024-05-02T15:54:10Z

Perhaps something more generic than fhir:null like "rdf:null" as the issue seems to be a general RDF List issue than a FHIR issue?

I don't think so. The question is what is the meaning of null in dicom and is that meaning consistent. I think here in the lists a null value is an existential variable. A variable that might be known in a different graph and/or inferable. So a blank node is the way to go.

The null issue outside of lists of values is larger.

dicom:null is a value is missing but could be known somewhere
dicom:null is that there is no value (owl:nothing)
dicom:null is a mix of the above
dicom:null can be subclassed in a value missing and a reason it was missing in a context .
also is dicom:null equals to dicom:null or should the logic be like float:Nan as they are not equals. Which would be another reason to go for option 5.

I think looking at dicom null flavours we have a mix here. dicom:NA is an owl:Nothing like value ( Which IMHO means that triple should not be there at all)

If we are talking in part about null flavors we can do the Option 5 and capture this and allow for reasoning and graph merging to resolve this.

ebremer · 2024-05-02T18:31:59Z

Option 6: Just use a plain blank node

This DICOM JSON:

"Value": [ 1, null, 2 ]

would be this in Turtle:

( 1 [] 2 )

Pros

Simple.
Concise Turtle list syntax can be used.
Usable outside the world of DICOM/FHIR

Cons

JSON-LD compaction will change all literals in the list to blank nodes (with their values and such) when the blank node(s) are detected

ebremer · 2024-05-02T18:51:23Z

@JervenBolleman we discussed the DICOM Null flavors during the FHIR RDF call today. @ericprud had found and shared that link the other month at an earlier meeting. We discussed if a generalized approach could be used as the missing values in a rdf:List has use cases outside of DICOM. My original approach was to just omit an rdf:first when no value is known, but @dbooth-boston explained to me is that this is considered to not be "well-formed" and could cause issues. I lean towards a generalized (non-FHIR, non-DICOM) solution as JSON-LD compaction would not be changed for anything less than a general solution and to handle use cases outside of FHIR/DICOM. I do like having something that could indicate the "null flavor" for a FHIR/DICOM specific solution, it adds detail and clarity.

My first thought was to have the DICOM RDF / JSON-LD match the current DICOM JSON save the context to make acceptance and tooling easier, but, nulls in literal lists is problematic.

ebremer · 2024-05-23T14:16:04Z

If we followed DICOM XML rather than the DICOM JSON (giving up on a DICOM JSON / JSON-LD match)

DICOM XML (fragment)

	<DicomAttribute keyword="ImageType" tag="00080008" vr="CS">
		<Value number="1">DERIVED</Value>
		<Value number="2">PRIMARY</Value>
		<Value number="3">SINGLE PLANE</Value>
		<Value number="4">SINGLE A</Value>
	</DicomAttribute>
	<DicomAttribute keyword="SOPClassUID" tag="00080016" vr="UI">
		<Value number="1">1.2.840.10008.5.1.4.1.1.12.1</Value>
	</DicomAttribute>
	<DicomAttribute keyword="SOPInstanceUID" tag="00080018" vr="UI">
		<Value number="1">1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6</Value>
	</DicomAttribute>
	<DicomAttribute keyword="StudyDate" tag="00080020" vr="DA">
		<Value number="1">19970422</Value>
	</DicomAttribute>
	<DicomAttribute keyword="StudyTime" tag="00080030" vr="TM">
		<Value number="1">131047</Value>
	</DicomAttribute>

Option 7

RDF Turtle

<>
	dcm:DicomAttribute [ dcm:keyword "ImageType"; dcm:tag "00080008"; dcm:vr "CS";
		dcm:Value [ dcm:number 1; rdf:value "DERIVED" ];
		dcm:Value [ dcm:number 2; rdf:value "PRIMARY" ];
		dcm:Value [ dcm:number 3; rdf:value "SINGLE PLANE" ];
		dcm:Value [ dcm:number 4; rdf:value "SINGLE A" ]] ;

	dcm:DicomAttribute [ dcm:keyword "SOPClassUID"; dcm:tag "00080016"; dcm:vr "UI";
		dcm:Value [ dcm:number 1; rdf:value "1.2.840.10008.5.1.4.1.1.12.1" ]] ;
		
	dcm:DicomAttribute [ keyword "SOPInstanceUID"; dcm:tag "00080018"; dcm:vr "UI";
		dcm:Value [ dcm:number 1; rdf:value "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6" ]] ;

	dcm:DicomAttribute [ dcm:keyword "StudyDate"; dcm:tag "00080020"; dcm:vr "DA";
		dcm:Value [ dcm:number 1; rdf:value "19970422" ]] ;
	
	dcm:DicomAttribute [ dcm:keyword "StudyTime"; dcm:tag "00080030"; dcm:vr "TM";
		dcm:Value [ dcm:number 1; rdf:value "131047" ]] ;

Option 8

reduced scaffolding ( use rdf:List, eliminate dcm:number, and move keyword and tag string values to ontology )

<>
	dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" dcm:UNK "SINGLE A" )] ;
	dcm:00080016 [ dcm:vr "UI"; dcm:Value ( "1.2.840.10008.5.1.4.1.1.12.1" )] ;
	dcm:00080018 [ dcm:vr "UI"; dcm:Value ( "1.3.12.2.1107.5.4.3.11…030.6" )] ;
	dcm:00080020 [ dcm:vr "DA"; dcm:Value ( "19970422" )] ;
	dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] ;

ebremer · 2024-05-23T14:40:22Z

And in my "maybe" example:

Option 9a

<>
	dcm:00080008 ( "DERIVED"^^dcm:CS "PRIMARY"^^dcm:CS "UNK"^^dcm:nullFlavor "SINGLE A"^^dcm:CS ) ;
	dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI ) ;
	dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6"^^dcm:UI ) ;
	dcm:00080020 ( "19970422"^^dcm:DA ) ;
	dcm:00080030 ( "131047"^^dcm:TM ) ;

or possibly

option 9b

<>
	dcm:00080008 ( "DERIVED"^^dcm:CS "PRIMARY"^^dcm:CS dcm:UNK "SINGLE A"^^dcm:CS ) ;
	dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI ) ;
	dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6"^^dcm:UI ) ;
	dcm:00080020 ( "19970422"^^dcm:DA ) ;
	dcm:00080030 ( "131047"^^dcm:TM ) ;

option 9c (with EricP nulls)

<>
	dcm:00080008 ( "DERIVED"^^dcm:CS "PRIMARY"^^dcm:CS  [ a dcm:UNK ] "SINGLE A"^^dcm:CS ) ;
	dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI ) ;
	dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6"^^dcm:UI ) ;
	dcm:00080020 ( "19970422"^^dcm:DA ) ;
	dcm:00080030 ( "131047"^^dcm:TM ) ;

dbooth-boston · 2024-06-11T15:44:24Z

I'd like to get this decided on our next teleconference (this week). Last week's discussion seemed to favor Option 5, but are there particular variants of option 5 that we should consider? For example, should we be recommending one specific null type? If so, specifically what URI? Or should we recommend a null-flavor type hierarchy? If so, specifically what, and what should the top-level null class be? If we can make options as concrete as possible, it will help facilitate our decision-making.

ebremer · 2024-06-13T11:47:30Z

@dbooth-boston of all of the DICOM null flavors, "NI" described in the spec as "No information. This is the most general and default null flavor." seems to be a good candidate for the root null flavor value with NA, UNK, ASKU, NAV, NASK, MSK, OTH being subclasses of "NI". I like Option 5 (@ericprud edits), as well, as it's closest to DICOM JSON. I also like Option 9 (with @ericprud nulls) being fairly compact. Further, the DICOM string usage could be mapped safely to more performant datatypes with SHEX/SHACL used to enforce the range/structure of values. A mapping/scripts from 5 to 9 (and 9 to 5) for those who want a few less triples?

gaurav · 2024-06-13T15:08:19Z

Just wanted to add a link to the HL7 NullFlavor code system, which is where the DICOM null flavor descriptions come from I think: https://terminology.hl7.org/CodeSystem-v3-NullFlavor.html

gaurav · 2024-06-13T15:18:09Z

Proposed 5B:

<>
	dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a fhir:null; fhir:nullFlavor "UNK" ] "SINGLE A" )] ;
	dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .

ericprud · 2024-06-13T16:23:32Z

Per today's provocative, last-second assertion that subtyping xsd:string won't reduce utility of SPARQL operator semantics, here are the relevant operators:

Operator	Type(A)	Type(B)	Function	Result type
A = B	xsd:string	xsd:string	op:numeric-equal(fn:compare(STR(A), STR(B)), 0)	xsd:boolean
A != B	xsd:string	xsd:string	fn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), 0))	xsd:boolean
A < B	xsd:string	xsd:string	op:numeric-equal(fn:compare(STR(A), STR(B)), -1)	xsd:boolean
A > B	xsd:string	xsd:string	op:numeric-equal(fn:compare(STR(A), STR(B)), 1)	xsd:boolean
A <= B	xsd:string	xsd:string	fn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), 1))	xsd:boolean
A >= B	xsd:string	xsd:string	fn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), -1))	xsd:boolean

-- SPARQL Operator Mapping

So refining an xsd:string to be e.g. "018M"dicom:AgeString would mean you couldn't just rely on the default interpretation of ""s as an xsd:string, e.g.:

opt 5

?s dcm:01234567 [ dcm:Value ?age ]
FILTER (?age = "018M")

opt 9 /c specialized types:

?s dcm:01234567 ?age
FILTER (?age = "018M"dicom:AgeString)

ebremer · 2024-06-20T18:12:50Z

Option 10 ( dcm:Null bnode, XSD data types, VR types moved to ontology/SHEX/SHACL, moving Lists up to their properties)

<>
	dcm:00080008 ( "DERIVED" "PRIMARY"  [ a dcm:Null ] "SINGLE A" ) ;
	dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1") ;
	dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6" ) ;
	dcm:00080020 ( "19970422"^^xsd:date ) ;
	dcm:00080030 ( "131047"^^xsd:time ) ;

11/21/24: Edited by dbooth to use dcm:Null instead of dcm:null.

dbooth-boston · 2024-11-21T16:40:07Z

AGREED: Option 10, though we have not yet decided on namespace for dcm: prefix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How best to represent DICOM lists with missing elements, in FHIR RDF? #141

How best to represent DICOM lists with missing elements, in FHIR RDF? #141

dbooth-boston commented Apr 29, 2024

dbooth-boston commented May 1, 2024 •

edited

Loading

JervenBolleman commented May 1, 2024 •

edited by ericprud

Loading

ebremer commented May 2, 2024

JervenBolleman commented May 2, 2024

ebremer commented May 2, 2024 •

edited by dbooth-boston

Loading

ebremer commented May 2, 2024

ebremer commented May 23, 2024 •

edited

Loading

ebremer commented May 23, 2024 •

edited by ericprud

Loading

dbooth-boston commented Jun 11, 2024 •

edited

Loading

ebremer commented Jun 13, 2024

gaurav commented Jun 13, 2024

gaurav commented Jun 13, 2024

ericprud commented Jun 13, 2024

ebremer commented Jun 20, 2024 •

edited by dbooth-boston

Loading

dbooth-boston commented Nov 21, 2024

How best to represent DICOM lists with missing elements, in FHIR RDF? #141

How best to represent DICOM lists with missing elements, in FHIR RDF? #141

Comments

dbooth-boston commented Apr 29, 2024

dbooth-boston commented May 1, 2024 • edited Loading

Option 1: Omit rdf:first elements from an RDF list ladder

Option 2: Use an RDF Sequence with a missing qlement

Option 3: Use a distinguished fhir:null value to represent null.

Option 4: Use explicit fhir:indexes , like in FHIR RDF R4

JervenBolleman commented May 1, 2024 • edited by ericprud Loading

Option 5: Use a blank node that is a of fhir:null value to represent null.

Option 5 example [edit by ericP]

Option 5 expanded with type hierarchy for null flavors [edit by ericP]

Option 5a example [edit by DBooth]

ebremer commented May 2, 2024

JervenBolleman commented May 2, 2024

ebremer commented May 2, 2024 • edited by dbooth-boston Loading

Option 6: Just use a plain blank node

ebremer commented May 2, 2024

ebremer commented May 23, 2024 • edited Loading

Option 7

Option 8

ebremer commented May 23, 2024 • edited by ericprud Loading

Option 9a

option 9b

option 9c (with EricP nulls)

dbooth-boston commented Jun 11, 2024 • edited Loading

ebremer commented Jun 13, 2024

gaurav commented Jun 13, 2024

gaurav commented Jun 13, 2024

ericprud commented Jun 13, 2024

opt 5

opt 9 /c specialized types:

ebremer commented Jun 20, 2024 • edited by dbooth-boston Loading

Option 10 ( dcm:Null bnode, XSD data types, VR types moved to ontology/SHEX/SHACL, moving Lists up to their properties)

dbooth-boston commented Nov 21, 2024

dbooth-boston commented May 1, 2024 •

edited

Loading

JervenBolleman commented May 1, 2024 •

edited by ericprud

Loading

ebremer commented May 2, 2024 •

edited by dbooth-boston

Loading

ebremer commented May 23, 2024 •

edited

Loading

ebremer commented May 23, 2024 •

edited by ericprud

Loading

dbooth-boston commented Jun 11, 2024 •

edited

Loading

ebremer commented Jun 20, 2024 •

edited by dbooth-boston

Loading