Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF4j driver: Literals with datatype xsd:boolean not converted correctly to java types #1337

Open
JervenBolleman opened this issue Jan 10, 2025 · 10 comments

Comments

@JervenBolleman
Copy link
Contributor

Starting with a linked data view (in case this is important)

I have this native result

./isql-t 9999 dba dba exec="sparql define input:storage virtrdf:uniparc SELECT ?s (datatype(?o) AS ?oDt) ?o  WHERE { GRAPH ?g {?s <http://purl.uniprot.org/core/obsolete> ?o }};"
Initializing PRNG
...
Driver: 07.20.3234 OpenLink Virtuoso ODBC Driver
s                                                                                 oDt                                       o
VARCHAR                                                                           VARCHAR NOT NULL                          LONG VARCHAR
 _______________________________________________________________________________
http://purl.uniprot.org/uniparc/UPI0000000001#MD56329AB26D04CBB442AE0D418712AC2F2  http://www.w3.org/2001/XMLSchema#boolean  true
http://purl.uniprot.org/uniparc/UPI0000000001#MD5531CD9580E20E9C98B967AC7FFB6DB7A  http://www.w3.org/2001/XMLSchema#boolean  true

Running the same query via the RDF4j driver I get

http://purl.uniprot.org/uniparc/UPI0000000001#MD56329AB26D04CBB442AE0D418712AC2F2 http://www.w3.org/2001/XMLSchema#boolean true (http://www.w3.org/2001/XMLSchema#string)
http://purl.uniprot.org/uniparc/UPI0000000001#MD5531CD9580E20E9C98B967AC7FFB6DB7A http://www.w3.org/2001/XMLSchema#boolean true (http://www.w3.org/2001/XMLSchema#string)

The datatype of the literal value has returned to the default value of xsd:string. This might be due to the castValue method only expecting '0' and '1' as the literal values and not 'true' and 'false'

@HughWilliams
Copy link
Collaborator

HughWilliams commented Jan 10, 2025

@JervenBolleman

If the triples are inserted manually as physical triples, does the problem occur when querying via RDF4J, as I doubt the problem would be due to the data coming from Linked Data Views? Also, are the Linked Data Views generated as transient/virtual or physical triples?

If you run the query via the Virtuoso SPARQL endpoint, is the data returned as expected?

Do you have sample RDF4J code to show how the query is being expected in RDF4J?

@JervenBolleman
Copy link
Contributor Author

JervenBolleman commented Jan 10, 2025

The issue is linked data view related.

e.g., inserting a normal triple

./isql-t 9999 dba dba exec="sparql insert data { graph <http://example.org/normal> { <http://example.org/normal/s> <http://purl.uniprot.org/core/obsolete> true} }";

gives

http://example.org/normal/s	http://www.w3.org/2001/XMLSchema#boolean	true (http://www.w3.org/2001/XMLSchema#boolean)

instead of the above.

test(){
try (RepositoryConnection conn = managedVirtuoso.getRepository().getConnection()) {
			TupleQuery tq = conn.prepareTupleQuery(
					"SELECT ?s (datatype(?o) AS ?p) ?o  WHERE { GRAPH ?g {?s <http://purl.uniprot.org/core/obsolete> ?o }}");
			try (TupleQueryResult evaluate = tq.evaluate()) {
				while (evaluate.hasNext()) {
					printRes(evaluate.next());
				}
			}
		}
}

private void printRes(BindingSet next) {
		System.out.print(next.getBinding("s").getValue().stringValue());
		System.out.print("\t");
		System.out.print(next.getBinding("p").getValue().stringValue());
		System.out.print("\t");
		Binding ob = next.getBinding("o");
		if (ob == null) {
			System.err.println("missing");
		} else {
			Value object = ob.getValue();
			System.out.print(object.stringValue());
			if (object.isLiteral()) {
				System.out.print(" (" + ((Literal) object).getDatatype() + ")");
			}
			System.out.println();
		}
	}

The isql commandline is correct in both cases. It is only when running the query via RDF4j that the datatype is not correctly picked up in the linked data view. Using an SQL INTEGER-to-boolean mapping as in this comment.

@JervenBolleman
Copy link
Contributor Author

Also are the Linked Data Views generated as transient/virtual or physical triples ?

I am not sure? the mapped tables are in virtuoso itself.

@HughWilliams
Copy link
Collaborator

HughWilliams commented Jan 10, 2025

OK, we are looking into this.

I suspect the Linked Data Views would be virtual triples, i.e., dynamically generated at query time, which they would be by default, as there is a specific additional step required to materialise them as physical triples.

@HughWilliams
Copy link
Collaborator

HughWilliams commented Jan 10, 2025

@JervenBolleman:

Can you provide the Literal classes & views quad map that was used to generate the Linked Data View that is causing this issue in the RDF4J java types being returned?

@JervenBolleman
Copy link
Contributor Author

JervenBolleman commented Jan 10, 2025

@HughWilliams Trying to create a minimal example to reproduce. I noticed something odd using just isql, no RDF4j.

CREATE TABLE uniparc_record_entry_false_EMBL (auto_id INTEGER not null, id VARCHAR not null, obsolete INTEGER not null, primary key (auto_id));
INSERT INTO uniparc_record_entry_false_EMBL VALUES (1, 'test1', 1);
GRANT SELECT ON uniparc_record_entry_false_EMBL TO "SPARQL";
SPARQL create iri class <https://sparql.uniprot.org/sql-mapping/embl-cds-iri> "http://purl.uniprot.org/embl-cds/%s.%d" (in id varchar not null, in version integer not null);

CREATE FUNCTION DB.DBA.BOOL_NOT_INT(in b INTEGER) {
	if (b = 1) {
		return 'true';
	} else {
		return 'false';
	}
};

GRANT EXECUTE ON DB.DBA.BOOL_NOT_INT TO "SPARQL";
CREATE FUNCTION DB.DBA.BOOL_NOT_INT_INVERSE (in b VARCHAR) {
	if (b = 'true') {
		return 1;
	} else if (b = '1') {
		return 1;
	} else {
		return 0;
	}
};

GRANT EXECUTE ON DB.DBA.BOOL_NOT_INT_INVERSE TO "SPARQL";
SPARQL CREATE LITERAL CLASS <https://sparql.uniprot.org/sql-mapping/boolNoInt> USING
	  FUNCTION DB.DBA.BOOL_NOT_INT(IN b INTEGER) RETURNS VARCHAR,
	  FUNCTION DB.DBA.BOOL_NOT_INT_INVERSE (IN d VARCHAR) RETURNS INTEGER
	  OPTION (datatype <http://www.w3.org/2001/XMLSchema#boolean>);

SPARQL
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX mapping: <https://sparql.uniprot.org/sql-mapping/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
CREATE QUAD STORAGE virtrdf:uniparc
FROM DB.DBA.uniparc_record_entry_true_EMBL AS en0in
{
CREATE <https://sparql.uniprot.org/sql-mapping/uniparc-sql> AS GRAPH <https://sparql.uniprot.org/uniparc> {

#Mapping inactive entries
mapping:embl-cds-iri(en0in.id) up:obsolete mapping:boolNoInt(en0in.obsolete) as mapping:en0inobsolete .
}
};

This has no results

./isql-t 9999 dba dba exec="sparql define input:storage virtrdf:uniparc PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> SELECT ?s ?p ?o WHERE {  GRAPH ?g { ?s <http://purl.uniprot.org/core/obsolete> true } }";

This does

./isql-t 9999 dba dba exec="sparql define input:storage virtrdf:uniparc PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> SELECT ?s ?p ?o WHERE {  GRAPH ?g { ?s <http://purl.uniprot.org/core/obsolete> ?o } }";

@HughWilliams
Copy link
Collaborator

HughWilliams commented Jan 10, 2025

Development provided the following script which shows the correct datatype and boolean values via command line isql, /sparql endpoint, and an RDF4J program using doTupleQuery(con,"prefix rv: <https://localhost:8443/schemas/rv/> select ?s ?o datatype(?o) from <https://localhost:8443/rv#> where { ?s rv:valid ?o }");:

create table RV.DBA.Registration (id int primary key, name varchar, valid int) if not exists;
insert soft RV.DBA.Registration values (1, 'Alice', 1);
insert soft RV.DBA.Registration values (3, 'Bob', 1);
insert soft RV.DBA.Registration values (2, 'Joe', 0);

CREATE FUNCTION DB.DBA.BOOL_NOT_INT(in b INTEGER) {
  if (b = 1){
    return 'true';
  } else {
return 'false';
  }
};

CREATE FUNCTION DB.DBA.BOOL_NOT_INT_INVERSE (in b VARCHAR) {
  if (b = 'true'){
    return 1;
  } else {
    return 0;
  }
};

GRANT EXECUTE ON DB.DBA.BOOL_NOT_INT TO "SPARQL_SELECT";
GRANT EXECUTE ON DB.DBA.BOOL_NOT_INT TO "SPARQL_SELECT";

grant select on "RV"."DBA"."Registration" to SPARQL_SELECT;


SPARQL
prefix rv: <https://localhost:8443/schemas/rv/> 
drop literal class rv:isValid .
drop iri class rv:registration .
create iri class rv:registration "https://^{URIQADefaultHost}^/rv/registration/id/%d#this" (in _id integer not null) . 
create literal class rv:isValid using 
                        function DB.DBA.BOOL_NOT_INT (in b integer) returns any
                        function DB.DBA.BOOL_NOT_INT_INVERSE (in b VARCHAR) returns integer
                        option (bijection, datatype xsd:boolean) .
                        ;


SPARQL
prefix rv: <https://localhost:8443/schemas/rv/> 
prefix aowl: <http://bblfish.net/work/atom-owl/2006-06-06/> 
alter quad storage virtrdf:DefaultQuadStorage 
 from "RV"."DBA"."Registration" as registration_s
 { 
   create rv:qm-registration as graph iri ("https://^{URIQADefaultHost}^/rv#") option (exclusive) 
    { 
      rv:registration (registration_s."id")  a rv:Registration ;
      rv:id registration_s."id" as rv:dba-registration-id ;
      rv:name registration_s."name" as rv:dba-registration-name ;
      rv:valid rv:isValid(registration_s."valid") as rv:dba-registration-valid .

    }
 }
;

The Java/RDF4J program returns proper results:

s                                                |o       |callret-2                                  |
-------------------------------------------------------------------------------------------------------
https://localhost:8443/rv/registration/id/1#this |"true"  |"http://www.w3.org/2001/XMLSchema#boolean" |
https://localhost:8443/rv/registration/id/2#this |"false" |"http://www.w3.org/2001/XMLSchema#boolean" |
https://localhost:8443/rv/registration/id/3#this |"true"  |"http://www.w3.org/2001/XMLSchema#boolean" |

Rows =3

@JervenBolleman
Copy link
Contributor Author

@HughWilliams is there a comma missing in

create literal class rv:isValid using 
                        function DB.DBA.BOOL_NOT_INT (in b integer) returns any, # comma added
                        function DB.DBA.BOOL_NOT_INT_INVERSE (in b VARCHAR) returns integer
                        option (bijection, datatype xsd:boolean) .

@imitko
Copy link
Collaborator

imitko commented Jan 10, 2025

@JervenBolleman

The literal class definition is

create literal class rv:isValid using
                        function DB.DBA.BOOL_NOT_INT (in b integer) returns varchar,
                        function DB.DBA.BOOL_NOT_INT_INVERSE (in b varchar) returns integer
                        option (bijection, datatype xsd:boolean) .
                        ;

Also it is appropriate the inverse to understand integer as input when used for literal filtering on boolean

CREATE FUNCTION DB.DBA.BOOL_NOT_INT_INVERSE (in b VARCHAR) {
  if (isinteger (b))
    return b;
  if (b = 'true'){
    return 1;
  } else {
    return 0;
  }
}

HTH

@JervenBolleman
Copy link
Contributor Author

Coming back to this because I am still getting the error :( Removing rdf4j from the picture and using just the jdbc connection.

java.sql.Statement statement = ((VirtuosoRepositoryConnection)repository .getConnection()).getQuadStoreConnection().createStatement();
try {
	ResultSet execute = statement.executeQuery("SPARQL define input:storage virtrdf:uniparc SELECT ?s (datatype(?o) AS ?p) ?o  WHERE { GRAPH ?g {?s <http://purl.uniprot.org/core/obsolete> ?o }}");
	while(execute.next()) {
		var s = execute.getObject(1);
		var p =execute.getString(2);
		var o = execute.getObject(3);
		System.out.print(s);
		System.out.print(",");
		System.out.print(s.getClass());
		System.out.print("\t");
		System.out.print(p);
		System.out.print(",");
		System.out.print(p.getClass());
		System.out.print("\t");
		System.out.print(o);
		System.out.print(",");
		System.out.print(o.getClass());
		System.out.println();
	}
} finally {
	statement.close();
}

I have the output that 's' is a VirtuosoExtendedString, 'p' or the dataatype is a java.lang.String and the same for the 'o' object.

And this is with the boolean conversion code as given by @imitko .

The raw query log is:

sparql define input:storage virtrdf:uniparc SELECT ?s (datatype(?o) AS ?p) ?o  WHERE { GRAPH ?g {?s <http://purl.uniprot.org/core/obsolete> ?o }}  { 
    time       5.3% fanout         1 input         1 rows
    Subquery 28 
      { 
        Union
          { 
            time        44% fanout         1 input         1 rows
            uniparc_record_entry_false_EMBL         1 rows(t3.uniparc_id$37, t3.hash_id$36, t3.obsolete$35)
            
            
            After code:
            0: s~0$29 :=  := artm t3.uniparc_id$37
            4: s~1$30 :=  := artm t3.hash_id$36
            8: o$31 :=  := artm t3.obsolete$35
            12: BReturn 0
            time       2.2% fanout         0 input         1 rows
            Subquery Select(s~0$29, s~1$30, o$31)
          }
          { 
            time        38% fanout         1 input         1 rows
            uniparc_record_entry_false_PDB         1 rows(t8.uniparc_id$49, t8.hash_id$48, t8.obsolete$47)
            
            
            After code:
            0: s~0$29 :=  := artm t8.uniparc_id$49
            4: s~1$30 :=  := artm t8.hash_id$48
            8: o$31 :=  := artm t8.obsolete$47
            12: BReturn 0
            time      0.66% fanout         0 input         1 rows
            Subquery Select(s~0$29, s~1$30, o$31)
          }
      }
    
    After code:
    0: s$59 := Call __spfi (<c http://purl.uniprot.org/uniparc/UPI%010u#%s>, s~0$29, s~1$30)
    5: o$61 := Call DB.DBA.BOOL_NOT_INT (o$31)
    12: BReturn 0
    time       9.7% fanout         0 input         2 rows
    Select (s$59, <c http://www.w3.org/2001/XMLSchema#boolean>, o$61)
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants