Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply binary search filter expressions directly on the block metadata of Index Scans #1619

Merged

Conversation

realHannes
Copy link
Collaborator

@realHannes realHannes commented Nov 15, 2024

With this PR, filter expressions that can be evaluated via binary search on a sorted input are directly evaluated on the block metadata of an IndexScan. For example in a query that contains { ?s ?p ?o FILTER (?o > 3)} only the blocks of the full index scan (sorted by the object) are read from disk that according to their metadata might contain values > 3.

Currently this mechanism has the following limitations:

  1. It can only be applied if the IndexScan directly is the child of the FILTER clause
  2. It can only be applied to logical expressions (AND/OR/NOT) and to relational expressions (greater than, equal to, etc.) between a variable and a constant. Currently the constant can not yet be an IRI or Literal.

realHannes and others added 30 commits April 25, 2024 19:07
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice and was quite a lot of work.
Mostly missing are Tests that the Prefilter is applied in the indexScan etc.
A good way (also for manual debugging) is to add the applied prefilters to the runtimInfo() as a detail, then they appear in the UI, and this can also be used for testing.

src/engine/IndexScan.h Show resolved Hide resolved
src/engine/IndexScan.h Outdated Show resolved Hide resolved
src/engine/IndexScan.h Outdated Show resolved Hide resolved
src/engine/IndexScan.h Outdated Show resolved Hide resolved
src/engine/IndexScan.cpp Outdated Show resolved Hide resolved
src/engine/IndexScan.cpp Show resolved Hide resolved
src/engine/IndexScan.cpp Outdated Show resolved Hide resolved
src/engine/IndexScan.cpp Outdated Show resolved Hide resolved
src/engine/IndexScan.h Outdated Show resolved Hide resolved
src/engine/IndexScan.h Outdated Show resolved Hide resolved
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very small nitpicks remain.
We still have to analyze, why the query planner takes so long.

src/engine/Filter.h Outdated Show resolved Hide resolved
src/engine/IndexScan.cpp Show resolved Hide resolved
src/engine/QueryExecutionTree.cpp Show resolved Hide resolved
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a tiny misunderstanding, but this is almost ready to merge.

src/engine/QueryExecutionTree.cpp Show resolved Hide resolved
src/engine/QueryExecutionTree.cpp Show resolved Hide resolved
src/engine/QueryExecutionTree.h Outdated Show resolved Hide resolved
@sparql-conformance
Copy link

Copy link

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much.
This is a great milestone for QLever!

@joka921 joka921 changed the title Apply PrefilterExpressionIndex in IndexScan Apply binary search filter expressions directly on the block metadata of Index Scans Dec 2, 2024
@joka921 joka921 merged commit 7680177 into ad-freiburg:master Dec 2, 2024
22 checks passed
realHannes added a commit to realHannes/qlever that referenced this pull request Dec 2, 2024
… of `Index Scan`s (ad-freiburg#1619)

With this PR, filter expressions that can be evaluated via binary search on a sorted input are directly evaluated on the block metadata of an IndexScan. For example in a query that contains `{ ?s ?p ?o FILTER (?o > 3)`} only the blocks of the full index scan (sorted by the object) are read from disk that according to their metadata might contain values `> 3`.

Currently this mechanism has the following limitations:
1. It can only be applied if the IndexScan directly is the child of the FILTER clause
2. It can only be applied to logical expressions (AND/OR/NOT) and to relational expressions (greater than, equal to, etc.) between a variable and a constant. Currently the constant can not yet be an IRI or Literal.
joka921 pushed a commit that referenced this pull request Dec 6, 2024
With this PR, the prefilter expressions implemented in #1619 also apply to literals and IRIs. For example the following query only extracts the relevant, prefiltered blocks from the `IndexScan`:
```
SELECT * {
?s ?p ?o FILTER (?o >= "hallo" && ?o <= "hello")
}
```
hannahbast pushed a commit that referenced this pull request Dec 12, 2024
Since #1619, the size estimate for an index scan always involved one or several copies of the block metadata, which incurred a significant query planning cost for most queries. Now, such a copy is only made for an index scan followed by a `FILTER` and only the metadata of those blocks is copied, which remain after the `FILTER` (in which case the two operations are expensive anyway).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants