Optimize decompression planning #7568

akuzm · 2025-01-06T12:46:34Z

The EquivalenceMember lookup is the most costly part, so share it between different uses.

Switch batch sorted merge to use the generic pathkey matching code.

Also cache some intermediate data in the CompressionInfo struct.

Disable-check: force-changelog-file

The EquivalenceMember lookup is the most costly part, so share it between different uses. Switch batch sorted merge to use the generic pathkey matching code. Also cache some intermediate data in the CompressionInfo struct.

svenklemm · 2025-01-06T13:11:10Z

tsl/src/nodes/decompress_chunk/exec.c

@@ -212,7 +212,7 @@ decompress_chunk_begin(CustomScanState *node, EState *estate, int eflags)
 										node->ss.ss_ScanTupleSlot->tts_tupleDescriptor);
 		}
 	}
-	/* Sort keys should only be present when sorted_merge_append is used */
+	/* Sort keys should only be present when batch sorted merge is used. */


Even without batch sortetd merge we might still want to push down ordering and skip the ordering after decompression.

Yes, we do that, but these are the keys for the sorting performed inside the DecompressChunk node itself. The only case when it does that is for batch sorted merge, otherwise the sorting is performed by the underlying compressed scan.

…ppend_partially_compressed-* ordered_append-*

codecov · 2025-01-06T14:10:56Z

Codecov Report

Attention: Patch coverage is 96.29630% with 5 lines in your changes missing coverage. Please review.

Project coverage is 82.30%. Comparing base (59f50f2) to head (780258e).
Report is 688 commits behind head on main.

Files with missing lines	Patch %	Lines
tsl/src/nodes/decompress_chunk/decompress_chunk.c	96.24%	0 Missing and 5 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7568      +/-   ##
==========================================
+ Coverage   80.06%   82.30%   +2.24%     
==========================================
  Files         190      238      +48     
  Lines       37181    43706    +6525     
  Branches     9450    10963    +1513     
==========================================
+ Hits        29770    35974    +6204     
- Misses       2997     3402     +405     
+ Partials     4414     4330      -84

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

akuzm · 2025-01-09T19:02:08Z

Tsbench has some 10% speedups on planning queries: https://grafana.ops.savannah-dev.timescale.com/d/uP2MnQk4z/query-run-times?orgId=1&var-suite=lazy_decompression&var-query=8e50d2074a29289e8ec3280d4ee535bc

This also uncovers a major planning regression which is caused by the notorious quadratic equivalence member search in the Postgres sorted plan creation. I think we'll have to live with it for now and fix this upstream. It happens on queries like EXPLAIN SELECT * FROM ht_chunk_compressed_2k ORDER BY time_bucket('1d', time) DESC, device LIMIT 1;. The reason is that we now have many per-chunk Sorts under MergeAppend, instead of one big Sort above Append. Each per-chunk sort requires a EM search (prepare_sort_from_pathkeys).

Another regression is in select min(vendor_id + passenger_count + trip_distance + pickup_longitude + pickup_latitude + rate_code + dropoff_longitude + dropoff_latitude + payment_type + fare_amount + extra + mta_tax + tip_amount + tolls_amount + improvement_surcharge + total_amount) from rides where true;
This is because sort + limit 1 initplan for min() is now chosen instead of aggregation. This is a known problem that will be fixed here #6879 (missed costing for projection under sort).

The regression in SELECT * FROM ht_metrics_compressed ORDER BY time_bucket('1d', time) DESC, device LIMIT 1; happens because now the parallel plan is not chosen. This will be also fixed by #6879

akuzm · 2025-01-09T19:14:14Z

I initially didn't want to introduce any plan changes with this PR, but it's split out of #6879, so I had to import one small part from there -- we now can sort above decompression not only by plain columns, but also by expressions (e.g. time_bucket), which gives rise to these (arguably more efficient) MergeAppend over per-chunk Sort plans.

Optimize decompression planning

b613f05

The EquivalenceMember lookup is the most costly part, so share it between different uses. Switch batch sorted merge to use the generic pathkey matching code. Also cache some intermediate data in the CompressionInfo struct.

akuzm mentioned this pull request Jan 6, 2025

Add unsorted decompressed chunk path even if we have sorted ones #6879

Open

remove unrelated change

7044cd5

svenklemm reviewed Jan 6, 2025

View reviewed changes

akuzm added 5 commits January 6, 2025 14:32

reword the comment

ab5505f

accept the changes

eb9a3de

reference REL_17_0-80-gb7467ab71c transparent_decompression-* merge_a…

a9e4b2c

…ppend_partially_compressed-* ordered_append-*

reference REL_14_11 ordered_append-*

64bb238

reference REL_16_4-111-g925b3aa857 ordered_append-*

780258e

svenklemm approved these changes Jan 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize decompression planning #7568

Optimize decompression planning #7568

akuzm commented Jan 6, 2025 •

edited

Loading

svenklemm Jan 6, 2025

akuzm Jan 6, 2025

codecov bot commented Jan 6, 2025 •

edited

Loading

akuzm commented Jan 9, 2025 •

edited

Loading

akuzm commented Jan 9, 2025 •

edited

Loading

Optimize decompression planning #7568

Are you sure you want to change the base?

Optimize decompression planning #7568

Conversation

akuzm commented Jan 6, 2025 • edited Loading

svenklemm Jan 6, 2025

Choose a reason for hiding this comment

akuzm Jan 6, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 6, 2025 • edited Loading

Codecov Report

akuzm commented Jan 9, 2025 • edited Loading

akuzm commented Jan 9, 2025 • edited Loading

akuzm commented Jan 6, 2025 •

edited

Loading

codecov bot commented Jan 6, 2025 •

edited

Loading

akuzm commented Jan 9, 2025 •

edited

Loading

akuzm commented Jan 9, 2025 •

edited

Loading