Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lists): add support for wildcard lists using a custom Trie #1233

Merged
merged 10 commits into from
Nov 17, 2023

Conversation

ThinkChaos
Copy link
Collaborator

I did trie benchmarks and a custom trie seemed like the best approach. I recorded the benchmarks in the commit history so we can test others easily in the future, or even just for reference.
See details below.

There's two trie implementations (in two commits, only a single one in the code at a time). First one in feat(lists): add support for wildcard lists using a custom Trie and then refactor(trie): reduce memory use by implementing a radix trie.
The radix tree uses less memory but is a little bit slower to search. I think the tradeoff is worth it cause the speed penalty is unlikely to be noticeable outside benchmarks.

While the trie's absolute memory usage is higher than the plain string cache (i.e. for the same data it uses more memory), in practice a wildcard list needs less entries. So the OISD big wildcard list ends up using about the same amount of memory as the OISD big plain one despite the trie being less compact.
I made the benchmarks to reflect this, but there's a switch to make them run with the same data just to see.

After optimizing the Trie it's even better than the plain string cache in a couple ways:

  • peak memory usage is lower since we don't use different storage in the factory and the cache (could be changed in the plain string implementation)
  • search is much faster

There's also a couple minor fixes included:

And a quick way to see coverage locally: build: generate coverage.html when running tests.

Benchmarks

The benchmarks use two versions of the OISD big list because that seemed more realistic, as mentioned above.
The string cache benchmark adds all items from the OISD big plain list, while the others (regex and wildcard) add the ones from the wildcard list.
The querying benchmark populates the caches in the same way, but always searches all entries from the plain list.

// --- Cache Building ---
//
// Most memory efficient: Wildcard (blocky/trie    radix) because of peak
// Fastest:               Wildcard (blocky/trie original)
//
// BenchmarkRegexFactory-8                1     1 232 170 998 ns/op   430.60 fact_heap_MB   430.60 peak_heap_MB   1 792 669 136 B/op   9 826 987 allocs/op
// BenchmarkStringFactory-8               7       159 934 992 ns/op    11.79 fact_heap_MB    26.91 peak_heap_MB      67 613 644 B/op       1 305 allocs/op
// BenchmarkWildcardFactory-8            18        60 091 687 ns/op    16.61 fact_heap_MB    16.61 peak_heap_MB      26 733 498 B/op      92 213 allocs/op (original)
// BenchmarkWildcardFactory-8            16        69 790 156 ns/op    14.89 fact_heap_MB    14.89 peak_heap_MB      27 987 510 B/op      52 902 allocs/op (radix)
// BenchmarkDGHubbleWildcardFactory-8    13        80 772 887 ns/op    23.65 fact_heap_MB    23.65 peak_heap_MB      34 126 104 B/op     301 831 allocs/op
// BenchmarkPorfirionWildcardFactory-8    4       283 443 974 ns/op   183.30 fact_heap_MB   183.30 peak_heap_MB     200 634 492 B/op     811 260 allocs/op
// --- Cache Querying ---
//
// Most memory efficient: Wildcard (blocky/trie radix)
// Fastest:               Wildcard (blocky/trie original)
//
// BenchmarkStringCache-8                 6       204 754 798 ns/op    15.11 cache_heap_MB              0 B/op          0 allocs/op
// BenchmarkWildcardCache-8              14        76 186 334 ns/op    16.61 cache_heap_MB              0 B/op          0 allocs/op (original)
// BenchmarkWildcardCache-8              12        95 316 121 ns/op    14.91 cache_heap_MB              0 B/op          0 allocs/op (radix)
// BenchmarkDGHubbleWildcardCache-8      14        78 111 098 ns/op    23.65 cache_heap_MB              0 B/op          0 allocs/op
// BenchmarkPorfirionWildcardCache-8      4       304 584 455 ns/op   183.30 cache_heap_MB     26 797 744 B/op    305 718 allocs/op

The third party tries I tested were:

I committed the lists in helpertest/data. The files are large-ish, but we shouldn't need to update them regularly, even then, they're text so diff will limit the extra weight. So I think it's fine to have them in repo.
Alternatively we could use a submodule but I didn't think it'd be worth the hassle that brings in.


Closes #1090

Copy link

codecov bot commented Nov 12, 2023

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (dc66eff) 93.66% compared to head (79cf5da) 93.71%.
Report is 3 commits behind head on main.

Files Patch % Lines
lists/list_cache.go 60.00% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1233      +/-   ##
==========================================
+ Coverage   93.66%   93.71%   +0.04%     
==========================================
  Files          70       72       +2     
  Lines        5687     5884     +197     
==========================================
+ Hits         5327     5514     +187     
- Misses        279      286       +7     
- Partials       81       84       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kwitsch
Copy link
Collaborator

kwitsch commented Nov 12, 2023

This looks great I'll look into it tomorrow... when I'm less drunk 🫣

@t-e-s-tweb
Copy link

In my usage its amazing so far. I use hagezi blocklists where the plain domains are 900k+ while the wildcards are only 280k+. Also I have noticed that if blocky was started and then stopped then previously it used to take the same cpu time in processing the lists again which was 1:30secs for me. But after this of it is stopped and restarted then the processing time is only 13.

Amazing improvement so far.

Copy link
Collaborator

@kwitsch kwitsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks clean and perfomes well(tests & benchmarks).
Great work. 👍

If there are any unformatted files, CI fails with confusing
"read-only file system" error.
Also means we save a bit of time by not checking every single entry with
all caches.
ThinkChaos added a commit to ThinkChaos/blocky that referenced this pull request Nov 12, 2023
A couple other Trie implementations were tested but they use more
memory and are slower. See PR 0xERR0R#1233 for details.
ThinkChaos added a commit to ThinkChaos/blocky that referenced this pull request Nov 12, 2023
A couple other Trie implementations were tested but they use more
memory and are slower. See PR 0xERR0R#1233 for details.
@ThinkChaos
Copy link
Collaborator Author

Thanks for taking a look to both of you!

I updated the PR with a comment tweak cause I realized that the trie is not a full radix: only terminals are compressed. I think it's fine to keep like this cause memory use is acceptable and we avoid the extra complexity and speed decrease a full radix would bring. So I'll call it a feature.

Also ran the benchmarks on the bigger versions of the Hagezi list out of curiosity. Results are pretty similar to OISD.
Second force-push is minor tweaks to the benchmarks I did during that (they don't affect results).

For the lists committed in the repo, I thought of another possibility: add a script to download the files and put the dir it downloads to in .gitignore. The script could be ran by the Makefile and wouldn't do anything if the files exist.
If you'd prefer that I'll make the change. Only downside is everyone gets different versions of the lists, but anyways benchmarks already aren't comparable from machine to machine.

Hagezi Pro

// Build:
// BenchmarkRegexFactory-8       1   1 670 006 348 ns/op   554.80 fact_heap_MB    554.80 peak_heap_MB    2 317 890 352 B/op   13 068 246 allocs/op
// BenchmarkStringFactory-8      1   2 479 199 024 ns/op    17.59 fact_heap_MB     40.37 peak_heap_MB      104 051 184 B/op        1 528 allocs/op
// BenchmarkWildcardFactory-8   12     101 126 137 ns/op    22.78 fact_heap_MB     22.78 peak_heap_MB       43 899 928 B/op      101 319 allocs/op
//
// Query:
// BenchmarkStringCache-8        3     391 377 809 ns/op    22.78 cache_heap_MB                                      0 B/op            0 allocs/op
// BenchmarkWildcardCache-8     14      86 248 134 ns/op    22.86 cache_heap_MB                                      0 B/op            0 allocs/op

Hagezi Pro++

// Build:
// BenchmarkRegexFactory-8       1   2 273 046 076 ns/op   706.40 fact_heap_MB    706.40 peak_heap_MB    2 949 640 304 B/op   16 842 001 allocs/op
// BenchmarkStringFactory-8      1   3 710 888 341 ns/op    21.35 fact_heap_MB     49.14 peak_heap_MB      127 686 816 B/op        2 060 allocs/op
// BenchmarkWildcardFactory-8    8     132 437 111 ns/op    27.29 fact_heap_MB     27.29 peak_heap_MB       51 432 116 B/op      125 302 allocs/op
//
// Query:
// BenchmarkStringCache-8        2     504 437 782 ns/op    27.79 cache_heap_MB                                      0 B/op            0 allocs/op
// BenchmarkWildcardCache-8     10     112 441 120 ns/op    27.22 cache_heap_MB                                      0 B/op            0 allocs/op

Hagezi Ultimate

// Build:
// BenchmarkRegexFactory-8       1   2 779 399 175 ns/op   890.80 fact_heap_MB    890.80 peak_heap_MB    3 728 279 496 B/op   21 357 076 allocs/op
// BenchmarkStringFactory-8      1   6 609 416 144 ns/op    25.65 fact_heap_MB     58.60 peak_heap_MB      154 634 824 B/op        2 174 allocs/op
// BenchmarkWildcardFactory-8    5     214 580 937 ns/op    52.17 fact_heap_MB     52.17 peak_heap_MB       80 956 003 B/op      154 019 allocs/op
//
// Query:
// BenchmarkStringCache-8        2     614 883 620 ns/op    32.95 cache_heap_MB                                      0 B/op            0 allocs/op
// BenchmarkWildcardCache-8      7     162 815 173 ns/op    52.22 cache_heap_MB                                      0 B/op            0 allocs/op

It seems the trie uses a fair amount of extra memory for this test, not sure why and it still seems acceptable so I didn't look more into it.

@kwitsch
Copy link
Collaborator

kwitsch commented Nov 12, 2023

Maybe we should print a warning message after loading the lists for the first time and reaching a threshold number of regex? 🤔

Similar to the one you added to the configuration page.

I'm fairly sure people pay more attention to the logs than the documentation if they have performance issues.

@ThinkChaos
Copy link
Collaborator Author

ThinkChaos commented Nov 12, 2023

Yeah I was thinking tackling that separately cause I wanted to refactor how ListCache uses the caches to tackle a couple things at the same time:

  • there's no great place to put a warning based on how many regexes are used
  • both lists and string caches need the rules to detect what kind of entry a string is (/ prefix + suffix for regex and so on)
  • this detection needing to be done twice is wasteful, especially compiling regexes

I just pushed a minimal patch to get a warning in already, and will tackle the rest separately :)

@ThinkChaos
Copy link
Collaborator Author

Realized I missed a b.ResetTimer() in the factory benchmarks. Doesn't affect results besides the absolute numbers since all implementations had the same setup time added. Reran the benchmarks and updated the PR, and even absolute number pretty much didn't change.
Anyways letting this sit a bit more since it's a big change, I'll merge this weekend or when my next MR is ready :)

Copy link
Owner

@0xERR0R 0xERR0R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good 👍

@0xERR0R 0xERR0R added the 🔨 enhancement New feature or request label Nov 17, 2023
@0xERR0R 0xERR0R added this to the v0.23 milestone Nov 17, 2023
@0xERR0R 0xERR0R merged commit b498bc5 into 0xERR0R:main Nov 17, 2023
11 checks passed
@ThinkChaos ThinkChaos deleted the feat/wildcard-lists branch December 12, 2023 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wildcard list support
4 participants