Make parser generic over sink #21

y21 · 2022-02-01T09:15:13Z

It would be nice if the parser was generic over a "sink" that gives users the ability to have a function called when a tag is visited (streaming parser). The sink could then decide what to do with the received tag.
Sometimes, one might not need to parse an entire HTML document, or other things that tl does by default.
We could provide default implementations, for example a sink that keeps track of ids and classes, and remembers them (in a map) so that ID lookups run in constant time (this already exists and can be enabled through ParserOptions::track_ids(), but a sink could be nicer).
AFAICT parsers like html5ever seem to do this.

The text was updated successfully, but these errors were encountered:

dist1ll · 2022-02-04T02:23:49Z

What do you think of adding a ParserOptions::skip_whitespace()? I noticed when parsing there were quite a few Raw(Bytes("\n\n")) that I'd like to ignore.

Or should I rather create a sink in that case?

y21 added the enhancement New feature or request label Feb 1, 2022

y21 mentioned this issue Feb 2, 2022

Convenience functions for traversing DOM? #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make parser generic over sink #21

Make parser generic over sink #21

y21 commented Feb 1, 2022

dist1ll commented Feb 4, 2022

Make parser generic over sink #21

Make parser generic over sink #21

Comments

y21 commented Feb 1, 2022

dist1ll commented Feb 4, 2022