Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query_selector not selecting children? #22

Open
steamyalt opened this issue Feb 5, 2022 · 5 comments
Open

query_selector not selecting children? #22

steamyalt opened this issue Feb 5, 2022 · 5 comments
Labels
bug Something isn't working enhancement New feature or request hard issue that's especially challenging to fix, requires potentially major (design) changes

Comments

@steamyalt
Copy link

steamyalt commented Feb 5, 2022

I'm pretty new to this so I might be misunderstanding how the query selector works.
I have the following HTML code:

<li class="test">
  <a href="example.com></a>
</li>

I'd now expect the selector li.test a to return the inner a element.
However, it returns nothing.
Rust code:

fn main() {
    let input = r#"<li class="test"><a href="example.com"></a></li>"#;
    let dom = tl::parse(input, tl::ParserOptions::default()).unwrap();
    let query = dom.query_selector("li.test a").unwrap().next();
    assert!(query.is_some()); // Panics!
}

Opening the same HTML code in Firefox and running $$("li.test a") in the console returns a one-element array with the a node.

@y21
Copy link
Owner

y21 commented Feb 6, 2022

You're right, the query selector implementation is incomplete. In particular, .foo .bar exists and is parsed properly:

/// Descendant combinator: .foo .bar
Descendant(Box<Selector<'a>>, Box<Selector<'a>>),

But Selector::matches doesn't handle that case (as well as .foo > .bar) and simply returns false.
_ => false,

I believe this was skipped in the implementation because of complexity at the time, and should be mentioned in the docs. Nodes don't have a reference or a handle to the parent, so implementing .foo > .bar is difficult. Also, the way the query selector works probably needs to be changed in some way to make this work, because right now there's not really a way to return a matched subnode from within the method.
I guess a workaround for now until that's implemented is to run a separate query selector on those matches (or if a query selector is overkill, one could iterate over the children and find the a tag using query.children().all().find(...) with a tag name comparison).


On a more-or-less unrelated note, I found an off-by-one bug in the query selector API for HTML tags while looking into this. We've got lots of tests for all of the different methods of the public API and I'm surprised they didn't catch this. I just published 0.6.1 with a quick fix for this (see #23).

@y21 y21 added bug Something isn't working enhancement New feature or request labels Feb 6, 2022
@steamyalt
Copy link
Author

As a temporary workaround, I currently have the following code:

    let items = dom.query_selector("li.example-class").unwrap();
    let links: Vec<_> = items
        .filter_map(|e| e.get(&parser).and_then(tl::Node::children))
        .filter_map(|e| e.all(&parser)[1].as_tag())
        .map(tl::HTMLTag::attributes)
        .filter_map(|e| e.get("href").flatten().and_then(tl::Bytes::try_as_utf8_str))
        .collect();

What I want to do is find the a tag inside all li tags with the specified class and get the link it's pointing to.
Is there any way to write this more concisely/idiomatically?

@y21 y21 added the hard issue that's especially challenging to fix, requires potentially major (design) changes label Mar 19, 2022
@kelko
Copy link
Contributor

kelko commented Jul 28, 2022

Ran also into the same problem for a fun project of mine. So I took a look at the code, thinking about maybe I can contribute a PR. But I'm not sure the current CSS selector parsing is prepared for this. E.g. the "OR" in the current implementation seems to 'local' while (at least in my understanding) in CSS it's an OR for the whole of the selector list.

So, in my understanding it should be:

a b,c -> (a AND b) OR (c)

But I think the code would parse it as:

a b,c -> (a) AND (b OR c)

Correct me if I'm wrong.

In the end I decided to put an index on top of your tl VDom, which caches all those relationships (https://github.com/kelko/html-streaming-editor/blob/main/src/html/mod.rs) and an own CSS selector parsing (using PEG, https://github.com/kelko/html-streaming-editor/blob/main/src/parsing/mod.rs#L66) & execution (https://github.com/kelko/html-streaming-editor/blob/main/src/css/mod.rs)

I don't say it's pretty, but it seems to do the job - at least for my project. And maybe there is an idea or two in there which might help you as well.

@ThatXliner
Copy link

What's the status on this?

@Aunmag
Copy link

Aunmag commented Dec 7, 2024

Same problem. Wish there was a fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request hard issue that's especially challenging to fix, requires potentially major (design) changes
Projects
None yet
Development

No branches or pull requests

5 participants