Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up scrape_tables #5

Closed
greimel opened this issue Jul 16, 2024 · 2 comments
Closed

Split up scrape_tables #5

greimel opened this issue Jul 16, 2024 · 2 comments

Comments

@greimel
Copy link

greimel commented Jul 16, 2024

Hi,

your code is quite helpful even for tables that are not completely "well-formed". As of now, I can't use your package though, but I need to copy-paste code.

If you split up scrape_tables and return intermediate objects, one can re-use a lot of your functionality.

Example

url = "https://www.ssa.gov/oact/NOTES/as120/images/LD_fig5.html"
cell_transform = strip  nodeText 

(; result_tables) = scrape_tables_inner(url, cell_transform)

rows = only(result_tables)[3:end]
header = only(result_tables)[2]

DataFrame(TableScraper.Table(rows, header))

Would you be interested in a PR (could include the above snippet as a test)?

[Table(rows, header) for (rows, header) in zip(result_tables, headers)]

EDIT: This would also address #4

@greimel
Copy link
Author

greimel commented Jul 16, 2024

It turns out, one can use your package like this

url = "https://www.ssa.gov/oact/NOTES/as120/images/LD_fig5.html"
tbl = only(TableScraper.scrape_tables(url, strip  nodeText))

rows = tbl.rows[3:end]
header = tbl.rows[2]

df = DataFrame(TableScraper.Table(rows, header))

Would you like to have this example added to the README or tests?

@xiaodaigh
Copy link
Owner

Once you have extracted the header, you dont' need TableScrapper anymore, you can just create a data frame.

But done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants