Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative to supplying the original image? #25

Open
joewiz opened this issue Dec 29, 2024 · 1 comment
Open

Alternative to supplying the original image? #25

joewiz opened this issue Dec 29, 2024 · 1 comment

Comments

@joewiz
Copy link

joewiz commented Dec 29, 2024

Hello! Thanks for this great tool! I see that, since the last time I used the code, @bertsky's PR that I'd been depending on #23 has been merged in, so I can use master - which is great!

I am writing with a request for an enhancement.

In my project I am working with images hosted in an S3 bucket and fronted by a IIIF-compliant image server (namely, Cantaloupe). I do not have the images on the system where I'm running textract2page. I would like to avoid having to download all of the images to my system, just to run textract2page.

The README explains why the image must be passed into the utility:

because Textract stores coordinates in float ratios, whereas PAGE uses int in pixel indices

Would there be some way I could pass in the pixel dimensions of the image? I can retrieve these easily via the IIIF API.

In my reading of the source code, the underlying convert_page function retrieves the image's dimensions here.

How about if the utility is passed a --width= and --height= flag instead of IMAGE_FILE, the utility could use these supplied values instead of requiring the image? Or some variation of this idea?

@bertsky
Copy link
Member

bertsky commented Jan 6, 2025

Thanks for raising this! Indeed, that's a valid and very realistic usecase. Good idea to do it like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants