Alternative to supplying the original image? #25

joewiz · 2024-12-29T17:15:16Z

Hello! Thanks for this great tool! I see that, since the last time I used the code, @bertsky's PR that I'd been depending on #23 has been merged in, so I can use master - which is great!

I am writing with a request for an enhancement.

In my project I am working with images hosted in an S3 bucket and fronted by a IIIF-compliant image server (namely, Cantaloupe). I do not have the images on the system where I'm running textract2page. I would like to avoid having to download all of the images to my system, just to run textract2page.

The README explains why the image must be passed into the utility:

because Textract stores coordinates in float ratios, whereas PAGE uses int in pixel indices

Would there be some way I could pass in the pixel dimensions of the image? I can retrieve these easily via the IIIF API.

In my reading of the source code, the underlying convert_page function retrieves the image's dimensions here.

How about if the utility is passed a --width= and --height= flag instead of IMAGE_FILE, the utility could use these supplied values instead of requiring the image? Or some variation of this idea?

The text was updated successfully, but these errors were encountered:

bertsky · 2025-01-06T07:08:00Z

Thanks for raising this! Indeed, that's a valid and very realistic usecase. Good idea to do it like this.

joewiz mentioned this issue Jan 3, 2025

Allow conversion without image #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative to supplying the original image? #25

Alternative to supplying the original image? #25

joewiz commented Dec 29, 2024

bertsky commented Jan 6, 2025

Alternative to supplying the original image? #25

Alternative to supplying the original image? #25

Comments

joewiz commented Dec 29, 2024

bertsky commented Jan 6, 2025