Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't read gray jpg2000 file #9

Open
THausherr opened this issue Jan 5, 2017 · 10 comments
Open

Can't read gray jpg2000 file #9

THausherr opened this issue Jan 5, 2017 · 10 comments

Comments

@THausherr
Copy link

THausherr commented Jan 5, 2017

my code:

        ImageReader reader = imageReadersByFormatName.next();
        System.out.println("reader.canReadRaster(): " + reader.canReadRaster());
        ImageInputStream iis = ImageIO.createImageInputStream(new File("x.jp2"));
        reader.setInput(iis, true, true);
        BufferedImage image = reader.read(0);

The output:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 54
	at jj2000.j2k.fileformat.reader.FileFormatReader.getColorModel(FileFormatReader.java:680)
	at com.github.jaiimageio.jpeg2000.impl.J2KReadState.getColorModel(J2KReadState.java:935)
	at com.github.jaiimageio.jpeg2000.impl.J2KReadState.readBufferedImage(J2KReadState.java:343)
	at com.github.jaiimageio.jpeg2000.impl.J2KImageReader.read(J2KImageReader.java:441)
	at javax.imageio.ImageReader.read(ImageReader.java:939)

`

I'm using version 1.3.0.

x jp2
(file is a renamed JPEG20000)

According to IrfanView, the file is a JPEG2000 - Wavelet, Grayscale.

@stain
Copy link
Member

stain commented Jan 18, 2017

Not sure if this bug is related to these warnings I get from openjpg-tools from your file:

stain@biggiebuntu:~/Pictures$ j2k_to_image -i 350c3f9c-d395-11e6-880f-bbd8fedda5c2.jp2  -o 350c3f9c-d395-11e6-880f-bbd8fedda5c2.ppm

[WARNING] SOT marker inconsistency in tile 0: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 1: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 2: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 3: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 4: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 5: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 6: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 7: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 8: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 9: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 10: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 11: tile-part index greater (5) than number of tile-parts (5)

@stain
Copy link
Member

stain commented Jan 18, 2017

This is happening inside FileFormatReader as it is accessing the maps array - which was populated by readComponentMappingBox(12)

Debugging for your file I get that it has read the arrays:

  • comps: [0, 0, 28784]
  • type: [1, -1, -128]
  • maps: [0, 54, -35]

so something looks not right there.. back in getColorModel() the lut variable only has [[-1], [-1], [-1]] and so an index of 54 and even less so -35 is not going to work well.

Any idea of what could be wrong? To the untrained eye it seems to think it is not truly grayscale, but still with 3 components, but they are not the usual RGB, but rather 0, 0 and 28784 (aka 0x7070) - if that makes any sense. So either the file header is wrong in the depth/component information , something is broken as it reads the COMPONENT_MAPPING_BOX, or something changed in the JPEG2000 spec not reflected in this aged reference implementation copied by Sun.

@stain
Copy link
Member

stain commented Jan 18, 2017

If I convert your picture with j2k_to_image I seem to get a fully black picture.. is that what is intended?

@THausherr
Copy link
Author

THausherr commented Jan 18, 2017

According to IrfanView it is fully white.

It makes sense that it is white: in the PDF that contains the problem file (file 001131 from the digitalcorpora size) a shape is created and then that image is rendered with this shape as clipping path. This appears white with Adobe Reader.

stain added a commit that referenced this issue Jan 18, 2017
@stain
Copy link
Member

stain commented Jan 18, 2017

j2k_dump says there is only one comp:

image {
  x0=0, y0=0, x1=627, y1=807
  numcomps=1
  comp 0 {
    dx=1, dy=1
    prec=8
    sgnd=0
  }
}

which is consistent with a grayscale picture.. so something is wrong early on in the call to readComponentMappingBox() with for some reason length 12 (divided by 4 makes 3 components) instead of length 4 - thus it reads too far ahead and gets two extra funny components.

@stain
Copy link
Member

stain commented Jan 18, 2017

No, the length is still 12.. but the second and third component mappings are bogus in the file.

I think something is wrong with that map as it's read from the file.. if I replace it with the map [0,1,2] (as would happen if there was no map, then the image reads fine, and in fact, when output as PNG is 100% white.

@stain
Copy link
Member

stain commented Jan 18, 2017

Checking with Python (or a hex editor) you will find your file says:

>>> j = open("350c3f9c-d395-11e6-880f-bbd8fedda5c2.jp2").read(1024)
>>> j.find("\x63\x6d\x61\x70") # COMPONENT_MAPPING_BOX
136
>>> j[135]
'\x14'
>>> length = 0x14
>>> length
20
>>> j[135:135+length]
'\x14cmap\x00\x00\x01\x00\x00\x00\xff6pp\x80\xdd\x00\x00\x00'
>>> cmap = j[135:135+length][8:]
>>> cmap
'\x00\x00\x00\xff6pp\x80\xdd\x00\x00\x00'
>>> len(cmap)
12
>>> struct.unpack(">HBB", cmap[0:4])
(0, 0, 255)
>>> struct.unpack(">HBB", cmap[4:8])
(13936, 112, 128)
>>> struct.unpack(">HBB", cmap[8:])
(56576, 0, 0)

(I unpacked according to section 1.5.3.5 [in ISO/IEC 15444-1:2002 T.800](http://www.itu.int/rec/T-REC-T.800-200208-S/en this then defines the CMP, MTYP and PCOL: )

CMP
This field specifies the index of component from the codestream that is mapped to this channel
(either directly or through a palette). This field is encoded as a 2-byte big endian unsigned integer.

So this means component index 13936 and 56576 for the second and third channel ... does that make sense?

MTYP
This field specifies how this channel is generated from the actual components in the file. This field is encoded as a 1-byte unsigned integer.

Only value 0 and 1 are defined as types for MTYP, so here value 112 for the second channel is way out.. (or is that supported by an extension?)

PCOL
This field specifies the index component from the palette that is used to map the actual component
from the codestream. This field is encoded as a 1-byte unsigned integer. If the value of the MTYP
field for this channel is 0, then the value of this field shall be 0.

Yet it is written as 255 for the first channel which has MTYP 0..?

I'm afraid I'm not an expert in JPEG 2000 and get a bit confused..

@stain
Copy link
Member

stain commented Jan 18, 2017

What is confusing is the reference implementation uses signed shorts when the spec says unsigned shorts.. and same for bytes. So there could be multiple sign errors somewhere going unnoticed.

@stain
Copy link
Member

stain commented Jan 19, 2017

Best bet is that the JP2 file you sent are written with C code for 3 channels - but as just 1 component is used the mapping for channel 2 and 3 was written with unintialized (e.g. ~random) data. This reference implementation parses according to spec, which says the number of components is defined by the size of the cmap box - in this case 12 bytes aka 3 components.

(The spec does not say what to do if there's a mismatch)

@THausherr
Copy link
Author

Another one:
PDFJS-11306 jp2
(file is a renamed JPEG2000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants