Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[External] Please fix white level scaling with HDR image processing (WIC and Direct2D related) #4860

Open
benstevens48 opened this issue Nov 8, 2024 · 1 comment

Comments

@benstevens48
Copy link

I know this is external but I have nowhere else to post it. I would like request that Microsoft (most notably the Windows Imaging Component team) urgently reviews how it handles white level scaling for HDR images before it becomes too late. With the 24H2 update, WIC has added support for HDR AVIF images, but the white level scaling is wrong (or at least missing essential metadata) and I hope it can be addressed before it becomes too much of a mess to ever sort out.

Historically for SDR images, WIC has left it up to the application to apply any color profile stored in the image. However, now with formats like HEIC and AVIF it does this conversion during the decode process. There is nothing wrong with this per se (well other than certain lack of control, especially when it comes to transcoding images). For HDR AVIF images it produces a 64-bit floating point pixel format which we are left to assume is in the scRGB color space (although no color space information at all is provided). But what is most problematic is it gives us no indication of what pixel value represents diffuse white. In my opinion, the only sensible value for this is (1,1,1). Indeed, in the JPEG-XR specification (https://www.itu.int/rec/T-REC-T.832-201906-I, page 181), it says 'The scRGB perfect diffuse white point is specified by all three colour channels set to a value of 1.0.' Based on exporting a 32-bit floating-point TIFF from lightroom, the value (1,1,1) is also used to represent diffuse white. Using this value to represent diffuse white makes perfect sense, as we can seamlessly combine with SDR images after apply a color transform to may to a common space (and the reference scRGB color profile will map (1,1,1) to and XYZ luminance value of 1, matching what happens for an SDR image e.g. with an sRGB profile). If we want to export the image to SDR, or show it on an SDR display, we need to know the value representing diffuse white, and using (1,1,1) for this means we do not have to do any extra scaling and non-HDR aware software will display the image reasonably well, and if we can make floating point (1,1,1) = diffuse white assumption then no additional metadata is required.

However, it seems that Microsoft has stuck with the mindset that a floating point pixel value of (1,1,1) represents a fixed luminance of 80 nits, going against the quote from the JPEG-XR specification, and this approach causes many, many problems. I am not 100% sure this is the exact mindset, but one thing I do know is that a PQ-encoded HDR AVIF decoded by WIC is treated like this. The PQ standard (https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BT.2408-7-2023-PDF-E.pdf) defines HDR reference white/diffuse white/graphics white as 203 nits (alternatively 58% of way along the PQ curve), and that is definitely not being mapped to (1,1,1) - from testing it does look more like 80 nits is being mapped to (1,1,1). This means that is we want to combine with any SDR graphics, apply effects (many of which work best in the range [0,1]), convert to SDR etc, we are doing this with the wrong white level (given that we have no way of knowing what pixel value represents diffuse white, other than by assuming it's 1,1,1).

In my opinion, by far the best solution to this problem is to treat floating point (1,1,1) as diffuse white. Now when it comes to display, I think it's OK that floating point (1,1,1) represents the fixed luminance 80 nits since we are able to obtain the SDR white level for the display. An HDR capable viewer can then apply scaling of (display SDR white/80) to map (1,1,1) in the image to the SDR white of the display. This is a behavioral change from that idea that seems to be pushed in some of Microsoft's docs, which is that HDR images represent a fixed display luminance, but in my opinion, this doesn't really make sense - if someone in a very bright room creates and image and sends it to someone in a very dark room, they are unlikely to want the display to use the same brightness when viewing it. Mapping the between SDR white levels (which can be specified by the user in their display settings) overcomes this.

The alternative (inferior) solution to this problem would be to ensure that HDR images decoded by WIC to a floating point pixel format contain metadata indicating that pixel value that represents SDR white. This could be a single value (so in the AVIF case it would be 203/80 = 2.5375), or if you really want, you could specify the luminance of (1,1,1) in nits and then the diffuse white point in nits as 80 and 203 respectively, but these absolute values don't really have much meaning).

This also has implications for HDR wallpaper. It is unclear to me what pixel value should represent diffuse white in these images.

In general I think perhaps some of the thinking from Microsoft has come from gaming, which has the unique property that that HDR content is produced and consumed on the same display. In general images are commonly shared and edited between people on multiple different devices etc, and we need to maintain that possibility, especially regarding editing, with HDR images. I do not know a lot about HDR video but I believe the HDR video standards have probably focussed on optimizing for display on TVs, not editing them in the same way that people edit images.

I know this is perhaps slightly off-topic for project reunion, but hopefully this gets seen by the relevant team!

@benstevens48
Copy link
Author

benstevens48 commented Dec 13, 2024

Some more thoughts. I realize that HDR standards, particularly video, specify the absolute luminance at which content should be displayed. IMO this is the wrong approach and it it much better for everything to be relative to SDR white, and to allow the user to set the brightness of SDR white which in turn sets the brightness of the image, and this approach ensures seamless compatibility with SDR content, while still allowing the full luminance range of the display to be used. (The HLG standard actually uses the max luminance of the display instead of an SDR white level for mapping the white level, which makes no sense since it means SDR content on a 10,000nit display in a dark room would be blinding - much better to use a user-defined SDR white level for mapping instead).

However, given the uncertainty around this, I think the ideal solution would be to enable the developer to do the color mapping themselves, like for standard image formats (although I don't know if the color transform has to be done during decode). Maybe we can use the IWICBitmapSourceTransform interface to request a 64bpp unsigned integer pixel format. Then we need some way of getting the color profile info (we can't use standard WIC color context functions since otherwise it would imply this needs to be applied to the standard floating point decode where color management has already been applied). Therefore perhaps the metadata query reader could be used to obtain any relevant color info (such as profile and/or CICP tag). Then the developer can apply the transform.

If not all of that is possible, then we firstly need to know what the decoded pixel value of 1 represents (e.g. 80nits or SDR white). Either by convention (but how since a 32-bit TIFF from lightroom will use 1 to represent SDR white, not 80nits, so we need some context?), or by some sort of 'virtual' metadata we can read with the query reader. Secondly, we need to be able to tell the encoded SDR white level if the image has an HDR transfer function like PQ or HLG, for which the standard specifies 203 (using the reference display for HLG). Maybe just by reading original color space info as described above.

Edit: I just wanted to say that I think that for HLG AVIF images, WIC is assuming reference display parameters for decode, which is the correct approach I think, so please do keep doing this. Trying to use the current display settings would introduce a weird dependency on the display settings at an early stage of the image processing pipeline and would not be good for editing purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants