Replies: 1 comment
-
+1,0000000000000000000000000000000000. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I spent a few hours investigating if it would be possible to create tagged PDFs from BIRT, and I think that the answer is "Yes, but it is quite an effort".
"Tagged PDF" means that the PDF not only contains rendering instructions, but also a logically HTML/XML-like document structure. This structure is necessary for blind or visually impaired readers and it is useful for other purposes, too.
Adding tagged PDF support to BIRT is probably too much effort for a single person or a little company.
My gut feeling tells me that it will cost several months of full-time work.
But it just seems necessary to comply with regulations in the US, in the EU and probably elsewhere in the world.
Technically, it is important to know that it is (as of today, might change with AI) impossible to take an ordinary PDF like those which BIRT creates today and magically turn it into a tagged PDF.
But in BIRT, we have our layout items with unique IDs and with a tree-like structure.
Other reporting engines support this, too.
So it should be doable.
In a tagged PDF, layout content is enclosed in BMC/BDC and EMC (end-marked-content), the difference between the opening BMC and BDC is that the letter also includes a dictionary with additional information.
One piece of information is the marked-content-ID, which is a (page-wise) unique number.
The structure tree represents the logical structure of the document. A node in the tree can reference to layout content by referencing pages and marked-content-IDs.
Page numbers, page headers and footers etc. are marked as pagination artifacts (e.g. usually unimportant for the reader).
There's much more to it, but that's the basic idea.
I took a quick look at the source code of OpenPDF, and it seems to support creating tagged PDFs (see the methods beginMarkedContentSequence and endMarkedContentSequence in PDFContentBytes.java).
The trick seems to create the layout objects and the structure at the same time - the begin... method needs a PdfStructureElement argument.
As we have access to the layout item tree (or could have access to it) when the emitter creates the PDF, in principle the BIRT layout item tree corresponds nicely to the PDF structure tree and thus can be used to build the PDF structure tree.
BIRT already supports a pdf tag property for layout elements, so obviously it was planned to support this - but the properties are unused at the moment.
Some things which a tagged PDF requires are not supported by BIRT's object model.
That's why the mapping of the trees is not really as straightforward as one might think at first.
For example, in BIRT we don't have a "caption" property for tables. In practice, the developer will either put a label or dynamic text item right above or below the table or use a table header/footer row with all cells merged for this. Whatever you choose, it is only a workaround, the semantic of this text being a caption for the table is lost. That's why I think that the PDF tag property alone is not sufficient. But with the help of names and ids of the report items, it should be possible to specify "this is a caption for the next table with name "Employees" somehow.
While it would be possible to add a "caption row" to table/grid items, I think changing the ROM structure would not be a good idea. Adding some new properties should be sufficient for tagged PDF support.
I think that in order to create accessible output, there should be a property "accessible" at the report level indicating this intention.
If the "accessible" property is set, in the designer and/or at runtime, warning messages should be displayed if it is clear that the report is not accessible, for example if there is no alternative text for an image or it the language is not specified.
Note: It is possible to say if a report cannot be accessible, but not the opposite. That's because for an accessible document, not all necessary conditions can be checked automatically.
I'd like to know
Beta Was this translation helpful? Give feedback.
All reactions