Extracting a Tagged PDF Structure Tree as XML

The project has produced a new tool to extract the Structure Tree from a tagged PDF file as XML. This is a Lua script just requiring the texlua that is distributed with all major TeX distributions.

RelaxNG schema are also provided to validate the resulting XML.

A discussion page with full details is available in the Project’s tagging-project repository.

To see the tool in action, validating one of the Project’s example WTPDF files, you may use the form at

https://texlive.net/showtags?doc=mathml-AF-ex2-se

See also: Original Source by the LaTeX project team

Note: The copyright belongs to the blog author and the blog. For the license, please see the linked original source blog.