Class TaggedPdfReaderTool


  • public class TaggedPdfReaderTool
    extends Object
    Converts a tagged PDF document into an XML file.
    Since:
    5.0.2
    • Field Detail

      • reader

        protected PdfReader reader
        The reader object from which the content streams are read.
      • out

        protected PrintWriter out
        The writer object to which the XML will be written
    • Constructor Detail

      • TaggedPdfReaderTool

        public TaggedPdfReaderTool()
    • Method Detail

      • convertToXml

        public void convertToXml​(PdfReader reader,
                                 OutputStream os,
                                 String charset)
                          throws IOException
        Parses a string with structured content.
        Parameters:
        reader - the PdfReader that has access to the PDF file
        os - the OutputStream to which the resulting xml will be written
        charset - the charset to encode the data
        Throws:
        IOException
        Since:
        5.0.5
      • convertToXml

        public void convertToXml​(PdfReader reader,
                                 OutputStream os)
                          throws IOException
        Parses a string with structured content. The output is done using the current charset.
        Parameters:
        reader - the PdfReader that has access to the PDF file
        os - the OutputStream to which the resulting xml will be written
        Throws:
        IOException
      • inspectChild

        public void inspectChild​(PdfObject k)
                          throws IOException
        Inspects a child of a structured element. This can be an array or a dictionary.
        Parameters:
        k - the child to inspect
        Throws:
        IOException
      • inspectChildArray

        public void inspectChildArray​(PdfArray k)
                               throws IOException
        If the child of a structured element is an array, we need to loop over the elements.
        Parameters:
        k - the child array to inspect
        Throws:
        IOException
      • inspectChildDictionary

        public void inspectChildDictionary​(PdfDictionary k)
                                    throws IOException
        If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.
        Parameters:
        k - the child dictionary to inspect
        Throws:
        IOException
      • inspectChildDictionary

        public void inspectChildDictionary​(PdfDictionary k,
                                           boolean inspectAttributes)
                                    throws IOException
        If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.
        Parameters:
        k - the child dictionary to inspect
        Throws:
        IOException
      • parseTag

        public void parseTag​(String tag,
                             PdfObject object,
                             PdfDictionary page)
                      throws IOException
        Searches for a tag in a page.
        Parameters:
        tag - the name of the tag
        object - an identifier to find the marked content
        page - a page dictionary
        Throws:
        IOException