Package com.itextpdf.text.pdf.parser
Class TaggedPdfReaderTool
- java.lang.Object
-
- com.itextpdf.text.pdf.parser.TaggedPdfReaderTool
-
public class TaggedPdfReaderTool extends Object
Converts a tagged PDF document into an XML file.- Since:
- 5.0.2
-
-
Field Summary
Fields Modifier and Type Field Description protected PrintWriter
out
The writer object to which the XML will be writtenprotected PdfReader
reader
The reader object from which the content streams are read.
-
Constructor Summary
Constructors Constructor Description TaggedPdfReaderTool()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
convertToXml(PdfReader reader, OutputStream os)
Parses a string with structured content.void
convertToXml(PdfReader reader, OutputStream os, String charset)
Parses a string with structured content.void
inspectChild(PdfObject k)
Inspects a child of a structured element.void
inspectChildArray(PdfArray k)
If the child of a structured element is an array, we need to loop over the elements.void
inspectChildDictionary(PdfDictionary k)
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.void
inspectChildDictionary(PdfDictionary k, boolean inspectAttributes)
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.void
parseTag(String tag, PdfObject object, PdfDictionary page)
Searches for a tag in a page.protected String
xmlName(PdfName name)
-
-
-
Field Detail
-
reader
protected PdfReader reader
The reader object from which the content streams are read.
-
out
protected PrintWriter out
The writer object to which the XML will be written
-
-
Method Detail
-
convertToXml
public void convertToXml(PdfReader reader, OutputStream os, String charset) throws IOException
Parses a string with structured content.- Parameters:
reader
- the PdfReader that has access to the PDF fileos
- the OutputStream to which the resulting xml will be writtencharset
- the charset to encode the data- Throws:
IOException
- Since:
- 5.0.5
-
convertToXml
public void convertToXml(PdfReader reader, OutputStream os) throws IOException
Parses a string with structured content. The output is done using the current charset.- Parameters:
reader
- the PdfReader that has access to the PDF fileos
- the OutputStream to which the resulting xml will be written- Throws:
IOException
-
inspectChild
public void inspectChild(PdfObject k) throws IOException
Inspects a child of a structured element. This can be an array or a dictionary.- Parameters:
k
- the child to inspect- Throws:
IOException
-
inspectChildArray
public void inspectChildArray(PdfArray k) throws IOException
If the child of a structured element is an array, we need to loop over the elements.- Parameters:
k
- the child array to inspect- Throws:
IOException
-
inspectChildDictionary
public void inspectChildDictionary(PdfDictionary k) throws IOException
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k
- the child dictionary to inspect- Throws:
IOException
-
inspectChildDictionary
public void inspectChildDictionary(PdfDictionary k, boolean inspectAttributes) throws IOException
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k
- the child dictionary to inspect- Throws:
IOException
-
parseTag
public void parseTag(String tag, PdfObject object, PdfDictionary page) throws IOException
Searches for a tag in a page.- Parameters:
tag
- the name of the tagobject
- an identifier to find the marked contentpage
- a page dictionary- Throws:
IOException
-
-