The pdf2Data SDK is a native Java (or .NET) application. Its primary function is to extract data from PDF files using predefined extraction rules.

The extracted data is output in either JSON or XML format.



The preferred way to set up iText pdf2Data in Java is to use a build system like Maven or Gradle and download pdf2Data artifacts from the iText Artifactory located at

The groupId is com.itextpdf.pdf2data, and the artifactId is pdf2data

In Maven, the configuration would look similar to the example below:

	<name>pdf2Data Maven Repository</name>



For .NET iText pdf2Data is distributed as a NuGet package which is available at or at iText Artifactory.

You can browse for the desired NuGet package manually or install it with the Install-Package itext7.pdf2data NuGet Package Manager command.

Integrating pdf2Data into your code

As from iText pdf2Data 4.0 the format of extraction templates has been changed, compared to iText pdf2Data 3.*. Please see the Migration guide to get to know more 

With the pdf2Data Manager in iText pdf2Data 4.0, you can download templates optimized for use in the pdf2Data SDK, so you can extract data in two lines of code. 

Extraction (Java)

Make sure to load the license file before invoking any code 


The initialization of the Pdf2DataExtractor instance from a processed template should now be done with one function call:

Pdf2DataExtractor extractor = Pdf2DataExtractor.create(new File(P2D_TEMPLATE_PATH));

Parse PDF using the extraction template

Perform extraction
ParsingResult result = extractor.recognize(new File(P2D_TEMPLATE_PATH));

You can use extracted values directly from the result or save them in one of two structured formats

Save to XML
result.saveToXml(new File(RESULT_XML_PATH));
Save to JSON
result.saveToJson(new File(RESULT_JSON_PATH));