The pdf2Data SDK is a native Java (or .NET) application. Its primary function is to extract data from PDF files using predefined extraction rules.
The extracted data is output in either JSON or XML format.
The preferred way to set up iText pdf2Data in Java is to use a build system like Maven or Gradle and download pdf2Data artifacts from the iText Artifactory located at https://repo.itextsupport.com/pdf2data/
The groupId is
com.itextpdf.pdf2data, and the artifactId is
In Maven, the configuration would look similar to the example below:
<repository> <id>pdf2Data</id> <name>pdf2Data Maven Repository</name> <url>https://repo.itextsupport.com/pdf2data</url> </repository> <dependency> <groupId>com.itextpdf.pdf2data</groupId> <artifactId>pdf2data</artifactId> <version>4.0.0</version> </dependency>
For .NET iText pdf2Data is distributed as a NuGet package which is available at NuGet.org or at iText Artifactory.
You can browse for the desired NuGet package manually or install it with the
Install-Package itext7.pdf2data NuGet Package Manager command.
Integrating pdf2Data into your code
As from iText pdf2Data 4.0 the format of extraction templates has been changed, compared to iText pdf2Data 3.*. Please see the Migration guide to get to know more
With the pdf2Data Manager in iText pdf2Data 4.0, you can download templates optimized for use in the pdf2Data SDK, so you can extract data in two lines of code.
Make sure to load the license file before invoking any code
The initialization of the
Pdf2DataExtractor instance from a processed template should now be done with one function call:
Pdf2DataExtractor extractor = Pdf2DataExtractor.create(new File(P2D_TEMPLATE_PATH));
Parse PDF using the extraction template
ParsingResult result = extractor.recognize(new File(P2D_TEMPLATE_PATH));
You can use extracted values directly from the result or save them in one of two structured formats