Download pdf-extractor-2.0.1.jar file

Description

This is an optimized version of Apache PDFBox. It allows to extract the rough structure of a document (pages, blocks of text and paragraphs as well as formatting information) and was made with the intent to optimize text extraction results for scientific papers. The output can easily be transformed to plaintext (toString) or to an XML format (toXML).

You can download jar file pdf-extractor 2.0.1 in this page.

License

The GNU Affero General Public License, Version 3

Build File

You can use the following script to add pdf-extractor-2.0.1.jar to your project.

<dependency>
   <groupId>de.cit-ec.scie</groupId>
   <artifactId>pdf-extractor</artifactId>
   <version>2.0.1</version>
</dependency>

compile group: 'de.cit-ec.scie', name: 'pdf-extractor', version: '2.0.1'

libraryDependencies += "de.cit-ec.scie" % "pdf-extractor" % "2.0.1"

<dependency org="de.cit-ec.scie" name="pdf-extractor" rev="2.0.1"/>

@Grapes(@Grab(group='de.cit-ec.scie', module='pdf-extractor', version='2.0.1'))

'de.cit-ec.scie:pdf-extractor:jar:2.0.1'

Download

Click the following link to download the jar file.

pdf-extractor-2.0.1-javadoc.jar
pdf-extractor-2.0.1-sources.jar
pdf-extractor-2.0.1.jar
pdf-extractor-2.0.1.pom

Download pdf-extractor-2.0.1.jar file - Jar p

Description

License

Build File

Download

Related Tutorials