Maven Repository - POM file for Web Framework boilerpipe 1.1.0 1.1.0

Summary

Boilerpipe -- Boilerplate Removal and Fulltext Extraction from HTML pages.

The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for....

Declaration

Here is the list of declaration for boilerpipe. If you use Maven you can use the following code to add the dependency for this POM file.

<dependency>
   <groupId>de.l3s.boilerpipe</groupId>
   <artifactId>boilerpipe</artifactId>
   <version>1.1.0</version>
</dependency>

If you think this Maven repository POM file listing for boilerpipe is inappropriate, such as containing malicious code/tools or violating the copyright, please email info at java2s dot com, thanks.





License

Name:Apache License 2.0.

Packages

The following packages are defined in the boilerpipe-1.1.0.jar

de.l3s.boilerpipe
de.l3s.boilerpipe.conditions
de.l3s.boilerpipe.document
de.l3s.boilerpipe.estimators
de.l3s.boilerpipe.extractors
de.l3s.boilerpipe.filters.english
de.l3s.boilerpipe.filters.heuristics
de.l3s.boilerpipe.filters.simple
de.l3s.boilerpipe.labels
de.l3s.boilerpipe.sax
de.l3s.boilerpipe.util
org.cyberneko.html




POM File Source

Here is the content of the POM file.

<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>de.l3s.boilerpipe</groupId>
  <artifactId>boilerpipe</artifactId>
  <packaging>jar</packaging>
  <version>1.1.0</version>
  <url>http://code.google.com/p/boilerpipe/</url>
  <licenses>
    <license>
      <name>Apache License 2.0</name>
    </license>
  </licenses>
  <name>Boilerpipe -- Boilerplate Removal and Fulltext Extraction from HTML pages</name>
  <description>The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.

The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.

Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.

Boilerpipe is a Java library written by Christian Kohlsch?tter. It is released under the Apache License 2.0.

The algorithms used by the library are based on (and extending) some concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlsch?tter et al., presented at WSDM 2010 -- The Third ACM International Conference on Web Search and Data Mining New York City, NY USA.
  </description>
  <scm>
    <connection>scm:svn:http://boilerpipe.googlecode.com/svn/trunk/</connection>
    <url>http://code.google.com/p/boilerpipe/source/browse/</url>
  </scm>
  
</project>