de.aitools.aq.textextraction
Class TextExtractor

java.lang.Object
  extended by de.aitools.aq.textextraction.TextExtractor
Direct Known Subclasses:
PsConverter, TikaConverter

public abstract class TextExtractor
extends java.lang.Object

An abstract class to extract plain text from arbitrary documents. Derived classes have to implement extract(File, File).

Version:
$Id: TextExtractor.java,v 1.1 2011/04/16 02:18:05 trenkman Exp $
Author:
martin.trenkmann@uni-weimar.de

Constructor Summary
TextExtractor()
           
 
Method Summary
 java.io.File extract(java.io.File inputFile)
          Extracts plain text from the given inputFile.
abstract  void extract(java.io.File inputFile, java.io.File outputFile)
          Extracts plain text from the given inputFile.
abstract  java.util.Set<org.apache.tika.mime.MediaType> getSupportedMediaTypes()
          Returns a the MIME type supported by this converter.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextExtractor

public TextExtractor()
Method Detail

getSupportedMediaTypes

public abstract java.util.Set<org.apache.tika.mime.MediaType> getSupportedMediaTypes()
Returns a the MIME type supported by this converter.

Returns:
The supported MIME type.

extract

public abstract void extract(java.io.File inputFile,
                             java.io.File outputFile)
                      throws TextExtractorException
Extracts plain text from the given inputFile. The retrieved content will be stored in outputFile.

Parameters:
inputFile - The input file to extract plain text from
outputFile - The path of the plain text file to be created
Throws:
TextExtractorException

extract

public java.io.File extract(java.io.File inputFile)
                     throws TextExtractorException
Extracts plain text from the given inputFile. The retrieved content will be stored in the same directory as the input file.

Parameters:
inputFile - The input file to extract plain text from
Returns:
The path of the created plain text file
Throws:
TextExtractorException