de.aitools.iv.mds.util
Class Preprocessor

java.lang.Object
  extended by de.aitools.iv.mds.util.Preprocessor

public class Preprocessor
extends java.lang.Object

Author:
Anita

Constructor Summary
Preprocessor()
          Creates a new instance of Preprocessor
 
Method Summary
 int getDocumentCount()
          Returns the number of documents.
 int getMaxDimension()
          Returns the maximum offset found in one of the vectors of the documents.
 int[][] getOffsets()
          Returns the offsets array of the documents.
 double[][] getValues()
          Returns the values array of the documents.
static java.math.BigInteger[][] readHashFile(java.lang.String file)
          Reads file with hash values which have corresponding indices to vectors.
 void readVectors(java.lang.String tfidfFile)
          Reads a file with vectors.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Preprocessor

public Preprocessor()
Creates a new instance of Preprocessor

Method Detail

readVectors

public void readVectors(java.lang.String tfidfFile)
                 throws java.io.FileNotFoundException,
                        java.io.IOException
Reads a file with vectors. First line of file has to state the maximum compressed dimension of the vectors and the number of vectors.

Parameters:
tfidfFile - file (with path) to be read
Throws:
java.io.FileNotFoundException
java.io.IOException

getOffsets

public int[][] getOffsets()
Returns the offsets array of the documents.

Returns:
2-dimensional array of offsets of vectors

getValues

public double[][] getValues()
Returns the values array of the documents.

Returns:
2-dimensional array of values of vectors

getDocumentCount

public int getDocumentCount()
Returns the number of documents.

Returns:
number of vectors, i.e. documents

getMaxDimension

public int getMaxDimension()
Returns the maximum offset found in one of the vectors of the documents.

Returns:
maximum offset of vectors

readHashFile

public static java.math.BigInteger[][] readHashFile(java.lang.String file)
                                             throws java.io.FileNotFoundException,
                                                    java.io.IOException
Reads file with hash values which have corresponding indices to vectors.

Parameters:
file - file (with path) to be read
Returns:
2-dimensional array with hash values
Throws:
java.io.FileNotFoundException
java.io.IOException