de::aitools::aq::invertedindex::usage::GoogleBooks Class Reference

List of all members.

Static Public Member Functions

static final void index (File ngramDir, File indexDir) throws ZipException, IOException
static void main (String[] args)

Detailed Description

A class to demonstrate how to build an inverted index directly from the zip-compressed csv-files of the Google Books N-Gram collection.

As the original dataset that tracks the frequency of n-grams over several years, we want to map each n-gram to its sequence of year/frequency tuples. To do so, the underlying code extracts one n-gram/year/frequency triple from each line, stores these data in a record and puts this record into an instance of Indexer.

Note: It seems that Google made a mistake in their n-gram encoding. We found out that certain n-gram files contain also n-grams of length (n-k) and that these shortened n-grams generate duplicates. You have to make sure that such n-grams are filtered out when parsing the zip-files.

Author:
martin.trenkmann@uni-weimar.de
Version:
Id
GoogleBooks.java,v 1.1 2011/04/10 16:41:25 trenkman Exp

Definition at line 44 of file GoogleBooks.java.


Member Function Documentation

static final void de::aitools::aq::invertedindex::usage::GoogleBooks::index ( File  ngramDir,
File  indexDir 
) throws ZipException, IOException [inline, static]
static void de::aitools::aq::invertedindex::usage::GoogleBooks::main ( String[]  args  )  [inline, static]

Definition at line 89 of file GoogleBooks.java.


The documentation for this class was generated from the following file:
Generated on Wed May 30 15:07:49 2012 by  doxygen 1.6.3