Class GoogleBooks

  extended by

public class GoogleBooks
extends java.lang.Object

A class to demonstrate how to build an inverted index directly from the zip-compressed csv-files of the Google Books N-Gram collection.

As the original dataset that tracks the frequency of n-grams over several years, we want to map each n-gram to its sequence of year/frequency tuples. To do so, the underlying code extracts one n-gram/year/frequency triple from each line, stores these data in a record and puts this record into an instance of Indexer.

Note: It seems that Google made a mistake in their n-gram encoding. We found out that certain n-gram files contain also n-grams of length (n-k) and that these shortened n-grams generate duplicates. You have to make sure that such n-grams are filtered out when parsing the zip-files.

$Id:,v 1.1 2011/04/10 16:41:25 trenkman Exp $

Constructor Summary
Method Summary
static void index( ngramDir, indexDir)
static void main(java.lang.String[] args)
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public GoogleBooks()
Method Detail


public static final void index( ngramDir,


public static void main(java.lang.String[] args)