de.aitools.aq.bighashmap.usage
Class GoogleBooks

java.lang.Object
  extended by de.aitools.aq.bighashmap.usage.GoogleBooks

public class GoogleBooks
extends java.lang.Object

A class to demonstrate how to build a BigHashMap directly from the zip-compressed csv-files of the Google Books N-Gram collection.

Unlike the original dataset that tracks the frequency of n-grams over several years, we want to map each n-gram to exactly one frequency value. To do so, the underlying code first accumulates the frequencies given for an n-gram and writes this value together with the n-gram as one record.

It turns out that this task can only be accomplished successfully, if the following issues are handled properly.

Version:
$Id: GoogleBooks.java,v 1.4 2011/04/09 23:22:46 trenkman Exp $
Author:
martin.trenkmann@uni-weimar.de

Constructor Summary
GoogleBooks()
           
 
Method Summary
static void createRecordFiles(java.io.File srcDir, java.io.File desDir)
           
static void main(java.lang.String[] args)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GoogleBooks

public GoogleBooks()
Method Detail

createRecordFiles

public static final void createRecordFiles(java.io.File srcDir,
                                           java.io.File desDir)
                                    throws java.util.zip.ZipException,
                                           java.io.IOException
Throws:
java.util.zip.ZipException
java.io.IOException

main

public static void main(java.lang.String[] args)