Static Public Member Functions | |
static final void | index (File ngramDir, File indexDir) throws ZipException, IOException |
static void | main (String[] args) |
A class to demonstrate how to build an inverted index directly from the zip-compressed csv-files of the Google Books N-Gram collection.
As the original dataset that tracks the frequency of n-grams over several years, we want to map each n-gram to its sequence of year/frequency tuples. To do so, the underlying code extracts one n-gram/year/frequency triple from each line, stores these data in a record and puts this record into an instance of Indexer.
Note: It seems that Google made a mistake in their n-gram encoding. We found out that certain n-gram files contain also n-grams of length (n-k) and that these shortened n-grams generate duplicates. You have to make sure that such n-grams are filtered out when parsing the zip-files.
Definition at line 44 of file GoogleBooks.java.
static final void de::aitools::aq::invertedindex::usage::GoogleBooks::index | ( | File | ngramDir, | |
File | indexDir | |||
) | throws ZipException, IOException [inline, static] |
Definition at line 46 of file GoogleBooks.java.
References de::aitools::aq::invertedindex::core::Configuration::setIndexDirectory(), de::aitools::aq::invertedindex::core::Configuration::setKeySorting(), and de::aitools::aq::invertedindex::core::Configuration::setValueSorting().
static void de::aitools::aq::invertedindex::usage::GoogleBooks::main | ( | String[] | args | ) | [inline, static] |
Definition at line 89 of file GoogleBooks.java.