de.aitools.ie.stemming.analysis
Class LeastFrequentSequenceAnalysis

java.lang.Object
  extended by de.aitools.ie.stemming.analysis.LeastFrequentSequenceAnalysis

public class LeastFrequentSequenceAnalysis
extends java.lang.Object

This class provides methods to extract subsequences or skipsequences from texts or words. Here subsequences are consecutive sequences of characters with a certain length, e.g. the word "information" provides the subsequences of length 5 "infor", "nform", "forma" and so on. Skipsequences are subsequences with one character omitted, e.g. the word "information" provides the skipsequences "i.for", "in.or" and so on. Neither the first nor the last character of a subsequence are omitted in a skipsequence.

Author:
apui9892

Constructor Summary
LeastFrequentSequenceAnalysis()
           
 
Method Summary
 java.lang.String[] computeSkipSequences(java.lang.String word, int length)
          Computes and returns skipsequences of certain length from the given word.
 java.lang.String[] computeSubSequences(java.lang.String word, int length)
          Computes and returns subsequences of certain length from the given word.
static void main(java.lang.String[] args)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LeastFrequentSequenceAnalysis

public LeastFrequentSequenceAnalysis()
Method Detail

computeSubSequences

public java.lang.String[] computeSubSequences(java.lang.String word,
                                              int length)
Computes and returns subsequences of certain length from the given word. The return value is an array of subsequences ordered by the occurrence in the original word. All subsequences have similar length as given by the length parameter. If word is shorter than length, an empty array is returned.

e.g. computeSubSequences("information", 5) returns
{"infor", "nform", "forma", "ormat", "rmati", "matio", "ation"}

Parameters:
word - The word to compute subsequences from
length - Desired length of the returned subsequences
Returns:
Array of subsequences

computeSkipSequences

public java.lang.String[] computeSkipSequences(java.lang.String word,
                                               int length)
Computes and returns skipsequences of certain length from the given word. The return value is an array of skipsequences ordered by the occurrence in the original word. All skipsequences have similar length as given by the length parameter. If word is shorter than length, an empty array is returned.

e.g. computeSkipSequences("information", 5) returns
{"i.for", "in.or", "inf.r", "n.orm", "nf.rm", ...}

Parameters:
word - The word to compute skipsequences from
length - Desired length of the returned skipsequences
Returns:
Array of skipsequences

main

public static void main(java.lang.String[] args)
                 throws java.net.URISyntaxException,
                        java.io.IOException
Parameters:
args -
Throws:
java.io.IOException
java.net.URISyntaxException