Natural Language Processing Magdalena Wolska
Contents I. Introduction
Objectives
Related Fields 1. Linguistics
Literature Natural Language Processing:
Literature Top-tier natural language processing conferences:
Literature Other relevant natural language processing conferences:
Software Annotation Software:
Software Algorithm Collections:
Chapter NLP:I I. Introduction
Goals of Language Technology 1. Aid humans in writing.
Remarks:
Chapter NLP:I I. Introduction
Examples of NLP Systems Writing Aid: Spelling and Grammar Checking
Examples of NLP Systems Writing Aid: Spelling and Grammar Checking
Examples of NLP Systems Writing Aid: Spelling and Grammar Checking
Remarks:
Examples of NLP Systems Question Answering: IBM Watson at Jeopardy
Examples of NLP Systems Question Answering: IBM Watson at Jeopardy
Remarks:
Examples of NLP Systems Question Answering: IBM Watson at Jeopardy
Examples of NLP Systems Question Answering: IBM Watson at Jeopardy
Examples of NLP Systems Question Answering: IBM Watson at Jeopardy
Examples of NLP Systems Question Answering: IBM Watson at Jeopardy
Examples of NLP Systems Question Answering: Jeopardy Revisited
Examples of NLP Systems Question Answering: Jeopardy Revisited
Chapter NLP:I I. Introduction
NLP Problems State of Affairs: Mostly Solved
NLP Problems State of Affairs: Mostly Solved
NLP Problems State of Affairs: Making Good Progress
NLP Problems State of Affairs: Making Good Progress
NLP Problems State of Affairs: Making Good Progress
NLP Problems State of Affairs: Still Challenging
NLP Problems State of Affairs: Still Challenging
NLP Problems State of Affairs: Still Challenging
NLP Problems State of Affairs: Still Challenging
NLP Problems State of Affairs: Still Challenging
Remarks:
Chapter NLP:I I. Introduction
Challenges for NLP Systems Why is NLP hard?
Challenges for NLP Systems Why is NLP hard?
Challenges for NLP Systems Why is NLP hard?
Challenges for NLP Systems Why is NLP hard?
Challenges for NLP Systems Why is NLP hard?
Challenges for NLP Systems Why is NLP hard?
Challenges for NLP Systems Why is NLP hard?
Challenges for NLP Systems Why is NLP hard?
Challenges for NLP Systems Why is NLP hard?
Challenges for NLP Systems Why is NLP hard?
Challenges for NLP Systems Why is NLP hard?
Challenges for NLP Systems Why is NLP hard?
Chapter NLP:II II. Corpus Linguistics
Empirical Research 1. Quantitative research based on numbers and statistics.
Empirical Research 1. Quantitative research based on numbers and statistics.
Empirical Research 1. Quantitative research based on numbers and statistics.
Empirical Research Research Questions
Empirical Research Research Questions
Empirical Research Research Questions
Empirical Research Empirical Research in NLP
Empirical Research Empirical Research in NLP
Empirical Research Evaluation Measures
Empirical Research Effectiveness
Empirical Research Classification Effectiveness: Instance Types
Empirical Research Classification Effectiveness: Evaluation based on the Instance Types
Empirical Research Classification Effectiveness: Accuracy
Empirical Research Classification Effectiveness: Limitations of Accuracy
Empirical Research Classification Effectiveness: Precision and Recall
Empirical Research Classification Effectiveness: Precision and Recall Implications
Empirical Research Classification Effectiveness: Interplay between Precision and Recall
Empirical Research Classification Effectiveness: F1-Score
Empirical Research Classification Effectiveness: F1-Score Generalization
Empirical Research Classification Effectiveness: F1-Score Issue in Tasks with Boundary Detection
Empirical Research Classification Effectiveness: Other F1-Score Issues
Empirical Research Classification Effectiveness: Micro- and Macro-Averaging
Empirical Research Classification Effectiveness: Confusion Matrix for Micro- and Macro-Averaging
Empirical Research Classification Effectiveness: Computing Micro- and Macro-Averages
Empirical Research Regression Effectiveness
Empirical Research Regression Effectiveness: Types of Regression Errors
Empirical Research Regression Effectiveness: Computation
Empirical Research Other Measures
Empirical Research Experiments
Empirical Research Datasets
Empirical Research Types of Evaluation: Training, Validation, and Test Set
Empirical Research Types of Evaluation: Cross-Validation
Empirical Research Types of Evaluation: Variations
Empirical Research Training Data
Empirical Research Comparison
Empirical Research Comparison: Upper and Lower Bounds
Empirical Research Comparison: Types of Baselines
Empirical Research Comparison: Exemplary Baselines
Empirical Research Comparison: Implications
Chapter NLP:II II. Corpus Linguistics
Hypothesis Testing Statistics
Hypothesis Testing Statistics: Variables and Scales
Hypothesis Testing Descriptive Statistics
Hypothesis Testing Descriptive Statistics: Central Tendency and its Disperson
Hypothesis Testing Descriptive Statistics: Normal Distribution
Hypothesis Testing Descriptive Statistics: Standard Scores
Hypothesis Testing Inferential Statistics
Hypothesis Testing Inferential Statistics: Hypotheses
Hypothesis Testing Four Steps of Hypothesis Testing
Hypothesis Testing Effect Size
Hypothesis Testing What Test to Choose
Hypothesis Testing Assumptions
Hypothesis Testing The Student’s t-Test
Hypothesis Testing One-Sample t-Test
Hypothesis Testing Dependent t-Test (aka paired-sample test)
Hypothesis Testing Independent t-Test
Hypothesis Testing The Student’s t-Test: What to do with the t-Score?
Hypothesis Testing Example: One-Tailed One-Sample t-Test
Chapter NLP:II II. Corpus Linguistics
Text Corpora Corpus Linguistics
Text Corpora Corpus Linguistics
Text Corpora Definition 1 (Text Corpus [Butler 2004])
Text Corpora Text as Data
Text Corpora Text as Data
Text Corpora Metadata
Text Corpora Research in Language Use
Text Corpora Research in Language Use
Text Corpora Vocabulary Growth: Heaps’ Law
Text Corpora Vocabulary Growth: Heaps’ Law
Text Corpora Term Frequency: Zipf’s Law
Text Corpora Term Frequency: Zipf’s Law
Text Corpora Term Frequency: Zipf’s Law
Text Corpora Term Frequency: Zipf’s Law
Remarks:
Text Corpora Term Frequency: Zipf’s Law
Text Corpora Term Frequency: Zipf’s Law
Remarks:
Text Corpora Term Frequency: Zipf’s Law
Text Corpora Term Frequency: Zipf’s Law
Text Corpora n-grams
Text Corpora n-grams
Text Corpora n-grams
Text Corpora n-gram Corpora
Text Corpora n-gram Corpora
Chapter NLP:II II. Corpus Linguistics
Data Acquisition Data Sources
Data Acquisition Newspapers
Data Acquisition Blogs and Forums
Data Acquisition Social network
Data Acquisition Other Sources
Chapter NLP:III III. Text Models
Text Preprocessing Overview
Text Preprocessing Overview
Text Preprocessing Overview
Text Preprocessing Overview
Text Preprocessing Preprocessing Pipeline
Remarks: Annotation is skipped when the annotations are not needed for further processing.
Text Preprocessing Token Normalization
Text Preprocessing Token Normalization: Regular Expressions
Text Preprocessing Token Normalization: Regular Expressions
Text Preprocessing Token Normalization: Regular Expressions
Text Preprocessing Token Normalization: Regular Expressions
Text Preprocessing Token Normalization: Regular Expressions
Text Preprocessing Token Normalization: Regular Expressions
Text Preprocessing Token Normalization: Regular Expressions
Text Preprocessing Token Normalization: Regular Expressions
Text Preprocessing Token Normalization: Regular Expressions
Text Preprocessing Token Normalization: Regular Expressions
Text Preprocessing Token Normalization: Regular Expressions Summary
Text Preprocessing Tokenization
Text Preprocessing Tokenization: Special Cases
Remarks:
Text Preprocessing Tokenization: Approaches
Text Preprocessing Tokenization: Rule-based [Jurafsky and Martin, 2007] [Grefenstette, 1999]
Text Preprocessing Tokenization: Rule-based [Jurafsky and Martin, 2007] [Grefenstette, 1999]
Remarks:
Text Preprocessing Problems of Rule-based Tokenization
Text Preprocessing Tokenization: Byte-Pair Encoding
Text Preprocessing Tokenization: Byte-Pair Encoding
Text Preprocessing Tokenization: Byte-Pair Encoding
Text Preprocessing Tokenization: Byte-Pair Encoding
Text Preprocessing Tokenization: Byte-Pair Encoding
Text Preprocessing Tokenization: Byte-Pair Encoding Rule Finding
Text Preprocessing Tokenization: Byte-Pair Encoding Rule Finding
Text Preprocessing Tokenization: Byte-Pair Encoding Rule Finding
Text Preprocessing Tokenization: Byte-Pair Encoding Rule Finding
Text Preprocessing Tokenization: Byte-Pair Encoding Rule Finding
Text Preprocessing Tokenization: Byte-Pair Encoding Rule Finding
Text Preprocessing Tokenization: Byte-Pair Encoding Rule Finding
Text Preprocessing Tokenization: Byte-Pair Encoding Rule Finding
Remarks:
Text Preprocessing Tokenization: Token Removal
Text Preprocessing Tokenization: Token Removal
Text Preprocessing Tokenization: Token Removal
Chapter NLP:III III. Text Models
Text Representation Models of Representation
Text Representation Token Representations
Text Representation Document Representation
Text Representation Document Representation: Bag of Words Metaphor
Text Representation Document Representation: Vector Space Model [Salton et. al. 1975]
Text Representation Document Representation: Vector Space Model [Salton et. al. 1975]
Remarks: DTMs can become very large and very sparse (approx. 95% of elements are zero).
Text Representation Term Weighting: tf ·idf
Text Representation Term Weighting: tf ·idf
Text Representation Term Weighting: tf ·idf
Text Representation Term Weighting: tf ·idf
Text Representation Term Weighting: tf ·idf
Text Representation Term Weighting: tf ·idf Example
Text Representation Term Weighting: tf ·idf Example
Text Representation Term Weighting: tf ·idf Example
Text Representation Term Weighting: tf ·idf Example
Remarks:
Text Representation Vocabulary Pruning
Text Representation Distributional Representations of Words
Text Representation Distributional Representations of Words
Text Representation Word2Vec
Text Representation Sentence Embeddings
Text Representation Sentence Embeddings
Chapter NLP:III III. Text Models
Text Similarity Text can be similar in different ways:
Text Similarity Text can be similar in different ways:
Text Similarity Similarity Measures
Text Similarity String-based Similarity: Hamming Distance
Text Similarity String-based Similarity: Levenshtein Distance
Text Similarity String-based Similarity
Remarks:
Text Similarity Resource-based Similarity: Thesaurus relations
Text Similarity Resource-based Similarity
Text Similarity Vector Distance
Text Similarity Vector Distance
Text Similarity Vector Similarity: Cosine Similarity
Text Similarity Vector Similarity: Cosine Similarity
Text Similarity Vector Similarity: Jaccard Similarity
Text Similarity Vector Similarity: Divergence
Text Similarity Vector Similarity
Remarks: Count vectors can be transformed into probability distributions (cf. Probability Mass
Similarity Measures Word Vector Similarity: Sentence Embeddings
Similarity Measures Word Vector Similarity: Sentence Embeddings
Similarity Measures Word Vector Similarity: Word Mover Distance
Similarity Measures Word Vector Similarity: Word Mover Distance
Similarity Measures Word Vector Similarity: Word Mover Distance
Text Similarity Word Vector Similarity
Chapter NLP:III III. Text Models
Text Classification Text Classification Problems
Text Classification Text Classification Problems
Text Classification Text Classification Problems
Text Classification Text Classification Problems
Text Classification Text Classification Problems
Text Classification Text Classification Problems
Text Classification Classification Tasks
Text Classification Classification Tasks
Text Classification Classification Tasks
Remarks: Classification and Regression in NLP
Text Classification Classification Tasks: Classes C
Text Classification Classification Tasks: Objects O
Remarks: Many (non-neural) classification algorithms work for |C| = 2 classes only. Multi-class and
Text Classification Feature space X
Text Classification Feature Engineering
Text Classification Content Features
Text Classification Linguistic Structure Features
Text Classification Task-specific features (a selection)
Text Classification Feature Engineering
Text Classification Representation Learning
Text Classification Representation Learning
Text Classification Feature Space Size
Text Classification Feature Space Size
Text Classification Token Classification
Text Classification Token Classification
Text Classification Token Classification
Text Classification Token Classification
Remarks:
Text Classification Common Classification Algorithms
Text Classification Evaluation
Text Classification Dataset Preparation
Text Classification Dataset Preparation: Negative Instances
Text Classification Dataset Preparation: Negative Instances
Text Classification Dataset Preparation: Mapping of Target Variable Values
Text Classification Dataset Preparation: Balancing Datasets
Text Classification Dataset Preparation: Balancing Datasets
Text Classification Dataset Preparation: Balancing Datasets
Text Classification Dataset Preparation: Balancing Datasets
Dataset Preparations Undersampling vs. Oversampling
Text Classification Case Study: Review Sentiment Analysis
Text Classification Review Sentiment Analysis: Review Argumentation
Text Classification Review Sentiment Analysis: Evaluation of Standard Features
Text Classification Review Sentiment Analysis: Specific Feature Types
Text Classification Review Sentiment Analysis: Specific Feature Types
Text Classification Review Sentiment Analysis: Evaluation of the Specific Feature Types
Text Classification Review Sentiment Analysis: Results and Discussion for the Specific Features
Text Classification Review Sentiment Analysis: Impact of Preprocessing
Chapter NLP:III III. Text Models
Language Modeling Definition 1 (Language Model)
Remarks: Modeling P (wn |w1 , ..., wn−1 ) is sometimes called autoregressive or causal language
Language Modeling Implications from the Definition
Language Modeling Implications from the Definition
Language Modeling Language Model Estimation
Language Modeling Language Model Estimation
Language Modeling Language Model Estimation
Language Modeling Language Model Estimation
Remarks: The condition for Bayes’ theorem is that the predicted events are mutually exclusive. Hence,
Language Modeling Bigram Model
Language Modeling Bigram Model: Example
Language Modeling Bigram Model: Example
Language Modeling Bigram Model: Example
Language Modeling Bigram Model: Example
Language Modeling The N-gram Model
Language Modeling Improving the N-gram Model: Smoothing
Language Modeling Denominator Smoothing: Stupid Backoff
Language Modeling Denominator Smoothing: Linear Interpolation
Language Modeling Denominator Smoothing: Linear Interpolation
Language Modeling Numerator Smoothing: Add-one (Laplace) smoothing
Language Modeling Numerator Smoothing: Add-one (Laplace) smoothing
Language Modeling Numerator Smoothing: Add-one (Laplace) smoothing
Language Modeling Numerator Smoothing: Good-Turing smoothing
Language Modeling Numerator Smoothing: Kneser-Ney Smoothing
Language Modeling Numerator Smoothing: Kneser-Ney Smoothing
Language Modeling Numerator Smoothing: Kneser-Ney Smoothing
Language Modeling Evaluation: Perplexity
Remarks: Minimizing perplexity is the same as maximizing probability
Language Modeling Neural Language Models
Language Modeling Neural Language Models: Feed Forward
Language Modeling Neural Language Models: RNN
Remarks: RNN Notation: t incidates the timestep. W and U are the same at every t.
Language Modeling Conditional Language Modeling
Remarks: Conditional language models are the basis for large language models (LLMs).
Chapter NLP:IV IV. Words
Morphology Overview [Hancox 1996]
Morphology Overview [Hancox 1996]
Morphology Overview [Hancox 1996]
Morphology Overview [Hancox 1996]
Morphology Stemming
Morphology Stemming: Principles [Frakes 1992]
Morphology Stemming: Affix Elimination
Morphology Stemming: Porter Stemmer
Morphology Stemming: Porter Stemmer
Morphology Remarks:
Morphology Stemming: Porter Stemmer
Morphology Stemming: Porter Stemmer
Morphology Stemming: Porter Stemmer
Morphology Stemming: Porter Stemmer
Morphology Stemming: Porter Stemmer
Morphology Stemming: Krovetz Stemmer
Morphology Stemming: Stemmer Comparison
Morphology Stemming: Stemmer Comparison
Morphology Stemming: Stemmer Comparison
Morphology Stemming: Character n-grams [McNamee et al. 2004] [McNamee et al. 2008]
Morphology Stemming: Character n-grams [McNamee et al. 2004] [McNamee et al. 2008]
Morphology Lemmatization
Chapter NLP:IV IV. Words
Word Classes Definition
Word Classes Traditional grammar
Word Classes Traditional grammar: Example
Remarks:
Remarks:
Word Classes Tagsets
Word Classes Penn Treebank tagset [upenn]
Word Classes Penn Treebank tagset [upenn]
Word Classes Penn Treebank tagset [upenn]
Word Classes Penn Treebank tagset [upenn]
Word Classes Penn Treebank tagset [upenn]
Word Classes Universal Dependencies tagset [UD]
Word Classes Universal Dependencies tagset [UD]
Word Classes Ambiguities
Remarks:
Word Classes Part-of-Speech Tagging
Word Classes Part-of-Speech Tagging
Word Classes Part-of-Speech Tagging: Maximum Likelihood Estimate
Word Classes Part-of-Speech Tagging: Brill Tagger [Brill 1992]
Word Classes Part-of-Speech Tagging: Brill Tagger [Brill 1992]
Word Classes Part-of-Speech Tagging: Brill Tagger [Brill 1994]
Word Classes Part-of-Speech Tagging: Brill Tagger [Brill 1994]
Word Classes Part-of-Speech Tagging: Token Classification
Word Classes Part-of-Speech Tagging
Remarks:
Chapter NLP:IV IV. Words
Named Entities Entities
Named Entities Named Entities
Named Entities Named Entities
Remarks: Named entity tagsets vary by corpus and use case:
Named Entities Named Entity Recognition
Named Entities BIO Tagging
Named Entities BIO Tagging
Remarks: Two popular variations of BIO are IO and BIOES.