Sampler
- class lightning_ir.data.dataset.Sampler[source]
Bases:
object
Helper class for sampling subsets of documents from a ranked list.
Methods
__init__
()log_random
(documents, sample_size)Sampling strategy to randomly sample documents with a higher probability to sample documents from the top of the ranking.
random
(documents, sample_size)Sampling strategy to randomly sample
sample_size
documents.sample
(df, sample_size, sampling_strategy)Samples a subset of documents from a ranked list given a sampling_strategy.
single_relevant
(documents, sample_size)Sampling strategy to randomly sample a single relevant document.
top
(documents, sample_size)Sampling strategy to randomly sample a single relevant document.
top_and_random
(documents, sample_size)Sampling strategy to randomly sample half the
sample_size
documents from the top of the ranking and the other half randomly.- static log_random(documents: DataFrame, sample_size: int) DataFrame [source]
Sampling strategy to randomly sample documents with a higher probability to sample documents from the top of the ranking.
- Parameters:
documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample
- Returns:
Sampled documents
- Return type:
pd.DataFrame
- static random(documents: DataFrame, sample_size: int) DataFrame [source]
Sampling strategy to randomly sample
sample_size
documents.- Parameters:
documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample
- Returns:
Sampled documents
- Return type:
pd.DataFrame
- static sample(df: DataFrame, sample_size: int, sampling_strategy: 'single_relevant' | 'top' | 'random' | 'log_random' | 'top_and_random') DataFrame [source]
Samples a subset of documents from a ranked list given a sampling_strategy.
- Parameters:
documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample
- Returns:
Sampled documents
- Return type:
pd.DataFrame
- static single_relevant(documents: DataFrame, sample_size: int) DataFrame [source]
Sampling strategy to randomly sample a single relevant document. The remaining
sample_size - 1
are non-relevant.- Parameters:
documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample
- Returns:
Sampled documents
- Return type:
pd.DataFrame
- static top(documents: DataFrame, sample_size: int) DataFrame [source]
Sampling strategy to randomly sample a single relevant document. The remaining
sample_size - 1
are non-relevant.- Parameters:
documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample
- Returns:
Sampled documents
- Return type:
pd.DataFrame
- static top_and_random(documents: DataFrame, sample_size: int) DataFrame [source]
Sampling strategy to randomly sample half the
sample_size
documents from the top of the ranking and the other half randomly.- Parameters:
documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample
- Returns:
Sampled documents
- Return type:
pd.DataFrame