Sampler

class lightning_ir.data.dataset.Sampler[source]

Bases: object

Helper class for sampling subsets of documents from a ranked list.

__init__()

Methods

`__init__`()
`log_random`(documents, sample_size)	Sampling strategy to randomly sample documents with a higher probability to sample documents from the top of the ranking.
`random`(documents, sample_size)	Sampling strategy to randomly sample `sample_size` documents.
`sample`(df, sample_size, sampling_strategy)	Samples a subset of documents from a ranked list given a sampling_strategy.
`single_relevant`(documents, sample_size)	Sampling strategy to randomly sample a single relevant document.
`top`(documents, sample_size)	Sampling strategy to randomly sample a single relevant document.
`top_and_random`(documents, sample_size)	Sampling strategy to randomly sample half the `sample_size` documents from the top of the ranking and the other half randomly.

static log_random(documents: DataFrame, sample_size: int) → DataFrame[source]

Sampling strategy to randomly sample documents with a higher probability to sample documents from the top of the ranking.

Parameters:

documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame

static random(documents: DataFrame, sample_size: int) → DataFrame[source]

Sampling strategy to randomly sample sample_size documents.

Parameters:

documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame

static sample(df: DataFrame, sample_size: int, sampling_strategy: 'single_relevant' | 'top' | 'random' | 'log_random' | 'top_and_random') → DataFrame[source]

Samples a subset of documents from a ranked list given a sampling_strategy.

Parameters:

documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame

static single_relevant(documents: DataFrame, sample_size: int) → DataFrame[source]

Sampling strategy to randomly sample a single relevant document. The remaining sample_size - 1 are non-relevant.

Parameters:

documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame

static top(documents: DataFrame, sample_size: int) → DataFrame[source]

Sampling strategy to randomly sample a single relevant document. The remaining sample_size - 1 are non-relevant.

Parameters:

documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame

static top_and_random(documents: DataFrame, sample_size: int) → DataFrame[source]

Sampling strategy to randomly sample half the sample_size documents from the top of the ranking and the other half randomly.

Parameters:

documents (pd.DataFrame) – Ranked list of documents
sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame