Sampler

class lightning_ir.data.dataset.Sampler[source]

Bases: object

Helper class for sampling subsets of documents from a ranked list.

__init__()

Methods

__init__()

log_random(documents, sample_size)

Sampling strategy to randomly sample documents with a higher probability to sample documents from the top of the ranking.

random(documents, sample_size)

Sampling strategy to randomly sample sample_size documents.

sample(df, sample_size, sampling_strategy)

Samples a subset of documents from a ranked list given a sampling_strategy.

single_relevant(documents, sample_size)

Sampling strategy to randomly sample a single relevant document.

top(documents, sample_size)

Sampling strategy to randomly sample a single relevant document.

top_and_random(documents, sample_size)

Sampling strategy to randomly sample half the sample_size documents from the top of the ranking and the other half randomly.

static log_random(documents: DataFrame, sample_size: int) DataFrame[source]

Sampling strategy to randomly sample documents with a higher probability to sample documents from the top of the ranking.

Parameters:
  • documents (pd.DataFrame) – Ranked list of documents

  • sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame

static random(documents: DataFrame, sample_size: int) DataFrame[source]

Sampling strategy to randomly sample sample_size documents.

Parameters:
  • documents (pd.DataFrame) – Ranked list of documents

  • sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame

static sample(df: DataFrame, sample_size: int, sampling_strategy: 'single_relevant' | 'top' | 'random' | 'log_random' | 'top_and_random') DataFrame[source]

Samples a subset of documents from a ranked list given a sampling_strategy.

Parameters:
  • documents (pd.DataFrame) – Ranked list of documents

  • sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame

static single_relevant(documents: DataFrame, sample_size: int) DataFrame[source]

Sampling strategy to randomly sample a single relevant document. The remaining sample_size - 1 are non-relevant.

Parameters:
  • documents (pd.DataFrame) – Ranked list of documents

  • sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame

static top(documents: DataFrame, sample_size: int) DataFrame[source]

Sampling strategy to randomly sample a single relevant document. The remaining sample_size - 1 are non-relevant.

Parameters:
  • documents (pd.DataFrame) – Ranked list of documents

  • sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame

static top_and_random(documents: DataFrame, sample_size: int) DataFrame[source]

Sampling strategy to randomly sample half the sample_size documents from the top of the ranking and the other half randomly.

Parameters:
  • documents (pd.DataFrame) – Ranked list of documents

  • sample_size (int) – Number of documents to sample

Returns:

Sampled documents

Return type:

pd.DataFrame