TupleDataset

class lightning_ir.data.dataset.TupleDataset(tuples_dataset: str, targets: 'order' | 'score' = 'order', num_docs: int | None = None)[source]

Bases: _IRDataset, IterableDataset

__init__(tuples_dataset: str, targets: 'order' | 'score' = 'order', num_docs: int | None = None) None[source]

Dataset containing tuples of a query and n-documents. Used for fine-tuning models on ranking tasks.

Parameters:
  • tuples_dataset (str) – Path to file containing tuples or valid ir_datasets id

  • targets (Literal["order", "score"], optional) – The data type to use as targets for a model during fine-tuning, defaults to “order”

  • num_docs (int | None, optional) – Maximum number of documents per query, defaults to None

Methods

__init__(tuples_dataset[, targets, num_docs])

Dataset containing tuples of a query and n-documents.

Attributes

DASHED_DATASET_MAP

Map of dataset names with dashes to dataset names with slashes.

dataset

Dataset name.

dataset_id

Dataset id.

docs

Documents in the dataset.

docs_dataset_id

ID of the dataset containing the documents.

ir_dataset

Instance of ir_datasets.Dataset.

qrels

Qrels in the dataset.

queries

Queries in the dataset.

property DASHED_DATASET_MAP: Dict[str, str]

Map of dataset names with dashes to dataset names with slashes.

Returns:

Dataset map

Return type:

Dict[str, str]

property dataset: str

Dataset name.

Returns:

Dataset name

Return type:

str

property dataset_id: str

Dataset id.

Returns:

Dataset id

Return type:

str

property docs: Docstore | Dict[str, GenericDoc]

Documents in the dataset.

Raises:

ValueError – If no documents are found in the dataset

Returns:

Documents

Return type:

ir_datasets.indices.Docstore | Dict[str, GenericDoc]

property docs_dataset_id: str

ID of the dataset containing the documents.

Returns:

Document dataset id

Return type:

str

property ir_dataset: Dataset | None

Instance of ir_datasets.Dataset.

Returns:

ir_datasets dataset

Return type:

ir_datasets.Dataset | None

property qrels: DataFrame | None

Qrels in the dataset.

Returns:

Qrels

Return type:

pd.DataFrame | None

property queries: Series

Queries in the dataset.

Raises:

ValueError – If no queries are found in the dataset

Returns:

Queries

Return type:

pd.Series