TupleDataset
- class lightning_ir.data.dataset.TupleDataset(tuples_dataset: str, targets: 'order' | 'score' = 'order', num_docs: int | None = None)[source]
Bases:
_IRDataset
,IterableDataset
- __init__(tuples_dataset: str, targets: 'order' | 'score' = 'order', num_docs: int | None = None) None [source]
Dataset containing tuples of a query and n-documents. Used for fine-tuning models on ranking tasks.
- Parameters:
tuples_dataset (str) – Path to file containing tuples or valid ir_datasets id
targets (Literal["order", "score"], optional) – The data type to use as targets for a model during fine-tuning, defaults to “order”
num_docs (int | None, optional) – Maximum number of documents per query, defaults to None
Methods
__init__
(tuples_dataset[, targets, num_docs])Dataset containing tuples of a query and n-documents.
Attributes
Map of dataset names with dashes to dataset names with slashes.
Dataset name.
Dataset id.
Documents in the dataset.
ID of the dataset containing the documents.
Instance of ir_datasets.Dataset.
Qrels in the dataset.
Queries in the dataset.
- property DASHED_DATASET_MAP: Dict[str, str]
Map of dataset names with dashes to dataset names with slashes.
- Returns:
Dataset map
- Return type:
Dict[str, str]
- property docs: Docstore | Dict[str, GenericDoc]
Documents in the dataset.
- Raises:
ValueError – If no documents are found in the dataset
- Returns:
Documents
- Return type:
ir_datasets.indices.Docstore | Dict[str, GenericDoc]
- property docs_dataset_id: str
ID of the dataset containing the documents.
- Returns:
Document dataset id
- Return type:
str
- property ir_dataset: Dataset | None
Instance of ir_datasets.Dataset.
- Returns:
ir_datasets dataset
- Return type:
ir_datasets.Dataset | None