RunDataset
- class lightning_ir.data.dataset.RunDataset(run_path_or_id: Path | str, depth: int = -1, sample_size: int = -1, sampling_strategy: 'single_relevant' | 'top' | 'random' | 'log_random' | 'top_and_random' = 'top', targets: 'relevance' | 'subtopic_relevance' | 'rank' | 'score' | None = None, normalize_targets: bool = False, add_docs_not_in_ranking: bool = False)[source]
Bases:
_IRDataset
,Dataset
- __init__(run_path_or_id: Path | str, depth: int = -1, sample_size: int = -1, sampling_strategy: 'single_relevant' | 'top' | 'random' | 'log_random' | 'top_and_random' = 'top', targets: 'relevance' | 'subtopic_relevance' | 'rank' | 'score' | None = None, normalize_targets: bool = False, add_docs_not_in_ranking: bool = False) None [source]
Dataset containing a list of queries with a ranked list of documents per query. Subsets of the ranked list can be sampled using different sampling strategies.
- Parameters:
run_path_or_id (Path | str) – Path to a run file or valid ir_datasets id
depth (int, optional) – Depth at which to cut off the ranking. If -1 the full ranking is kept, defaults to -1
sample_size (int, optional) – The number of documents to sample per query, defaults to -1
sampling_strategy (Literal['single_relevant', 'top', 'random', 'log_random', 'top_and_random'], optional) – The sample strategy to use to sample documents, defaults to “top”
targets (Literal['relevance', 'subtopic_relevance', 'rank', 'score'] | None, optional) – The data type to use as targets for a model during fine-tuning. If relevance the relevance judgements are parsed from qrels, defaults to None
normalize_targets (bool, optional) – Whether to normalize the targets between 0 and 1, defaults to False
add_docs_not_in_ranking (bool, optional) – Whether to add relevant to a sample that are in the qrels but not in the ranking, defaults to False
Methods
__init__
(run_path_or_id[, depth, ...])Dataset containing a list of queries with a ranked list of documents per query.
Attributes
Map of dataset names with dashes to dataset names with slashes.
Dataset name.
Dataset id.
Documents in the dataset.
ID of the dataset containing the documents.
Instance of ir_datasets.Dataset.
The qrels in the dataset.
Queries in the dataset.
- property DASHED_DATASET_MAP: Dict[str, str]
Map of dataset names with dashes to dataset names with slashes.
- Returns:
Dataset map
- Return type:
Dict[str, str]
- property docs: Docstore | Dict[str, GenericDoc]
Documents in the dataset.
- Raises:
ValueError – If no documents are found in the dataset
- Returns:
Documents
- Return type:
ir_datasets.indices.Docstore | Dict[str, GenericDoc]
- property docs_dataset_id: str
ID of the dataset containing the documents.
- Returns:
Document dataset id
- Return type:
str
- property ir_dataset: Dataset | None
Instance of ir_datasets.Dataset.
- Returns:
ir_datasets dataset
- Return type:
ir_datasets.Dataset | None