DocSample

class lightning_ir.data.data.DocSample(doc_id: str, doc: str)[source]

Bases: object

A sample of document data containing a document and its id.

Parameters:
  • doc_id (str) – Id of the document

  • doc – Document text

:type doc

__init__(doc_id: str, doc: str) None

Methods

__init__(doc_id, doc)

from_ir_dataset_sample(sample[, text_fields])

Create a DocSample from an ir_datasets sample.

Attributes

doc_id

doc

classmethod from_ir_dataset_sample(sample: GenericDoc, text_fields: Sequence[str] | None = None) DocSample[source]

Create a DocSample from an ir_datasets sample.

Parameters:
  • sample (GenericDoc) – ir_datasets sample

  • text_fields (Sequence[str] | None, optional) – Optional fields to parse the text. If None uses the samples default_text() defaults to None

Returns:

Doc sample

Return type:

DocSample