Model Zoo

The following table lists models from the HuggingFace Model Hub that are supported in Lightning IR. For each model, the table reports the re-ranking effectiveness in terms of nDCG@10 on the officially released run files containing 1,000 passages for TREC Deep Learning 2019 and 2020.

Native models were fine-tuned using Lightning IR and the model’s HuggingFace model card provides Lightning IR configurations for reproduction. Non-native models were fine-tuned externally but are supported in Lightning IR for inference.

Reproduction

The following command and configuration can be used to reproduce the results:

config.yaml

trainer:
  logger: false
  enable_checkpointing: false
model:
  class_path: CrossEncoderModule # for cross-encoders
  # class_path: BiEncoderModule # for bi-encoders
  init_args:
    model_name_or_path: {MODEL_NAME}
    evaluation_metrics:
    - nDCG@10
data:
  class_path: LightningIRDataModule
  init_args:
    inference_datasets:
    - class_path: RunDataset
      init_args:
      run_path_or_id: msmarco-passage/trec-dl-2019/judged
    - class_path: RunDataset
      init_args:
      run_path_or_id: msmarco-passage/trec-dl-2020/judged

lightning-ir re_rank --config config.yaml

Model Name	Native	TREC DL 2019	TREC DL 2020
*Cross-Encoders*
webis/monoelectra-base	✅	0.751	0.769
webis/monoelectra-large	✅	0.750	0.791
castorini/monot5-base-msmarco	❌	0.723	0.714
castorini/monot5-large-msmarco	❌	0.720	0.728
castorini/monot5-3b-msmarco	❌	0.726	0.752
Soyoung97/RankT5-base	❌	0.734	0.745
Soyoung97/RankT5-large	❌	0.737	0.759
Soyoung97/RankT5-3b	❌	0.721	0.776
*Bi-Encoders*
webis/bert-bi-encoder	✅	0.711	0.714
sentence-transformers/msmarco-bert-base-dot-v5	❌	0.705	0.735
webis/colbert	✅	0.751	0.749
colbert-ir/colbertv2.0	❌	0.732	0.746
webis/splade	✅	0.736	0.723
naver/splade-v3	❌	0.715	0.749