LightningIRTokenizerClassFactory
- class lightning_ir.base.class_factory.LightningIRTokenizerClassFactory(MixinConfig: Type[LightningIRConfig])[source]
Bases:
LightningIRClassFactory
Class factory for creating derived LightningIRTokenizer classes from HuggingFace tokenizer classes.
- __init__(MixinConfig: Type[LightningIRConfig]) None
Creates a new LightningIRClassFactory.
- Parameters:
MixinConfig (Type[LightningIRConfig]) – LightningIRConfig mixin class
Methods
__init__
(MixinConfig)Creates a new LightningIRClassFactory.
from_backbone_class
(BackboneClass)Creates a derived LightningIRTokenizer from a transformers.PreTrainedTokenizerBase backbone tokenizer.
from_backbone_classes
(BackboneClasses[, ...])Creates derived slow and fastLightningIRTokenizers from a tuple of backbone HuggingFace tokenizer classes.
from_pretrained
(model_name_or_path, *args[, ...])Loads a derived LightningIRTokenizer from a pretrained HuggingFace tokenizer.
get_backbone_config
(model_name_or_path)Grabs the tokenizer configuration class from a checkpoint of a pretrained HuggingFace tokenizer.
get_backbone_model_type
(model_name_or_path, ...)Grabs the model type from a checkpoint of a pretrained HuggingFace tokenizer.
get_lightning_ir_config
(model_name_or_path)Grabs the Lightning IR configuration class from a checkpoint of a pretrained Lightning IR model.
get_lightning_ir_model_type
(model_name_or_path)Grabs the Lightning IR model type from a checkpoint of a pretrained HuggingFace model.
Attributes
Camel case model type of the Lightning IR model.
- from_backbone_class(BackboneClass: Type[PreTrainedTokenizerBase]) Type[LightningIRTokenizer] [source]
Creates a derived LightningIRTokenizer from a transformers.PreTrainedTokenizerBase backbone tokenizer. If the backbone tokenizer is already a LightningIRTokenizer, it is returned as is.
- Parameters:
BackboneClass (Type[PreTrainedTokenizerBase]) – Backbone tokenizer class
- Returns:
Derived LightningIRTokenizer
- Return type:
Type[LightningIRTokenizer]
- from_backbone_classes(BackboneClasses: Tuple[Type[PreTrainedTokenizerBase] | None, Type[PreTrainedTokenizerBase] | None], BackboneConfig: Type[PretrainedConfig] | None = None) Tuple[Type[LightningIRTokenizer] | None, Type[LightningIRTokenizer] | None] [source]
Creates derived slow and fastLightningIRTokenizers from a tuple of backbone HuggingFace tokenizer classes.
- Parameters:
BackboneClasses (Tuple[Type[PreTrainedTokenizerBase] | None, Type[PreTrainedTokenizerBase] | None]) – Slow and fast backbone tokenizer classes
BackboneConfig (Type[PretrainedConfig], optional) – Backbone configuration class, defaults to None
- Returns:
Slow and fast derived LightningIRTokenizers
- Return type:
Tuple[Type[LightningIRTokenizer] | None, Type[LightningIRTokenizer] | None]
- from_pretrained(model_name_or_path: str | Path, *args, use_fast: bool = True, **kwargs) Type[LightningIRTokenizer] [source]
Loads a derived LightningIRTokenizer from a pretrained HuggingFace tokenizer.
- Parameters:
model_name_or_path (str | Path) – Path to the tokenizer or its name
use_fast (bool, optional) – Whether to use the fast or slow tokenizer, defaults to True
- Raises:
ValueError – If use_fast is True and no fast tokenizer is found
ValueError – If use_fast is False and no slow tokenizer is found
- Returns:
Derived LightningIRTokenizer
- Return type:
Type[LightningIRTokenizer]
- static get_backbone_config(model_name_or_path: str | Path) PretrainedConfig [source]
Grabs the tokenizer configuration class from a checkpoint of a pretrained HuggingFace tokenizer.
- Parameters:
model_name_or_path (str | Path) – Path to the tokenizer or its name
- Returns:
Configuration class of the backbone tokenizer
- Return type:
PretrainedConfig
- static get_backbone_model_type(model_name_or_path: str | Path, *args, **kwargs) str [source]
Grabs the model type from a checkpoint of a pretrained HuggingFace tokenizer.
- Parameters:
model_name_or_path (str | Path) – Path to the tokenizer or its name
- Returns:
Model type of the backbone tokenizer
- Return type:
str
- static get_lightning_ir_config(model_name_or_path: str | Path) Type[LightningIRConfig] | None
Grabs the Lightning IR configuration class from a checkpoint of a pretrained Lightning IR model.
- Parameters:
model_name_or_path (str | Path) – Path to the model or its name
- Returns:
Configuration class of the Lightning IR model
- Return type:
Type[LightningIRConfig]
- static get_lightning_ir_model_type(model_name_or_path: str | Path) str | None
Grabs the Lightning IR model type from a checkpoint of a pretrained HuggingFace model.
- Parameters:
model_name_or_path (str | Path) – Path to the model or its name
- Returns:
Model type of the Lightning IR model
- Return type:
str | None