0.7.0
Flexible and easy-to-use simulation and evaluation framework for generative IR
A run of GenIRSim consists of two stages, which are executed sequentially by the main run function: simulation and evaluation. During simulation (see the simulate function), a user converses with a generative IR system. During evaluation (see the evaluate function), the generated conversation is judged by one or more evaluators. The run is defined using a configuration in JSON format (see the configuration
parameter of the above mentioned functions and see the configurations directory for examples). If you create your own user, system, or evaluator classes, make sure to register them using the options.additionalUsers/Systems/Evaluators
parameters of the above mentioned functions.
Simulates and evaluates an interaction with a generative information retrieval system.
((Object | string))
The configuration for the simulation
and evaluation, either as object or a JSON string
Name | Description |
---|---|
configuration.simulation Object
|
The configuration for the simulation, see simulate |
configuration.evaluation Object
|
The configuration for the evaluation, see evaluate |
(Object?
= undefined
)
Name | Description |
---|---|
options.logCallback function?
|
The function to consume all LogbookEntry of the simulation and evaluation |
options.additionalUsers Object?
|
Object that contains non-standard {User}
classes as values; if
configuration.simulation.user.class
is the same as a
key of this object, the corresponding class will be instantiated and used
|
options.additionalSystems Object?
|
Object that contains non-standard
{System} classes as values; if
configuration.simulation.system.class
is the
same as a key of this object, the corresponding class will be instantiated
and used
|
options.additionalEvaluators Object?
|
Object that contains non-standard
{Evaluator} classes as values; if
configuration.evaluation.evaluators.[evaluatorName].class
is the same as a
key of this object, the corresponding class will be instantiated and used
|
((Object | Array | String)?
= undefined
)
An object that specifies
the value to replace template variables in the
configuration
(
{{variable}}
) by. If the
configuration
is a
string
, the replacement
happens before JSON parsing, allowing to replace variables with JSON
structures. Variables in the
configuration
for which no value is specified
in
replacements
are ignored. If
replacements
is an array, this function
is executed for each of its elements and the resulting list of evaluation
objects is returned. If
replacements
is a string, it is treated as a
tab-separated values files that specifies an array of replacements: the first
line specifying the variable name of a column and the values in that column
in other lines being the respective replacement.
(Evaluation | Array)
:
The evaluation object or an array of these if
replacements
is an array or string; an empty object is returned in case of
an error
Simulates an interaction with a generative information retrieval system.
Unless you do not want to evaluate, use run instead.
(Object)
The configuration for the simulation
Name | Description |
---|---|
configuration.topic Topic
|
The topic for the simulation |
configuration.user Object
|
The configuration passed to the user in the constructor |
configuration.user.class string
|
The name of the user class, either
one of the standard classes of GenIRSim or one in
additionalUsers
|
configuration.system Object
|
The configuration passed to the system in the constructor |
configuration.system.class string
|
The name of the system class,
either one of the standard classes of GenIRSim or one in
additionalSystems
|
configuration.maxTurns number?
|
The maximum number of user turns to simulate (default: 3) |
(Object?
= undefined
)
Name | Description |
---|---|
options.logCallback function?
|
The function to consume all LogbookEntry of the simulation |
options.additionalUsers Object?
|
Object that contains non-standard
{User} classes as values; if
configuration.user.class
is the same as a key
of this object, the corresponding class will be instantiated and used
|
options.additionalSystems Object?
|
Object that contains
non-standard {System} classes as values; if
configuration.system.class
is
the same as a key of this object, the corresponding class will be
instantiated and used
|
Simulation
:
The simulation object
Evaluates a simulated interaction with a generative information retrieval system.
(Simulation)
The simulation to evaluate
(Object)
The configuration for the evaluation
Name | Description |
---|---|
configuration.evaluators Object
|
An object where each value is
another configuration object that (1) is passed to the respective evaluator
in the the constructor and (2) has a property
class
that is the name of the
evaluator class, either one of the standard classes of GenIRSim or one in
additionalEvaluators
|
(Object?
= undefined
)
Name | Description |
---|---|
options.logCallback function?
|
The function to consume all LogbookEntry of the evaluation |
options.additionalEvaluators Object?
|
Object that contains
non-standard {Evaluator} classes as values; if
configuration.evaluators.[evaluatorName].class
is the same as a key of this
object, the corresponding class will be instantiated and used
|
Evaluation
:
The evaluation object
An evaluator to measure some quality score for single turns of a conversation and/or an entire conversation.
Evaluators can be stateful and must not be re-used between conversations.
The method Evaluator#evaluate must always be called first to
evaluate each turn, in order, starting with turnIndex = 0
, and then to
evaluate the entire conversation (leaving the turnIndex
undefined).
The constructor of an evaluator must have two parameters:
configuration
that has to be passed via super(configuration)
and
is then available via this.configuration
.(Object)
The configuration for the evaluator
Evaluates one specific turn or the entire conversation.
Evaluators can be stateful. This method must always be called first to
evaluate each turn, in order, starting with 0, and then to evaluate the
entire conversation (leaving the turnIndex
undefined). Evaluators must
not be re-used to evaluate multiple conversations.
(Simulation)
The simulation to evaluate
(number)
Index of the user's turn (or rather the
response to that turn) to be evaluated, starting with 0, or undefined to
evaluate the entire conversation
(EvaluationResult | null)
:
The result of the evaluation, with
at least the score property, or
null
if the Evaluator does not evaluate
single turns or the complete conversation and that is what was asked
An evaluator that prompts a language model for a score.
(Object)
The configuration for the evaluator
Name | Description |
---|---|
configuration.llm LLMConfiguration
|
The configuration for the language model to be prompted |
configuration.promt string
|
Template for the prompt to evaluate
the system response. Variables:
|
configuration.requiredKeys Array?
|
The properties that the language model's response must have (in addition to EVALUATION_RESULT.SCORE ) |
(Logbook)
A function that takes log messages
An evaluator that measures the readability of the system response.
A generative information retrieval system.
Systems can be stateful. However, users are not differentiated: the system can assume it is used by exactly one user. A separate system object must be instantiated for each simulated user.
The constructor of a system must have two parameters:
configuration
that has to be passed via super(configuration)
and
is then available via this.configuration
.(Object)
The configuration for the system
Generates a response for the user's utterance.
Systems can be stateful. However, users are not differentiated: the system can assume it is used by exactly one user.
SystemResponse
:
The system's response with a least the
utterance
set
A blackbox retrieval system that implements a basic chat API.
The API needs to consume a JSON object that has at least the property
messages
, which is an array of message objects. Each message object has
the string property role
, which is either assistant
or user
, and the
string property content
that contains the message text.
The API produces a JSON object that has at least the property content
,
which is the message text of the response.
(Object)
The configuration for the system
Name | Description |
---|---|
configuration.url string
|
The URL of the chat endpoint |
configuration.request string
|
The object that is sent to the endpoint on each query with the messages added |
(Logbook)
A function that takes log messages
Retrieves results for the user's query.
SystemResponse
:
The system's response with the
utterance
set
and the complete response of the system as
response
A basic generative information retrieval system implemented using an LLM and a Elasticsearch server.
Properties of the SystemResponse objects that
GenerativeElasticSystem#search produces are determined by the
configuration.generation.message
extended with
SYSTEM_RESPONSE.RESULTS and (the same as one string)
SYSTEM_RESPONSE.RESULTS_PAGE.
(Object)
The configuration for the system
Name | Description |
---|---|
configuration.llm LLMConfiguration
|
The configuration for the language model employed during retrieval |
configuration.preprocessing Object?
|
No preprocessing if this
property is
undefined
|
configuration.preprocessing.message string?
|
Template for the
prompt to preprocess the user's utterance (no preprocessing will happen if
configuration.preprocessing
is
undefined
). The LLM's response must be
formatted as JSON. Variables:
|
configuration.preprocessing.requiredKeys Array?
|
The properties that the preprocessing response must have (none by default) |
configuration.search Object
|
|
configuration.search.url string
|
The complete URL of the
Elasticsearch server's API endpoint (up to but excluding
_search
)
|
configuration.search.query string
|
The Elasticsearch query object
for retrieving results, but every string in it is treated as a template.
Variables are the same as for
configuration.preprocessing.message
, plus:
|
configuration.search._source Object?
|
An object that specifies which source attributes to include in the response, see the Elasticsearch documentation |
configuration.search.size number
|
The number of results to retrieve |
configuration.generation Object
|
|
configuration.generation.message string
|
Template for the
prompt to generate a system response for the user's utterance from the
retrieved search results. The LLM's response must be formatted as JSON.
Variables are the same as for
configuration.search.query
, plus:
|
configuration.generation.searchResultKeys Array?
|
The properties of each result that are used to render the result in the generation message |
configuration.generation.requiredKeys Array?
|
The properties that the generated response must have (in addition to SYSTEM_RESPONSE.UTTERANCE ) |
(Logbook)
A function that takes log messages
Retrieves results for the user's query.
SystemResponse
:
The system's response with a least the
utterance
set
Abstract class for simulators of a user of a generative information retrieval system.
Users can be stateful. Calling User#start is equivalent to starting a new conversation. Simple users might reset at the start of that method, whereas others might have a cross-conversation state. In any case, that method must be called at least once before calling User#followUp.
The constructor of a user must have two parameters:
configuration
that has to be passed via super(configuration)
and
is then available via this.configuration
.(Object)
The configuration for the user
Starts a new simulation for the specified topic.
Users can be stateful. Calling this method is equivalent to starting a new conversation. Simple users might reset at the start of this method, whereas others might have a cross-conversation state. In any case, this method must be called at least once before calling User#followUp.
(any)
(Topic)
: The topic
UserTurn
:
The turn with at least the
utterance
set
Follows up on a system response to a previous utterance.
Users can be stateful. The method @{link User#start} must be called at least once before calling this method.
(any)
(SystemResponse)
: The latest response of the system
UserTurn
:
The turn with at least the
utterance
set
A basic user model that does not change during conversation and only looks at the latest response for following up on it.
(Object)
The configuration for the user
Name | Description |
---|---|
configuration.llm LLMConfiguration
|
The configuration for the language model employed during simulation |
configuration.start string
|
Template for the prompt to simulate the
first message for a topic. Variables:
|
configuration.followUp string
|
Template for the prompt to simulate
a follow-up message to a system response. Variables:
|
(Logbook)
A function that takes log messages
User model for the Touche 25 Retrieval-Augmented Debating task. A client tor the corresponding server.
Static methods for filling in text templates.
Replaces occurrences of {{path.to.variable}}
in the text with the
corresponding values in the context object (e.g., replace with
context["path"]["to"]["variable"]
).
If the input is not a string but an object or array, it is recursively cloned and occurences in the contents are replaced. Numbers, boolean, etc. are shallow copied.
(any)
The template string or an object or array structure that
contains template strings (among others)
(Object)
The values of the variables that can be referenced
Converts each row of a tab-separated values text (except the header) to a context object.
(string)
Contents of a tab-separated values file (no
quotations), first line is treated as header that specifies the keys and
the values in the other lines are then the respective values, each line then
being converted to a context object
Array
:
Array of the created context objects
Object that represents the evaluation of a simulation.
Type: Object
(Object)
: The configuration of the evaluation
(Simulation)
: The simulation that was evaluated
(Array)
: For each user turn of the
simulation, in order, an object where the keys are the names of the
configured evaluators (if they evaluated the specific turn of the simulation)
and the values are the respective
EvaluationResult
s (and one
property,
milliseconds
gives the time taken for evaluation in milliseconds)
(Object)
: An object where the keys are the
names of the configured evaluators (if they evaluated the overall simulation)
and the values are the respective
EvaluationResult
s (and one
property,
milliseconds
gives the time taken for evaluation in milliseconds)
(number)
: Time taken for the evaluation in
milliseconds
Object returned by Evaluator#evaluate with at least a score.
Type: Object
Constants for EvaluationResult property names.
A large language model.
(LLMConfiguration)
The configuration object
(Logbook)
The logbook to log to
Generates a chat completion.
(Array)
The message history for the completion, use
LLM#createSystemMessage
,
LLM#createUserMessage
, and
LLM#createAssistantMessage
to create these
(string)
Name of the action for which the text is
generated, used for logging
string
:
The completion
Generates a chat completion in JSON format.
(Array)
The message history for the completion, use
LLM#createSystemMessage
,
LLM#createUserMessage
, and
LLM#createAssistantMessage
to create these
(string)
Name of the action for which the text is
generated, used for logging
(Array?
= []
)
Names of properties that the parsed JSON
completion must have
(number?
= 3
)
Maximum number of times to retry the
completion (if it can not be parsed and is missing a required key) before
throwing an error
Object
:
The completion as parsed object
Configuration for an LLM.
Properties are url
(see below) and all paramters for the chat completion
endpoint, which includes the required model
, but also optional parameters
like options.temperature
(see the
modelfile parameter
of Ollama).
Type: Object
A logbook to log actions specific to one source.
(string)
The source for which to log entries
(string?)
An optional prefix to the action logged
Logs one entry to the logbook.
(string)
The action for which to log
LogbookEntry
:
The logged entry
One entry for the logbook, issued by the source to log for the action.
(string)
The source that produced this entry
(string)
The action for which this entry was produced
Checks whether this entry is a continuation of the previous entry (both belong to the same action).
(LogbookEntry)
The previous entry for the logbook
boolean
:
Whether it is
Object that represents a completed simulation.
Type: Object
Object that represents a system's respone to a user's utterance in the simulated conversation with at least the system's utterance.
Type: Object
(string)
: The utterance of the system
Constants for SystemResponse property names.
Object that represents a topic (or task, information need).
Type: Object
(string)
: A natural language description of the
information task to be accomplished
Object that represents a user's turn in the simulated conversation with at least the user's utterance.
Type: Object
(string)
: The simulated utterance sent from the user to
the system
(SystemResponse)
: The response sent from the system
to the user as a reply
(number?)
: Time taken for simulation in milliseconds
(this property is automatically added by GenIRSim)
Constants for UserTurn property names.
The simulated utterance sent from the user to the system.
The SystemResponse sent from the system to the user as a reply.