GenIRSim

Flexible and easy-to-use simulation and evaluation framework for generative IR

[code] [docker] [package]

A run of GenIRSim consists of two stages, which are executed sequentially by the main run function: simulation and evaluation. During simulation (see the simulate function), a user converses with a generative IR system. During evaluation (see the evaluate function), the generated conversation is judged by one or more evaluators. The run is defined using a configuration in JSON format (see the configuration parameter of the above mentioned functions and see the configurations directory for examples). If you create your own user, system, or evaluator classes, make sure to register them using the options.additionalUsers/Systems/Evaluators parameters of the above mentioned functions.

Simulates and evaluates an interaction with a generative information retrieval system.

run(configuration: (Object | string), options: Object?, replacements: (Object | Array | String)?): (Evaluation | Array)
Parameters
configuration ((Object | string)) The configuration for the simulation and evaluation, either as object or a JSON string
Name Description
configuration.simulation Object The configuration for the simulation, see simulate
configuration.evaluation Object The configuration for the evaluation, see evaluate
options (Object? = undefined)
Name Description
options.logCallback function? The function to consume all LogbookEntry of the simulation and evaluation
options.additionalUsers Object? Object that contains non-standard {User} classes as values; if configuration.simulation.user.class is the same as a key of this object, the corresponding class will be instantiated and used
options.additionalSystems Object? Object that contains non-standard {System} classes as values; if configuration.simulation.system.class is the same as a key of this object, the corresponding class will be instantiated and used
options.additionalEvaluators Object? Object that contains non-standard {Evaluator} classes as values; if configuration.evaluation.evaluators.[evaluatorName].class is the same as a key of this object, the corresponding class will be instantiated and used
replacements ((Object | Array | String)? = undefined) An object that specifies the value to replace template variables in the configuration ( {{variable}} ) by. If the configuration is a string , the replacement happens before JSON parsing, allowing to replace variables with JSON structures. Variables in the configuration for which no value is specified in replacements are ignored. If replacements is an array, this function is executed for each of its elements and the resulting list of evaluation objects is returned. If replacements is a string, it is treated as a tab-separated values files that specifies an array of replacements: the first line specifying the variable name of a column and the values in that column in other lines being the respective replacement.
Returns
(Evaluation | Array): The evaluation object or an array of these if replacements is an array or string; an empty object is returned in case of an error

Simulates an interaction with a generative information retrieval system.

Unless you do not want to evaluate, use run instead.

simulate(configuration: Object, options: Object?): Simulation
Parameters
configuration (Object) The configuration for the simulation
Name Description
configuration.topic Topic The topic for the simulation
configuration.user Object The configuration passed to the user in the constructor
configuration.user.class string The name of the user class, either one of the standard classes of GenIRSim or one in additionalUsers
configuration.system Object The configuration passed to the system in the constructor
configuration.system.class string The name of the system class, either one of the standard classes of GenIRSim or one in additionalSystems
configuration.maxTurns number? The maximum number of user turns to simulate (default: 3)
options (Object? = undefined)
Name Description
options.logCallback function? The function to consume all LogbookEntry of the simulation
options.additionalUsers Object? Object that contains non-standard {User} classes as values; if configuration.user.class is the same as a key of this object, the corresponding class will be instantiated and used
options.additionalSystems Object? Object that contains non-standard {System} classes as values; if configuration.system.class is the same as a key of this object, the corresponding class will be instantiated and used
Returns
Simulation: The simulation object

Evaluates a simulated interaction with a generative information retrieval system.

evaluate(simulation: Simulation, configuration: Object, options: Object?): Evaluation
Parameters
simulation (Simulation) The simulation to evaluate
configuration (Object) The configuration for the evaluation
Name Description
configuration.evaluators Object An object where each value is another configuration object that (1) is passed to the respective evaluator in the the constructor and (2) has a property class that is the name of the evaluator class, either one of the standard classes of GenIRSim or one in additionalEvaluators
options (Object? = undefined)
Name Description
options.logCallback function? The function to consume all LogbookEntry of the evaluation
options.additionalEvaluators Object? Object that contains non-standard {Evaluator} classes as values; if configuration.evaluators.[evaluatorName].class is the same as a key of this object, the corresponding class will be instantiated and used
Returns
Evaluation: The evaluation object

Evaluators

An evaluator to measure some quality score for single turns of a conversation and/or an entire conversation.

Evaluators can be stateful and must not be re-used between conversations. The method Evaluator#evaluate must always be called first to evaluate each turn, in order, starting with turnIndex = 0, and then to evaluate the entire conversation (leaving the turnIndex undefined).

The constructor of an evaluator must have two parameters:

  • The configuration that has to be passed via super(configuration) and is then available via this.configuration.
  • A Logbook that can be used to log the initialization process.
new Evaluator(configuration: Object)
Parameters
configuration (Object) The configuration for the evaluator
Instance Members
evaluate(simulation, turnIndex, logbook)

An evaluator that prompts a language model for a score.

new PromptedEvaluator(configuration: Object, log: Logbook)
Parameters
configuration (Object) The configuration for the evaluator
Name Description
configuration.llm LLMConfiguration The configuration for the language model to be prompted
configuration.promt string Template for the prompt to evaluate the system response. Variables:
  • {{x}}: A property x of the configuration for the evaluator
  • {{variables.simulation}}: The entire Simulation
  • {{variables.userTurn}}: The specific user turn, especially with variables.userTurn.utterance and variables.userTurn.SystemResponse.utterance
configuration.requiredKeys Array? The properties that the language model's response must have (in addition to EVALUATION_RESULT.SCORE )
log (Logbook) A function that takes log messages

An evaluator that measures the readability of the system response.

new ReadabilityEvaluator(configuration: Object, log: Logbook)
Parameters
configuration (Object) The configuration for the evaluator
Name Description
configuration.measure string The key of the measure that should be used to calculate the score
log (Logbook) A function that takes log messages

Systems

A generative information retrieval system.

Systems can be stateful. However, users are not differentiated: the system can assume it is used by exactly one user. A separate system object must be instantiated for each simulated user.

The constructor of a system must have two parameters:

  • The configuration that has to be passed via super(configuration) and is then available via this.configuration.
  • A Logbook that can be used to log the initialization process.
new System(configuration: Object)
Parameters
configuration (Object) The configuration for the system
Instance Members
search(userTurn, logbook)

A blackbox retrieval system that implements a basic chat API.

The API needs to consume a JSON object that has at least the property messages, which is an array of message objects. Each message object has the string property role, which is either assistant or user, and the string property content that contains the message text.

The API produces a JSON object that has at least the property content, which is the message text of the response.

new BasicChatSystem(configuration: Object, log: Logbook)
Parameters
configuration (Object) The configuration for the system
Name Description
configuration.url string The URL of the chat endpoint
configuration.request string The object that is sent to the endpoint on each query with the messages added
log (Logbook) A function that takes log messages
Instance Members
search(userTurn, logbook)

A basic generative information retrieval system implemented using an LLM and a Elasticsearch server.

Properties of the SystemResponse objects that GenerativeElasticSystem#search produces are determined by the configuration.generation.message extended with SYSTEM_RESPONSE.RESULTS and (the same as one string) SYSTEM_RESPONSE.RESULTS_PAGE.

new GenerativeElasticSystem(configuration: Object, log: Logbook)
Parameters
configuration (Object) The configuration for the system
Name Description
configuration.llm LLMConfiguration The configuration for the language model employed during retrieval
configuration.preprocessing Object? No preprocessing if this property is undefined
configuration.preprocessing.message string? Template for the prompt to preprocess the user's utterance (no preprocessing will happen if configuration.preprocessing is undefined ). The LLM's response must be formatted as JSON. Variables:
  • {{x}}: A property x of the configuration for the system
  • {{variables.messages}}: The previous exchange betbeen user and system (assistant) rendered as string (templates#joinMessages)
  • {{variables.userTurn}}: The last UserTurn, especially with variables.userTurn.utterance
configuration.preprocessing.requiredKeys Array? The properties that the preprocessing response must have (none by default)
configuration.search Object
configuration.search.url string The complete URL of the Elasticsearch server's API endpoint (up to but excluding _search )
configuration.search.query string The Elasticsearch query object for retrieving results, but every string in it is treated as a template. Variables are the same as for configuration.preprocessing.message , plus:
  • {{variables.preprocessing}}: The parsed output of the preprocessing (if preprocessing was performed)
configuration.search._source Object? An object that specifies which source attributes to include in the response, see the Elasticsearch documentation
configuration.search.size number The number of results to retrieve
configuration.generation Object
configuration.generation.message string Template for the prompt to generate a system response for the user's utterance from the retrieved search results. The LLM's response must be formatted as JSON. Variables are the same as for configuration.search.query , plus:
  • {{variables.results}}: The retrieved results rendered as a string
configuration.generation.searchResultKeys Array? The properties of each result that are used to render the result in the generation message
configuration.generation.requiredKeys Array? The properties that the generated response must have (in addition to SYSTEM_RESPONSE.UTTERANCE )
log (Logbook) A function that takes log messages
Instance Members
search(userTurn, logbook)

Users

Abstract class for simulators of a user of a generative information retrieval system.

Users can be stateful. Calling User#start is equivalent to starting a new conversation. Simple users might reset at the start of that method, whereas others might have a cross-conversation state. In any case, that method must be called at least once before calling User#followUp.

The constructor of a user must have two parameters:

  • The configuration that has to be passed via super(configuration) and is then available via this.configuration.
  • A Logbook that can be used to log the initialization process.
new User(configuration: Object)
Parameters
configuration (Object) The configuration for the user
Instance Members
start(topic, logbook)
followUp(systemResponse, logbook)

A basic user model that does not change during conversation and only looks at the latest response for following up on it.

new StaticUser(configuration: Object, log: Logbook)
Parameters
configuration (Object) The configuration for the user
Name Description
configuration.llm LLMConfiguration The configuration for the language model employed during simulation
configuration.start string Template for the prompt to simulate the first message for a topic. Variables:
  • {{x}}: A property x of the configuration for the user
  • {{variables.topic}}: The Topic object
configuration.followUp string Template for the prompt to simulate a follow-up message to a system response. Variables:
  • {{x}}: A property x of the configuration for the user
  • {{variables.topic}}: The Topic object
  • {{variables.systemResponse}}: The SystemResponse object of the response to follow-up on
log (Logbook) A function that takes log messages

User model for the Touche 25 Retrieval-Augmented Debating task. A client tor the corresponding server.

new Touche25RADUser(configuration: Object, log: Logbook)
Parameters
configuration (Object) The configuration for the user
Name Description
configuration.url string The URL of the chat API
configuration.model string The name of the user model
log (Logbook) A function that takes log messages

Utility

Static methods for filling in text templates.

templates
Static Members
render(text, context, options = undefined)
joinMessages(messages)
joinProperties(object, keys = undefined)
tsv2Contexts(tsv)

Types

Evaluation

src/index.js

Object that represents the evaluation of a simulation.

Evaluation

Type: Object

Properties
configuration (Object) : The configuration of the evaluation
simulation (Simulation) : The simulation that was evaluated
userTurnsEvaluations (Array) : For each user turn of the simulation, in order, an object where the keys are the names of the configured evaluators (if they evaluated the specific turn of the simulation) and the values are the respective EvaluationResult s (and one property, milliseconds gives the time taken for evaluation in milliseconds)
overallEvaluations (Object) : An object where the keys are the names of the configured evaluators (if they evaluated the overall simulation) and the values are the respective EvaluationResult s (and one property, milliseconds gives the time taken for evaluation in milliseconds)
millisecondsEvaluation (number) : Time taken for the evaluation in milliseconds

EvaluationResult

src/evaluator.js

Object returned by Evaluator#evaluate with at least a score.

EvaluationResult

Type: Object

Properties
score (number) : A number between 0 and 1, with higher values indicating better responses
milliseconds (number?) : Time taken for evaluation in milliseconds (this property is automatically added by GenIRSim)

EVALUATION_RESULT

src/evaluator.js

Constants for EvaluationResult property names.

EVALUATION_RESULT
Static Members
SCORE
EXPLANATION

A large language model.

new LLM(configuration: LLMConfiguration, logbook: Logbook)
Parameters
configuration (LLMConfiguration) The configuration object
logbook (Logbook) The logbook to log to
Instance Members
createAssistantMessage(message)
createSystemMessage(message)
createUserMessage(message)
chat(messages, action)
json(messages, action, requiredKeys = [], maxRetries = 3)

LLMConfiguration

src/llm.js

Configuration for an LLM.

Properties are url (see below) and all paramters for the chat completion endpoint, which includes the required model, but also optional parameters like options.temperature (see the modelfile parameter of Ollama).

LLMConfiguration

Type: Object

Properties
url (string) : The complete URL of the LLM's chat API endpoint
model (string) : The large language model name as per the API

A logbook to log actions specific to one source.

new Logbook(source: string, callback: function?, prefix: string?)
Parameters
source (string) The source for which to log entries
callback (function?) An optional function to call with each LogbookEntry created on Logbook#log
prefix (string?) An optional prefix to the action logged
Instance Members
log(action, object?)

LogbookEntry

src/logbook.js

One entry for the logbook, issued by the source to log for the action.

new LogbookEntry(source: string, action: string, data: (Object | string)?)
Parameters
source (string) The source that produced this entry
action (string) The action for which this entry was produced
data ((Object | string)?) An optional object or string describing the event that is logged
Instance Members
time
source
action
data
isContinuationOf(previousEntry)
hasContent()
hasTextContent()
getContent()

Simulation

src/index.js

Object that represents a completed simulation.

Simulation

Type: Object

Properties
configuration (Object) : The configuration of the simulation
turns (Array) : List of simulated UserTurn s (each one includes the system's response)
milliseconds (number) : Time taken for the simulation in milliseconds

SystemResponse

src/system.js

Object that represents a system's respone to a user's utterance in the simulated conversation with at least the system's utterance.

SystemResponse

Type: Object

Properties
utterance (string) : The utterance of the system

SYSTEM_RESPONSE

src/system.js

Constants for SystemResponse property names.

SYSTEM_RESPONSE
Static Members
UTTERANCE
RESULTS
RESULTS_PAGE

Object that represents a topic (or task, information need).

Topic

Type: Object

Properties
description (string) : A natural language description of the information task to be accomplished

UserTurn

src/user.js

Object that represents a user's turn in the simulated conversation with at least the user's utterance.

UserTurn

Type: Object

Properties
utterance (string) : The simulated utterance sent from the user to the system
systemResponse (SystemResponse) : The response sent from the system to the user as a reply
milliseconds (number?) : Time taken for simulation in milliseconds (this property is automatically added by GenIRSim)

USER_TURN

src/user.js

Constants for UserTurn property names.

USER_TURN
Static Members
UTTERANCE
SYSTEM_RESPONSE