Evaluate re-ranker performance

Re-ranking capability is already available with a default language model that provides normalized (0..1) and consistent scoring across a multi-algorithm candidate pool.

This task is to design a straightforward evaluation to compare the performance of our default re-ranking language model (LM) to one or more alternative LMs (i.e. a medically trained LM) using several validation datasets. Decision points:
- Is our current model good enough that we should make re-ranking available to all users?
- Do users need the option to enable/disable or can it always be enabled?
- Is a medically trained model giving better results?
- Is one LM for all projects sufficient to start? Or do different projects require different LMs for re-ranking to be useful?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate re-ranker performance #2313

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluate re-ranker performance #2313

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions