Skip to content

Evaluate re-ranker performance #2313

@paynejd

Description

@paynejd

Re-ranking capability is already available with a default language model that provides normalized (0..1) and consistent scoring across a multi-algorithm candidate pool.

This task is to design a straightforward evaluation to compare the performance of our default re-ranking language model (LM) to one or more alternative LMs (i.e. a medically trained LM) using several validation datasets. Decision points:

  • Is our current model good enough that we should make re-ranking available to all users?
  • Do users need the option to enable/disable or can it always be enabled?
  • Is a medically trained model giving better results?
  • Is one LM for all projects sufficient to start? Or do different projects require different LMs for re-ranking to be useful?

Metadata

Metadata

Assignees

No one assigned

    Labels

    stage/triagedAI triage complete — scored and classifiedtype/featureNew or improved functionality

    Type

    No type

    Projects

    Status

    Requirements

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions