-
Notifications
You must be signed in to change notification settings - Fork 73
Description
We currently have a lot of duplicate tests that perform unnecessary requests and generations. We should refactor our tests to allow easier use of mocked inputs / outputs along with de-duplicating the amount of tests between the backends.
We should have a few types of tests:
- Conformance tests- for each backend, we should ensure that we receive the expected response format for a given input (ie structured outputs, tool requests, etc...)
- Backend unit tests- for each backend, we should mock these response values and ensure that our backends process these responses as expected
- Std library tests- these resemble regular unit tests; if generation is required, we should mock that generation
We should also allow mocking to be disabled so that the full test suite can be run with real requests.
These changes should decouple our testing of the standard library / backend functionality from the tests that actually check LLM generation. This should allow us to actually run unit tests for backends during our regular test suite and only require special testing pipelines for the conformance tests.
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels