Problem
Currently, the LibrEd generator pipeline faces performance bottlenecks due to sequential LLM calls in:
- Question classification
- Theory/explanation generation
This leads to long execution times and inefficient resource usage.
Proposed Improvements
1. Async Batching (Partially Implemented)
- Already implemented async batching for classification
- Reduced execution time significantly
- Plan to extend similar approach to theory generation
2. Async Theory Generation
- Convert sequential theory generation → async batches
- Avoid waiting for one LLM response before sending the next
- Use controlled concurrency (semaphores)
3. Caching Layer
- Store LLM responses (classification + theory) in SQLite
- Use hash-based lookup to avoid repeated computation
4. Retry & Failure Handling
- Add retry mechanism for failed LLM calls
- Handle partial failures gracefully
- Save intermediate results
5. Performance Metrics & Logging
- Track execution time per pipeline stage
- Log batch-level processing details
Goal
- Reduce total pipeline runtime significantly
- Improve scalability for large datasets
- Make pipeline more robust and production-ready
Note
I have already created a working prototype demonstrating async processing and caching:
[link your repo]
I plan to integrate these improvements directly into the LibrEd codebase.
Would appreciate feedback on this direction before proceeding further.
Problem
Currently, the LibrEd generator pipeline faces performance bottlenecks due to sequential LLM calls in:
This leads to long execution times and inefficient resource usage.
Proposed Improvements
1. Async Batching (Partially Implemented)
2. Async Theory Generation
3. Caching Layer
4. Retry & Failure Handling
5. Performance Metrics & Logging
Goal
Note
I have already created a working prototype demonstrating async processing and caching:
[link your repo]
I plan to integrate these improvements directly into the LibrEd codebase.
Would appreciate feedback on this direction before proceeding further.