[Paper][EMNLP 2025] SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs
-
Updated
Aug 27, 2025 - Python
[Paper][EMNLP 2025] SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs
RuleSets of AccessMonitor - the validator of web accessibility practices
Efficient topic-set reduction for IR evaluation using NSGA-II
Single-turn LLM evaluation platform. Run structured evals across 5 different AI providers. Score outputs, track latency, and compare models through a live analytics dashboard.
Jupyter Notebooks of Course of LangChain for LLM Application Development by DeepLearning.AI
An MCQ Scanner App for students developed using Flutter and Flask. Students can take a picture of a physical MCQs exam paper and the app gives them an interactive quiz experience where they can solve those mcqs on the app and AI evaluates them.
Repository for the final thesis in "Interaction Media Design" (Prof. Sofia Pescarin) at University of Bologna, MA "Digital Humanities and Digital Knowledge" (a.y. 2022/2023)
Compare armor sets from the video game Dark Souls across most categories, given the attributes defined by the developers (of the game).
Add a description, image, and links to the evalutation topic page so that developers can more easily learn about it.
To associate your repository with the evalutation topic, visit your repo's landing page and select "manage topics."