Introduce tool calling support#116
Merged
orionpapadakis merged 18 commits intoJun 16, 2026
Merged
Conversation
792f9a1 to
76058fa
Compare
# Conflicts: # LlamaTornadoCli.java
… extract ToolCallingDemo
…l calls, user message integration, and enhanced response parsing.
…as used only for testing
…-alone test-only tool calling app class
…ll JSON parsing logic
…xtend support for multi-model formats
…adjust `Options` validation and defaults
…g-aware brace counting
…ol_response>` tags
…le and batch parsing scenarios
…ing control encoding
91333d1 to
65414f2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds tool calling to the engine through a model-agnostic
ChatFormatAPI: the enginehandles prompt encoding and tool-call detection in strings/tokens, while orchestration
(LangChain4j
ToolExecutionRequests, multi-turn loops) lives in thequarkus-langchain4jgpu-llama3provider (separate PR). Validated against Qwen3-0.6B (f16) and Llama-3.2-1B-Instruct.Tool calling
ChatFormatgains the tool methods as defaults (no-op;supportsToolCalling()returnsfalse), so existing formats are unchanged and families opt in by overriding:toolSystemPromptSuffix,encodeToolCallAssistantTurn(single + batch),encodeToolResultTurn,extractToolCall,extractAllToolCalls,getToolAwareStopTokens.ToolCallExtractrecord(name, argumentsJson, Optional<String> id)— the hand-off type between engine and caller.ToolCallParserUtils— stateless parsing of<tool_call>…</tool_call>(Qwen3 / Llama 3.2, closed and unclosed),<|python_tag|>(Llama 3.1), and raw / fenced JSON fallbacks. Brace counting isstring-aware (skips braces inside JSON strings) so arguments containing code/braces aren't truncated.
LlamaChatFormat(3.1 + 3.2; tools injected into the first user message) andQwen3ChatFormat(system message;<tool_call>/<tool_response>tags).Complementary features
ChatFormat.supportsThinking()/encodeThinkingControl(boolean)(default no-op).Qwen3ChatFormatprimes a pre-closed<think>\n\n</think>block to skip reasoning, using thecanonical
<think>/</think>token ids (now captured byQwen3Tokenizerbefore they're stripped from the special-token map). DeepSeek-R1 reportsfalseand is never forced off.temperature/top-p, with relatedOptionsvalidation tidy-up.Testing
Unit: new
ToolCallParserUtilsTest(16 cases — tags,python_tag, raw/fenced JSON, unclosed blocks, batch calls, brace-in-string, escaped quotes).End-to-end via the
quarkus-langchain4jweather-agent sample (geocoding → forecast tool chain).0. Environment
https://github.com/quarkiverse/quarkus-langchain4j/pull/2604Wire the sample's
pom.xmlto the local snapshot, swapOpenAIforgpu-llama3, and pass the TornadoVM argfile:Configure the sample's
application.properties:Build the gpu-llama3 provider against the engine:
Run the
weather-agentsample:Ask for a city's weather; the model emits a tool call, the agent runs geocoding → forecast, and the final answer is grounded in the tool results:
Notes