Introduce tool calling support by orionpapadakis · Pull Request #116 · beehive-lab/GPULlama3.java

orionpapadakis · 2026-05-14T12:15:52Z

Summary

Adds tool calling to the engine through a model-agnostic ChatFormat API: the engine
handles prompt encoding and tool-call detection in strings/tokens, while orchestration
(LangChain4j ToolExecutionRequests, multi-turn loops) lives in the quarkus-langchain4j
gpu-llama3 provider (separate PR). Validated against Qwen3-0.6B (f16) and Llama-3.2-1B-Instruct.

Tool calling

ChatFormat gains the tool methods as defaults (no-op; supportsToolCalling() returns false), so existing formats are unchanged and families opt in by overriding: toolSystemPromptSuffix,
encodeToolCallAssistantTurn (single + batch), encodeToolResultTurn, extractToolCall, extractAllToolCalls, getToolAwareStopTokens.
ToolCallExtract record (name, argumentsJson, Optional<String> id) — the hand-off type between engine and caller.
ToolCallParserUtils — stateless parsing of <tool_call>…</tool_call> (Qwen3 / Llama 3.2, closed and unclosed), <|python_tag|> (Llama 3.1), and raw / fenced JSON fallbacks. Brace counting is
string-aware (skips braces inside JSON strings) so arguments containing code/braces aren't truncated.
LlamaChatFormat (3.1 + 3.2; tools injected into the first user message) and Qwen3ChatFormat (system message; <tool_call> / <tool_response> tags).

Complementary features

Thinking control — ChatFormat.supportsThinking() / encodeThinkingControl(boolean) (default no-op). Qwen3ChatFormat primes a pre-closed <think>\n\n</think> block to skip reasoning, using the
canonical <think>/</think> token ids (now captured by Qwen3Tokenizer before they're stripped from the special-token map). DeepSeek-R1 reports false and is never forced off.
Per-format default temperature / top-p, with related Options validation tidy-up.

Testing

Unit: new ToolCallParserUtilsTest (16 cases — tags, python_tag, raw/fenced JSON, unclosed blocks, batch calls, brace-in-string, escaped quotes).

End-to-end via the quarkus-langchain4j weather-agent sample (geocoding → forecast tool chain).

0. Environment

sdk use java 25.0.2-open
sdk use tornadovm 4.0.1-jdk25-ptx

GPULlama3.java — build the current branch:

  cd ~/GPULlama3.java && ./mvnw clean install -DskipTests -q

quarkus-langchain4j — clone the PR: https://github.com/quarkiverse/quarkus-langchain4j/pull/2604

Wire the sample's pom.xml to the local snapshot, swap OpenAI for gpu-llama3, and pass the TornadoVM argfile:

<quarkus-langchain4j.version>999-SNAPSHOT</quarkus-langchain4j.version>
...
<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-gpu-llama3</artifactId>
    <version>${quarkus-langchain4j.version}</version>
</dependency>
...
<configuration>
    <jvmArgs>@${env.TORNADOVM_HOME}/tornado-argfile</jvmArgs>
</configuration>

Configure the sample's application.properties:

quarkus.langchain4j.log-requests=true
quarkus.langchain4j.log-responses=true

quarkus.rest-client.geocoding.url=https://geocoding-api.open-meteo.com
quarkus.rest-client.openmeteo.url=https://api.open-meteo.com

# Qwen3 (tool-calling driver)
quarkus.langchain4j.gpu-llama3.chat-model.model-name=ggml-org/Qwen3-0.6B-GGUF
quarkus.langchain4j.gpu-llama3.chat-model.quantization=f16
# or Llama 3.2:
#quarkus.langchain4j.gpu-llama3.chat-model.model-name=unsloth/Llama-3.2-1B-Instruct-GGUF
#quarkus.langchain4j.gpu-llama3.chat-model.quantization=F16

quarkus.langchain4j.gpu-llama3.chat-model.temperature=0.6
quarkus.langchain4j.gpu-llama3.chat-model.top-p=0.95
quarkus.langchain4j.gpu-llama3.chat-model.max-tokens=2048
quarkus.langchain4j.gpu-llama3.chat-model.device-memory=5GB
quarkus.langchain4j.gpu-llama3.chat-model.enable-thinking=false
quarkus.langchain4j.gpu-llama3.chat-model.prefill-decode=true
quarkus.langchain4j.gpu-llama3.chat-model.prefill-batch-size=32

# tool-call / prompt tracing:
#quarkus.log.category."io.quarkiverse.langchain4j.gpullama3".level=DEBUG

Build the gpu-llama3 provider against the engine:

  cd ~/quarkus-langchain4j && mvn install -pl model-providers/gpu-llama3/runtime,model-providers/gpu-llama3/deployment -am -DskipTests -q

Run the weather-agent sample:

 cd ~/quarkus-langchain4j/samples/weather-agent && mvn quarkus:dev

Ask for a city's weather; the model emits a tool call, the agent runs geocoding → forecast, and the final answer is grounded in the tool results:

curl "http://localhost:8080/weather?city=Chania"

Notes

No behavioural change for non-tool, non-thinking paths — all new ChatFormat surface is default-implemented.

# Conflicts: # LlamaTornadoCli.java

… extract ToolCallingDemo

…ity in GPULlama3

…l calls, user message integration, and enhanced response parsing.

…as used only for testing

…-alone test-only tool calling app class

…l parsing

…ll JSON parsing logic

… formats

…xtend support for multi-model formats

…adjust `Options` validation and defaults

…ed readability

…g-aware brace counting

…ol_response>` tags

…le and batch parsing scenarios

…ing control encoding

stratika

LGTM

mikepapadim requested review from Copilot and mikepapadim May 14, 2026 12:28

Copilot started reviewing on behalf of mikepapadim May 14, 2026 12:29 View session

This comment was marked as outdated.

Sign in to view

orionpapadakis force-pushed the feat/tool-calling branch from 792f9a1 to 76058fa Compare June 11, 2026 15:51

orionpapadakis marked this pull request as ready for review June 15, 2026 20:48

orionpapadakis added 18 commits June 16, 2026 12:51

[tool][WIP] Introduce tool-calling capabilities by GPULlama3.java

ebefefb

# Conflicts: # LlamaTornadoCli.java

[tools] Refactor tool-calling architecture: modularize components and…

cecff71

… extract ToolCallingDemo

[introduce] Add standalone ToolCallingApp for tool-calling functional…

25bfdaa

…ity in GPULlama3

[tools][wip] Add support for Llama 3.2 tool-call injection: batch too…

62c52e6

…l calls, user message integration, and enhanced response parsing.

[tools][wip] Add fix for tool-calling

13f9654

[tools] Remove standalone ToolCallingApp and its references as it w…

3a43dc5

…as used only for testing

[tools] Remove tool-calling classes as redundant after dropping stand…

9647a5e

…-alone test-only tool calling app class

[tools] Add supportsToolCalling implementation and enhance tool cal…

0a1d189

…l parsing

[tools] Extend ToolCallExtract with optional id and unify tool ca…

546ce94

…ll JSON parsing logic

[tools] Add support for batch tool call encoding across multiple chat…

67eda88

… formats

[tools] Unify tool call parsing logic, streamline method names, and e…

9308c91

…xtend support for multi-model formats

Add default temperature and top-p resolution based on model formats, …

8e54d2e

…adjust `Options` validation and defaults

[tools] Simplify section comments in ToolCallParserUtils for improv…

9c43c95

…ed readability

[tools][fix] Enhance JSON parsing in ToolCallParserUtils with strin…

043bb8a

…g-aware brace counting

[tools][fix] Update Qwen3ChatFormat to encode tool results using `<to…

fb2649d

…ol_response>` tags

[tools][test] Add unit tests for ToolCallParserUtils, covering sing…

b934e04

…le and batch parsing scenarios

Add thinking on/off control support

2ba7789

Add support for canonical <think> token tracking and usage in think…

65414f2

…ing control encoding

orionpapadakis force-pushed the feat/tool-calling branch from 91333d1 to 65414f2 Compare June 16, 2026 09:52

stratika approved these changes Jun 16, 2026

View reviewed changes

orionpapadakis merged commit 60cf0f4 into beehive-lab:main Jun 16, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce tool calling support#116

Introduce tool calling support#116
orionpapadakis merged 18 commits into
beehive-lab:mainfrom
orionpapadakis:feat/tool-calling

orionpapadakis commented May 14, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

stratika left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

orionpapadakis commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tool calling

Complementary features

Testing

Uh oh!

This comment was marked as outdated.

Uh oh!

stratika left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

orionpapadakis commented May 14, 2026 •

edited

Loading