Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
158 commits
Select commit Hold shift + click to select a range
2496f9c
mtmd : support MiniCPM-V 4.6 (#22529)
tc-mb May 6, 2026
3980e04
llama : add missing call to ggml_backend_load_all() (#22752)
angt May 7, 2026
cfff1fc
sycl : fix test script (#22737)
dogunbound May 7, 2026
e358d75
webui: fix flicker issue on dismiss animation on overlay primitives (…
vignesh191 May 7, 2026
97f06e9
codeowners : add ZenDNN backend codeowner (#22772)
z-vishal May 7, 2026
f4b5a2e
webui: fix ?model= URL param race in router mode (#22771)
ServeurpersoCom May 7, 2026
8e52631
model: Add Mimo v2.5 model support (#22493)
AesSedai May 7, 2026
cc97e45
mtmd: fix whisper audio tail truncation by exposing padded buffer to …
ServeurpersoCom May 7, 2026
68380ae
ggml-cpu: Optimized risc-v cpu q1_0 dot
pl752 May 7, 2026
803627f
llama : remove unnecessary seq_id check during state restore (#22797)
ggerganov May 7, 2026
b9afc19
Write a readme on Multi-GPU usage in llama.cpp (#22729)
gaugarg-nv May 7, 2026
ad09224
sycl: add FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, GATED_DELTA_NET (#…
aicss-genai May 7, 2026
deab41e
tests: add long-sequence cases and fix inputs for gated_delta_net (#2…
Neroued May 7, 2026
093be62
common/chat : preserve media markers for typed-content templates (#22…
aldehir May 7, 2026
ceb7e14
opencl: add opfilter regex for debugging (#22782)
shaofeiqi May 7, 2026
e43431b
llama : fix device state save/load (#22805)
ggerganov May 7, 2026
aaf4a4d
webui: add option for LLM title generation (#22265)
smugman-dot May 7, 2026
05ff59c
CUDA: batch out_prod inner loop with cublasSgemmStridedBatched (#22651)
leonardHONG May 7, 2026
44dbe8c
model: Support sarashina2.2-vision-3b model (#22103)
samuraieng May 7, 2026
6a2a251
fix script error (#22795sycl : )
arthw May 8, 2026
1d72d87
convert : fix RuntimeError when stripping FP8 KV-cache scales (#22818)
pich May 8, 2026
f3e8d14
opencl: add q4_0 MoE GEMM for Adreno (#22731)
shawngu-quic May 8, 2026
3e941b8
ggml: update SCHED_DEBUG output to use ggml_op_desc() (#22825)
max-krasnyansky May 8, 2026
6d57a49
vulkan: fix spv shadowing (#22760)
miyanyan May 8, 2026
a8fd165
CUDA: lower-case PCI bus id, standardize for ggml (#22820)
JohannesGaessler May 8, 2026
9b2925e
webui: Add Import/Export of Settings configuration + improve architec…
allozaur May 8, 2026
58e68df
cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667)
ServeurpersoCom May 8, 2026
9dcf835
server: (router) expose child model info from router's /v1/models (#2…
ngxson May 8, 2026
29debb3
server: support Vertex AI compatible API (#22545)
ngxson May 8, 2026
5d6f18a
webui: fix LLM title generation for agentic conversations (#22840)
smugman-dot May 8, 2026
f9cd456
common : revert reasoning budget +inf logit bias (#22740)
aldehir May 8, 2026
9f5f0e6
model : support Gemma4_26B_A4B_NVFP4 (#22804)
ynankani May 8, 2026
4995604
common : do not wrap raw strings in schema parser for tagged parsers …
aldehir May 8, 2026
b46812d
Feature hexagon l2 norm (#22816)
pdhinaka May 8, 2026
c5703e0
sycl: support non-contiguous input in PAD op (#22148)
aicss-genai May 9, 2026
6600172
hexagon: add HTP kernel for GGML_OP_GATED_DELTA_NET (#22837)
wyanzhao May 9, 2026
046e284
Add flash attention MMA / Tiles to support MiMo-V2.5 (#22812)
AesSedai May 9, 2026
4a4f819
sycl: Battlemage AOT build via spir64_gen + MMQ subgroup annotations …
aicss-genai May 9, 2026
6048993
sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path (#22152)
aicss-genai May 9, 2026
fd89556
[SYCL] Add BF16 support to GET_ROWS operation (#21391)
devedse May 9, 2026
e20b839
SYCL: reduce allocation overhead during flash attention (#22732)
sanmai May 9, 2026
5757c4d
cmake : update BoringSSL to 0.20260508.0 (#22839)
cabelo May 9, 2026
00d56b1
docker : upgraded the default intel compute-runtime version (#22567)
WizardlyBump17 May 9, 2026
65d7a8b
devops : updated Nix systems (#22869)
yuannan May 9, 2026
1e5ad35
model : add sarvam_moe architecture support (#20275)
sumitchatterjee13 May 9, 2026
5755a10
model : fix model type check for granite/llama3 and deepseek2/glm4.7 …
CISC May 10, 2026
f3c3e0e
internal AllReduce kernel for CUDA provider (#22299)
scutler-nv May 10, 2026
efbada9
ggml : bump version to 0.11.1 (ggml/1484)
ggerganov May 10, 2026
0b04728
sync : ggml
ggerganov May 10, 2026
2b2babd
ggml-virtgpu : include missing mutex header (#22810)
olliewalsh May 10, 2026
5d5d2e1
vendor : update cpp-httplib to 0.43.4 (#22888)
cabelo May 10, 2026
2e97c5f
backend sampling: support returning post-sampling probs (#22622)
TimNN May 10, 2026
389ff61
server : print warning when HTTP timeout exceeded (#22907)
ggerganov May 10, 2026
7d442ab
[SYCL] Add OP im2col_3d (#22903)
arthw May 11, 2026
8383743
vendor : update cpp-httplib to 0.44.0 (#22919)
cabelo May 11, 2026
f5636f8
convert : add image break token fallback (#22914)
danbev May 11, 2026
8cef820
CUDA: directly include cuda/iterator (#22936)
ORippler May 11, 2026
dd9280a
vulkan: Support asymmetric FA in scalar/mmq/coopmat1 paths (#22589)
jeffbolznv May 11, 2026
7dbb0e9
examples : update args speculative-simple README.md [no ci] (#22938)
danbev May 11, 2026
928b486
ggml-virtgpu: Add a GHA build check (#22943)
kpouget May 11, 2026
68e7ea3
spec : parallel drafting support (#22838)
ggerganov May 11, 2026
ef22b3e
docs: fix metrics endpoint description in server README (#22879)
willjoha May 11, 2026
e936660
Ggml/cuda snake fusion hardening (#22912)
ServeurpersoCom May 11, 2026
8e1f9d0
CUDA: handle OW > 65535 in im2col (2D and 3D) (#22944)
CrispStrobe May 11, 2026
1ec7ba0
opencl: add q4_1 MoE for Adreno (#22856)
shawngu-quic May 11, 2026
da44953
metal : promote mul_mv/mul_mm batch divisors to function constants (#…
guyfischman May 12, 2026
78fbbc2
convert : add split() to LoraTorchTensor in LoRA converter (#22832)
jesus-talavera-ibm May 12, 2026
4178259
mtmd: add MiMo v2.5 vision (#22883)
AesSedai May 12, 2026
fa62042
ci : bump ty to 0.0.35 (#22961)
CISC May 12, 2026
706fbd8
vulkan: Check shared memory size for mmq shaders (#22693)
jeffbolznv May 12, 2026
ef93e98
vulkan: Fix Windows performance regression on Intel GPU BF16 workload…
rillomas May 12, 2026
fde69a3
examples : add llama-eval (#21152)
ggerganov May 12, 2026
89730c8
model-conversion : add causal-convert-mmproj target [no ci] (#22969)
danbev May 12, 2026
239a497
ggml-webgpu: address precision issues for multimodal (#22808)
Constannnnnt May 12, 2026
927dada
ggml-webgpu: Enables running gpt-oss-20b (#22906)
yomaytk May 12, 2026
7bfe120
mtmd, server, common: expose modalities to /v1/models (#22952)
ngxson May 12, 2026
dded58b
webui: Fix Chat Screen Form box disappearing + autoscroll issues on W…
allozaur May 12, 2026
cce09f0
convert : fix Pixtral 12B --mistral-format conversion (3 bugs) (#22981)
fredzillman May 12, 2026
a9883db
opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill (#22755)
happyyzy May 12, 2026
856c3ad
hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993)
trivikram-reddy1 May 13, 2026
61af07c
ggml-zendnn : adaptive fallback to CPU backend for small batch sizes …
z-sachin May 13, 2026
bcfe63f
llama-eval : enable type check (#22988)
CISC May 13, 2026
634275f
spec : update CLI arguments for better consistency (#22964)
ggerganov May 13, 2026
3796c94
ci: validate model naming convention (#22680)
ngxson May 13, 2026
5d44db6
server, webui: support continue generation on reasoning models (#22727)
ServeurpersoCom May 13, 2026
e75cd5e
download: do not exit() on error (#23008)
ngxson May 13, 2026
ad96bb8
hexagon: add unary tanh op (#22999)
max-krasnyansky May 13, 2026
7e16646
docs : Update OPENVINO.md (#22959)
ravi9 May 13, 2026
46be24d
webui: preserve system message on edit cancel (#22911)
ServeurpersoCom May 13, 2026
2dfeca3
webui: Deduplicate model aliases in data + handle single/multiple ali…
allozaur May 13, 2026
527045b
flush the gpu profile timestamp before the queryset is overflowed (#2…
yomaytk May 13, 2026
1e4579f
opencl: fix crash when warming up MoE on Adreno (#22876)
lhez May 13, 2026
95d469a
server, webui: accept continue_final_message flag for vLLM API compat…
ServeurpersoCom May 13, 2026
ec562eb
opencl: add q5_0 and q5_1 MoE for Adreno (#22985)
shaofeiqi May 13, 2026
7f3f843
Fix for issue #22974. Cast intermediate results to float before addin…
scutler-nv May 13, 2026
4c1c3ac
ggml-webgpu: only use subgroup-matrix path when head dims are divisib…
ArberSephirotheca May 13, 2026
9ed6e19
SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocat…
PMZFX May 14, 2026
320a6a4
fix: Autoscroll detection (#23026)
allozaur May 14, 2026
dbe7901
vulkan: fix matmul integer pipeline selection (#23005)
0cc4m May 14, 2026
42532af
unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr…
Kabir08 May 14, 2026
0f45f1a
docker : revert stable version of intel compute-runtime (#22968)
arthw May 14, 2026
81b0d88
ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend (#22863)
alex-spacemit May 14, 2026
67b2b7f
logs : reduce (#23021)
ggerganov May 14, 2026
253ba11
webui: Move static build output from repo code to HF Bucket (#22937)
allozaur May 14, 2026
97b658c
contributing: new contributors should not submit trivial fixes (#23045)
am17an May 14, 2026
0c3e4fc
fix: Propagate version tag to WebUI asset download in self-hosted CI …
allozaur May 14, 2026
5ec717d
ggml-webgpu: makes the flash attn vec path subgroup-aware (#23040)
ArberSephirotheca May 14, 2026
834a243
ggml-webgpu: Enable NVIDIA self-hosted CI (#22976)
reeselevine May 14, 2026
d81e63d
CI : support IOT device (IQ9) (#22987)
zhiyuan8 May 14, 2026
3e037f3
HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (#22880)
JohannesGaessler May 14, 2026
5c0e946
ggml-hexagon: cpy: add contiguous fast-path in reshape copy (#23076)
pdhinaka May 14, 2026
7155a49
readme : update bindings (#23063)
KitaitiMakoto May 15, 2026
91e84fe
Support for Codex CLI by skipping unsupported Responses tools (#23041)
SidShaytay May 15, 2026
d528444
webui: preserve partial response on streaming error (#23090)
ServeurpersoCom May 15, 2026
ac33f03
reasoning-budget: clone should do a deep-copy (#23095)
am17an May 15, 2026
d5dc2e0
llama-eval : add AIME 2026 dataset support (#23058)
ggerganov May 15, 2026
769cc93
ci : fix transform of top . entry in release archive (#23080)
CISC May 15, 2026
cc7200b
Refactor: convert_hf_to_gguf.py (#17114)
pwilkin May 15, 2026
18d1717
convert : fix Qwen3 ASR conversion (#23081)
CISC May 15, 2026
8be1786
webui: fix theme from --webui-config-file not applied on first load (…
ServeurpersoCom May 15, 2026
72e60f5
mtmd: add chunks and fix preproc for qwen3a (#23073)
ngxson May 15, 2026
6831fe4
docs: document `usage` object in server timings response (#23110)
julien-c May 15, 2026
cfabeb1
tests: add BF16 non-contig coverage for MUL_MAT permutations (#22689)
ServeurpersoCom May 15, 2026
1348f67
webui: Use lowercase hash for HF checksum check (#23107)
ozars May 15, 2026
49d1701
ci : fix release symlinks (#23119)
CISC May 15, 2026
59778f0
ui: Restructure repo to use `tools/ui` folder and `ui` / `UI` / `llam…
allozaur May 16, 2026
42928bc
model : NvFP4 quantized LM head support (#23046)
ynankani May 16, 2026
1d9f99a
fix: Add build step using build workflow to publish workflow (#23134)
allozaur May 16, 2026
366c5e2
ui: untrack settings sync in props effect to prevent reactive loop (#…
ServeurpersoCom May 16, 2026
1428004
webui : [ChatFormActionAdd][a11y] fix accessibility issues in add men…
vignesh191 May 16, 2026
b81c2cd
ui: Fix handling of MCP resource template parameters (#23117)
kubawoo May 16, 2026
2555826
llama + spec: MTP Support (#22673)
am17an May 16, 2026
18675b6
vendor : update cpp-httplib to 0.45.0 (#23103)
cabelo May 16, 2026
25b1bc9
ui: Correct links in `tools/ui/README.md` [no ci] (#23139)
howlger May 16, 2026
2eb3e6b
ggml: install ggml.pc in <libdir>/pkgconfig (ggml/1480)
robUx4 May 10, 2026
560445b
metal : tighten input-position loop in kernel_conv_transpose_1d (ggml…
CrispStrobe May 10, 2026
e6c37a1
ggml : bump version to 0.12.0 (ggml/1494)
ggerganov May 16, 2026
3a92bc9
sync : ggml
ggerganov May 16, 2026
0253fb2
ui: Add request timeout for MCP tool calls (#23138)
allozaur May 16, 2026
6049906
vulkan: removed duplicate #include <memory> in headers (#23144)
winstonma May 16, 2026
64b38b5
server: skip device enumeration in router mode to avoid creating CUDA…
ServeurpersoCom May 16, 2026
b64739e
server: (router) alloc tmp buffer on heap (#23159)
ngxson May 16, 2026
4f13cb7
webui: support video files as input (#22830)
foldl May 17, 2026
a16cce8
ngram : reduce noisy logs (#23185)
ddh0 May 17, 2026
1a68ec9
server : honor --embd-normalize CLI arg (#23125)
rvernica May 17, 2026
3fbadb0
vulkan: fuse SSM_CONV + BIAS + SILU (#22653)
jeffbolznv May 17, 2026
f4cc787
common : enable streaming JSON argument values (#23173)
aldehir May 17, 2026
7ba22c6
vulkan: Support unaligned tensors for ROPE (#22637)
jeffbolznv May 17, 2026
fcae601
vulkan: add cpy bf16 -> f32 pipelines (#22677)
ServeurpersoCom May 17, 2026
a6d6183
ggml-vulkan/CMakeLists: add a check for SPIRV-Headers (#22009)
jeeb May 17, 2026
39cf5d6
common : delegate assistant continuation to underlying template handl…
aldehir May 17, 2026
3e12fbd
llama: avoid copying logits during prompt decode in MTP (#23198)
am17an May 17, 2026
84c6782
CUDA: Continue directly including cuda/iterator (#23102)
ORippler May 17, 2026
e0de4c2
cmake : do not install conversion script (#23204)
CISC May 17, 2026
8758904
cmake : fix LLAMA_BUILD_UI logic (#23190)
aldehir May 17, 2026
a9291ca
ggml-cpu: add rvv 512b,1024b impls for iq4_xs
taimur-10x Feb 13, 2026
81f13d1
ggml-cpu: refactor; add rvv 512b, 1024b impls for q6_K, i-quants
taimur-10x Feb 14, 2026
3db486a
ggml-cpu: refactor; add 512 and 1024 implementations of tq3_s, iq3_xx…
RehanQasim-dev Feb 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
20 changes: 13 additions & 7 deletions .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,15 @@ ARG ONEAPI_VERSION=2025.3.3-0-devel-ubuntu24.04
FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS build

ARG GGML_SYCL_F16=OFF
ARG LEVEL_ZERO_VERSION=1.28.2
ARG LEVEL_ZERO_UBUNTU_VERSION=u24.04
RUN apt-get update && \
apt-get install -y git libssl-dev
apt-get install -y git libssl-dev wget ca-certificates && \
cd /tmp && \
wget -q "https://github.com/oneapi-src/level-zero/releases/download/v${LEVEL_ZERO_VERSION}/level-zero_${LEVEL_ZERO_VERSION}%2B${LEVEL_ZERO_UBUNTU_VERSION}_amd64.deb" -O level-zero.deb && \
wget -q "https://github.com/oneapi-src/level-zero/releases/download/v${LEVEL_ZERO_VERSION}/level-zero-devel_${LEVEL_ZERO_VERSION}%2B${LEVEL_ZERO_UBUNTU_VERSION}_amd64.deb" -O level-zero-devel.deb && \
apt-get -o Dpkg::Options::="--force-overwrite" install -y ./level-zero.deb ./level-zero-devel.deb && \
rm -f /tmp/level-zero.deb /tmp/level-zero-devel.deb

WORKDIR /app

Expand All @@ -33,11 +40,11 @@ RUN mkdir -p /app/full \

FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS base

ARG IGC_VERSION=v2.30.1
ARG IGC_VERSION_FULL=2_2.30.1+20950
ARG COMPUTE_RUNTIME_VERSION=26.09.37435.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.09.37435.1-0
ARG IGDGMM_VERSION=22.9.0
ARG IGC_VERSION=v2.20.5
ARG IGC_VERSION_FULL=2_2.20.5+19972
ARG COMPUTE_RUNTIME_VERSION=25.40.35563.10
ARG COMPUTE_RUNTIME_VERSION_FULL=25.40.35563.10-0
ARG IGDGMM_VERSION=22.8.2
RUN mkdir /tmp/neo/ && cd /tmp/neo/ \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/$IGC_VERSION/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/$IGC_VERSION/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
Expand Down Expand Up @@ -109,4 +116,3 @@ WORKDIR /app
HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/app/llama-server" ]

2 changes: 1 addition & 1 deletion .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ let
vulkan-headers
vulkan-loader
shaderc
spirv-headers
];
in

Expand Down Expand Up @@ -146,7 +147,6 @@ effectiveStdenv.mkDerivation (finalAttrs: {
ninja
pkg-config
git
spirv-headers
]
++ optionals useCuda [
cudaPackages.cuda_nvcc
Expand Down
10 changes: 1 addition & 9 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,7 @@ insert_final_newline = unset
trim_trailing_whitespace = unset
insert_final_newline = unset

[tools/server/webui/**]
indent_style = unset
indent_size = unset
end_of_line = unset
charset = unset
trim_trailing_whitespace = unset
insert_final_newline = unset

[tools/server/public/**]
[tools/ui/**]
indent_style = unset
indent_size = unset
end_of_line = unset
Expand Down
4 changes: 0 additions & 4 deletions .gitattributes

This file was deleted.

5 changes: 2 additions & 3 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,10 @@ android:
- changed-files:
- any-glob-to-any-file:
- examples/llama.android/**
server/webui:
server/ui:
- changed-files:
- any-glob-to-any-file:
- tools/server/webui/**
- tools/server/public/**
- tools/ui/**
server:
- changed-files:
- any-glob-to-any-file:
Expand Down
46 changes: 39 additions & 7 deletions .github/workflows/build-and-test-snapdragon.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,45 @@ jobs:
name: llama-cpp-android-arm64-snapdragon
path: pkg-snapdragon/llama.cpp

linux-iot-snapdragon:
runs-on: ubuntu-latest
container:
image: 'ghcr.io/snapdragon-toolchain/arm64-linux:v0.1'
defaults:
run:
shell: bash

steps:
- name: Clone
uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: false

- name: Build Llama.CPP for Snapdragon Linux IoT
id: build_llama_cpp_snapdragon_linux
run: |
cp docs/backend/snapdragon/CMakeUserPresets.json .
cmake --preset arm64-linux-snapdragon-release -B build-snapdragon -DGGML_OPENCL=ON
cmake --build build-snapdragon -j $(nproc)
cmake --install build-snapdragon --prefix pkg-snapdragon/llama.cpp

- name: Upload Llama.CPP Snapdragon Linux IoT Build Artifact
if: ${{ always() && steps.build_llama_cpp_snapdragon_linux.outcome == 'success' }}
uses: actions/upload-artifact@v6
with:
name: llama-cpp-linux-arm64-snapdragon
path: pkg-snapdragon/llama.cpp

test-snapdragon-qdc:
name: Test on QDC Android Device (${{ matrix.device }})
needs: [android-ndk-snapdragon]
runs-on: ubuntu-slim
name: Test on QDC Device (${{ matrix.device }})
needs: [android-ndk-snapdragon, linux-iot-snapdragon]
runs-on: ubuntu-24.04-arm
timeout-minutes: 90
strategy:
fail-fast: false
matrix:
device: [SM8750, SM8650, SM8850]
device: [SM8750, SM8850, QCS9075M]

steps:
- name: Checkout
Expand All @@ -74,11 +105,11 @@ jobs:
- name: Download build artifact
uses: actions/download-artifact@v7
with:
name: llama-cpp-android-arm64-snapdragon
name: ${{ startsWith(matrix.device, 'QCS') && 'llama-cpp-linux-arm64-snapdragon' || 'llama-cpp-android-arm64-snapdragon' }}
path: pkg-snapdragon/llama.cpp

- name: Set up Python
uses: actions/setup-python@v5
uses: actions/setup-python@v6
with:
python-version: '3.x'
cache: pip
Expand Down Expand Up @@ -107,7 +138,8 @@ jobs:
--test all \
--pkg-dir pkg-snapdragon/llama.cpp \
--model-url "https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_0.gguf" \
--device ${{ matrix.device }}
--device ${{ matrix.device }} \
${{ startsWith(matrix.device, 'QCS') && '--retries 2 --retry-delay 300' || '' }}
env:
QDC_API_KEY: ${{ secrets.QDC_API_KEY }}

Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/build-cross.yml
Original file line number Diff line number Diff line change
Expand Up @@ -301,16 +301,17 @@ jobs:
export RISCV_ROOT_PATH=${PWD}/spacemit_toolchain
cmake -B build -DLLAMA_OPENSSL=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
-DGGML_CPU_REPACK=OFF \
-DLLAMA_BUILD_TOOLS=ON \
-DLLAMA_BUILD_TESTS=OFF \
-DGGML_CPU_RISCV64_SPACEMIT=ON \
-DGGML_RVV=ON \
-DGGML_RV_ZVFH=ON \
-DGGML_RV_ZFH=ON \
-DGGML_RV_ZICBOP=ON \
-DGGML_RV_ZIHINTPAUSE=ON \
-DRISCV64_SPACEMIT_IME_SPEC=RISCV64_SPACEMIT_IME1 \
-DGGML_RV_ZBA=ON \
-DCMAKE_TOOLCHAIN_FILE=${PWD}/cmake/riscv64-spacemit-linux-gnu-gcc.cmake

cmake --build build --config Release -j $(nproc)
93 changes: 67 additions & 26 deletions .github/workflows/build-self-hosted.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,24 @@ env:
LLAMA_LOG_TIMESTAMPS: 1

jobs:
determine-tag:
name: Determine tag name
runs-on: ubuntu-slim
outputs:
tag_name: ${{ steps.tag.outputs.name }}
steps:
- name: Clone
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Determine tag name
id: tag
uses: ./.github/actions/get-tag-name
env:
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}

ggml-ci-nvidia-cuda:
needs: determine-tag
runs-on: [self-hosted, Linux, NVIDIA]

steps:
Expand All @@ -65,11 +82,14 @@ jobs:

- name: Test
id: ggml-ci
env:
HF_UI_VERSION: ${{ needs.determine-tag.outputs.tag_name }}
run: |
nvidia-smi
GG_BUILD_CUDA=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp

ggml-ci-nvidia-vulkan-cm:
needs: determine-tag
runs-on: [self-hosted, Linux, NVIDIA]

steps:
Expand All @@ -79,11 +99,14 @@ jobs:

- name: Test
id: ggml-ci
env:
HF_UI_VERSION: ${{ needs.determine-tag.outputs.tag_name }}
run: |
vulkaninfo --summary
GG_BUILD_VULKAN=1 GGML_VK_DISABLE_COOPMAT2=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp

ggml-ci-nvidia-vulkan-cm2:
needs: determine-tag
runs-on: [self-hosted, Linux, NVIDIA, COOPMAT2]

steps:
Expand All @@ -93,39 +116,40 @@ jobs:

- name: Test
id: ggml-ci
env:
HF_UI_VERSION: ${{ needs.determine-tag.outputs.tag_name }}
run: |
vulkaninfo --summary
GG_BUILD_VULKAN=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp

# TODO: investigate slight precision issues in some operations for test-backend-ops on the WebGPU backend.
#ggml-ci-nvidia-webgpu:
# runs-on: [self-hosted, Linux, NVIDIA]
ggml-ci-nvidia-webgpu:
runs-on: [self-hosted, Linux, NVIDIA]

# steps:
# - name: Clone
# id: checkout
# uses: actions/checkout@v6
steps:
- name: Clone
id: checkout
uses: actions/checkout@v6

# - name: Dawn Dependency
# id: dawn-depends
# run: |
# DAWN_VERSION="v20260317.182325"
# DAWN_OWNER="google"
# DAWN_REPO="dawn"
# DAWN_ASSET_NAME="Dawn-18eb229ef5f707c1464cc581252e7603c73a3ef0-ubuntu-latest-Release"
# echo "Fetching release asset from https://github.com/google/dawn/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.tar.gz"
# curl -L -o artifact.tar.gz \
# "https://github.com/google/dawn/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.tar.gz"
# mkdir dawn
# tar -xvf artifact.tar.gz -C dawn --strip-components=1
- name: Dawn Dependency
id: dawn-depends
run: |
DAWN_VERSION="v20260317.182325"
DAWN_OWNER="google"
DAWN_REPO="dawn"
DAWN_ASSET_NAME="Dawn-18eb229ef5f707c1464cc581252e7603c73a3ef0-ubuntu-latest-Release"
echo "Fetching release asset from https://github.com/google/dawn/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.tar.gz"
curl -L -o artifact.tar.gz \
"https://github.com/google/dawn/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.tar.gz"
mkdir dawn
tar -xvf artifact.tar.gz -C dawn --strip-components=1

# - name: Test
# id: ggml-ci
# run: |
# GG_BUILD_WEBGPU=1 \
# GG_BUILD_WEBGPU_DAWN_PREFIX="$GITHUB_WORKSPACE/dawn" \
# GG_BUILD_WEBGPU_DAWN_DIR="$GITHUB_WORKSPACE/dawn/lib64/cmake/Dawn" \
# bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
- name: Test
id: ggml-ci
run: |
GG_BUILD_WEBGPU=1 \
GG_BUILD_WEBGPU_DAWN_PREFIX="$GITHUB_WORKSPACE/dawn" \
GG_BUILD_WEBGPU_DAWN_DIR="$GITHUB_WORKSPACE/dawn/lib64/cmake/Dawn" \
bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp

# TODO: provision AMX-compatible machine
#ggml-ci-cpu-amx:
Expand Down Expand Up @@ -172,6 +196,7 @@ jobs:
# GG_BUILD_ROCM=1 GG_BUILD_AMDGPU_TARGETS="gfx1101" bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp

ggml-ci-mac-metal:
needs: determine-tag
runs-on: [self-hosted, macOS, ARM64]

steps:
Expand All @@ -181,10 +206,13 @@ jobs:

- name: Test
id: ggml-ci
env:
HF_UI_VERSION: ${{ needs.determine-tag.outputs.tag_name }}
run: |
GG_BUILD_METAL=1 bash ./ci/run.sh ~/results/llama.cpp ~/mnt/llama.cpp

ggml-ci-mac-webgpu:
needs: determine-tag
runs-on: [self-hosted, macOS, ARM64]

steps:
Expand All @@ -207,11 +235,14 @@ jobs:

- name: Test
id: ggml-ci
env:
HF_UI_VERSION: ${{ needs.determine-tag.outputs.tag_name }}
run: |
GG_BUILD_WEBGPU=1 GG_BUILD_WEBGPU_DAWN_PREFIX="$GITHUB_WORKSPACE/dawn" \
bash ./ci/run.sh ~/results/llama.cpp ~/mnt/llama.cpp

ggml-ci-mac-vulkan:
needs: determine-tag
runs-on: [self-hosted, macOS, ARM64]

steps:
Expand All @@ -221,11 +252,14 @@ jobs:

- name: Test
id: ggml-ci
env:
HF_UI_VERSION: ${{ needs.determine-tag.outputs.tag_name }}
run: |
vulkaninfo --summary
GG_BUILD_VULKAN=1 bash ./ci/run.sh ~/results/llama.cpp ~/mnt/llama.cpp

ggml-ci-linux-intel-vulkan:
needs: determine-tag
runs-on: [self-hosted, Linux, Intel]

steps:
Expand All @@ -237,11 +271,14 @@ jobs:

- name: Test
id: ggml-ci
env:
HF_UI_VERSION: ${{ needs.determine-tag.outputs.tag_name }}
run: |
vulkaninfo --summary
GG_BUILD_VULKAN=1 bash ./ci/run.sh ~/results/llama.cpp ~/mnt/llama.cpp

ggml-ci-win-intel-vulkan:
needs: determine-tag
runs-on: [self-hosted, Windows, X64, Intel]

steps:
Expand All @@ -256,13 +293,15 @@ jobs:
MSYSTEM: UCRT64
CHERE_INVOKING: 1
PATH: C:\msys64\ucrt64\bin;C:\msys64\usr\bin;C:\Windows\System32;${{ env.PATH }}
HF_UI_VERSION: ${{ needs.determine-tag.outputs.tag_name }}
run: |
vulkaninfo --summary
# Skip python related tests with GG_BUILD_LOW_PERF=1 since Windows MSYS2 UCRT64 currently fails to create
# a valid python environment for testing
LLAMA_FATAL_WARNINGS=OFF GG_BUILD_NINJA=1 GG_BUILD_VULKAN=1 GG_BUILD_LOW_PERF=1 ./ci/run.sh ./results/llama.cpp ./mnt/llama.cpp

ggml-ci-intel-openvino-gpu-low-perf:
needs: determine-tag
runs-on: [self-hosted, Linux, Intel, OpenVINO]

concurrency:
Expand Down Expand Up @@ -294,6 +333,8 @@ jobs:

- name: Test
id: ggml-ci
env:
HF_UI_VERSION: ${{ needs.determine-tag.outputs.tag_name }}
run: |
source ./openvino_toolkit/setupvars.sh
GG_BUILD_OPENVINO=1 GGML_OPENVINO_DEVICE=GPU GG_BUILD_LOW_PERF=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
Loading
Loading