Cactus Graph & Kernel

Cross-platform & energy-efficient kernels, runtime and AI inference engine for mobile devices.

┌─────────────────┐
│   Cactus FFI    │ ←── OpenAI compatible C API for integration (tools, RAG, cloud handoff)
└─────────────────┘
         │
┌─────────────────┐
│  Cactus Engine  │ ←── High-level transformer engine (NPU support, INT4/INT8/FP16/MIXED)
└─────────────────┘
         │
┌─────────────────┐
│ Cactus Models   │ ←── Implements SOTA models using Cactus Graphs 
└─────────────────┘
         │
┌─────────────────┐  
│  Cactus Graph   │ ←── Unified zero-copy computation graph (think NumPy for mobile)
└─────────────────┘
         │
┌─────────────────┐
│ Cactus Kernels  │ ←── Low-level ARM-specific SIMD operations (think CUDA for mobile)
└─────────────────┘

Cactus Graph & Kernel

#include cactus.h

CactusGraph graph;
auto a = graph.input({2, 3}, Precision::FP16);
auto b = graph.input({3, 4}, Precision::INT8);

auto x1 = graph.matmul(a, b, false);
auto x2 = graph.transpose(x1);
auto result = graph.matmul(b, x2, true);

float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f};
float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
graph.set_input(a, a_data, Precision::FP16);
graph.set_input(b, b_data, Precision::INT8);

graph.execute();
void* output_data = graph.get_output(result);

graph.hard_reset();

Cactus Engine & FFI

#include cactus.h

cactus_set_pro_key("");                  // email founders@cactuscompute.com for optional key

cactus_model_t model = cactus_init(
    "path/to/weight/folder",             // section to generate weigths below
    "txt/or/md/file/or/dir/with/many",   // nullptr if none, cactus does automatic fast RAG
);

const char* messages = R"([
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "My name is Henry Ndubuaku"}
])";

const char* options = R"({
    "max_tokens": 50,
    "stop_sequences": ["<|im_end|>"]
})";

char response[4096];
int result = cactus_complete(
    model,                               // model handle from cactus_init
    messages,                            // JSON array of chat messages
    response,                            // buffer to store response JSON
    sizeof(response),                    // size of response buffer
    options,                             // optional: generation options (nullptr for defaults)
    nullptr,                             // optional: tools JSON for function calling 
    nullptr,                             // optional: streaming callback fn(token, id, user_data)
    nullptr                              // optional: user data passed to callback
);

Example response from Gemma3-270m

{
    "success": true,                    // when successfully generated locally
    "error": null,                      // returns specific errors if success = false
    "cloud_handoff": false,             // true when model is unconfident, simply route to cloud
    "response": "Hi there!",            // null when error is not null or cloud_handoff = true
    "function_calls": [],               // parsed to [{"name":"set_alarm","arguments":{"hour":"10","minute":"0"}}]
    "confidence": 0.8193,               // how confident the model is with its response
    "time_to_first_token_ms": 45.23,    // latency (time to first token)
    "total_time_ms": 163.67,            // total execution time
    "prefill_tps": 1621.89,             // prefill tokens per second
    "decode_tps": 168.42,               // decode tokens per second
    "ram_usage_mb": 245.67,             // current process RAM usage in MB
    "prefill_tokens": 28,
    "decode_tokens": 50,
    "total_tokens": 78
}

Performance

_{Models: LFM2-VL-450m & Whisper-Small}
_{Precision: Cactus smartly blends INT4, INT8 and F16 for all weights.}
_{Decode = toks/sec, P/D = prefill/decode, VLM = 256×256 image, STT = 30s audio}
_{Cactus Pro: Uses NPU for realtime and large context (Apple for now), scores are marked with *}

Device	Short Decode	4k-P/D	VLM-TTFT	VLM-Dec	STT-TTFT	STT-Dec
Mac M4 Pro	170	989/150	0.2s/0.1s*	168	0.9s/0.2s*	92
Mac M3 Pro	140	890/123	0.3s/0.1s*	149	1.5s/0.4s*	81
iPad/Mac M4	134	603/106	0.3s/0.1s*	129	1.8s0.3s*	70
iPad/Mac M3	117	525/93	0.4s/0.1s*	111	2.8s/0.7s*	61
iPhone 17 Pro	126	428/84	0.5s/0.1s*	120	3.0s/0.6s*	80
iPhone 16 Pro	106	380/81	0.6s/0.2s*	101	4.3s/0.7s*	75
iPhone 15 Pro	90	330/75	0.7s/0.3s*	92	4.5s/0.8s*	70
Galaxy S25 Ultra	80	355/52	0.7s	70	3.6s/-	32
Nothing 3	56	320/46	0.8s	54	4.5s	55
Pixel 6a	25	108/24	2.3s	25	9.6	15
Raspberry Pi 5	20	292/18	1.7s	23	15s	16

Supported models

_{Cactus smartly and compactly blends INT4, INT8 and F16 for all weights.}
_{You can still quantize everything with one precision, but mixed is optimal}

Model	Zipped Size	Completion	Tools	Vision	Embed	Speech	Pro
google/gemma-3-270m-it	252MB	✓	✗	✗	✗	✗	✗
google/functiongemma-270m-it	252MB	✓	✓	✗	✗	✗	✗
openai/whisper-small	283MB	✗	✗	✗	✓	✓	Apple
LiquidAI/LFM2-350M	244MB	✓	✓	✗	✓	✗	✗
LiquidAI/LFM2-VL-450M	448MB	✓	✗	✓	✓	✗	Apple
nomic-ai/nomic-embed-text-v2-moe	451MB	✗	✗	✗	✓	✗	✗
Qwen/Qwen3-0.6B	514MB	✓	✓	✗	✓	✗	✗
Qwen/Qwen3-Embedding-0.6B	514MB	✗	✗	✗	✓	✗	✗
LiquidAI/LFM2-700M	498MB	✓	✓	✗	✓	✗	✗
google/gemma-3-1b-it	642MB	✓	✗	✗	✗	✗	✗
LiquidAI/LFM2.5-1.2B-Instruct	474MB	✓	✓	✗	✓	✗	✗
LiquidAI/LFM2-1.2B-RAG	474MB	✓	✓	✗	✓	✗	✗
LiquidAI/LFM2-1.2B-Tool	474MB	✓	✓	✗	✓	✗	✗
openai/whisper-medium	658MB	✗	✗	✗	✓	✓	Apple
LiquidAI/LFM2.5-VL-1.6B	954MB	✓	✗	✓	✓	✗	Apple
Qwen/Qwen3-1.7B	749MB	✓	✓	✗	✓	✗	✗

Using this repo on a Mac

git clone https://github.com/cactus-compute/cactus && cd cactus && source ./setup

_{[model] is a HuggingFace name from the table above (default: google/gemma-3-270m-it)}
_{Common flags: --precision INT4|INT8|FP16 (default: INT4), --token <hf_token>}
_{Always run source ./setup in any new terminal.}

Command	Description
`cactus run [model]`	Opens playground (auto downloads model)
`cactus download [model]`	Downloads model to `./weights`
`cactus convert [model] [dir]`	Converts model, supports LoRA merging (`--lora <path>`)
`cactus build`	Builds for ARM (`--apple` or `--android`)
`cactus test`	Runs tests (`--ios` / `--android`, `--model [name/path]`)
`cactus clean`	Removes build artifacts
`cactus --help`	Shows all commands and flags

Name		Name	Last commit message	Last commit date
Latest commit History 449 Commits
.githooks		.githooks
.github/workflows		.github/workflows
android		android
apple		apple
assets		assets
cactus		cactus
docs		docs
flutter		flutter
libs		libs
python		python
tests		tests
.gitignore		.gitignore
CACTUS_VERSION		CACTUS_VERSION
CONTRIBUTING.md		CONTRIBUTING.md
DCO.md		DCO.md
LICENSE		LICENSE
README.md		README.md
setup		setup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cactus Graph & Kernel

Cactus Engine & FFI

Performance

Supported models

Using this repo on a Mac

Using in your apps

Try demo apps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cactus Graph & Kernel

Cactus Engine & FFI

Performance

Supported models

Using this repo on a Mac

Using in your apps

Try demo apps

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages