This document describes the GGML (Georgi Gerganov Machine Learning) low-level tensor library bindings available in gollama.cpp.
GGML is the tensor library that powers llama.cpp. While most users will interact with the high-level llama.cpp API (gollama.go), the GGML bindings (goggml.go) provide direct access to low-level tensor operations and backend management for advanced use cases.
Important Note: GGML functions may not be exported in all llama.cpp builds. The library gracefully handles missing functions without errors, allowing code to compile and run even when GGML symbols are not available.
Use GGML bindings when you need:
- Type Information: Query tensor type sizes, block sizes, or quantization status
- Backend Management: Enumerate available compute backends (CPU, GPU, etc.)
- Memory Management: Direct buffer allocation and management
- Quantization: Access to low-level quantization utilities
- Advanced Integration: Building custom tensor operations or tools
For most LLM inference tasks, use the high-level llama.cpp API in gollama.go.
GGML supports various data types for tensors:
GGML_TYPE_F32- 32-bit float (4 bytes)GGML_TYPE_F16- 16-bit float (2 bytes)GGML_TYPE_F64- 64-bit float (8 bytes)GGML_TYPE_BF16- BFloat16 (2 bytes)
GGML_TYPE_I8- 8-bit integer (1 byte)GGML_TYPE_I16- 16-bit integer (2 bytes)GGML_TYPE_I32- 32-bit integer (4 bytes)GGML_TYPE_I64- 64-bit integer (8 bytes)
GGML_TYPE_Q4_0,GGML_TYPE_Q4_1- 4-bit quantizationGGML_TYPE_Q5_0,GGML_TYPE_Q5_1- 5-bit quantizationGGML_TYPE_Q8_0,GGML_TYPE_Q8_1- 8-bit quantizationGGML_TYPE_Q2_K- 2-bit K-quantGGML_TYPE_Q3_K- 3-bit K-quantGGML_TYPE_Q4_K- 4-bit K-quantGGML_TYPE_Q5_K- 5-bit K-quantGGML_TYPE_Q6_K- 6-bit K-quantGGML_TYPE_Q8_K- 8-bit K-quant
GGML_TYPE_IQ1_S,GGML_TYPE_IQ1_M- 1-bit importance quantizationGGML_TYPE_IQ2_XXS,GGML_TYPE_IQ2_XS,GGML_TYPE_IQ2_S- 2-bit IQ variantsGGML_TYPE_IQ3_XXS,GGML_TYPE_IQ3_S- 3-bit IQ variantsGGML_TYPE_IQ4_NL,GGML_TYPE_IQ4_XS- 4-bit IQ variants
func Ggml_type_size(typ GgmlType) (uint64, error)Returns the size in bytes of a GGML type element.
Example:
size, err := gollama.Ggml_type_size(gollama.GGML_TYPE_F32)
if err != nil {
log.Fatal(err)
}
fmt.Printf("F32 size: %d bytes\n", size) // Output: F32 size: 4 bytesfunc Ggml_blck_size(typ GgmlType) (int32, error)Returns the block size of a GGML type (relevant for quantized types).
func Ggml_type_is_quantized(typ GgmlType) (bool, error)Returns whether a GGML type is quantized.
Example:
isQuantized, err := gollama.Ggml_type_is_quantized(gollama.GGML_TYPE_Q4_0)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Q4_0 is quantized: %v\n", isQuantized) // Output: Q4_0 is quantized: truefunc Ggml_type_name(typ GgmlType) (string, error)Returns the string name of a GGML type.
Example:
name, err := gollama.Ggml_type_name(gollama.GGML_TYPE_F32)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Type name: %s\n", name) // Output: Type name: f32func Ggml_backend_dev_count() (uint64, error)Returns the number of available backend devices.
Example:
count, err := gollama.Ggml_backend_dev_count()
if err != nil {
log.Fatal(err)
}
fmt.Printf("Found %d backend device(s)\n", count)func Ggml_backend_dev_get(index uint64) (GgmlBackendDevice, error)Returns a backend device by index.
func Ggml_backend_dev_name(device GgmlBackendDevice) (string, error)Returns the name of a backend device.
Example:
count, _ := gollama.Ggml_backend_dev_count()
for i := uint64(0); i < count; i++ {
dev, err := gollama.Ggml_backend_dev_get(i)
if err != nil {
continue
}
name, err := gollama.Ggml_backend_dev_name(dev)
if err != nil {
continue
}
fmt.Printf("Device %d: %s\n", i, name)
}func Ggml_backend_dev_description(device GgmlBackendDevice) (string, error)Returns the description of a backend device.
func Ggml_backend_dev_memory(device GgmlBackendDevice) (free uint64, total uint64, err error)Returns the memory statistics of a backend device (free and total memory in bytes).
func Ggml_backend_cpu_buffer_type() (GgmlBackendBufferType, error)Returns the CPU buffer type.
func Ggml_backend_buffer_free(buffer GgmlBackendBuffer) errorFrees a backend buffer.
func Ggml_backend_buffer_get_size(buffer GgmlBackendBuffer) (uint64, error)Returns the size of a backend buffer in bytes.
func Ggml_backend_buffer_is_host(buffer GgmlBackendBuffer) (bool, error)Checks if a buffer is in host memory (RAM).
func Ggml_backend_buffer_name(buffer GgmlBackendBuffer) (string, error)Returns the name of a backend buffer.
Here's a comprehensive example using GGML bindings:
package main
import (
"fmt"
"log"
"github.com/dianlight/gollama.cpp"
)
func main() {
// Initialize the library
if err := gollama.Backend_init(); err != nil {
log.Fatal(err)
}
defer gollama.Backend_free()
// Query type information
fmt.Println("=== Type Information ===")
types := []gollama.GgmlType{
gollama.GGML_TYPE_F32,
gollama.GGML_TYPE_F16,
gollama.GGML_TYPE_Q4_0,
gollama.GGML_TYPE_Q8_0,
}
for _, typ := range types {
// Get type size
size, err := gollama.Ggml_type_size(typ)
if err != nil {
fmt.Printf("Type %s: size unavailable\n", typ.String())
continue
}
// Check if quantized
isQuant, _ := gollama.Ggml_type_is_quantized(typ)
// Get type name
name, _ := gollama.Ggml_type_name(typ)
fmt.Printf("Type: %-10s | Size: %2d bytes | Quantized: %v | Name: %s\n",
typ.String(), size, isQuant, name)
}
// Enumerate backend devices
fmt.Println("\n=== Backend Devices ===")
count, err := gollama.Ggml_backend_dev_count()
if err != nil {
fmt.Println("Backend device enumeration not available")
return
}
if count == 0 {
fmt.Println("No backend devices available")
return
}
for i := uint64(0); i < count; i++ {
dev, err := gollama.Ggml_backend_dev_get(i)
if err != nil {
continue
}
name, err := gollama.Ggml_backend_dev_name(dev)
if err != nil {
continue
}
desc, _ := gollama.Ggml_backend_dev_description(dev)
fmt.Printf("Device %d: %s\n", i, name)
if desc != "" {
fmt.Printf(" Description: %s\n", desc)
}
// Try to get memory info (may not be supported)
free, total, err := gollama.Ggml_backend_dev_memory(dev)
if err == nil {
fmt.Printf(" Memory: %.2f MB free / %.2f MB total\n",
float64(free)/(1024*1024),
float64(total)/(1024*1024))
}
}
}Expected Output:
=== Type Information ===
Type: f32 | Size: 4 bytes | Quantized: false | Name: f32
Type: f16 | Size: 2 bytes | Quantized: false | Name: f16
Type: q4_0 | Size: 2 bytes | Quantized: true | Name: q4_0
Type: q8_0 | Size: 1 bytes | Quantized: true | Name: q8_0
=== Backend Devices ===
Device 0: CPU
Description: CPU backend
The GgmlType enum provides a String() method for easy display:
typ := gollama.GGML_TYPE_Q4_0
fmt.Println(typ.String()) // Output: q4_0All GGML functions return an error that should be checked:
size, err := gollama.Ggml_type_size(gollama.GGML_TYPE_F32)
if err != nil {
// Function not available in this build or library not loaded
log.Printf("Warning: %v", err)
return
}
// Use size...The GGML bindings include comprehensive tests in goggml_test.go:
# Run all GGML tests
go test -v -run TestGgml
# Run specific test
go test -v -run TestGgmlTypeSize
# Run benchmarks
go test -v -bench=BenchmarkGgml-
Optional Functions: GGML functions may not be exported in all llama.cpp builds. The library handles this gracefully by returning errors instead of panicking.
-
Platform Differences: Some functions may have different behavior or availability across platforms.
-
Build Variants: Different llama.cpp builds (CPU-only vs GPU-enabled) may export different GGML symbols.
-
Version Compatibility: GGML API may change between llama.cpp versions. Always use the version of gollama.cpp that matches your llama.cpp build.
- Main README - High-level overview and quick start
- Build Guide - Building from source
- GPU Setup - GPU acceleration configuration
- API Reference - Full Go API documentation
If you encounter issues with GGML bindings:
- Check that your llama.cpp build exports GGML symbols
- Verify you're using a compatible gollama.cpp version
- Report issues at: https://github.com/dianlight/gollama.cpp/issues