consumer-gpu

Star

Here are 14 public repositories matching this topic...

Alberto-Codes / turboquant-vllm

Star

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

compression transformer triton quantization inference-optimization kv-cache llm vllm consumer-gpu turboquant

Updated Apr 10, 2026
Python

Rahul-14507 / MELLM

Star

Lightweight Modular AI Routing Engine for Local LLMs — Run specialised experts efficiently on consumer GPUs using smart Mixture-of-Experts routing.

python mixture-of-experts llama-cpp local-llm gguf consumer-gpu ai-router

Updated Apr 6, 2026
Python

anna-claudette / angruvadal

Star

RAM-Backed MCP Memory Architecture for Consumer LLM Inference — 900K token context on 16GB VRAM

amd mcp rocm llm llama-cpp local-llm context-window rdna4 consumer-gpu rotorquant

Updated Mar 27, 2026
Python

arbitrary-number / arbitrary-number

Star

Arbitrary Numbers

python machine-learning deep-learning tensorflow gpu cuda pytorch nvidia model-serving numerical-computing model-optimization edge-inference ai-inference ai-performance consumer-gpu

Updated Aug 12, 2025
Python

obisin / dgls

Star

Dynamic GPU Layer Swapping: Train large models on consumer GPUs with intelligent memory management

training pytorch gpu-memory memory-optimization consumer-gpu layer-swapping

Updated Sep 12, 2025
Python

chris-colinsky / Zorac

Star

Self-hosted LLM chat client with streaming UI for vLLM servers. Run Mistral-24B locally on RTX 4090/3090. Privacy-focused ChatGPT alternative for homelab/gaming PCs. Python/Rich terminal UI.

python cli ai self-hosted chat-client homelab mistral nvidia-gpu awq llm vllm chatgpt-alternative local-llm llm-inference offline-ai consumer-gpu

Updated Feb 27, 2026
Python

Novamind-CS / Novamind-CS

Star

Surgical reasoning on consumer silicon. Hybrid SSM + causal memory architecture with entropy-gated System 1/2 dispatch, O(1) inference memory, and continual learning — designed for 16 GB VRAM.

machine-learning research deep-learning pytorch lora language-model mamba causal-inference reasoning low-memory fine-tuning state-space-model continual-learning neuro-symbolic llm consumer-gpu

Updated Mar 22, 2026
Python

RAMP: RL-guided Adaptive Mixed-Precision quantization for GGUF models. Data-free sensitivity analysis, evolutionary search, per-tensor type optimization. Produces hardware-optimized GGUF for consumer GPUs.

moe quantization sensitivity-analysis ramp mixed-precision llm llama-cpp qwen gguf qwen3 consumer-gpu imatrix ik-llama

Updated Mar 30, 2026
Python

brownsn1-ux / ollama-coding-agent-notes

Star

Technical notes on building a local-first AI coding assistant with local LLMs, Ollama, SwiftUI, and consumer GPU constraints.

swiftui local-first ai-agent llmops local-llm ollama coding-agent consumer-gpu

Updated Apr 9, 2026

junkyard22 / holster-memory

Star

Tiered GPU memory architecture for consumer AI inference. VRAM as execution cache, system RAM as passive staging layer.

inference pytorch transformer gpu-memory memory-management offloading vram llm local-ai consumer-gpu vram-optimization

Updated Apr 6, 2026
Python

tk-yasuno / gpt-oss-20b-local-execute

Star

GPT-OSS B20 Local Execution. Lightweight local environment for running it with Python 3.12 and CUDA acceleration. - Run GPT-OSS B20 entirely offline - Optimize text generation with GPU - Enable fast, secure inference on consumer hardware.

text-generation performance-optimization gpu-optimization edge-ai inference-acceleration secure-inference model-runtime minimal-setup llm-inference open-source-llm local-execution offline-inference privacy-preserving-ai consumer-gpu gpt-oss-b20 lightweight-environment

Updated Aug 13, 2025
Python

gokhaneraslan / stable-diffusion-3.5-lora-finetuning

Sponsor

Star

A comprehensive, modular framework for fine-tuning Stable Diffusion 3.5 models using LoRA (Low-Rank Adaptation). Create custom AI image generators tailored to your artistic style, objects, or concepts with memory-efficient training on consumer GPUs.

Updated Jun 7, 2025
Python

ikaganacar1 / ismail

Star

ismail is a from-scratch Turkish language model implementation designed for low-end hardware, built and trained on a single RTX 5070 (12GB).

moe mla pytorch-implementation turkish-nlp llm llms llm-training deepseek-v3 turkish-llm consumer-gpu low-resource-llm

Updated Nov 19, 2025
Python

Babyhamsta / PILON

Star

PILON (Primitive-Induced Linear Operator Network) explores a compositional weight parameterization for transformer FFN layers. The goal is to replace dense FFN matrices with shared low-rank primitives plus learned composition weights.

research pytorch transformer language-model model-compression bitnet weight-sharing low-rank ternary-quantization quantization-aware-training efficient-deep-learning consumer-gpu

Updated Mar 20, 2026
Python

Improve this page

Add a description, image, and links to the consumer-gpu topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the consumer-gpu topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consumer-gpu

Here are 14 public repositories matching this topic...

Alberto-Codes / turboquant-vllm

Rahul-14507 / MELLM

anna-claudette / angruvadal

arbitrary-number / arbitrary-number

obisin / dgls

chris-colinsky / Zorac

Novamind-CS / Novamind-CS

AIdevsmartdata / ramp-quant

brownsn1-ux / ollama-coding-agent-notes

junkyard22 / holster-memory

tk-yasuno / gpt-oss-20b-local-execute

gokhaneraslan / stable-diffusion-3.5-lora-finetuning

ikaganacar1 / ismail

Babyhamsta / PILON

Improve this page

Add this topic to your repo