Awesome LLM Steering

This list focuses on the steering of large language models, which generally refers to techniques that influence and control the model's behavior without retraining it from scratch. There's an intended bias towards covering more about techniques that leverages LLM internals and interpretability for inference time intervention, as opposed to system prompts or agentic workflow style control.

Some papers in this list do not explicitly mention steering but are intrinsically connected to it, such as certain knowledge editing techniques.

Papers

Internal

[arxiv, Anthropic] Persona Vectors: Monitoring and Controlling Character Traits in Language Models
[arxiv] KV Cache Steering for Inducing Reasoning in Small Language Models
[EMNLP 2025] AutoSteer: Automating Steering for Safe Multimodal Large Language Models
[arxiv] InfoSteer: Steering Information Utility in Language Model Post-Training
[EMNLP 2025] Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
[NeurIPS 2025 (Spotlight)] Angular Steering: Behavior Control via Rotation in Activation Space
[COLM 2025] μKE: Matryoshka Unstructured Knowledge Editing of Large Language Models
[COLM 2025] One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs
[ICML 2025] AxBENCH: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
[NAACL 2025] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
[ICLR 2025] Programming Refusal with Conditional Activation Steering
[ICLR 2025] NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
[ICLR 2025] Improving Instruction-Following in Language Models through Activation Steering
[NeurIPS 2024] Stealth edits to large language models
[COLM 2024] Locating and Editing Factual Associations in Mamba
[EMNLP 2024] Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
[ACL 2024] Understanding and Patching Compositional Reasoning in LLMs
[ACL 2024] Steering Llama 2 via Contrastive Activation Addition
[CoRR 2023] Representation Engineering: A Top-Down Approach to AI Transparency
[ICLR 2024] Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
[ICLR 2024] Function Vectors in Large Language Models
[ICLR 2024] Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
[arxiv] Steering language models with activation engineering
[NeurIPS 2023] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
[EMNLP 2023] Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
[ACL 2024] Word Embeddings Are Steers for Language Models
[ICLR 2023] Editing Models with Task Arithmetic
[ICLR 2023] Mass-Editing Memory in a Transformer
[NeurIPS 2022] Locating and Editing Factual Associations in GPT
[EMNLP 2021] Transformer Feed-Forward Layers Are Key-Value Memories

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome LLM Steering

Table of Contents

Papers

Internal

External

MLLM

Blogs & Posts

Videos

Libraries

Other LLM Interpretability Repositories

About

Uh oh!

Releases

Packages

License

ziansu/awesome-llm-steering

Folders and files

Latest commit

History

Repository files navigation

Awesome LLM Steering

Table of Contents

Papers

Internal

External

MLLM

Blogs & Posts

Videos

Libraries

Other LLM Interpretability Repositories

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages