Skip to content

pauldebdeep9/fin-alignment-lab

Repository files navigation

FinRLHF: Budget RLHF-style alignment for FinTech (SFT → DPO) on EC2 + S3

This repo provides an end-to-end, reproducible pipeline to align an open Hugging Face LLM for finance/risk-facing assistant behavior under a small budget:

  • SFT (instruction tuning) with QLoRA (LoRA on 4-bit base)
  • Preference dataset generation (chosen/rejected) with a transparent rubric
  • DPO (RLHF-style preference optimization) with TRL
  • Lightweight eval + reporting
  • Optional FastAPI serving
  • Conventional AWS architecture: EC2 (GPU) + S3 (artifacts) + IAM (least privilege)

⚠️ Not financial advice. Outputs are for research/demo purposes.


1) Quickstart (EC2 GPU recommended)

1.1 Create env

python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
pip install -e .

bitsandbytes is installed only on Linux (GPU target). On non-Linux machines, run smoke tests with load_in_4bit: false.

1.2 Put HF cache on EBS (recommended)

export HF_HOME=/mnt/ebs/hf
export TRANSFORMERS_CACHE=/mnt/ebs/hf/transformers
export HF_DATASETS_CACHE=/mnt/ebs/hf/datasets
mkdir -p "$HF_HOME"

If you use a gated model:

export HF_TOKEN=...

1.3 Run pipeline (defaults: Qwen2.5-7B-Instruct)

bash scripts/train_sft.sh
bash scripts/make_prefs.sh
bash scripts/train_dpo.sh
bash scripts/run_eval.sh

1.4 Sync artifacts to S3 (optional)

export S3_URI="s3://YOUR_BUCKET/finrlhf"
bash scripts/sync_s3.sh

1.5 Quick smoke test (tiny model)

Runs a minimal end-to-end pipeline (prepare_sft -> sft -> make_preferences -> dpo) for validation.

bash scripts/smoke_test.sh

2) Outputs

  • outputs/sft/ (SFT adapter)
  • outputs/prefs/ (preference JSONL)
  • outputs/dpo/ (DPO adapter)
  • reports/results.json (eval summary)

3) Useful commands

Train SFT:

python -m finrlhf.data.prepare_sft --config configs/sft_qwen25_7b.yaml
python -m finrlhf.train.sft --config configs/sft_qwen25_7b.yaml

Generate preference pairs:

python -m finrlhf.data.make_preferences --config configs/prefs_qwen25_7b.yaml

Train DPO:

python -m finrlhf.train.dpo --config configs/dpo_qwen25_7b.yaml

Eval:

python -m finrlhf.eval.run_eval --config configs/eval.yaml

Serve:

python -m finrlhf.serve.app --config configs/serve.yaml

4) Notes on cost control

  • QLoRA (4-bit) + small batch + grad accumulation
  • seq_len <= 1024
  • save adapters, not merged full weights
  • prune checkpoints
  • store only key artifacts in S3

See docs/architecture.md.

About

RLHF blueprint

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors