FinRLHF: Budget RLHF-style alignment for FinTech (SFT → DPO) on EC2 + S3

This repo provides an end-to-end, reproducible pipeline to align an open Hugging Face LLM for finance/risk-facing assistant behavior under a small budget:

SFT (instruction tuning) with QLoRA (LoRA on 4-bit base)
Preference dataset generation (chosen/rejected) with a transparent rubric
DPO (RLHF-style preference optimization) with TRL
Lightweight eval + reporting
Optional FastAPI serving
Conventional AWS architecture: EC2 (GPU) + S3 (artifacts) + IAM (least privilege)

⚠️ Not financial advice. Outputs are for research/demo purposes.

1) Quickstart (EC2 GPU recommended)

1.1 Create env

python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
pip install -e .

bitsandbytes is installed only on Linux (GPU target). On non-Linux machines, run smoke tests with load_in_4bit: false.

1.2 Put HF cache on EBS (recommended)

export HF_HOME=/mnt/ebs/hf
export TRANSFORMERS_CACHE=/mnt/ebs/hf/transformers
export HF_DATASETS_CACHE=/mnt/ebs/hf/datasets
mkdir -p "$HF_HOME"

If you use a gated model:

export HF_TOKEN=...

1.3 Run pipeline (defaults: Qwen2.5-7B-Instruct)

bash scripts/train_sft.sh
bash scripts/make_prefs.sh
bash scripts/train_dpo.sh
bash scripts/run_eval.sh

1.4 Sync artifacts to S3 (optional)

export S3_URI="s3://YOUR_BUCKET/finrlhf"
bash scripts/sync_s3.sh

1.5 Quick smoke test (tiny model)

Runs a minimal end-to-end pipeline (prepare_sft -> sft -> make_preferences -> dpo) for validation.

bash scripts/smoke_test.sh

2) Outputs

outputs/sft/ (SFT adapter)
outputs/prefs/ (preference JSONL)
outputs/dpo/ (DPO adapter)
reports/results.json (eval summary)

3) Useful commands

Train SFT:

python -m finrlhf.data.prepare_sft --config configs/sft_qwen25_7b.yaml
python -m finrlhf.train.sft --config configs/sft_qwen25_7b.yaml

Generate preference pairs:

python -m finrlhf.data.make_preferences --config configs/prefs_qwen25_7b.yaml

Train DPO:

python -m finrlhf.train.dpo --config configs/dpo_qwen25_7b.yaml

Eval:

python -m finrlhf.eval.run_eval --config configs/eval.yaml

Serve:

python -m finrlhf.serve.app --config configs/serve.yaml

4) Notes on cost control

QLoRA (4-bit) + small batch + grad accumulation
seq_len <= 1024
save adapters, not merged full weights
prune checkpoints
store only key artifacts in S3

See docs/architecture.md.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
docs		docs
infra		infra
reports		reports
scripts		scripts
src/finrlhf		src/finrlhf
.gitignore		.gitignore
DATASET_CARD.md		DATASET_CARD.md
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinRLHF: Budget RLHF-style alignment for FinTech (SFT → DPO) on EC2 + S3

1) Quickstart (EC2 GPU recommended)

1.1 Create env

1.2 Put HF cache on EBS (recommended)

1.3 Run pipeline (defaults: Qwen2.5-7B-Instruct)

1.4 Sync artifacts to S3 (optional)

1.5 Quick smoke test (tiny model)

2) Outputs

3) Useful commands

4) Notes on cost control

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FinRLHF: Budget RLHF-style alignment for FinTech (SFT → DPO) on EC2 + S3

1) Quickstart (EC2 GPU recommended)

1.1 Create env

1.2 Put HF cache on EBS (recommended)

1.3 Run pipeline (defaults: Qwen2.5-7B-Instruct)

1.4 Sync artifacts to S3 (optional)

1.5 Quick smoke test (tiny model)

2) Outputs

3) Useful commands

4) Notes on cost control

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages