Skip to content

Demonstrates a workflow that involves fetching, processing, storing, analyzing, and reporting on financial data using machine learning techniques within a Snowflake database environment

License

Notifications You must be signed in to change notification settings

oceanicpatterns/VIX-Index-Prediction-Model

Repository files navigation

VIX-Index-Prediction-Model

Lightweight Python + data-engineering project for exploring VIX prediction with a simple baseline model and Snowflake-backed data flow.

Maintainer: OceanicPatterns Repository: https://github.com/oceanicpatterns/VIX-Index-Prediction-Model

Table of Contents

Why This Project Exists

This repository demonstrates a clean, testable workflow for:

  1. Retrieving VIX historical data.
  2. Building a derived volatility feature.
  3. Persisting and reading modeled data through Snowflake.
  4. Training and evaluating a baseline regression model.

It is designed as an educational baseline with production-minded engineering practices (typed contracts, tests, CI, secret hygiene).

Quick Navigation

Background: VIX in Plain English

The VIX (CBOE Volatility Index) is often called the market's "fear gauge." It reflects expected near-term volatility derived from S&P 500 options pricing.

In this project, the engineered feature is:

VOLATILITY_INDEX = (HIGH - LOW) / CLOSE

This is a simplified signal and not a complete market-volatility model. The goal here is to provide a transparent baseline pipeline, not trading advice.

Architecture and Data Flow

Workflow

  1. Fetch CSV data from CBOE endpoint.
  2. Validate expected input columns.
  3. Compute VOLATILITY_INDEX.
  4. Write/read data via Snowflake temp table.
  5. Train/test split.
  6. Fit linear regression.
  7. Report MSE and sample prediction.

Design Boundaries

Interactive Demo Playground

Use the static visitor demo to explore the project quickly:

Run locally:

open docs/playground.html

Quick Start

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pytest -q
python ml_vix_model.py

Configuration and Security

Secrets are not committed to source control.

Recommended (Environment Variables)

export SNOWFLAKE_USER="..."
export SNOWFLAKE_PASSWORD="..."
export SNOWFLAKE_ACCOUNT="..."
export SNOWFLAKE_WAREHOUSE="..."
export SNOWFLAKE_DATABASE="..."
export SNOWFLAKE_SCHEMA="..."

Optional:

export SNOWFLAKE_TEMP_TABLE="MASTER_DB.RAW.TEMP_TABLE"
export VIX_PREDICTION_INPUT="0.40"

Local Config Fallback

cp config/snowflake_config.example.ini config/snowflake_config.ini

Then fill local credentials in config/snowflake_config.ini (ignored by git).

Run the Model

Preferred (backward-compatible CLI style)

python ml_vix_model.py

Installed console entrypoint

pip install .
run_vix_model

Run Tests

Unit tests (default)

pytest -q

Snowflake integration test (opt-in)

RUN_SNOWFLAKE_INTEGRATION_TESTS=1 pytest -q

CI and Quality Gates

On every push/PR:

  1. Secret scan (gitleaks)
  2. Python syntax check
  3. Unit tests

Manual runs are enabled with workflow_dispatch.

Project Structure

VIX-Index-Prediction-Model/
  src/
    vix_model/
      __init__.py
      app.py
      modeling.py
      snowflake_io.py
  tests/
    test_modeling.py
    test_snowflake_io.py
  docs/
    index.html
    playground.html
  config/
    snowflake_config.example.ini
  ml_vix_model.py
  snowflake_connection.py
  pytest.ini
  requirements.txt
  setup.py
  LICENSE

Limitations and Next Steps

  • Current model is intentionally simple (single-feature linear regression).
  • No feature store or experiment tracking yet.
  • No advanced time-series modeling in current baseline.

Natural next extensions:

  1. Add richer feature engineering (lags, rolling stats, macro signals).
  2. Add model comparison (LinearRegression, tree models, regularized models).
  3. Add data versioning and experiment logs.
  4. Add model serving or batch prediction output contracts.

This repository is educational and not financial advice.

About

Demonstrates a workflow that involves fetching, processing, storing, analyzing, and reporting on financial data using machine learning techniques within a Snowflake database environment

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages