An enterprise-grade AI assistant for commission plan management featuring state-of-the-art Machine Learning predictions, Retrieval-Augmented Generation (RAG), Model Context Protocol (MCP) PostgreSQL server integration, and Redis-powered chat history. This solution combines conversational AI, document intelligence, predictive analytics, secure read-only database operations, persistent session management, and seamless API orchestration to revolutionize commission management workflows.
- Advanced Commission Impact Prediction: Monte Carlo simulation-based forecasting
- Multi-Algorithm Ensemble: Random Forest, XGBoost, LightGBM, Prophet time series
- Plan Optimization Engine: AI-powered recommendations with direct edit capability
- Risk Analysis: Confidence intervals, worst-case scenarios, probability calculations
- Cohort Analysis: Performance tier-based impact assessment
- Automated Feature Engineering: 40+ derived features for accurate predictions
- Persistent conversation memory across browser sessions
- Configurable TTL (default: 30 minutes)
- High-performance chat history storage
- Automatic session cleanup
- ML prediction intent detection ("predict 10% increase", "what if we decrease by 5%")
- Plan optimization intent detection ("optimize my plans", "improve plan performance")
- Advanced parameter extraction from natural language
- Better plan creation intent recognition
- Full schema introspection (all tables, not hardcoded)
- Pagination support for large result sets
- Enhanced tenant filtering (org_id + client_id)
- Better SQL security validation
- 🔮 Predictive Analytics: Forecast commission impact with Monte Carlo simulation (1000+ simulations)
- 🎯 Plan Optimization: AI-powered recommendations with actionable edit suggestions
- 📈 Multi-Model Ensemble: Random Forest, XGBoost, LightGBM, Prophet for maximum accuracy
- 📊 Risk Analysis: Confidence intervals, best/worst case scenarios, probability calculations
- 👥 Cohort Analysis: Performance tier-based impact assessment
- 🧠 Advanced RAG System: Hybrid retrieval with semantic understanding and document intelligence
- 💬 Persistent Chat Memory: Redis-powered conversation history with configurable TTL
- 🎯 Intent Detection: Automatically routes queries to appropriate handlers (ML, DB, RAG, Plan Creation)
- 🎯 Guided Plan Creation: Three-phase wizard with intelligent validation and NLP-based extraction
- 📊 Dynamic Rule Builder: Complex commission structures with tiered calculations
- 🔄 Real-time API Integration: Seamless backend synchronization
- 📈 Business Logic Engine: Support for multiple calculation types (Flat/Slab/Tiered)
- ✏️ Direct Plan Editing: Apply ML recommendations with one command
- 🔐 SQL Security Layer: Protection against SQL injection with SELECT-only operations
- 🗄️ PostgreSQL Integration: Secure read-only database operations through MCP protocol
- 📱 Redis Session Management: High-performance chat history storage with automatic expiration
- 🤖 ML Model Cache: Tenant-specific model storage for fast predictions
- 🔍 Intelligent Query Builder: Safe and optimized SQL generation
- 📊 Real-time Analytics: Query results with natural language formatting
- 📄 Multi-Format Processing: PDF, DOCX, Excel, CSV, TXT, JSON, Images (OCR)
- 🔍 Semantic Search: FAISS-powered vector similarity search
- ⚡ Smart Chunking: Adaptive document segmentation
- 🎨 Metadata Extraction: Automatic document enrichment
graph TB
subgraph "Client Layer"
Web[Web Interface]
MCPClient[MCP Client]
end
subgraph "Session & Cache Layer"
Redis[(Redis Cache)]
ChatHistory[Chat History Manager]
MLModels[ML Model Cache]
end
subgraph "Intelligence Engine"
IntentDetection[Intent Detection]
RAG[RAG Pipeline]
MLPredictor[ML Predictor]
PlanOptimizer[Plan Optimizer]
NLPExtractor[NLP Plan Extractor]
end
subgraph "MCP Server Layer"
MCPServer[MCP PostgreSQL Server]
SQLSecurity[SQL Security - SELECT Only]
end
subgraph "Application Core"
Flask[Flask Application]
MLHandler[ML Request Handler]
DBHandler[Database Query Handler]
PlanWizard[Plan Creation Wizard]
PlanEditor[Plan Editor]
end
subgraph "Data Layer"
PostgreSQL[(PostgreSQL DB)]
FAISS[(FAISS Index)]
Docs[(Document Store)]
External[External APIs]
end
Web --> Flask
Flask --> IntentDetection
IntentDetection --> MLHandler
IntentDetection --> DBHandler
IntentDetection --> RAG
IntentDetection --> PlanWizard
MLHandler --> MLPredictor
MLHandler --> PlanOptimizer
PlanOptimizer --> PlanEditor
PlanEditor --> PostgreSQL
NLPExtractor --> PlanWizard
Flask --> ChatHistory
ChatHistory --> Redis
MLPredictor --> MLModels
MLModels --> Redis
DBHandler --> MCPServer
MCPServer --> SQLSecurity
SQLSecurity --> PostgreSQL
PlanWizard --> External
RAG --> FAISS
FAISS --> Docs
commission-dashboard-assistant-RAG/
│
├── 🎯 Core Application Files
│ ├── 📄 app.py # Main Flask application with ML integration
│ ├── 📄 rag.py # RAG pipeline implementation
│ ├── 📄 intent_detection.py # ML/Plan/DB intent detection (ENHANCED)
│ ├── 📄 nlp_plan_builder.py # NLP-based plan field extraction (NEW)
│ └── 📄 requirements.txt # Python dependencies (ML libs added)
│
├── 🤖 ML/Prediction System (NEW!)
│ ├── 📄 ml_advanced_predictor.py # Advanced ML predictor with ensemble models
│ ├── 📄 ml_handler.py # ML request handler for chat interface
│ ├── 📄 plan_editor.py # Apply ML recommendations to plans
│ └── 📄 test_ml_system.py # Comprehensive ML test suite
│
├── 🔌 MCP Integration Files
│ ├── 📄 mcp_client.py # MCP client with Claude SQL generation
│ ├── 📄 improved_mcp_client.py # Enhanced MCP client wrapper
│ ├── 📄 mcp_postgres_server.py # MCP PostgreSQL server (SELECT-only)
│ └── 📄 setup_mcp.py # MCP configuration and setup
│
├── 🗄️ Database & Session Management
│ ├── 📄 final_db_utils.py # Database utilities with tenant filtering
│ ├── 📄 sql_security.py # SQL injection protection
│ ├── 📄 redis_chat_session.py # Redis session management (NEW)
│ └── 📄 test_db.py # Database connection testing
│
├── 🎨 Frontend
│ └── 📁 templates/
│ └── 📄 index.html # Web interface with ML features
│
├── 📊 Data Storage
│ ├── 📁 data/ # Document storage directory
│ │ ├── 📄 README.md
│ │ └── 📄 .gitkeep
│ └── 📁 model/ # ML model cache directory (NEW)
│ ├── 📄 README.md
│ └── 📄 .gitkeep
│
├── 📋 Configuration & Scripts
│ ├── 📄 .env # Environment variables
│ ├── 📄 .gitignore # Git ignore rules
│ ├── 📄 start.bat # Windows startup script
│ └── 📄 README.md # This file
models = {
'random_forest': RandomForestRegressor(n_estimators=200),
'gradient_boosting': GradientBoostingRegressor(n_estimators=200),
'xgboost': XGBRegressor(n_estimators=200),
'lightgbm': LGBMRegressor(n_estimators=200)
}
# Best model automatically selected based on R² score- Time Features: Seasonality (month_sin, month_cos), quarter, day of week
- Plan Features: Duration, progress percentage, timing
- Employee Features: Historical averages, std dev, performance tiers
- Lag Features: Previous 1/2/3 months performance
- Rolling Features: 3-month rolling averages
- Statistical Features: Z-scores, growth rates
- Cohort Features: Performance tier classification
# 1000 simulations to calculate:
- Expected cost change
- Best case (5th percentile)
- Worst case (95th percentile)
- Probability of 10%+ overrun
- Confidence intervals (95%)Detects and recommends fixes for:
- ❌ Low attainment (<70%) → Reduce quota
- ❌ High attainment (>120%) → Increase quota
- ❌ Poor tier distribution → Adjust thresholds
- ❌ High commission rates → Optimize rates
- ✅ Add accelerators for top performers
graph LR
A[User Query] --> B{Intent Detection}
B -->|Predict| C[ML Predictor]
B -->|Optimize| D[Plan Optimizer]
C --> E[Fetch Tenant Data]
D --> E
E --> F[Feature Engineering]
F --> G[Model Selection]
G --> H{Model Exists?}
H -->|No| I[Train Models]
H -->|Yes| J[Use Cached Model]
I --> K[Random Forest]
I --> L[XGBoost]
I --> M[LightGBM]
I --> N[Gradient Boosting]
K --> O[Select Best Model]
L --> O
M --> O
N --> O
O --> P[Cache Model]
P --> J
J --> Q[Generate Predictions]
Q --> R[Monte Carlo Simulation]
R --> S[Risk Analysis]
S --> T[Cohort Analysis]
T --> U[Format Response]
D --> V[Analyze Performance]
V --> W[Generate Recommendations]
W --> X[Create Edit Actions]
X --> Y[Return to User]
U --> Z[Display Results]
Y --> Z
New ML Endpoints:
POST /chat- Now handles ML prediction and optimization intentsPOST /api/ml/predict- Direct API for commission predictionsPOST /api/ml/optimize- Direct API for plan optimizationsPOST /api/ml/apply-change- Apply ML recommendation to a plan
Key New Functions:
handle_ml_request(): Routes ML prediction/optimization requestsis_ml_prediction_intent(): Detects ML prediction queriesis_ml_optimization_intent(): Detects plan optimization queriesextract_ml_params(): Extracts percentage, plan_id, etc. from user message
Core Classes:
TenantMLPredictor: Main ML predictor with tenant isolation
Key Methods:
predict_commission_impact(): Forecast impact of commission changesrecommend_plan_optimizations(): Generate actionable recommendationstrain_tenant_models(): Train ensemble models per tenantengineer_advanced_features(): Create 40+ ML featuresfetch_tenant_commissions(): Load tenant-specific data with isolation
Core Functions:
handle_ml_request(): Main entry point for ML querieshandle_prediction_request(): Process "what if" scenarioshandle_optimization_request(): Process optimization requestsformat_prediction_response(): Format ML output as markdownlookup_plan_id_by_name(): Find plan by name for easier queries
Core Class:
PlanEditor: Applies ML recommendations to database
Key Methods:
apply_optimization(): Apply specific recommendation to plan_adjust_quota(): Modify tier thresholds_adjust_commission_rates(): Change commission percentages_add_tier(): Add accelerator tiers_update_plan_rules(): Commit changes to database
Core Functions:
save_message(): Store chat message with TTLget_chat_history(): Retrieve conversation historyclear_chat_history(): Delete session data
Features:
- Automatic session expiration (configurable TTL)
- Client and session isolation
- JSON serialization for message storage
Core Function:
extract_plan_struct_from_text(): Extract plan fields from paragraph
Extracted Fields:
- plan_name, plan_period, territory, quota
- commission_structure, tiers, bonus_rules
- sales_target, effective_dates
New Functions:
is_ml_prediction_intent(): Detect ML prediction requestsis_ml_optimization_intent(): Detect optimization requestsextract_ml_params(): Extract percentage, plan_id, plan_nameis_plan_creation_intent(): Improved plan creation detection
Supported Intents:
- 🤖 ML Prediction: "predict 10% increase", "what if commission decreases by 5%"
- 🎯 ML Optimization: "optimize my plans", "recommend improvements"
- 📝 Plan Creation: "create plan", "new commission plan"
- 🗄️ Database Query: "show active plans", "list all programs"
- 🧠 RAG Query: "what are our Q2 rates?", "explain our policy"
- Python 3.10+
- PostgreSQL 13+
- Redis 7.0+ (REQUIRED)
- Tesseract OCR (for image processing)
- 8GB+ RAM recommended
- Anthropic API Key (for Claude/MCP)
git clone https://github.com/SNEAKO7/commission-dashboard-assistant-RAG-.git
cd commission-dashboard-assistant-RAG-# macOS/Linux
python3 -m venv .venv
source .venv/bin/activate
# Windows
python -m venv .venv
.venv\Scripts\activatepip install -r requirements.txtNew ML Dependencies:
- scikit-learn (Random Forest, Gradient Boosting)
- xgboost (XGBoost models)
- lightgbm (LightGBM models)
- prophet (Time series forecasting)
- scipy, statsmodels (Statistical analysis)
- redis (Chat session management)
# Install Redis
# Ubuntu/Debian
sudo apt-get install redis-server
# macOS
brew install redis
# Windows (using WSL or Docker recommended)
docker run -d -p 6379:6379 redis:7-alpine
# Start Redis server
redis-server
# Test Redis connection
redis-cli ping# Create database
createdb commission_dashboard
# Run setup script
python setup_mcp.pyCreate .env file:
# Flask Configuration
FLASK_SECRET_KEY=your-super-secure-secret-key
FLASK_ENV=development
FLASK_DEBUG=True
# API Configuration
BACKEND_API_BASE_URL=https://localhost:8081
JWT_TOKEN=your-jwt-token-if-needed
# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=commission_dashboard
DB_USER=your_db_user
DB_PASSWORD=your_db_password
# Redis Configuration (NEW)
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=your_redis_password # Optional
REDIS_DB=0
REDIS_SSL=false
CHAT_HISTORY_TTL=1800 # 30 minutes
# MCP/Claude Configuration (REQUIRED)
ANTHROPIC_API_KEY=your-anthropic-api-key-here
# RAG Configuration
DOCUMENTS_FOLDER=data
MODEL_CACHE_DIR=model
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
CHUNK_SIZE=800
CHUNK_OVERLAP=100
# ML Configuration (NEW)
ML_MODEL_CACHE=true
ML_AUTO_RETRAIN=false
ML_MIN_TRAINING_SAMPLES=100
ML_SIMULATION_COUNT=1000# Test database connection
python test_db.py
# Test Redis connection
redis-cli ping
# Test ML system
python test_ml_system.py
# Initialize MCP server
python setup_mcp.py# Copy documents to data folder
cp /path/to/your/documents/* data/# Windows
start.bat
# Linux/Mac
chmod +x start.sh
./start.sh# Terminal 1: Start Redis (if not running as service)
redis-server
# Terminal 2: Start MCP PostgreSQL Server
python mcp_postgres_server.py
# Terminal 3: Start Flask App
python app.pyAccess at: http://localhost:5000
POST /api/ml/predict
Content-Type: application/json
Authorization: Bearer {jwt_token}
{
"percentage_change": 10.0,
"plan_id": 123 // optional
}Response:
{
"success": true,
"requested_change_pct": 10.0,
"current_metrics": {
"avg_commission": 5000.00,
"total_commission": 500000.00,
"avg_attainment": 85.5,
"num_participants": 100
},
"ml_predictions": {
"avg_commission": 5500.00,
"total_commission": 550000.00
},
"simulation_analysis": {
"total_commission": {
"mean": 548000.00,
"percentile_5": 520000.00,
"percentile_95": 580000.00,
"std": 15000.00
}
},
"risk_metrics": {
"expected_cost_change": 48000.00,
"worst_case_cost": 580000.00,
"probability_exceed_10pct": 0.15
},
"cohort_analysis": [
{
"cohort": "Q1-Low",
"current_avg": 3000.00,
"predicted_avg": 3300.00,
"change_pct": 10.0
}
],
"model_used": "xgboost",
"model_accuracy": {
"r2": 0.925,
"mae": 250.00,
"mape": 5.2
}
}POST /api/ml/optimize
Content-Type: application/json
Authorization: Bearer {jwt_token}
{
"plan_id": 123 // optional - if omitted, analyzes all plans
}Response:
{
"success": true,
"num_plans_analyzed": 5,
"recommendations": [
{
"plan_id": 123,
"plan_name": "Q3 Sales Incentives",
"current_performance": {
"avg_attainment": 65.0,
"avg_commission": 4500.00,
"total_paid": 450000.00
},
"recommendations": [
{
"type": "quota_adjustment",
"priority": "high",
"issue": "Low attainment (65.0%)",
"recommendation": "Reduce quota by 20%",
"expected_impact": "Improve attainment to 80-90% range",
"edit_action": {
"field": "quota",
"operation": "multiply",
"value": 0.80
}
}
],
"priority_score": 9
}
],
"summary": "Analyzed 5 plans. 3 need immediate optimization."
}POST /api/ml/apply-change
Content-Type: application/json
Authorization: Bearer {jwt_token}
{
"plan_id": 123,
"edit_action": {
"field": "quota",
"operation": "multiply",
"value": 0.80
}
}Response:
{
"success": true,
"plan_id": 123,
"changes_applied": {
"field": "quota",
"operation": "multiply",
"value": 0.80
},
"new_structure": [...]
}POST /chat
Content-Type: application/json
Authorization: Bearer {jwt_token}
{
"message": "predict the impact of a 10% commission increase",
"user_id": "user-123",
"session_id": "session-abc"
}ML Prediction Response:
{
"response": "## 📊 AI Commission Impact Analysis: +10.0%\n\n*Analysis using xgboost model*\n*Model Accuracy: R² = 0.925, Error = 5.2%*\n\n### 📈 Current State\n- **Average Payout**: $5,000.00\n- **Total Annual Cost**: $500,000.00\n- **Average Attainment**: 85.5%\n\n### 🤖 AI Predicted Impact\n- **New Average Payout**: $5,500.00\n- **New Total Cost**: $550,000.00\n- **Cost Change**: +$50,000.00\n\n### ⚠️ Risk Analysis (1,000 Simulations)\n- **Expected Cost**: $548,000.00\n- **Best Case**: $520,000.00\n- **Worst Case**: $580,000.00\n- **Risk of 10%+ Overrun**: 15.0%\n\n### 💡 AI Recommendation\n✅ **Moderate Change (+10.0%)**: Monitor closely for impact...",
"history": [...]
}GET /chat/history?user_id={user_id}&session_id={session_id}
Authorization: Bearer {jwt_token}Response:
{
"history": [
{
"sender": "user",
"text": "predict 10% increase"
},
{
"sender": "bot",
"text": "## 📊 AI Commission Impact Analysis..."
}
]
}GET /_test_apiEnhanced Response:
{
"api_base_url": "https://localhost:8081",
"test_endpoint_status": 200,
"jwt_present": true,
"database_connected": true,
"redis_connected": true,
"ml_system_ready": true,
"ml_models_cached": 3,
"anthropic_api_key_present": true,
"test_response": "OK"
}User: "Predict the impact of a 10% commission increase"
System:
✅ Intent: ML Prediction
📊 Analyzing impact using XGBoost model...
🎯 Result: +$50,000 annual cost, 15% risk of overrun
User: "What if I decrease commission by 5% for plan 'Q3 Sales Champions'?"
System:
✅ Intent: ML Prediction (plan-specific)
🔍 Found plan 'Q3 Sales Champions' → ID: 456
📊 Analyzing 5% decrease...
🎯 Result: -$25,000 savings, low risk
User: "I want to increase commission by 15% - what will happen?"
System:
✅ Extracted: +15% change
📊 Monte Carlo simulation (1000 runs)...
🎯 Expected cost: +$75,000
⚠️ High risk scenario - 35% chance of 10%+ overrun
User: "Recommend optimizations for my commission plans"
System:
✅ Intent: ML Optimization
🔍 Analyzing 8 plans...
📊 Found 5 high-priority optimization opportunities
🔴 High Priority Plans:
1. Q3 Sales Incentives (ID: 123)
- Low attainment (65%) → Reduce quota by 20%
- Poor tier distribution → Adjust Tier 1 threshold
2. North Region Plan (ID: 456)
- High commission rate (12%) → Reduce by 2%
User: "Analyze plan 123 and suggest improvements"
System:
✅ Intent: ML Optimization (plan-specific)
📊 Analyzing plan 123...
Current Performance:
- Attainment: 65%
- Avg Payout: $4,500
- Total Cost: $450,000
🔧 Recommended Changes:
1. 🔴 Quota Adjustment
- Reduce quota by 20%
- Expected: Improve attainment to 80-90%
- ✏️ Type 'apply change 1 to plan 123' to implement
2. 🟡 Tier Threshold
- Lower Tier 1 by 15%
- Expected: Better distribution
User: "Apply change 1 to plan 123"
System:
✅ Applying quota adjustment...
📝 Updated plan 123
✅ Quota reduced by 20%
📊 New thresholds:
- Tier 1: $0 - $80,000 (was $100,000)
- Tier 2: $80,000 - $160,000 (was $200,000)
1. User: "Optimize my plans"
→ System shows 3 high-priority recommendations
2. User: "Predict impact of recommendation 1 for plan 123"
→ System forecasts outcome of proposed change
3. User: "Apply change 1 to plan 123"
→ System updates plan in database
4. User: "Show plan 123"
→ System displays updated plan with new structure
User: "Create a commission plan called 'Q4 Champions' for the North territory with a quota of 500000, tiered structure: 8% below quota and 12% above quota, valid from Jan 1 to Mar 31"
System:
✅ Intent: Plan Creation (NLP)
🧠 Extracted fields:
- Plan name: Q4 Champions
- Territory: North
- Quota: 500,000
- Tiers: 8% (below), 12% (above)
- Dates: Jan 1 - Mar 31
📋 PLAN SUMMARY:
🏷️ Plan Name: Q4 Champions
📅 Period: Not specified
🌍 Territory: North
🎯 Quota: 500,000
💰 Commission Structure: Not specified
📊 Commission Tiers:
- Below quota: 8%
- Above quota: 12%
✅ Type 'submit' to save this plan
Session 1 (10:00 AM):
User: "What are our Q3 commission rates?"
System: "Q3 rates are 8% for sales under $100K..."
[Session stored in Redis with 30-min TTL]
Session 2 (10:15 AM - same user, new browser):
User: "Can you increase those rates by 2%?"
System: "I understand you want to increase the Q3 rates we discussed earlier..."
[System loaded context from Redis]
- Model Caching: Tenant-specific models cached in memory and Redis
- Batch Predictions: Process multiple scenarios in single request
- Feature Precomputation: Advanced features calculated once and reused
- Incremental Training: Update models without full retraining
- Lazy Loading: Models loaded on-demand per tenant
- Pagination: Default 10 results per page with offset support
- Result Caching: Frequently accessed data cached in Redis
- Connection Pooling: Reuse database connections
- Query Optimization: Optimized SQL generation by Claude
- Horizontal Scaling: Deploy multiple Flask instances behind load balancer
- Redis Cluster: Use Redis Cluster for high-availability chat sessions
- ML Model Serving: Dedicated ML prediction service (FastAPI/TensorFlow Serving)
- Database Read Replicas: PostgreSQL read replicas for heavy query load
- CDN: Serve static assets via CDN
- Async Processing: Celery for background ML training jobs
| Issue | Solution |
|---|---|
| ML Predictions Fail | Check Anthropic API key, verify 100+ training samples available |
| Redis Connection Failed | Verify Redis service: redis-cli ping, check REDIS_HOST in .env |
| Chat History Not Loading | Check session TTL settings, verify Redis memory usage |
| Model Training Timeout | Reduce num_simulations, check available CPU cores |
| High Memory Usage | Clear ML model cache: tenant_predictor.tenant_models.clear() |
| Slow Predictions | Enable model caching, reduce simulation count |
| Database Query Timeout | Add pagination, optimize SQL with indices |
| MCP Server Not Starting | Verify Anthropic API key, check Python version 3.10+ |
| OCR Issues | Install Tesseract: brew install tesseract or sudo apt install tesseract-ocr |
# Comprehensive ML test suite
python test_ml_system.py
# Test specific ML features
python -c "
from ml_advanced_predictor import tenant_predictor
result = tenant_predictor.predict_commission_impact(
org_id=94,
client_id=93,
percentage_change=10.0
)
print('Prediction Success:', result['success'])
"
# Test plan optimization
python -c "
from ml_advanced_predictor import tenant_predictor
result = tenant_predictor.recommend_plan_optimizations(
org_id=94,
client_id=93
)
print('Found', result['num_plans_analyzed'], 'plans')
"# Test database connectivity
python test_db.py
# Test Redis connectivity
redis-cli ping
# Test chat history
python -c "
from redis_chat_session import save_message, get_chat_history
save_message('test_client', 'test_session', {'sender': 'user', 'text': 'hello'})
history = get_chat_history('test_client', 'test_session')
print('History:', history)
"# Test full MCP pipeline
python -m pytest tests/ -v
# Test all endpoints
python test_all_endpoints.py| Model | R² Score | MAE | MAPE | Training Time |
|---|---|---|---|---|
| XGBoost | 0.92-0.95 | $200-300 | 4-6% | 2-5 sec |
| Random Forest | 0.88-0.92 | $250-350 | 5-7% | 3-6 sec |
| LightGBM | 0.90-0.94 | $220-320 | 4.5-6.5% | 1-3 sec |
| Gradient Boosting | 0.89-0.93 | $230-330 | 5-6.5% | 4-7 sec |
Best Model: XGBoost (typically selected automatically)
- emp_avg_commission (18%) - Historical employee average
- rolling_3m_commission (15%) - 3-month rolling average
- plan_avg_commission (12%) - Plan-level average
- sales_amount (10%) - Current sales
- attainment_percentage (8%) - Current attainment
- commission_rate (7%) - Current rate
- plan_progress_pct (6%) - How far into plan period
- emp_std_commission (5%) - Employee volatility
- month_sin (4%) - Seasonality
- days_into_plan (4%) - Plan timing
- "What if" scenarios: Test commission changes before implementation
- Budget planning: Forecast annual commission costs
- Risk assessment: Understand cost uncertainty and worst-case scenarios
- Impact analysis: See how changes affect different employee cohorts
- Quarterly reviews: Predict Q4 costs based on Q1-Q3 performance
- Individual employee predictions: Models work at aggregate level
- Short-term plans (<3 months history): Need sufficient training data
- Brand new plans: No historical data to learn from
- One-time bonuses: Models expect recurring commission structures
- Low attainment: Plans where <70% of employees hit quota
- High costs: Plans exceeding budget
- Poor distribution: Too many employees in bottom/top tiers
- Plan audits: Regular review of all active plans
- New plan design: Learn from similar historical plans
- Recently launched plans (<2 months): Not enough performance data
- Custom/unique structures: Recommendations based on common patterns
- Non-commission incentives: Stock options, retention bonuses, etc.
- 0.90-1.00: Excellent - Model highly accurate
- 0.80-0.90: Good - Reliable for most decisions
- 0.70-0.80: Fair - Use with caution, verify assumptions
- <0.70: Poor - Insufficient data or plan too complex
-
Probability of 10%+ Overrun:
- <10%: Low risk
- 10-25%: Moderate risk - monitor closely
- 25-50%: High risk - consider phased rollout
-
50%: Very high risk - reconsider change
-
Confidence Intervals (5th-95th percentile):
- Narrow range (<20% of mean): High confidence
- Wide range (>40% of mean): High uncertainty
- Model Isolation: Tenant-specific models prevent data leakage
- Input Validation: All ML inputs sanitized and validated
- Rate Limiting: Prevent ML abuse (expensive operations)
- Audit Logging: All ML predictions and plan edits logged
- Permission Checks: Verify user can access org_id/client_id
# Production Redis Configuration
redis-cli CONFIG SET requirepass "your-strong-password"
redis-cli CONFIG SET protected-mode yes
redis-cli CONFIG SET bind "127.0.0.1"
# Enable SSL/TLS
redis-server --tls-port 6380 --port 0 \
--tls-cert-file redis.crt \
--tls-key-file redis.key \
--tls-ca-cert-file ca.crt- SELECT-Only: MCP server enforces read-only operations
- Tenant Filtering: All queries filtered by org_id + client_id
- SQL Injection Protection: Parameterized queries only
- Connection Encryption: SSL/TLS for database connections
# Production Configuration
FLASK_ENV=production
FLASK_DEBUG=False
LOG_LEVEL=INFO
# Redis Production
REDIS_HOST=redis.production.com
REDIS_PORT=6379
REDIS_PASSWORD=super-secure-redis-password
REDIS_SSL=true
REDIS_CONNECTION_POOL_SIZE=50
# ML Production Settings
ML_MODEL_CACHE=true
ML_AUTO_RETRAIN=true
ML_MIN_TRAINING_SAMPLES=500
ML_SIMULATION_COUNT=1000
# Database Production
DB_HOST=postgres.production.com
DB_SSL_MODE=require
DB_CONNECTION_POOL_SIZE=20
# API Keys (use environment-specific secrets)
ANTHROPIC_API_KEY=sk-ant-production-key
JWT_SECRET_KEY=production-jwt-secret# Dockerfile
FROM python:3.10-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
tesseract-ocr \
postgresql-client \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Run application
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "--timeout", "120", "app:app"]version: '3.8'
services:
app:
build: .
ports:
- "5000:5000"
environment:
- REDIS_HOST=redis
- DB_HOST=postgres
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
depends_on:
- redis
- postgres
redis:
image: redis:7-alpine
ports:
- "6379:6379"
command: redis-server --requirepass ${REDIS_PASSWORD}
volumes:
- redis_data:/data
postgres:
image: postgres:13
environment:
POSTGRES_DB: commissions
POSTGRES_USER: ${DB_USER}
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
redis_data:
postgres_data:- Deep Learning models (LSTM, Transformers) for time series
- Automated hyperparameter tuning (Optuna, Hyperopt)
- A/B testing framework for plan changes
- Causal inference for true impact estimation
- Real-time model retraining on new data
- Model explainability (SHAP values)
- Custom ML models per plan type
- Multi-objective optimization (cost + satisfaction + retention)
- Streaming responses for ML predictions
- Plan comparison tool
- What-if scenario builder (UI)
- Email notifications for plan changes
- Mobile app (React Native)
- GraphQL API
- Audit trail dashboard
- Advanced analytics & reporting
| Feature | Preview |
|---|---|
| ML Prediction | [Add screenshot of ML prediction response] |
| Plan Optimization | [Add screenshot of optimization recommendations] |
| Chat with History | [Add screenshot showing session continuity] |
| NLP Plan Creation | [Add screenshot of natural language plan extraction] |
We welcome contributions! Areas of interest:
- Additional ML models (neural networks, ensemble methods)
- Performance optimizations
- UI/UX improvements
- Test coverage
- Documentation
- Bug fixes
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-ml-feature) - Commit changes (
git commit -m 'Add amazing ML feature') - Push to branch (
git push origin feature/amazing-ml-feature) - Open Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Anthropic Claude - AI/ML capabilities and MCP protocol
- Scikit-learn - Machine learning algorithms
- XGBoost - Gradient boosting framework
- LightGBM - Fast gradient boosting
- Prophet - Time series forecasting
- Redis - In-memory data store
- FAISS - Efficient similarity search
- Hugging Face - Transformer models and embeddings
- LangChain - LLM application framework
- Flask - Web framework
- The broader Python open-source community
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
Built with ❤️ for intelligent commission management and powered by advanced ML predictions