Algorithm Recommender System

A research-grade algorithm recommendation system that automatically selects the optimal algorithm for sorting, array searching, graph searching, string searching, and tree searching problems. The system combines algorithmic theory (correctness constraints) with machine learning (data-driven selection) to provide intelligent, explainable recommendations.

🎯 Project Overview

This system addresses the algorithm selection problem: given a problem instance, which algorithm should be used for optimal performance?

Key Innovation

Traditional approaches either:

Use rule-based systems (rigid, don't adapt to data)
Use pure ML (may violate correctness constraints)

Our approach is a hybrid system that:

Enforces strict correctness constraints (e.g., never recommends Binary Search for unsorted arrays)
Uses ML for optimization among valid algorithms
Provides human-readable explanations for all recommendations
Includes automatic structure inference to detect input type

📊 Domains Supported

1. Sorting Algorithms

Algorithm	Best Case	Average	Worst	Space	When to Use
Insertion Sort	O(n)	O(n²)	O(n²)	O(1)	Small/nearly sorted arrays
Selection Sort	O(n²)	O(n²)	O(n²)	O(1)	Very small arrays
Merge Sort	O(n log n)	O(n log n)	O(n log n)	O(n)	Large arrays, stability needed
Quick Sort	O(n log n)	O(n log n)	O(n²)	O(log n)	General purpose
Heap Sort	O(n log n)	O(n log n)	O(n log n)	O(1)	Guaranteed performance + in-place

Features Extracted:

Input size (log-normalized)
Sortedness measure (inversions ratio)
Duplicate ratio
Distribution characteristics

2. Array Search Algorithms

Algorithm	Complexity	Constraint	When to Use
Linear Search	O(n)	None	Unsorted arrays, small arrays
Binary Search	O(log n)	Sorted only	Large sorted arrays
Jump Search	O(√n)	Sorted only	Sorted arrays, forward-only access
Exponential Search	O(log n)	Sorted + Uniform	Large sorted uniform arrays

Correctness Constraints (STRICT):

❌ Binary/Jump/Exponential search NEVER recommended for unsorted arrays
❌ Exponential search NEVER recommended for non-uniform distributions

3. Graph Search Algorithms

Algorithm	Complexity	Constraint	When to Use
BFS	O(V + E)	Unweighted only	Shortest path in unweighted graphs
Dijkstra	O((V+E) log V)	Non-negative weights	General weighted graphs
Bellman-Ford	O(VE)	Handles negatives	Graphs with negative weights
A*	O(E) best	Requires heuristic	When good heuristic available

Correctness Constraints (STRICT):

❌ BFS NEVER recommended for weighted graphs
❌ Dijkstra NEVER recommended when negative weights exist
❌ A* NEVER recommended without heuristic

4. String Search Algorithms (NEW)

Algorithm	Complexity	Constraint	When to Use
Naive String Search	O(nm)	None	Single pattern, short text
Trie Search	O(m) per query	Multiple patterns OR prefix queries	Multiple patterns, autocomplete

Correctness Constraints (STRICT):

❌ Trie search NEVER recommended for single exact pattern (preprocessing overhead not justified)

5. Tree Search Algorithms (NEW)

Algorithm	Complexity	Constraint	When to Use
BST Search	O(h)	None	Any BST, unknown balance
Balanced Tree Search	O(log n)	Balanced tree only	AVL/Red-Black trees

Correctness Constraints (STRICT):

❌ Balanced tree search NEVER recommended for unbalanced trees (loses O(log n) guarantee)

🏗️ Architecture

algorithm_recommender/
├── sorting/
│   ├── algorithms.py      # Sorting algorithm implementations
│   ├── features.py        # Feature extraction
│   ├── dataset.py         # Training data generation
│   ├── recommender.py     # ML-based recommender
│   ├── train.py           # Training script
│   └── test.py            # Test script
│
├── searching/
│   ├── array_search/
│   │   ├── algorithms.py  # Search algorithm implementations
│   │   ├── features.py    # Feature extraction with correctness
│   │   ├── dataset.py     # Training data (respects constraints)
│   │   ├── recommender.py # Recommender with constraint enforcement
│   │   ├── train.py       # Training script
│   │   └── test.py        # Correctness-focused tests
│   │
│   ├── graph_search/
│   │   ├── algorithms.py  # BFS, Dijkstra, Bellman-Ford, A*
│   │   ├── features.py    # Graph feature extraction
│   │   ├── dataset.py     # Training data generation
│   │   ├── recommender.py # Graph search recommender
│   │   ├── train.py       # Training script
│   │   └── test.py        # Path correctness tests
│   │
│   ├── string_search/     # NEW: String/pattern matching
│   │   ├── algorithms.py  # Naive search, Trie search
│   │   ├── features.py    # Text/pattern feature extraction
│   │   ├── constraints.py # Trie validity constraints
│   │   ├── dataset.py     # Training data generation
│   │   ├── recommender.py # String search recommender
│   │   ├── train.py       # Training script
│   │   └── test.py        # Correctness tests
│   │
│   └── tree_search/       # NEW: Tree-based searching
│       ├── algorithms.py  # BST, AVL tree, search functions
│       ├── features.py    # Tree feature extraction (balance, height)
│       ├── constraints.py # Balance threshold constraints
│       ├── dataset.py     # Training data generation
│       ├── recommender.py # Tree search recommender
│       ├── train.py       # Training script
│       └── test.py        # Balance-aware tests
│
├── models/                 # Saved trained models
│   ├── sorting_model.pkl
│   ├── array_search_model.pkl
│   ├── graph_search_model.pkl
│   ├── string_search_model.pkl  # NEW
│   └── tree_search_model.pkl    # NEW
│
├── utils/                  # Shared utilities
│   ├── timing.py          # Benchmarking utilities
│   ├── statistics.py      # Statistical helpers
│   ├── validation.py      # Input validation
│   └── structure_inference.py  # NEW: Auto-detect input type
│
├── web/                    # Web interface
│   ├── app.py             # Flask application
│   ├── templates/         # HTML templates
│   └── static/            # CSS/JS assets
│
├── unified_interface.py    # Single entry point for all recommendations
├── __init__.py            # Package initialization
└── README.md              # This file

🚀 Quick Start

Installation

# Clone or download the project
cd algorithm_recommender

# Install dependencies
pip install numpy scikit-learn

Basic Usage

from algorithm_recommender import AlgorithmRecommender

# Initialize
recommender = AlgorithmRecommender()

# Train models (first time) or load existing models
recommender.train_all()  # Generates training data and trains models
# OR
recommender.load_models()  # Load pre-trained models

# ===== SORTING =====
result = recommender.recommend_sorting([5, 2, 8, 1, 9, 3, 7])
print(f"Algorithm: {result['algorithm']}")
print(f"Why: {result['explanation']}")
# Output: Algorithm: insertion_sort
# Output: Why: Array is small array, highly unsorted → Insertion Sort selected.

# Actually sort the array
sorted_arr, algo, why = recommender.sort([5, 2, 8, 1, 9])
print(sorted_arr)  # [1, 2, 5, 8, 9]

# ===== ARRAY SEARCH =====
result = recommender.recommend_array_search([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(f"Algorithm: {result['algorithm']}")
print(f"Valid options: {result['valid_algorithms']}")
# Output: Algorithm: binary_search
# Output: Valid options: ['linear_search', 'binary_search', 'jump_search', 'exponential_search']

# Search for a value
idx, algo, why = recommender.search_array([1, 2, 3, 4, 5], 3)
print(f"Found at index {idx} using {algo}")

# ===== GRAPH SEARCH =====
graph = {
    0: [(1, 4.0), (2, 1.0)],
    1: [(3, 1.0)],
    2: [(1, 2.0), (3, 5.0)],
    3: []
}
result = recommender.recommend_graph_search(graph)
print(f"Algorithm: {result['algorithm']}")
# Output: Algorithm: dijkstra

# Find shortest path
path, cost, algo, why = recommender.search_graph(graph, start=0, goal=3)
print(f"Path: {path}, Cost: {cost}")
# Output: Path: [0, 2, 1, 3], Cost: 4.0

# ===== STRING SEARCH (NEW) =====
# Single pattern - only naive search valid
result = recommender.recommend_string_search("hello world", ["world"])
print(f"Algorithm: {result['algorithm']}")
# Output: Algorithm: naive_string_search

# Multiple patterns - trie search valid and optimal
result = recommender.recommend_string_search("hello world hello", ["hello", "world", "foo"])
print(f"Algorithm: {result['algorithm']}")
print(f"Valid options: {result['valid_algorithms']}")
# Output: Algorithm: trie_search
# Output: Valid options: ['naive_string_search', 'trie_search']

# ===== TREE SEARCH (NEW) =====
from searching.tree_search.algorithms import BST, AVLTree

# Balanced AVL tree - balanced search valid
avl = AVLTree()
for k in [5, 3, 7, 1, 9, 4, 6]:
    avl.insert(k)
result = recommender.recommend_tree_search(avl)
print(f"Algorithm: {result['algorithm']}")
# Output: Algorithm: balanced_tree_search

# Skewed BST - only bst_search valid
bst = BST()
for k in range(1, 20):  # Sorted insertion creates skew
    bst.insert(k)
result = recommender.recommend_tree_search(bst)
print(f"Algorithm: {result['algorithm']}")
# Output: Algorithm: bst_search

# ===== AUTO-DETECTION (NEW) =====
# The system can automatically detect what kind of problem you have
result = recommender.recommend_auto([5, 2, 8, 1, 9], operation_hint="sort")
print(f"Task: {result['task_type']}, Algorithm: {result['algorithm']}")
# Output: Task: sorting, Algorithm: insertion_sort

🧪 Training and Testing

Train All Models

cd algorithm_recommender
python train_all.py  # Trains all 5 models
# OR
python unified_interface.py  # Runs demo which trains if needed

Train Individual Models

# Sorting
python -m sorting.train --num_instances 1000

# Array Search
python -m searching.array_search.train --num_instances 500

# Graph Search
python -m searching.graph_search.train --num_instances 400

# String Search (NEW)
python -m searching.string_search.train --num_instances 300

# Tree Search (NEW)
python -m searching.tree_search.train --num_instances 300

Run Tests

# Sorting tests
python -m sorting.test

# Array search tests (correctness-focused)
python -m searching.array_search.test

# Graph search tests
python -m searching.graph_search.test

# String search tests (NEW)
python -m searching.string_search.test

# Tree search tests (NEW)
python -m searching.tree_search.test

🌐 Web Application

The system includes a professional web interface built with Flask.

Running the Web Application

cd algorithm_recommender

# Option 1: Run directly
python web/app.py

# Option 2: Use the batch file (Windows)
web\run_web.bat

Then open http://localhost:5000 in your browser.

Web Features

Task Detection: Automatically detects sorting, array search, graph search, string search, or tree search from natural language
Dynamic Input Forms: Different inputs for each task type
Feature Visualization: Shows extracted features in real-time
Algorithm Recommendation: Displays the recommended algorithm with explanation
Confidence Scores: Visual bar chart of algorithm probabilities
Algorithm Execution: Execute the recommended algorithm and see results
Failure Mode Analysis: Educational feature showing constraint enforcement (see below)

Failure Mode Analysis (Optional Feature)

The web app includes an optional, read-only "Failure Mode Analysis" feature that demonstrates why correctness constraints are essential:

What It Shows:

Algorithm WITHOUT Constraints: What the raw ML model would predict
Why It's Invalid: Explanation of the correctness constraint violated
Corrected Algorithm: The valid algorithm after constraint enforcement

How to Use:

Get a recommendation for any task
In the results section, click "Analyze Constraints" button
View the side-by-side comparison of unconstrained vs constrained choices

Example (Array Search on unsorted array):

┌──────────────────────────────────────────────────────────────┐
│ Failure Mode Analysis                                         │
├──────────────────────────────────────────────────────────────┤
│ Input: Array [5, 2, 8, 1, 9] (UNSORTED)                      │
│                                                               │
│ ┌─────────────────────┐  ┌─────────────────────┐             │
│ │ WITHOUT Constraints │  │ WITH Constraints    │             │
│ │ Binary Search ❌    │  │ Linear Search ✅    │             │
│ │ [INVALID]           │  │ [GUARANTEED CORRECT]│             │
│ └─────────────────────┘  └─────────────────────┘             │
│                                                               │
│ ⚠️ Why Invalid: Binary Search requires sorted array.         │
│    The input array is NOT sorted, so Binary Search would     │
│    produce incorrect results.                                 │
│                                                               │
│ ✅ Constraint System: Overrode ML choice to Linear Search    │
│    which works correctly on any array.                       │
└──────────────────────────────────────────────────────────────┘

This feature helps users understand:

Why pure ML is insufficient for algorithm selection
How correctness constraints prevent invalid recommendations
The value of the hybrid ML + theory approach

API Endpoints

Endpoint	Method	Description
`/`	GET	Main web interface
`/api/detect-task`	POST	Detect task from problem statement
`/api/recommend/sorting`	POST	Get sorting recommendation
`/api/recommend/array-search`	POST	Get array search recommendation
`/api/recommend/graph-search`	POST	Get graph search recommendation
`/api/recommend/string-search`	POST	Get string search recommendation
`/api/recommend/tree-search`	POST	Get tree search recommendation
`/api/execute/sorting`	POST	Execute sorting algorithm
`/api/execute/array-search`	POST	Execute search algorithm
`/api/execute/graph-search`	POST	Execute graph search algorithm
`/api/failure-mode`	POST	Failure mode analysis (unconstrained vs constrained)
`/api/algorithms`	GET	List all supported algorithms

Legacy Streamlit App

A Streamlit version is also available:

streamlit run app.py

🔬 Research Methodology

Feature Extraction

Each domain has a dedicated feature extractor that computes relevant characteristics:

Sorting Features:

size_log: Normalized log of array size
sortedness: 1 - (inversions / max_inversions)
duplicate_ratio: Fraction of duplicate elements
is_nearly_sorted: Binary indicator for high sortedness
is_reverse_sorted: Binary indicator for descending order

Array Search Features:

size_log: Normalized input size
is_sorted: Critical for algorithm validity
is_uniform: Distribution uniformity measure
gap_consistency: Measure of value spacing regularity

Graph Search Features:

num_nodes_log, num_edges_log: Size metrics
edge_density: E / (V * (V-1))
is_weighted: Whether edges have varying weights
has_negative_weights: Critical constraint feature
has_heuristic: Enables A* selection

Labeling Strategy

Training labels are determined using a hybrid empirical + theoretical approach:

Empirical: Benchmark all valid algorithms on generated instances
Theoretical: Apply domain-knowledge priors to adjust scores
Hybrid Score: (1-α) * empirical + α * theoretical

This prevents dominance bias where one algorithm (e.g., QuickSort) always wins empirically but isn't theoretically optimal for all cases.

ML Model

Algorithm: RandomForestClassifier
Why RandomForest:
- Handles non-linear relationships
- Provides feature importance
- Robust to overfitting
- Works well with small-medium datasets
Hyperparameters: 100 trees, max_depth=12-15, balanced class weights

📈 Evaluation Metrics

During training, we evaluate:

Cross-validation accuracy (5-fold)
Test accuracy on held-out data
Classification report per algorithm
Feature importance for interpretability

During testing, we verify:

Correctness constraints are never violated
Algorithm diversity - different algorithms for different regimes
Path/sort correctness - results are actually correct

🎓 Academic Rigor

This project follows research-level expectations:

Clear separation between rule-based constraints and ML-based optimization
Extensive documentation with docstrings and comments
Reproducibility through random seed control
Theoretical grounding with complexity analysis
Explainability - every recommendation includes justification
Extensibility - easy to add new algorithms

📚 Extending the System

Adding a New Sorting Algorithm

Implement in sorting/algorithms.py:

def tim_sort(arr: List[float]) -> List[float]:
    """Tim Sort implementation."""
    # ... implementation
    return result

SORTING_ALGORITHMS['tim_sort'] = tim_sort

Add theoretical priors in sorting/dataset.py:

PRIORS['tim_sort'] = {
    'is_nearly_sorted': 0.4,
    'size_large': 0.3,
}

Retrain the model.

Adding a New Search Algorithm

Similar process, but ensure you also define correctness constraints in ALGORITHM_CONSTRAINTS.

🔧 Requirements

Python 3.8+
NumPy
scikit-learn

📄 License

MIT License

👥 Authors

Algorithm Recommender Research Team

🙏 Acknowledgments

This project draws inspiration from:

Algorithm Selection literature (Rice, 1976)
AutoML and meta-learning research
Classic algorithms textbooks (CLRS, Sedgewick)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
models		models
searching		searching
sorting		sorting
utils		utils
web		web
README.md		README.md
__init__.py		__init__.py
import_test.txt		import_test.txt
quick_test.py		quick_test.py
requirements.txt		requirements.txt
run_app.bat		run_app.bat
run_tests.py		run_tests.py
test_classifier.py		test_classifier.py
test_explanation.py		test_explanation.py
test_explanation_file.py		test_explanation_file.py
test_inference.py		test_inference.py
test_large_arrays.py		test_large_arrays.py
test_new_modules.py		test_new_modules.py
test_output.txt		test_output.txt
test_performance_analyzer.py		test_performance_analyzer.py
test_results.txt		test_results.txt
test_results_explanation.txt		test_results_explanation.txt
train_all.py		train_all.py
unified_interface.py		unified_interface.py
webapp_log.txt		webapp_log.txt

Folders and files

Latest commit

History

Repository files navigation

Algorithm Recommender System

🎯 Project Overview

Key Innovation

📊 Domains Supported

1. Sorting Algorithms

2. Array Search Algorithms

3. Graph Search Algorithms

4. String Search Algorithms (NEW)

5. Tree Search Algorithms (NEW)

🏗️ Architecture

🚀 Quick Start

Installation

Basic Usage

🧪 Training and Testing

Train All Models

Train Individual Models

Run Tests

🌐 Web Application

Running the Web Application

Web Features

Failure Mode Analysis (Optional Feature)

API Endpoints

Legacy Streamlit App

🔬 Research Methodology

Feature Extraction

Labeling Strategy

ML Model

📈 Evaluation Metrics

🎓 Academic Rigor

📚 Extending the System

Adding a New Sorting Algorithm

Adding a New Search Algorithm

🔧 Requirements

📄 License

👥 Authors

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages