Machine Learning Algorithms Repository

Overview

This repository contains comprehensive implementations of various machine learning algorithms, covering both classification and regression tasks. Each algorithm is implemented with proper data preprocessing, model training, evaluation, and hyperparameter tuning where applicable.

Repository Structure

ml_algorithm/
├── clustering/                   # Clustering Algorithms (K-Means, Hierarchical, DBSCAN)
├── KNN/                          # K-Nearest Neighbors Classification
├── Navie_bayes/                  # Naive Bayes Classification
├── Random_forest_classification/ # Random Forest Classification
├── logistic_regression/          # Logistic Regression (Binary & Multiclass)
├── support_vecotr_regression/    # Support Vector Regression
├── support_vector_classification/# Support Vector Classification
├── Linear_regression/            # Linear Regression
├── multiple_linear_regression/   # Multiple Linear Regression
├── Polynomial_regression/        # Polynomial Regression
└── PCA/                          # Principal Component Analysis

Algorithms Implemented

Clustering Algorithms (Unsupervised)

1. K-Means Clustering

Location: clustering/
Technique: Centroid-based clustering
Features:
- Elbow method for optimal k selection
- Silhouette score analysis
- Automatic elbow detection using KneeLocator
Dataset: Synthetic blob dataset (150 samples, 2 features, 3 clusters)

2. Hierarchical Clustering (Agglomerative)

Location: clustering/
Technique: Bottom-up hierarchical clustering
Features:
- Dendrogram visualization
- Ward linkage method
- PCA for dimensionality reduction
Dataset: Iris dataset (reduced to 2D using PCA)

3. DBSCAN (Density-Based Clustering)

Location: clustering/
Technique: Density-based clustering with noise detection
Features:
- Automatic cluster detection
- Outlier/noise identification
- Handles non-spherical clusters
Dataset: Synthetic moon-shaped dataset (1000 samples, 2 features)

Classification Algorithms

4. K-Nearest Neighbors (KNN)

Location: KNN/
Technique: Instance-based learning algorithm
Features: Binary classification with k=5 neighbors
Accuracy: Evaluated using accuracy score

5. Naive Bayes

Location: Navie_bayes/
Technique: Gaussian Naive Bayes
Features: Multi-class classification on Iris dataset
Accuracy: ~96.67%
Evaluation: Accuracy, classification report, confusion matrix

6. Logistic Regression

Location: logistic_regression/
Technique: Logistic Regression with various configurations
Features:
- Binary classification (91% accuracy)
- Multiclass classification (One-vs-Rest strategy)
- Handling imbalanced datasets with class weights
- ROC curve and AUC score analysis
- Hyperparameter tuning (GridSearchCV, RandomizedSearchCV)
Accuracy: 91% (binary), 59% (multiclass), 98.85% (imbalanced)

7. Random Forest Classification

Location: Random_forest_classification/
Technique: Ensemble learning with multiple decision trees
Features:
- Comprehensive data cleaning and preprocessing
- Feature engineering
- Multiple model comparison (Logistic Regression, Decision Tree, Random Forest)
- Hyperparameter tuning with RandomizedSearchCV
Accuracy: 90.90%
Best Model: Random Forest with optimized hyperparameters

8. Support Vector Classification (SVC)

Location: support_vector_classification/
Technique: Support Vector Machine for classification
Features:
- Multiple kernel functions (linear, RBF, polynomial, sigmoid)
- Hyperparameter tuning with GridSearchCV
- Best kernel: RBF
Accuracy: 90%

Regression Algorithms

9. Support Vector Regression (SVR)

Location: support_vecotr_regression/
Technique: Support Vector Machine for regression
Features:
- Categorical feature encoding (Label Encoding, One-Hot Encoding)
- Multiple kernel support
- Hyperparameter tuning
Evaluation: R² score

10. Linear Regression

Location: Linear_regression/
Technique: Simple linear regression
Dataset: Height-weight dataset

11. Multiple Linear Regression

Location: multiple_linear_regression/
Technique: Multiple linear regression with multiple features
Dataset: Economic index dataset

12. Polynomial Regression

Dimensionality Reduction

13. Principal Component Analysis (PCA)

Location: PCA/
Technique: Linear dimensionality reduction
Features:
- Reduces 30 features to 2 principal components
- Variance preservation
- 2D visualization of high-dimensional data
Dataset: Breast Cancer Wisconsin dataset (569 samples, 30 features)
Location: Polynomial_regression/
Technique: Polynomial regression for non-linear relationships

Common Techniques Used Across Projects

Data Preprocessing

Train-test splitting (typically 80-20 split)
Handling missing values (median/mode imputation)
Feature encoding (Label Encoding, One-Hot Encoding)
Feature scaling (StandardScaler)
Feature engineering

Model Evaluation

Classification Metrics:
- Accuracy score
- Precision, Recall, F1-score
- Confusion matrix
- Classification report
- ROC curve and AUC score
Regression Metrics:
- R² score (coefficient of determination)
- Mean squared error (where applicable)

Hyperparameter Tuning

GridSearchCV: Exhaustive search over parameter grid
RandomizedSearchCV: Random search over parameter distributions
Cross-validation (typically 3-5 folds)

Model Comparison

Multiple algorithm comparison
Performance metrics comparison
Best model selection based on evaluation metrics

Technologies Used

Python: Primary programming language
scikit-learn: Machine learning library
pandas: Data manipulation and analysis
numpy: Numerical computing
matplotlib: Data visualization
seaborn: Statistical data visualization
Jupyter Notebooks: Interactive development environment

Getting Started

Prerequisites

pip install pandas numpy scikit-learn matplotlib seaborn jupyter

Running the Notebooks

Navigate to the desired algorithm folder
Open the Jupyter notebook file (.ipynb)
Run all cells to execute the complete workflow

Example

cd KNN
jupyter notebook KNN_algorithm.ipynb

Key Highlights

Comprehensive Coverage: Both classification and regression algorithms
Real-world Applications: Practical datasets and use cases
Best Practices: Proper data preprocessing, evaluation, and hyperparameter tuning
Documentation: Each folder contains detailed README explaining the implementation
Code Quality: Clean, well-structured code with comments

Performance Summary

Algorithm	Task Type	Best Accuracy/R²	Dataset
K-Means	Clustering	Optimal k found	Synthetic Blobs
Hierarchical	Clustering	2 clusters	Iris
DBSCAN	Clustering	Auto-detected	Synthetic Moons
KNN	Classification	Evaluated	Synthetic
Naive Bayes	Classification	96.67%	Iris
Logistic Regression	Classification	91% (binary)	Synthetic
Random Forest	Classification	90.90%	Travel
SVC	Classification	90%	Synthetic
SVR	Regression	Evaluated	Tips
PCA	Dimensionality Reduction	2 components	Breast Cancer

Notes

All implementations use scikit-learn library for consistency
Random states are set for reproducibility
Evaluation metrics are comprehensive and appropriate for each task type
Hyperparameter tuning is performed where applicable to optimize performance

Contributing

Feel free to explore each algorithm folder for detailed implementation and documentation. Each folder contains:

A README.md file explaining the technique and implementation
Jupyter notebook(s) with complete code
Dataset files (where applicable)

License

This repository is for educational purposes, demonstrating various machine learning algorithms and their implementations.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
KNN		KNN
Linear_regression		Linear_regression
Navie_bayes		Navie_bayes
PCA		PCA
Polynomial_regression		Polynomial_regression
Random_forest_classification		Random_forest_classification
Random_forest_regression		Random_forest_regression
clustering		clustering
logistic_regression		logistic_regression
multiple_linear_regression		multiple_linear_regression
support_vecotr_regression		support_vecotr_regression
support_vector_classification		support_vector_classification
.gitattributes		.gitattributes
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Algorithms Repository

Overview

Repository Structure

Algorithms Implemented

Clustering Algorithms (Unsupervised)

1. K-Means Clustering

2. Hierarchical Clustering (Agglomerative)

3. DBSCAN (Density-Based Clustering)

Classification Algorithms

4. K-Nearest Neighbors (KNN)

5. Naive Bayes

6. Logistic Regression

7. Random Forest Classification

8. Support Vector Classification (SVC)

Regression Algorithms

9. Support Vector Regression (SVR)

10. Linear Regression

11. Multiple Linear Regression

12. Polynomial Regression

Dimensionality Reduction

13. Principal Component Analysis (PCA)

Common Techniques Used Across Projects

Data Preprocessing

Model Evaluation

Hyperparameter Tuning

Model Comparison

Technologies Used

Getting Started

Prerequisites

Running the Notebooks

Example

Key Highlights

Performance Summary

Notes

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages