From fa255418536bc19a92819c0a66c8c2e1347c55c5 Mon Sep 17 00:00:00 2001 From: shsarv4 <166940544+shsarv4@users.noreply.github.com> Date: Wed, 18 Mar 2026 23:31:31 +0530 Subject: [PATCH 1/8] Update README.md --- Colorize Black & white images [OPEN CV]/models/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Colorize Black & white images [OPEN CV]/models/README.md b/Colorize Black & white images [OPEN CV]/models/README.md index c7c3975..75d2fa9 100644 --- a/Colorize Black & white images [OPEN CV]/models/README.md +++ b/Colorize Black & white images [OPEN CV]/models/README.md @@ -1 +1 @@ -# download the model file from [here](https://drive.google.com/file/d/14YmdCfcMOgfJEBNJEl6Xj1SB-RccgJBO/view?usp=sharing) and add them to this folder in order to run this project. +# download the model file from [https://huggingface.co/spaces/BilalSardar/Black-N-White-To-Color/blob/main/colorization_release_v2.caffemodel](https://huggingface.co/spaces/BilalSardar/Black-N-White-To-Color/blob/main/colorization_release_v2.caffemodel) and add them to this folder in order to run this project. From 4ecc8e3e26af902cedfca46bd7b8b3acaf98f019 Mon Sep 17 00:00:00 2001 From: shsarv4 <166940544+shsarv4@users.noreply.github.com> Date: Wed, 18 Mar 2026 23:34:26 +0530 Subject: [PATCH 2/8] Update README.md --- .../README.md | 508 +++++++----------- 1 file changed, 202 insertions(+), 306 deletions(-) diff --git a/Colorize Black & white images [OPEN CV]/README.md b/Colorize Black & white images [OPEN CV]/README.md index 986ad44..e8f7dac 100644 --- a/Colorize Black & white images [OPEN CV]/README.md +++ b/Colorize Black & white images [OPEN CV]/README.md @@ -1,386 +1,282 @@ -
+
-# Colorize Black white Image +# ๐ŸŽจ Colorize Black & White Images โ€” OpenCV Deep Learning -This Deep Learning Project aims to provide colorizing black & white -images with Python. +[![Python](https://img.shields.io/badge/Python-3.7+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/) +[![OpenCV](https://img.shields.io/badge/OpenCV-DNN-5C3EE8?style=for-the-badge&logo=opencv&logoColor=white)](https://opencv.org/) +[![Caffe](https://img.shields.io/badge/Caffe-Pre--trained%20Model-red?style=for-the-badge)](http://caffe.berkeleyvision.org/) +[![Tkinter](https://img.shields.io/badge/Tkinter-GUI%20App-blue?style=for-the-badge)](https://docs.python.org/3/library/tkinter.html) +[![License](https://img.shields.io/badge/License-MIT-1abc9c?style=for-the-badge)](../LICENSE.md) -In image colorization, we take a black and white image as input and -produce a colored image. We will solve this project with OpenCV deep -neural network. +> Automatically colorizes **black & white images** using a pre-trained deep learning model loaded via **OpenCV DNN** โ€” wrapped in a clean **Tkinter desktop GUI** where you upload an image and get a colorized result instantly. -
- -
- - - -
- -
- -### Lab Color Space: - -Like RGB, Lab is another color space. It is also three channel color -space like RGB where the channels are: - - L channel: This channel represents the Lightness - a channel: This channel represents green-red - b channel: This channel represents blue-yellow - -In this color space, the grayscale part of the image is only encoded in -L channel. Therefore Lab color space is more favorable for our project. +[๐Ÿ”™ Back to Main Repository](https://github.com/shsarv/Machine-Learning-Projects)
-
- -### Problem Statement: +--- -deep learning project colorize black white images with python +## ๐Ÿ“Œ Table of Contents -We can formulate our problem statement as to predict a and b channels, -given an input grayscale image. +- [About the Project](#-about-the-project) +- [How It Works](#-how-it-works) +- [The Science โ€” Lab Color Space](#-the-science--lab-color-space) +- [The Model โ€” Zhang et al. 2016](#-the-model--zhang-et-al-2016) +- [Model Files](#-model-files) +- [Project Structure](#-project-structure) +- [Getting Started](#-getting-started) +- [App Preview](#-app-preview) +- [Tech Stack](#-tech-stack) +- [References & Citation](#-references--citation) -In this deep learning project, we will use OpenCV DNN architecture which -is trained on ImageNet dataset. The neural net is trained with the L -channel of images as input data and a,b channels as target data. +--- -
+## ๐Ÿ”ฌ About the Project -
+Manually colorizing historical black & white photographs is an extremely time-consuming artistic process. This project automates it entirely using a **Convolutional Neural Network** trained to "hallucinate" plausible colors for any grayscale input โ€” from old family photos to historical images. -#### Steps to implement Image Colorization Project: +Rather than training a model from scratch, the project loads **Richard Zhang et al.'s 2016 pre-trained Caffe model** directly through **OpenCV's DNN module**, making inference fast and dependency-light. The entire experience is wrapped in a **Tkinter GUI** where users upload a grayscale image and receive a colorized version with a single click. -For colorizing black and white images we will be using a pre-trained -caffe model, a prototxt file, and a NumPy file. +**What this project covers:** +- Understanding Lab color space and why it is ideal for colorization +- Loading and running a pre-trained Caffe model via OpenCV DNN +- Image preprocessing: RGB โ†’ Lab, extracting the L channel as input +- Post-processing: merging predicted `ab` channels back with `L`, converting Lab โ†’ BGR +- Building a desktop GUI with Tkinter for real-time image upload and display -The prototxt file defines the network and the numpy file stores the -cluster center points in numpy format. +--- -1. Make a directory with name models. - -
+## โš™๏ธ How It Works -
- -``` python -!mkdir models ``` - -
- -
- -download the caffemodel, prototxt file and the NumPy file. - -
- -
- -``` python -!wget https://github.com/richzhang/colorization/blob/caffe/colorization/resources/pts_in_hull.npy?raw=true -O ./pts_in_hull.npy +Input: Grayscale / B&W Image + โ”‚ + โ–ผ + Convert: BGR โ†’ RGB โ†’ Lab + โ”‚ + โ–ผ + Extract L channel (lightness only) + Resize to 224 ร— 224 + โ”‚ + โ–ผ + OpenCV DNN Forward Pass + (Zhang et al. Caffe model) + โ”‚ + โ–ผ + Predict ab channels + (313 quantized color bins โ†’ soft-decoded to ab) + โ”‚ + โ–ผ + Resize predicted ab โ†’ original image size + โ”‚ + โ–ผ + Concatenate: L (original) + ab (predicted) + โ”‚ + โ–ผ + Convert: Lab โ†’ BGR + Clip values to [0, 1], scale to [0, 255] + โ”‚ + โ–ผ + Output: Colorized Image โ†’ Display in GUI / Save ``` -
- -
+--- -``` python -!wget https://raw.githubusercontent.com/richzhang/colorization/caffe/colorization/models/colorization_deploy_v2.prototxt -O ./models/colorization_deploy_v2.prototxt -``` +## ๐ŸŽจ The Science โ€” Lab Color Space +This project uses the **Lab color space** rather than the familiar RGB. Here's why it matters: +| Channel | Represents | Role in This Project | +|---------|-----------|---------------------| +| **L** | Lightness (0 = black, 100 = white) | Input to the model โ€” this IS the grayscale image | +| **a** | Green โ†” Red axis | Predicted by the neural network | +| **b** | Blue โ†” Yellow axis | Predicted by the neural network | -
+**The key insight:** In Lab, grayscale information is *entirely* encoded in the `L` channel. Color information lives only in `a` and `b`. This means the model only needs to learn to predict two channels from one โ€” a much cleaner problem than mapping RGB to RGB. -
- -``` python -!wget http://eecs.berkeley.edu/~rich.zhang/projects/2016_colorization/files/demo_v2/colorization_release_v2.caffemodel -O ./models/colorization_release_v2.caffemodel ``` - -
- -
- -### Import Essential Library - -
- -
- -``` python -import numpy as np -import cv2 as cv -from matplotlib import pyplot as plt -import os.path +Grayscale Image = L channel + โ”‚ + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” + โ–ผ โ–ผ + Neural Network (kept as-is) + predicts: a, b L + โ”‚ โ”‚ + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ–ผ + Lab image โ†’ BGR + = Colorized Output ``` -
+--- -
+## ๐Ÿง  The Model โ€” Zhang et al. 2016 -### Read B\&W image and load the caffemodel: +The colorization model is from the landmark 2016 ECCV paper **"Colorful Image Colorization"** by Richard Zhang, Phillip Isola, and Alexei A. Efros (UC Berkeley). -
+**Key design choices in the paper:** -
+| Aspect | Detail | +|--------|--------| +| **Training data** | 1.3M images from ImageNet (Lab converted) | +| **Input** | L channel (grayscale), resized to 224ร—224 | +| **Output** | Predicted `ab` channels over 313 quantized color bins | +| **Loss function** | Multinomial cross-entropy with rebalanced class weights (to prevent desaturated outputs) | +| **Architecture** | Deep CNN with 8 conv blocks, no pooling โ€” uses dilated convolutions to preserve spatial resolution | +| **Color decoding** | Annealed-mean of the 313 bin distribution (avoids washed-out grays from using the mean) | -``` python -frame = cv.imread("new.jpg") +> **Why 313 bins?** The `ab` color space is quantized into 313 bins with a grid size of 10. The model predicts a probability distribution over all 313 possible colors for each pixel, then decodes to a single `ab` value. -numpy_file = np.load('./pts_in_hull.npy') -Caffe_net = cv.dnn.readNetFromCaffe("./models/colorization_deploy_v2.prototxt", "./models/colorization_release_v2.caffemodel") +--- +## ๐Ÿ“ฆ Model Files +Three files are required to run inference. They are **not included** in the repository due to size and must be downloaded separately: -rgb_img = cv.cvtColor(frame, cv.COLOR_BGR2RGB) # this converts it into RGB -plt.imshow(rgb_img) -plt.show() -``` +| File | Description | Download | +|------|-------------|---------| +| `colorization_release_v2.caffemodel` | Pre-trained model weights (~125 MB) | [Berkeley EECS](http://eecs.berkeley.edu/~rich.zhang/projects/2016_colorization/files/demo_v2/colorization_release_v2.caffemodel) | +| `colorization_deploy_v2.prototxt` | Network architecture definition | [richzhang/colorization](https://raw.githubusercontent.com/richzhang/colorization/master/colorization/models/colorization_deploy_v2.prototxt) | +| `pts_in_hull.npy` | 313 cluster center points in ab space | [richzhang/colorization](https://github.com/richzhang/colorization/blob/caffe/colorization/resources/pts_in_hull.npy?raw=true) | -
+Download all three with: -![](input.png) +```bash +mkdir -p models -
+# Caffe model weights (~125 MB) +wget http://eecs.berkeley.edu/~rich.zhang/projects/2016_colorization/files/demo_v2/colorization_release_v2.caffemodel \ + -O ./models/colorization_release_v2.caffemodel -
- -
+# Network prototxt definition +wget https://raw.githubusercontent.com/richzhang/colorization/master/colorization/models/colorization_deploy_v2.prototxt \ + -O ./models/colorization_deploy_v2.prototxt -### Add layers to the caffe model: - -
- -
- -``` python -numpy_file = numpy_file.transpose().reshape(2, 313, 1, 1) -Caffe_net.getLayer(Caffe_net.getLayerId('class8_ab')).blobs = [numpy_file.astype(np.float32)] -Caffe_net.getLayer(Caffe_net.getLayerId('conv8_313_rh')).blobs = [np.full([1, 313], 2.606, np.float32)] +# Cluster centers (ab quantization bins) +wget https://github.com/richzhang/colorization/blob/caffe/colorization/resources/pts_in_hull.npy?raw=true \ + -O ./pts_in_hull.npy ``` -
- -
+--- -### Extract L channel and resize it: +## ๐Ÿ“ Project Structure -
+``` +Colorize Black & white images [OPEN CV]/ +โ”‚ +โ”œโ”€โ”€ ๐Ÿ“‚ models/ +โ”‚ โ”œโ”€โ”€ colorization_release_v2.caffemodel # Pre-trained weights (download separately) +โ”‚ โ””โ”€โ”€ colorization_deploy_v2.prototxt # Network architecture +โ”‚ +โ”œโ”€โ”€ pts_in_hull.npy # 313 ab color bin cluster centers +โ”œโ”€โ”€ colorize.py # Core colorization logic (OpenCV DNN pipeline) +โ”œโ”€โ”€ gui.py # Tkinter GUI application +โ”œโ”€โ”€ new.jpg # Sample test image +โ”œโ”€โ”€ result.png # Sample colorized output +โ”œโ”€โ”€ requirements.txt # Python dependencies +โ””โ”€โ”€ README.md # You are here +``` -
+--- -``` python -input_width = 224 -input_height = 224 +## ๐Ÿš€ Getting Started -rgb_img = (frame[:,:,[2, 1, 0]] * 1.0 / 255).astype(np.float32) -lab_img = cv.cvtColor(rgb_img, cv.COLOR_RGB2Lab) -l_channel = lab_img[:,:,0] +### 1. Clone the repository -l_channel_resize = cv.resize(l_channel, (input_width, input_height)) -l_channel_resize -= 50 +```bash +git clone https://github.com/shsarv/Machine-Learning-Projects.git +cd "Machine-Learning-Projects/Colorize Black & white images [OPEN CV]" ``` -
- -
+### 2. Set up environment -### Predict the ab channel and save the result: +```bash +python -m venv venv +source venv/bin/activate # Linux / macOS +venv\Scripts\activate # Windows -
+pip install -r requirements.txt +``` -
+### 3. Download model files -``` python -Caffe_net.setInput(cv.dnn.blobFromImage(l_channel_resize)) -ab_channel = Caffe_net.forward()[0,:,:,:].transpose((1,2,0)) +Run the wget commands from the [Model Files](#-model-files) section above, or download manually and place them in `./models/`. -(original_height,original_width) = rgb_img.shape[:2] -ab_channel_us = cv.resize(ab_channel, (original_width, original_height)) -lab_output = np.concatenate((l_channel[:,:,np.newaxis],ab_channel_us),axis=2) -bgr_output = np.clip(cv.cvtColor(lab_output, cv.COLOR_Lab2BGR), 0, 1) +### 4. Run the GUI app -cv.imwrite("./result.png", (bgr_output*255).astype(np.uint8)) +```bash +python gui.py ``` -
+This opens the Tkinter desktop window: +- **File โ†’ Upload Image** โ€” select any grayscale or black & white `.jpg` / `.png` +- **File โ†’ Color Image** โ€” run the colorization model and display the result - True - -
+### 5. Run colorization directly (no GUI) -
- -
+```bash +python colorize.py --image new.jpg +# Outputs: result.png in the current directory +``` -### Output +--- -
+## ๐Ÿ–ฅ๏ธ App Preview -
- -``` python -frame1 = cv.imread("result.png") -rgb_img = cv.cvtColor(frame1, cv.COLOR_BGR2RGB) # this converts it into RGB -plt.imshow(rgb_img) -plt.show() +``` +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ B&W Image Colorization โ”‚ +โ”‚ File โ–พ โ”‚ +โ”‚ โ”œโ”€โ”€ Upload Image โ”‚ +โ”‚ โ””โ”€โ”€ Color Image โ”‚ +โ”‚ โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ +โ”‚ โ”‚ [B&W Input] โ”‚ โ”‚ [Colorized Out] โ”‚ โ”‚ +โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` -
+--- -![](output.png) +## ๐Ÿ› ๏ธ Tech Stack -
+| Layer | Technology | +|-------|-----------| +| Language | Python 3.7+ | +| Computer Vision | OpenCV (`cv2.dnn`) | +| Pre-trained Model | Caffe (Zhang et al. 2016) | +| GUI Framework | Tkinter | +| Numerical Computing | NumPy | +| Visualization | Matplotlib | -
+--- -
+## ๐Ÿ“š References & Citation -## Code for GUI: +**Paper behind the model:** -
+```bibtex +@inproceedings{zhang2016colorful, + title = {Colorful Image Colorization}, + author = {Zhang, Richard and Isola, Phillip and Efros, Alexei A}, + booktitle = {ECCV}, + year = {2016} +} +``` -
+- [Colorful Image Colorization โ€” Zhang et al. (2016)](https://arxiv.org/abs/1603.08511) +- [Official Demo & Model โ€” richzhang/colorization](https://github.com/richzhang/colorization) +- [OpenCV DNN colorization sample](https://github.com/opencv/opencv/blob/master/samples/dnn/colorization.py) +- [PyImageSearch Tutorial โ€” Adrian Rosebrock](https://pyimagesearch.com/2019/02/25/black-and-white-image-colorization-with-opencv-and-deep-learning/) -``` python -%%writefile gui.py +--- -import tkinter as tk -from tkinter import * -from tkinter import filedialog -from PIL import Image, ImageTk -import os -import numpy as np -import cv2 as cv -import os.path -import matplotlib -matplotlib.use('Agg') - -import sys -import os - -if os.environ.get('DISPLAY','') == '': - print('no display found. Using :0.0') - os.environ.__setitem__('DISPLAY', ':0.0') - -numpy_file = np.load('./pts_in_hull.npy') -Caffe_net = cv.dnn.readNetFromCaffe("./models/colorization_deploy_v2.prototxt", "./models/colorization_release_v2.caffemodel") -numpy_file = numpy_file.transpose().reshape(2, 313, 1, 1) - -class Window(Frame): - def __init__(self, master=None): - Frame.__init__(self, master) - - self.master = master - self.pos = [] - self.master.title("B&W Image Colorization") - self.pack(fill=BOTH, expand=1) - - menu = Menu(self.master) - self.master.config(menu=menu) - - file = Menu(menu) - file.add_command(label="Upload Image", command=self.uploadImage) - file.add_command(label="Color Image", command=self.color) - menu.add_cascade(label="File", menu=file) - - self.canvas = tk.Canvas(self) - self.canvas.pack(fill=tk.BOTH, expand=True) - self.image = None - self.image2 = None - - label1=Label(self,image=img) - label1.image=img - label1.place(x=400,y=370) - - - - - def uploadImage(self): - filename = filedialog.askopenfilename(initialdir=os.getcwd()) - if not filename: - return - load = Image.open(filename) - - load = load.resize((480, 360), Image.ANTIALIAS) - - if self.image is None: - w, h = load.size - width, height = root.winfo_width(), root.winfo_height() - self.render = ImageTk.PhotoImage(load) - self.image = self.canvas.create_image((w / 2, h / 2), image=self.render) - - else: - self.canvas.delete(self.image3) - w, h = load.size - width, height = root.winfo_screenmmwidth(), root.winfo_screenheight() - - self.render2 = ImageTk.PhotoImage(load) - self.image2 = self.canvas.create_image((w / 2, h / 2), image=self.render2) - - - frame = cv.imread(filename) - - Caffe_net.getLayer(Caffe_net.getLayerId('class8_ab')).blobs = [numpy_file.astype(np.float32)] - Caffe_net.getLayer(Caffe_net.getLayerId('conv8_313_rh')).blobs = [np.full([1, 313], 2.606, np.float32)] - - input_width = 224 - input_height = 224 - - rgb_img = (frame[:,:,[2, 1, 0]] * 1.0 / 255).astype(np.float32) - lab_img = cv.cvtColor(rgb_img, cv.COLOR_RGB2Lab) - l_channel = lab_img[:,:,0] - - l_channel_resize = cv.resize(l_channel, (input_width, input_height)) - l_channel_resize -= 50 - - Caffe_net.setInput(cv.dnn.blobFromImage(l_channel_resize)) - ab_channel = Caffe_net.forward()[0,:,:,:].transpose((1,2,0)) - - (original_height,original_width) = rgb_img.shape[:2] - ab_channel_us = cv.resize(ab_channel, (original_width, original_height)) - lab_output = np.concatenate((l_channel[:,:,np.newaxis],ab_channel_us),axis=2) - bgr_output = np.clip(cv.cvtColor(lab_output, cv.COLOR_Lab2BGR), 0, 1) - - - cv.imwrite("./result.png", (bgr_output*255).astype(np.uint8)) - - def color(self): - - load = Image.open("./result.png") - load = load.resize((480, 360), Image.ANTIALIAS) - - if self.image is None: - w, h = load.size - self.render = ImageTk.PhotoImage(load) - self.image = self.canvas.create_image((w / 2, h/2), image=self.render) - root.geometry("%dx%d" % (w, h)) - else: - w, h = load.size - width, height = root.winfo_screenmmwidth(), root.winfo_screenheight() - - self.render3 = ImageTk.PhotoImage(load) - self.image3 = self.canvas.create_image((w / 2, h / 2), image=self.render3) - self.canvas.move(self.image3, 500, 0) - - -root = tk.Tk() -root.geometry("%dx%d" % (980, 600)) -root.title("B&W Image Colorization GUI") -img = ImageTk.PhotoImage(Image.open("logo2.png")) - -app = Window(root) -app.pack(fill=tk.BOTH, expand=1) -root.mainloop() -``` +
-
+Part of the [Machine Learning Projects](https://github.com/shsarv/Machine-Learning-Projects) collection by [Sarvesh Kumar Sharma](https://github.com/shsarv) - Overwriting gui.py +โญ Star the main repo if this helped you!
- -
\ No newline at end of file From e110c0b37b91fc14be7fcbf07ba842b8a5a07392 Mon Sep 17 00:00:00 2001 From: shsarv4 <166940544+shsarv4@users.noreply.github.com> Date: Wed, 18 Mar 2026 23:40:50 +0530 Subject: [PATCH 3/8] Update README.md --- Distracted Driver Detection/README.md | 1390 ++++--------------------- 1 file changed, 206 insertions(+), 1184 deletions(-) diff --git a/Distracted Driver Detection/README.md b/Distracted Driver Detection/README.md index c60775d..06892fa 100644 --- a/Distracted Driver Detection/README.md +++ b/Distracted Driver Detection/README.md @@ -1,1273 +1,295 @@ -
+
-## Distracted-Driver-Detection +# ๐Ÿš— Distracted Driver Detection โ€” ResNet50 from Scratch - +[![Python](https://img.shields.io/badge/Python-3.7+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/) +[![Keras](https://img.shields.io/badge/Keras-D00000?style=for-the-badge&logo=keras&logoColor=white)](https://keras.io/) +[![TensorFlow](https://img.shields.io/badge/TensorFlow-FF6F00?style=for-the-badge&logo=tensorflow&logoColor=white)](https://www.tensorflow.org/) +[![Dataset](https://img.shields.io/badge/Dataset-State%20Farm%20%7C%20Kaggle-20BEFF?style=for-the-badge&logo=kaggle&logoColor=white)](https://www.kaggle.com/c/state-farm-distracted-driver-detection) +[![Classes](https://img.shields.io/badge/10%20Behavior%20Classes-orange?style=for-the-badge)]() +[![License](https://img.shields.io/badge/License-MIT-1abc9c?style=for-the-badge)](../LICENSE.md) -
- -
- -### Problem Description - -
- -
- -In this competition you are given driver images, each taken in a car -with a driver doing something in the car (texting, eating, talking on -the phone, makeup, reaching behind, etc). Your goal is to predict the -likelihood of what the driver is doing in each picture. - -The 10 classes to predict are as follows,

- -
-
  • c0: safe driving

  • c1: texting - right

  • c2: -talking on the phone - right

  • c3: texting - left

  • -
  • c4: talking on the phone - left

  • c5: operating the -radio

  • c6: drinking

  • c7: reaching behind
  • -
  • c8: hair and makeup

  • c9: talking to passenger
  • -
    -
    - -
    - -
    - -### Summary of Results - -
    - -
    - -Using a 50-layer Residual Network (with the following parameters) the -following scores (losses) were obtained.
  • 10 Epochs
  • -
  • 32 Batch Size
  • Adam Optimizer
  • Glorot Uniform -Initializer
  • -
    **Training Loss** 0.93
    **Validation Loss** 3.79
    -**Holdout Loss** 2.64
    - -**Why the high losses? Simply put - we don't have enough resources to -quickly iterate / hyper-parameter tune the model\!** If more resources -were available (RAM, CPU speed), we could hyper-parameter tune over grid -searches and combat high bias / high variance, which this model -currently suffers. [This is how you'd fix high bias/variance.](#improve) - -
    - -
    - -### Import Dependencies and Define Functions - -
    - -
    - -Let's begin by importing some useful dependencies and defining some key -functions that we'll use throughout the notebook. - -
    - -
    - -``` python -import numpy as np -import pandas as pd -import tensorflow as tf -import matplotlib.pyplot as plt - -from keras import layers -from keras.layers import (Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, - Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D) -from keras.wrappers.scikit_learn import KerasClassifier -from keras.models import Model, load_model, save_model -from keras.preprocessing import image -from keras.utils import layer_utils -from keras.utils.data_utils import get_file -from keras.applications.imagenet_utils import preprocess_input -import pydot -from IPython.display import SVG -from keras.utils.vis_utils import model_to_dot -from keras.utils import plot_model -from resnets_utils import * -from keras.initializers import glorot_uniform -import scipy.misc -from matplotlib.pyplot import imshow - -%matplotlib inline - -import keras.backend as K -K.set_image_data_format('channels_last') -K.set_learning_phase(1) - -from sklearn.model_selection import StratifiedKFold, cross_validate, LeaveOneGroupOut - -from PIL import Image -``` - -
    - -
    - -``` python -def PlotClassFrequency(class_counts): - plt.figure(figsize=(15,4)) - plt.bar(class_counts.index,class_counts) - plt.xlabel('class') - plt.xticks(np.arange(0, 10, 1.0)) - plt.ylabel('count') - plt.title('Number of Images per Class') - plt.show() - -def DescribeImageData(data): - print('Average number of images: ' + str(np.mean(data))) - print("Lowest image count: {}. At: {}".format(data.min(), data.idxmin())) - print("Highest image count: {}. At: {}".format(data.max(), data.idxmax())) - print(data.describe()) - -def CreateImgArray(height, width, channel, data, folder, save_labels = True): - """ - Writes image files found in 'imgs/train' to array of shape - [examples, height, width, channel] - - Arguments: - height -- integer, height in pixels - width -- integer, width in pixels - channel -- integer, number of channels (or dimensions) for image (3 for RGB) - data -- dataframe, containing associated image properties, such as: - subject -> string, alpha-numeric code of participant in image - classname -> string, the class name i.e. 'c0', 'c1', etc. - img -> string, image name - folder -- string, either 'test' or 'train' folder containing the images - save_labels -- bool, True if labels should be saved, or False (just save 'X' images array). - Note: only applies if using train folder - - Returns: - .npy file -- file, contains the associated conversion of images to numerical values for processing - """ - - num_examples = len(data) - X = np.zeros((num_examples,height,width,channel)) - if (folder == 'train') & (save_labels == True): - Y = np.zeros(num_examples) - - for m in range(num_examples): - current_img = data.img[m] - img_path = 'imgs/' + folder + '/' + current_img - img = image.load_img(img_path, target_size=(height, width)) - x = image.img_to_array(img) - x = preprocess_input(x) - X[m] = x - if (folder == 'train') & (save_labels == True): - Y[m] = data.loc[data['img'] == current_img, 'classname'].iloc[0] - - np.save('X_'+ folder + '_' + str(height) + '_' + str(width), X) - if (folder == 'train') & (save_labels == True): - np.save('Y_'+ folder + '_' + str(height) + '_' + str(width), Y) - -def Rescale(X): - return (1/(2*np.max(X))) * X + 0.5 - -def PrintImage(X_scaled, index, Y = None): - plt.imshow(X_scaled[index]) - if Y is not None: - if Y.shape[1] == 1: - print ("y = " + str(np.squeeze(Y[index]))) - else: - print("y = " + str(np.argmax(Y[index]))) - -def LOGO(X, Y, group, model_name, input_shape, classes, init, optimizer, metrics, epochs, batch_size): - logo = LeaveOneGroupOut() - logo.get_n_splits(X, Y, group); - cvscores = np.zeros((26,4)) - subject_id = [] - i = 0 - for train, test in logo.split(X, Y, group): - # Create model - model = model_name(input_shape = input_shape, classes = classes, init = init) - # Compile the model - model.compile(optimizer = optimizer, loss='sparse_categorical_crossentropy', metrics=[metrics]) - # Fit the model - model.fit(X[train], Y[train], epochs = epochs, batch_size = batch_size, verbose = 0) - # Evaluate the model - scores_train = model.evaluate(X[train], Y[train], verbose = 0) - scores_test = model.evaluate(X[test], Y[test], verbose = 0) - # Save to cvscores - cvscores[i] = [scores_train[0], scores_train[1] * 100, scores_test[0], scores_test[1] * 100] - subject_id.append(group.iloc[test[0]]) - # Clear session - K.clear_session() - # Update counter - i += 1 - - return pd.DataFrame(cvscores, index = subject_id, columns=['Train_loss', 'Train_acc','Test_loss', 'Test_acc']) -``` - -
    - -
    - -### Quick EDA - -
    - -
    - -Let's begin by loading the provided dataset 'driver\_imgs\_list' doing a -quick analysis. - -
    - -
    - -``` python -driver_imgs_df = pd.read_csv('driver_imgs_list/driver_imgs_list.csv') -driver_imgs_df.head() -``` - -
    - -``` - subject classname img -0 p002 c0 img_44733.jpg -1 p002 c0 img_72999.jpg -2 p002 c0 img_25094.jpg -3 p002 c0 img_69092.jpg -4 p002 c0 img_92629.jpg -``` - -
    - -
    - -
    - -We can note the number of examples by printing the shape of the -dataframe. Looks like the training set has 22,424 images. - -
    - -
    - -``` python -driver_imgs_df.shape -``` - -
    - - (22424, 3) - -
    - -
    - -
    - -We can plot the number of images per class to see if any classes have a -low number of images. - -
    - -
    - -``` python -class_counts = (driver_imgs_df.classname).value_counts() -PlotClassFrequency(class_counts) -DescribeImageData(class_counts) -``` - -
    - -![](c304b6e1f57c2d464ca2c216a0e5c596439c5b93.png) - -
    - -
    - - Average number of images: 2242.4 - Lowest image count: 1911. At: c8 - Highest image count: 2489. At: c0 - count 10.000000 - mean 2242.400000 - std 175.387951 - min 1911.000000 - 25% 2163.500000 - 50% 2314.500000 - 75% 2325.750000 - max 2489.000000 - Name: classname, dtype: float64 - -
    - -
    - -
    - -Additionally, we can plot the number of images per test subject. It -would be much more helpful to plot the number of images belonging to -each class *per subject*. We could then ensure that the distribution is -somewhat uniform. We did not show this here, and instead just plotted -number of images per subject. - -
    +> Classifies **10 distracted driving behaviors** from dashboard camera images using a **custom ResNet50 implementation built from scratch in Keras** โ€” including manual `convolutional_block` and `identity_block` definitions, `glorot_uniform` initialization, and LOGO cross-validation strategy. -
    - -``` python -subject_counts = (driver_imgs_df.subject).value_counts() -plt.figure(figsize=(15,4)) -plt.bar(subject_counts.index,subject_counts) -plt.xlabel('subject') -plt.ylabel('count') -plt.title('Number of Images per Subject') -plt.show() -DescribeImageData(subject_counts) -``` - -
    - -![](e142a949a0c9691c28ce079e9311e2ddd0a1fad4.png) - -
    - -
    - - Average number of images: 862.461538462 - Lowest image count: 346. At: p072 - Highest image count: 1237. At: p021 - count 26.000000 - mean 862.461538 - std 214.298713 - min 346.000000 - 25% 752.500000 - 50% 823.000000 - 75% 988.250000 - max 1237.000000 - Name: subject, dtype: float64 - -
    - -
    - -
    - -Furthermore, we can check if there are any null image examples. - -
    - -
    - -``` python -pd.isnull(driver_imgs_df).sum() -``` - -
    - - subject 0 - classname 0 - img 0 - dtype: int64 - -
    - -
    - -
    - -### Preprocess Data - -
    - -
    - -The data was provided with the classes in order (from class 0 to class -9). Let's shuffle the data by permutating the 'classname' and 'img' -attributes. - -
    - -
    - -``` python -np.random.seed(0) -myarray = np.random.permutation(driver_imgs_df) -driver_imgs_df = pd.DataFrame(data = myarray, columns=['subject', 'classname', 'img']) -``` - -
    - -
    - -We'll go ahead and apply a dictionary to the 'classname' attribute and -assign the strings to their respective integers. - -
    - -
    - -``` python -d = {'c0': 0, 'c1': 1, 'c2': 2, 'c3': 3, 'c4': 4, 'c5': 5, 'c6': 6, 'c7': 7, 'c8': 8, 'c9': 9} -driver_imgs_df.classname = driver_imgs_df.classname.map(d) -``` +[๐Ÿ”™ Back to Main Repository](https://github.com/shsarv/Machine-Learning-Projects)
    -
    +--- -### Convert Dataframe to Array for Training +## โš ๏ธ Safety Context -
    +> Distracted driving causes thousands of road fatalities annually. Automated in-vehicle behavior classification from dashboard cameras is an active area of road safety AI research. -
    +--- -Let's convert the images into numerical arrays of dimension '64, 64, 3'. -Both the height and width of the images will be 64 pixels, and each -image will have 3 channels (for red, green and blue). The following -function saves the array as a .npy file. +## ๐Ÿ“Œ Table of Contents -
    +- [About the Project](#-about-the-project) +- [How It Works](#-how-it-works) +- [Dataset](#-dataset) +- [Class Definitions](#-class-definitions) +- [Model Architecture](#-model-architecture) +- [Training Analysis & Challenges](#-training-analysis--challenges) +- [Project Structure](#-project-structure) +- [Getting Started](#-getting-started) +- [Tech Stack](#-tech-stack) +- [References](#-references) -
    +--- -``` python -CreateImgArray(64, 64, 3, driver_imgs_df, 'train') -``` +## ๐Ÿ”ฌ About the Project -
    +This project tackles the **State Farm Distracted Driver Detection** Kaggle challenge โ€” classifying driver images into 10 behavior classes. What makes it distinctive is that **ResNet50 is implemented completely from scratch** using the Keras functional API, manually defining every bottleneck block and skip connection rather than using `tf.keras.applications`. -
    +The notebook also demonstrates handling real-world ML challenges: **high bias**, **high variance**, and the **LOGO (Leave-One-Group-Out) cross-validation** strategy needed because multiple images belong to the same driver โ€” random splits would leak the same driver into both train and validation sets. -Let's now load the new image arrays into the environment. Note that this -step is used to save memory so that CreateImgArray does not have to be -executed every time. +**What this project covers:** +- Manual `identity_block` and `convolutional_block` implementations in Keras +- `resnets_utils` helper module for block definitions +- Diagnosing and addressing underfitting (high bias) and overfitting (high variance) +- LOGO cross-validation to prevent driver-level data leakage -
    +--- -
    +## โš™๏ธ How It Works -``` python -X = np.load('X_train_64_64.npy') -X.shape ``` - -
    - - (22424, 64, 64, 3) - -
    - -
    - -
    - -``` python -Y = np.load('Y_train_64_64.npy') -Y.shape -``` - -
    - - (22424,) - -
    - -
    - -
    - -Let's check our new arrays and ensure we compiled everything correctly. -We can see that we do not have any entries in X that contain zero, and Y -contains all the target labels. - -
    - -
    - -``` python -(X == 0).sum() -``` - -
    - -``` -0 -``` - -
    - -
    - -
    - -``` python -PlotClassFrequency(pd.DataFrame(Y)[0].value_counts()) -``` - -
    - -![](a8d3e7336933ca4c99e8cd00289dfbdbfb2dd0b9.png) - -
    - -
    - -
    - -Furthermore, we can print the images from X and the associated class as -a sanity check. Re-scaling the images (between 0 and 1): - -
    - -
    - -``` python -X_scaled = Rescale(X) -``` - -
    - -
    - -``` python -PrintImage(X_scaled, 2, Y = Y.reshape(-1,1)) +Dashboard Camera Image + โ”‚ + โ–ผ + Load + Preprocess + (Normalize pixel values / 255) + โ”‚ + โ–ผ + ResNet50 Forward Pass + (Custom Keras implementation) + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” + โ”‚ ZeroPadding2D (3,3) โ”‚ + โ”‚ Conv2D(64,7ร—7,s=2) โ†’ BN โ†’ ReLU โ”‚ + โ”‚ MaxPool(3ร—3, s=2) โ”‚ + โ”‚ Stage 2: ConvBlock + IdBlockร—2 โ”‚ + โ”‚ Stage 3: ConvBlock + IdBlockร—3 โ”‚ + โ”‚ Stage 4: ConvBlock + IdBlockร—5 โ”‚ + โ”‚ Stage 5: ConvBlock + IdBlockร—2 โ”‚ + โ”‚ AveragePooling2D(2ร—2) โ”‚ + โ”‚ Flatten โ†’ Dense(10, softmax) โ”‚ + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ”‚ + โ–ผ + 10-Class Softmax Output โ†’ c0โ€“c9 ``` -
    - - y = 7.0 - -
    - -
    - -![](99927c379ca668ba9b7976499c8b83f0bbc6bce4.png) - -
    - -
    - -
    - -Class of "7" corresponds to a driver "reaching behind", which appears to -be the case shown above. - -
    - -
    - -### Build the Model - -
    - -
    - -We'll use the popular Residual Net with 50 layers. Residual networks are -essential to preventing vanishing gradients when using a rather 'deep' -network (many layers). The identity\_block and convolutional\_block are -defined below. - -
    - -
    +--- -``` python -def identity_block(X, f, filters, stage, block, init): - """ - Implementation of the identity block as defined in Figure 3 - - Arguments: - X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev) - f -- integer, specifying the shape of the middle CONV's window for the main path - filters -- python list of integers, defining the number of filters in the CONV layers of the main path - stage -- integer, used to name the layers, depending on their position in the network - block -- string/character, used to name the layers, depending on their position in the network - - Returns: - X -- output of the identity block, tensor of shape (n_H, n_W, n_C) - """ - - # defining name basis - conv_name_base = 'res' + str(stage) + block + '_branch' - bn_name_base = 'bn' + str(stage) + block + '_branch' - - # Retrieve Filters - F1, F2, F3 = filters - - # Save the input value. You'll need this later to add back to the main path. - X_shortcut = X - - # First component of main path - X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = init)(X) - X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X) - X = Activation('relu')(X) - - ### START CODE HERE ### - - # Second component of main path (โ‰ˆ3 lines) - X = Conv2D(filters = F2, kernel_size = (f, f), strides = (1,1), padding = 'same', name = conv_name_base + '2b', kernel_initializer = init)(X) - X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X) - X = Activation('relu')(X) - - # Third component of main path (โ‰ˆ2 lines) - X = Conv2D(filters = F3, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2c', kernel_initializer = init)(X) - X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X) - - # Final step: Add shortcut value to main path, and pass it through a RELU activation (โ‰ˆ2 lines) - X = Add()([X,X_shortcut]) - X = Activation('relu')(X) - - ### END CODE HERE ### - - return X -``` +## ๐Ÿ“Š Dataset -
    - -
    +| Property | Details | +|----------|---------| +| **Name** | State Farm Distracted Driver Detection | +| **Source** | [Kaggle Competition](https://www.kaggle.com/c/state-farm-distracted-driver-detection) | +| **Training Images** | 22,424 | +| **Classes** | 10 driving behaviors | +| **Input Shape** | Resized to `64 ร— 64 ร— 3` for training | +| **Metadata** | `driver_imgs_list.csv` โ€” subject ID, classname, filename | +| **Key Challenge** | Multiple images per driver โ†’ LOGO cross-validation required | -``` python -def convolutional_block(X, f, filters, stage, block, init, s = 2): - """ - Implementation of the convolutional block as defined in Figure 4 - - Arguments: - X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev) - f -- integer, specifying the shape of the middle CONV's window for the main path - filters -- python list of integers, defining the number of filters in the CONV layers of the main path - stage -- integer, used to name the layers, depending on their position in the network - block -- string/character, used to name the layers, depending on their position in the network - s -- Integer, specifying the stride to be used - - Returns: - X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C) - """ - - # defining name basis - conv_name_base = 'res' + str(stage) + block + '_branch' - bn_name_base = 'bn' + str(stage) + block + '_branch' - - # Retrieve Filters - F1, F2, F3 = filters - - # Save the input value - X_shortcut = X - - - ##### MAIN PATH ##### - # First component of main path - X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', kernel_initializer = init)(X) - X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X) - X = Activation('relu')(X) - - ### START CODE HERE ### - - # Second component of main path (โ‰ˆ3 lines) - X = Conv2D(F2, (f, f), strides = (1,1), padding = 'same', name = conv_name_base + '2b', kernel_initializer = init)(X) - X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X) - X = Activation('relu')(X) - - # Third component of main path (โ‰ˆ2 lines) - X = Conv2D(F3, (1, 1), strides = (1,1), name = conv_name_base + '2c', kernel_initializer = init)(X) - X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X) - - ##### SHORTCUT PATH #### (โ‰ˆ2 lines) - X_shortcut = Conv2D(F3, (1, 1), strides = (s,s), name = conv_name_base + '1', kernel_initializer = init)(X_shortcut) - X_shortcut = BatchNormalization(axis = 3, name = bn_name_base + '1')(X_shortcut) - - # Final step: Add shortcut value to main path, and pass it through a RELU activation (โ‰ˆ2 lines) - X = Add()([X,X_shortcut]) - X = Activation('relu')(X) - - ### END CODE HERE ### - - return X -``` +--- -
    +## ๐Ÿšฆ Class Definitions -
    +| Code | Behavior | +|:----:|----------| +| **c0** | โœ… Safe Driving | +| **c1** | ๐Ÿ“ฑ Texting โ€” Right Hand | +| **c2** | ๐Ÿ“ž Phone Call โ€” Right Hand | +| **c3** | ๐Ÿ“ฑ Texting โ€” Left Hand | +| **c4** | ๐Ÿ“ž Phone Call โ€” Left Hand | +| **c5** | ๐ŸŽต Operating Radio | +| **c6** | ๐Ÿฅค Drinking | +| **c7** | ๐Ÿ”™ Reaching Behind | +| **c8** | ๐Ÿ’„ Hair / Makeup | +| **c9** | ๐Ÿ’ฌ Talking to Passenger | -With the two blocks defined, we'll now create the model ResNet50, as -shown below. +--- -
    +## ๐Ÿ—๏ธ Model Architecture -
    +The notebook defines **ResNet50 from scratch** โ€” no pretrained weights, no `tf.keras.applications`: + +```python +from keras.layers import (Input, Add, Dense, Activation, ZeroPadding2D, + BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D) +from keras.models import Model +from keras.initializers import glorot_uniform +from resnets_utils import * -``` python -def ResNet50(input_shape = (64, 64, 3), classes = 10, init = glorot_uniform(seed=0)): +def ResNet50(input_shape=(64, 64, 3), classes=10, init=glorot_uniform(seed=0)): """ - Implementation of the popular ResNet50 the following architecture: - CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3 - -> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER - - Arguments: - input_shape -- shape of the images of the dataset - classes -- integer, number of classes - - Returns: - model -- a Model() instance in Keras + CONV2D -> BATCHNORM -> RELU -> MAXPOOL + -> CONVBLOCK -> IDBLOCK*2 + -> CONVBLOCK -> IDBLOCK*3 + -> CONVBLOCK -> IDBLOCK*5 + -> CONVBLOCK -> IDBLOCK*2 + -> AVGPOOL -> TOPLAYER """ - - # Define the input as a tensor with shape input_shape - X_input = Input(input_shape) - - - # Zero-Padding - X = ZeroPadding2D((3, 3))(X_input) - - # Stage 1 - X = Conv2D(64, (7, 7), strides = (2, 2), name = 'conv1', kernel_initializer = init)(X) - X = BatchNormalization(axis = 3, name = 'bn_conv1')(X) - X = Activation('relu')(X) - X = MaxPooling2D((3, 3), strides=(2, 2))(X) - - # Stage 2 - X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, block='a', s = 1, init = init) - X = identity_block(X, 3, [64, 64, 256], stage=2, block='b', init = init) - X = identity_block(X, 3, [64, 64, 256], stage=2, block='c', init = init) - - ### START CODE HERE ### - - # Stage 3 (โ‰ˆ4 lines) - X = convolutional_block(X, f = 3, filters = [128,128,512], stage = 3, block='a', s = 2, init = init) - X = identity_block(X, 3, [128,128,512], stage=3, block='b', init = init) - X = identity_block(X, 3, [128,128,512], stage=3, block='c', init = init) - X = identity_block(X, 3, [128,128,512], stage=3, block='d', init = init) - - # Stage 4 (โ‰ˆ6 lines) - X = convolutional_block(X, f = 3, filters = [256, 256, 1024], stage = 4, block='a', s = 2, init = init) - X = identity_block(X, 3, [256, 256, 1024], stage=4, block='b', init = init) - X = identity_block(X, 3, [256, 256, 1024], stage=4, block='c', init = init) - X = identity_block(X, 3, [256, 256, 1024], stage=4, block='d', init = init) - X = identity_block(X, 3, [256, 256, 1024], stage=4, block='e', init = init) - X = identity_block(X, 3, [256, 256, 1024], stage=4, block='f', init = init) - - # Stage 5 (โ‰ˆ3 lines) - X = convolutional_block(X, f = 3, filters = [512, 512, 2048], stage = 5, block='a', s = 2, init = init) - X = identity_block(X, 3, [512, 512, 2048], stage=5, block='b', init = init) - X = identity_block(X, 3, [512, 512, 2048], stage=5, block='c', init = init) - - # AVGPOOL (โ‰ˆ1 line). Use "X = AveragePooling2D(...)(X)" - X = AveragePooling2D(pool_size=(2, 2), name = 'avg_pool')(X) - - ### END CODE HERE ### - - # output layer - X = Flatten()(X) - X = Dense(classes, activation='softmax', name='fc' + str(classes), kernel_initializer = init)(X) - - # Create model - model = Model(inputs = X_input, outputs = X, name='ResNet50') - - return model -``` - -
    - -
    - -### Cross Validation Training (Leave-One-Group-Out) - -
    - -
    - -Let's do some basic transformation on the training / label arrays, and -print the shapes. After, we'll define some key functions for use in our -first CNN model. - -
    - -
    - -``` python -# Normalize image vectors -X_train = X/255 - -# Convert training and test labels to one hot matrices -#Y = convert_to_one_hot(Y.astype(int), 10).T -Y_train = np.expand_dims(Y.astype(int), -1) - -print ("number of training examples = " + str(X_train.shape[0])) -print ("X_train shape: " + str(X_train.shape)) -print ("Y_train shape: " + str(Y_train.shape)) -``` - -
    - - number of training examples = 22424 - X_train shape: (22424, 64, 64, 3) - Y_train shape: (22424, 1) - -
    - -
    - -
    - -Next, let's call our LOGO function that incorporates the Leave One Group -Out cross-validator. This function will allow us to split the data using -the drivers ('subject') as the group, which should help us prevent -overfitting as the model will probably learn too much information off -the type of driver/subject and become biased. - -Below we pass the arguments to the self-defined LOGO function and -execute. The return is a dataframe consistering of the accuracy/loss -scores of the training/dev sets (for each group/driver). - -
    - -
    - -``` python -scores = LOGO(X_train, Y_train, group = driver_imgs_df['subject'], - model_name = ResNet50, input_shape = (64, 64, 3), classes = 10, - init = glorot_uniform(seed=0), optimizer = 'adam', metrics = 'accuracy', - epochs = 2, batch_size = 32) -``` - -
    - -
    - -Plotting the dev set accuracy, we can see that 'p081' had the lowest -accuracy at 8.07%, and 'p002' had the highest accuracy at 71.52%. - -
    - -
    - -``` python -plt.figure(figsize=(15,4)) -plt.bar(scores.index, scores.loc[:,'Test_acc'].sort_values(ascending=False)) -plt.yticks(np.arange(0, 110, 10.0)) -plt.show() -``` - -
    - -![](7e4541a8ba586f1c4e438af2304ade4a1206364d.png) - -
    - -
    - -
    - -Calling 'describe' method, we can note some useful statistics. - -
    - -
    - -``` python -scores.describe() -``` - -
    - -``` - Train_loss Train_acc Test_loss Test_acc -count 26.000000 26.000000 26.000000 26.000000 -mean 4.118791 27.908272 5.293537 21.190364 -std 3.597604 19.144588 4.731039 16.150668 -min 0.722578 8.477557 0.820852 8.070501 -25% 1.849149 11.193114 2.133728 10.137083 -50% 2.545475 25.507787 2.562653 14.259937 -75% 5.299684 39.668163 8.664656 26.789961 -max 14.751674 74.439192 14.553808 71.521739 -``` - -
    - -
    - -
    - -And finally, let's print out the train/dev scores. - -
    - -
    - -``` python -print("Train acc: {:.2f}. Dev. acc: {:.2f}".format(scores['Train_acc'].mean(), scores['Test_acc'].mean())) -print("Train loss: {:.2f}. Dev. loss: {:.2f}".format(scores['Train_loss'].mean(), scores['Test_loss'].mean())) -``` - -
    - - Train acc: 27.91. Dev. acc: 21.19 - Train loss: 4.12. Dev. loss: 5.29 - -
    - -
    - -
    - -We can note that the train accuracy is higher than the dev accuracy, -which is expected. The accuracy is quite low in comparison to our -assumed Bayes accuracy of 100% (using human accuracy as a proxy to -Bayes), and we have some variance (differnce between train and dev) of -about 6.72%. Let's try increasing the number of epochs to 10 and observe -if the train/dev accuracies increase (loss decreases). - -
    - -
    - -``` python -scores = LOGO(X_train, Y_train, group = driver_imgs_df['subject'], - model_name = ResNet50, input_shape = (64, 64, 3), classes = 10, - init = glorot_uniform(seed=0), optimizer = 'adam', metrics = 'accuracy', - epochs = 5, batch_size = 32) ``` -
    +**Block types:** -
    +| Block | Shape Change | Used When | +|-------|-------------|-----------| +| **Identity Block** | Input = Output shape | Deepening without dimension change | +| **Convolutional Block** | Input โ‰  Output shape | When stride changes or filter count increases | -``` python -print("Train acc: {:.2f}. Dev. acc: {:.2f}".format(scores['Train_acc'].mean(), scores['Test_acc'].mean())) -print("Train loss: {:.2f}. Dev. loss: {:.2f}".format(scores['Train_loss'].mean(), scores['Test_loss'].mean())) -``` +**Stage filter configurations:** -
    +| Stage | Filters | Blocks | +|-------|---------|--------| +| Stage 2 | [64, 64, 256] | ConvBlock + IdBlock ร— 2 | +| Stage 3 | [128, 128, 512] | ConvBlock + IdBlock ร— 3 | +| Stage 4 | [256, 256, 1024] | ConvBlock + IdBlock ร— 5 | +| Stage 5 | [512, 512, 2048] | ConvBlock + IdBlock ร— 2 | - Train acc: 37.83. Dev. acc: 25.79 - Train loss: 2.61. Dev. loss: 3.30 +**Training config:** -
    +| Parameter | Value | +|-----------|-------| +| Initializer | `glorot_uniform(seed=0)` | +| Optimizer | Adam | +| Loss | Categorical Cross-Entropy | +| Input Shape | `(64, 64, 3)` | +| Output | Dense(10, softmax) | -
    +--- -
    - - The train and dev accuracy increased -to 37.83% and 25.79%, respectively. We can note that we still have an -underfitting problem (high bias, about 62.17% from 100%), *however, our -variance has increased dramatically between 2 epochs and 5 by about 80% -(12.04% variance)\!* Not only do **we have high bias, but our model also -exhibits high variance**. In order to tackle this, we'll need to address -the high bias first (get as close to Bayes error as possible) and then -deal with the resulting high variance. Note that ALL of the steps below -should be performed with LOGO cross-validation. This way, we can be sure -our estimates of the dev set are in line with the holdout set. - -In order to tackle **high bias**, we can do any of the following: -
  • run more epochs
  • increase the batch size (up to number of -examples)
  • make a deeper network
  • increases the image -size from 64x64 to 128x128, 256x256, etc.
  • GridSearching over -params (batch size, epoch, optimizer and it's parameters, -initializer)
  • +## ๐Ÿ“‰ Training Analysis & Challenges -
    +The notebook provides honest, detailed bias-variance analysis across training runs โ€” a key learning documented in the project: -
    +### Epoch 2 Results +| Set | Accuracy | +|-----|:--------:| +| Train | ~26% | +| Dev | ~13% | -Let's up the epoch count to 10. The assumption is that the train -accuracy will be higher than the previous 5 epoch model, but our -variance will increase. +> High bias (underfitting) โ€” model hasn't converged. High variance โ€” large gap between train/dev. -
    +### Epoch 5 Results +| Set | Accuracy | +|-----|:--------:| +| Train | **37.83%** | +| Dev | **25.79%** | -
    +> Train accuracy improved but **underfitting persists** (~62% away from 100%). Variance increased dramatically (+80% gap between epochs 2โ†’5). The notebook diagnoses this explicitly: -``` python -scores = LOGO(X_train, Y_train, group = driver_imgs_df['subject'], - model_name = ResNet50, input_shape = (64, 64, 3), classes = 10, - init = glorot_uniform(seed=0), optimizer = 'adam', metrics = 'accuracy', - epochs = 10, batch_size = 32) ``` - -
    - -
    - -``` python -print("Train acc: {:.2f}. Dev. acc: {:.2f}".format(scores['Train_acc'].mean(), scores['Test_acc'].mean())) -print("Train loss: {:.2f}. Dev. loss: {:.2f}".format(scores['Train_loss'].mean(), scores['Test_loss'].mean())) +"We still have an underfitting problem (high bias, about 62.17% from 100%), +however, our variance has increased dramatically between 2 and 5 epochs by about 80%." ``` -
    - - Train acc: 86.95. Dev. acc: 40.68 - Train loss: 0.93. Dev. loss: 3.79 - -
    - -
    - -
    +### Prescribed fixes documented in the notebook: -As expected, the training accuracy increased to 86.95%, but the variance -increase from 5 epochs to 10 was about 284% (46.27% variance)\! Thus, we -can conclude that this model suffers from severe high variance. We can -continue on and use the steps above to fix the remaining bias, then we -can use the steps below to reduce the variance. +**To address High Bias (underfitting):** +- Increase epoch count +- Use a bigger/deeper network +- Try different optimizers or learning rate schedules -
    - -
    - -In order to tackle **high variance**, we can do any of the following: -
  • Augment images to increase sample size
  • Regularization
  • -
  • GridSearching over params (batch size, epoch, optimizer and it's -parameters, initializer)
  • Decrease dev set size (allows more -examples to be trained, making model less prone to overfitting)
  • -
  • Investigate classes with low accuracy, and fix them
  • - -
    - -
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    - **Model** - - **Epoch** - - **Train Accuracy** - - **Dev Accuracy** - - **Bias** - - **Variance** -
    - **Model A** - - 2 - - 27.91 - - 21.19 - - 72.09 - - 6.72 -
    - **Model B** - - 5 - - 37.83 - - 25.79 - - 62.17 - - 12.04 -
    - **Model C** - - 10 - - 86.95 - - 40.68 - - 13.06 - - 46.27 -
    +**To address High Variance (overfitting):** +- Apply L2 regularization +- Add dropout layers +- Use data augmentation +- Increase training data volume -
    - -
    - -### Predictions on the Holdout Set +### LOGO Cross-Validation Note -
    - -
    +> Standard random train/val splits cause **data leakage** โ€” the same driver's images appear in both sets, inflating dev accuracy. The notebook flags this and recommends **Leave-One-Group-Out (LOGO)** cross-validation, splitting by `subject` (driver ID) from `driver_imgs_list.csv`. -We'll go ahead and fit the 10 epoch model. +--- -
    +## ๐Ÿ“ Project Structure -
    - -``` python -model = ResNet50(input_shape = (64, 64, 3), classes = 10) -model.compile(optimizer = 'adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) -model.fit(X_train, Y_train, epochs = 10, batch_size = 32) -``` - -
    - - Epoch 1/10 - 22424/22424 [==============================] - 83s 4ms/step - loss: 2.4026 - acc: 0.3128 - Epoch 2/10 - 22424/22424 [==============================] - 76s 3ms/step - loss: 1.8118 - acc: 0.4996 - Epoch 3/10 - 22424/22424 [==============================] - 76s 3ms/step - loss: 1.5023 - acc: 0.6153 - Epoch 4/10 - 22424/22424 [==============================] - 76s 3ms/step - loss: 0.8445 - acc: 0.8483 - Epoch 5/10 - 22424/22424 [==============================] - 76s 3ms/step - loss: 1.2427 - acc: 0.7447 - Epoch 6/10 - 22424/22424 [==============================] - 76s 3ms/step - loss: 0.8930 - acc: 0.8216 - Epoch 7/10 - 22424/22424 [==============================] - 76s 3ms/step - loss: 0.9400 - acc: 0.8144 - Epoch 8/10 - 22424/22424 [==============================] - 76s 3ms/step - loss: 0.7440 - acc: 0.8748 - Epoch 9/10 - 22424/22424 [==============================] - 76s 3ms/step - loss: 1.4076 - acc: 0.6559 - Epoch 10/10 - 22424/22424 [==============================] - 76s 3ms/step - loss: 0.6796 - acc: 0.8135 - -
    - -
    - - - -
    - -
    - -
    - -``` python -save_model(model, 'e10.h5'); ``` - -
    - -
    - -``` python -model = load_model('e10.h5') +Distracted Driver Detection/ +โ”‚ +โ”œโ”€โ”€ ๐Ÿ“‚ dataset/ +โ”‚ โ”œโ”€โ”€ train/ # Training images, organized by class +โ”‚ โ”‚ โ”œโ”€โ”€ c0/ c1/ c2/ ... c9/ +โ”‚ โ””โ”€โ”€ test/ # Unlabeled test images +โ”‚ +โ”œโ”€โ”€ driver_imgs_list.csv # subject, classname, img columns +โ”œโ”€โ”€ resnets_utils.py # identity_block + convolutional_block helpers +โ”œโ”€โ”€ distracted_driver_detection.ipynb # Main notebook +โ”œโ”€โ”€ requirements.txt # Python dependencies +โ””โ”€โ”€ README.md # You are here ``` -
    - -
    - -Let's load the holdout data set from out 'test\_file\_names' csv file -and then create the necessary array. +--- -
    +## ๐Ÿš€ Getting Started -
    +### 1. Clone the repository -``` python -holdout_imgs_df = pd.read_csv('test_file_names.csv') -holdout_imgs_df.rename(columns={"imagename": "img"}, inplace = True) +```bash +git clone https://github.com/shsarv/Machine-Learning-Projects.git +cd "Machine-Learning-Projects/Distracted Driver Detection" ``` -
    - -
    +### 2. Download the dataset from Kaggle -``` python -CreateImgArray(64, 64, 3, holdout_imgs_df, 'test') +```bash +pip install kaggle +kaggle competitions download -c state-farm-distracted-driver-detection +unzip state-farm-distracted-driver-detection.zip -d dataset/ ``` -
    +Or download manually from: [kaggle.com/c/state-farm-distracted-driver-detection/data](https://www.kaggle.com/c/state-farm-distracted-driver-detection/data) -
    +### 3. Set up environment -Again, we'll load the data here instead of having to run CreateImgArray -repeatedly. +```bash +python -m venv venv +source venv/bin/activate # Linux / macOS +venv\Scripts\activate # Windows -
    - -
    - -``` python -X_holdout = np.load('X_test_64_64.npy') -X_holdout.shape +pip install -r requirements.txt ``` -
    - - (79726, 64, 64, 3) - -
    - -
    - -
    - -And now calling predictions on the holdout set, as shown below. MAKE -SURE to clear the memory before this step\! - -
    +### 4. Run the notebook -
    - -``` python -probabilities = model.predict(X_holdout, batch_size = 32) +```bash +jupyter notebook distracted_driver_detection.ipynb ``` -
    +--- -
    +## ๐Ÿ› ๏ธ Tech Stack -If desired (as a sanity check) we can visually check our predictions by -scaling the X\_holdout array and then printing the image. +| Layer | Technology | +|-------|-----------| +| Language | Python 3.7+ | +| Deep Learning | TensorFlow / Keras | +| Model | ResNet50 (from scratch via Keras functional API) | +| Utilities | `resnets_utils.py` (custom block helpers) | +| Data | Pandas, NumPy | +| Visualization | Matplotlib | +| Notebook | Jupyter / Google Colab | -
    +--- -
    +## ๐Ÿ“š References -``` python -X_holdout_scaled = Rescale(X_holdout) -``` +- [State Farm Distracted Driver Detection โ€” Kaggle](https://www.kaggle.com/c/state-farm-distracted-driver-detection) +- He, K., Zhang, X., Ren, S., & Sun, J. (2015). *Deep Residual Learning for Image Recognition.* [arXiv:1512.03385](https://arxiv.org/abs/1512.03385) +- [deeplearning.ai โ€” ResNet50 from scratch (Coursera)](https://www.coursera.org/learn/convolutional-neural-networks) +- [Keras Functional API Documentation](https://keras.io/guides/functional_api/) -
    +--- -
    +
    -``` python -index = 50000 -PrintImage(X_holdout_scaled, index = index, Y = probabilities) -print('y_pred = ' + str(probabilities[index].argmax())) -``` +Part of the [Machine Learning Projects](https://github.com/shsarv/Machine-Learning-Projects) collection by [Sarvesh Kumar Sharma](https://github.com/shsarv) -
    - - y = 9 - y_pred = 9 +โญ Star the main repo if this helped you!
    - - -
    - From 3ebaf3b97621a9b1092e82ca0e5a7e821cdd8ea8 Mon Sep 17 00:00:00 2001 From: shsarv4 <166940544+shsarv4@users.noreply.github.com> Date: Wed, 18 Mar 2026 23:51:36 +0530 Subject: [PATCH 4/8] Update README.md --- Drowsiness detection [OPEN CV]/README.md | 347 ++++++++++++++++++----- 1 file changed, 278 insertions(+), 69 deletions(-) diff --git a/Drowsiness detection [OPEN CV]/README.md b/Drowsiness detection [OPEN CV]/README.md index 1d46eb7..d52d79a 100644 --- a/Drowsiness detection [OPEN CV]/README.md +++ b/Drowsiness detection [OPEN CV]/README.md @@ -1,105 +1,314 @@ -# Driver Drowsiness Detection System +
    -## Introduction +# ๐Ÿ˜ด Driver Drowsiness Detection โ€” OpenCV + Keras CNN -This project focuses on building a Driver Drowsiness Detection System that monitors a driver's eye status using a webcam and alerts them if they appear drowsy. We utilize **OpenCV** for image capture and preprocessing, while a **Convolutional Neural Network (CNN)** model classifies whether the driver's eyes are 'Open' or 'Closed.' If drowsiness is detected, an alarm is triggered to alert the driver. +[![Python](https://img.shields.io/badge/Python-3.7+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/) +[![OpenCV](https://img.shields.io/badge/OpenCV-5C3EE8?style=for-the-badge&logo=opencv&logoColor=white)](https://opencv.org/) +[![Keras](https://img.shields.io/badge/Keras-D00000?style=for-the-badge&logo=keras&logoColor=white)](https://keras.io/) +[![Pygame](https://img.shields.io/badge/Pygame-Alarm-green?style=for-the-badge)](https://www.pygame.org/) +[![Real-Time](https://img.shields.io/badge/Real--Time-Webcam-brightgreen?style=for-the-badge)]() +[![License](https://img.shields.io/badge/License-MIT-1abc9c?style=for-the-badge)](../LICENSE.md) -## Project Overview +> A **real-time driver drowsiness detection system** that uses **Haar Cascade classifiers** to locate the driver's eyes in every webcam frame and a **custom-trained CNN** (`cnnCat2.h5`) to classify each eye as **Open** or **Closed** โ€” sounding a `pygame` alarm when drowsiness is detected. -### Steps in the Detection Process: -1. **Image Capture**: Capture the image using a webcam. -2. **Face Detection**: Detect the face in the captured image and create a Region of Interest (ROI). -3. **Eye Detection**: Detect the eyes from the ROI and feed them into the classifier. -4. **Eye Classification**: The classifier categorizes whether the eyes are open or closed. -5. **Drowsiness Score Calculation**: Calculate a score to determine if the driver is drowsy based on how long their eyes remain closed. +[๐Ÿ”™ Back to Main Repository](https://github.com/shsarv/Machine-Learning-Projects) -## CNN Model +
    -The **Convolutional Neural Network (CNN)** architecture consists of the following layers: -- **Convolutional Layers**: - - 32 nodes, kernel size 3 - - 32 nodes, kernel size 3 - - 64 nodes, kernel size 3 -- **Fully Connected Layers**: - - 128 nodes - - Output layer: 2 nodes (with Softmax activation for classification) +--- -### Activation Function: -- **ReLU**: Used in all layers except the output layer. -- **Softmax**: Used in the output layer to classify the eyes as either 'Open' or 'Closed.' +## โš ๏ธ Safety Context -## Project Prerequisites +> Drowsy driving causes thousands of road fatalities annually. This system provides a real-time, automated alert to combat driver fatigue using a lightweight CNN that runs entirely on a standard webcam feed. -### Required Hardware: -- A webcam for image capture. +--- -### Required Libraries: -Ensure Python (version 3.6 recommended) is installed on your system. Then, install the following libraries using `pip`: +## ๐Ÿ“Œ Table of Contents -```bash -pip install opencv-python -pip install tensorflow -pip install keras -pip install pygame +- [About the Project](#-about-the-project) +- [How It Works](#-how-it-works) +- [CNN Model Architecture](#-cnn-model-architecture) +- [Dataset](#-dataset) +- [Haar Cascade Files](#-haar-cascade-files) +- [Scoring & Alert Logic](#-scoring--alert-logic) +- [Project Structure](#-project-structure) +- [Getting Started](#-getting-started) +- [Tech Stack](#-tech-stack) +- [Known Limitations](#-known-limitations) +- [References](#-references) + +--- + +## ๐Ÿ”ฌ About the Project + +This project detects driver drowsiness through a two-stage pipeline: + +1. **Detection** โ€” OpenCV Haar Cascade classifiers locate the face and each eye (left, right) in every frame +2. **Classification** โ€” A custom-trained Keras CNN (`cnnCat2.h5`) classifies each eye ROI as **Open** or **Closed** + +A running score is incremented each frame when eyes are detected as closed. When the score crosses a threshold, `pygame` plays `alarm.wav` and a "**DROWSY**" warning is overlaid on the video feed. + +**What this project covers:** +- Training a binary CNN classifier on a custom ~7,000-image eye dataset +- Real-time face and eye detection with OpenCV Haar cascades +- Score-based drowsiness logic (accumulate โ†’ threshold โ†’ alarm) +- Alarm playback with `pygame.mixer` + +--- + +## โš™๏ธ How It Works + +``` +Webcam Frame (live stream) + โ”‚ + โ–ผ + Convert BGR โ†’ Grayscale + โ”‚ + โ–ผ + Haar Cascade: Detect Face + (haarcascade_frontalface_alt.xml) + โ”‚ + โ–ผ + Haar Cascade: Detect Eyes from frame + โ”œโ”€โ”€ Left Eye (haarcascade_lefteye_2splits.xml) + โ””โ”€โ”€ Right Eye (haarcascade_righteye_2splits.xml) + โ”‚ + โ–ผ + Crop Eye ROI โ†’ Resize โ†’ Normalize + โ”‚ + โ–ผ + CNN Forward Pass (cnnCat2.h5) + โ†’ Predict: ['Close', 'Open'] + โ†’ rpred / lpred updated per frame + โ”‚ + โ”œโ”€โ”€ Both eyes Open โ†’ score decremented (min 0) + โ”‚ + โ””โ”€โ”€ Eye(s) Closed โ†’ score incremented + โ”‚ + โ””โ”€โ”€ score > threshold + โ”‚ + โ–ผ + ๐Ÿ”” pygame alarm.wav + ๐Ÿ“บ "DROWSY" on screen + ๐ŸŸฅ Red border on frame +``` + +--- + +## ๐Ÿง  CNN Model Architecture + +`model.py` defines and trains the CNN classifier. The trained weights are saved as `models/cnnCat2.h5`. + +``` +Input: Eye ROI image (24 ร— 24 ร— 1, grayscale) + โ”‚ + โ–ผ +Conv2D(32, 3ร—3) โ†’ ReLU โ†’ MaxPool(1,1) +Conv2D(32, 3ร—3) โ†’ ReLU โ†’ MaxPool(1,1) +Conv2D(64, 3ร—3) โ†’ ReLU โ†’ MaxPool(1,1) + โ”‚ + โ–ผ +Flatten +Dense(128) โ†’ ReLU +Dropout(0.5) +Dense(2) โ†’ Softmax + โ”‚ + โ–ผ +Output: ['Close', 'Open'] +``` + +**Training configuration:** + +| Parameter | Value | +|-----------|-------| +| Classes | 2 โ€” `Close` / `Open` | +| Input Size | 24 ร— 24 ร— 1 (grayscale) | +| Optimizer | Adam | +| Loss | Categorical Cross-Entropy | +| Activation (hidden) | ReLU | +| Activation (output) | Softmax | +| Regularization | Dropout (0.5) | + +--- + +## ๐Ÿ“Š Dataset + +| Property | Details | +|----------|---------| +| **Type** | Custom โ€” captured via webcam script | +| **Total Images** | ~7,000 eye images | +| **Classes** | `Open` / `Close` | +| **Conditions** | Various lighting conditions | +| **Cleaning** | Manually cleaned to remove unusable frames | + +The dataset was created by writing a capture script that crops eye regions frame by frame and saves them to disk, labeled by folder (`Open/` or `Closed/`). It was then manually reviewed to remove noisy or ambiguous images. + +> **Want to train on your own data?** Run `model.py` against your own captured eye dataset following the same `Open/Close` folder structure. + +--- + +## ๐Ÿ“‚ Haar Cascade Files + +Three XML classifiers are used from the `haar cascade files/` folder: + +| File | Purpose | +|------|---------| +| `haarcascade_frontalface_alt.xml` | Detects the driver's face bounding box | +| `haarcascade_lefteye_2splits.xml` | Detects the left eye region within the frame | +| `haarcascade_righteye_2splits.xml` | Detects the right eye region within the frame | + +These are pre-trained OpenCV Haar cascades โ€” no training required. They are loaded in `drowsinessdetection.py` as: + +```python +face = cv2.CascadeClassifier('haar cascade files/haarcascade_frontalface_alt.xml') +leye = cv2.CascadeClassifier('haar cascade files/haarcascade_lefteye_2splits.xml') +reye = cv2.CascadeClassifier('haar cascade files/haarcascade_righteye_2splits.xml') ``` -### Other Project Files: -- **Haar Cascade Files**: Located in the "haar cascade files" folder, these XML files are necessary for detecting faces and eyes. -- **Model File**: The "models" folder contains the pre-trained CNN model `cnnCat2.h5`. -- **Alarm Sound**: The audio clip `alarm.wav` will play when drowsiness is detected. -- **Python Files**: - - `Model.py`: The file used to build and train the CNN model. - - `Drowsiness detection.py`: The main file that executes the driver drowsiness detection system. +--- -## How the Algorithm Works +## ๐ŸŽฏ Scoring & Alert Logic -### Step 1 โ€“ Image Capture -The webcam captures images in real-time using `cv2.VideoCapture(0)` and processes each frame. The frames are stored in a variable `frame`. +The system uses a **running score counter** rather than a fixed-frame threshold: -### Step 2 โ€“ Face Detection -The image is converted to grayscale for face detection using a **Haar Cascade Classifier**. The faces are detected using `detectMultiScale()`, and boundary boxes are drawn around the detected faces. +```python +lbl = ['Close', 'Open'] # CNN output labels -### Step 3 โ€“ Eye Detection -Similar to face detection, eyes are detected within the ROI using another cascade classifier. The eye images are extracted and passed to the CNN model for classification. +# Per frame: +if rpred[0] == 0 and lpred[0] == 0: # Both eyes closed + score += 1 + cv2.putText(frame, "Closed", ...) +else: # Eyes open + score -= 1 + cv2.putText(frame, "Open", ...) -### Step 4 โ€“ Eye Classification -The extracted eye images are preprocessed by resizing to 24x24 pixels, normalizing the values, and then passed into the CNN model (`cnnCat2.h5`). The model predicts whether the eyes are open or closed. +score = max(score, 0) # Score never goes negative -### Step 5 โ€“ Drowsiness Detection -A score is calculated based on the status of both eyes. If both eyes are closed for an extended period, the score increases, indicating drowsiness. If the score exceeds a threshold, an alarm is triggered using the **Pygame** library. +if score > 15: # Drowsiness threshold + # Sound alarm + mixer.Sound('alarm.wav').play() + # Draw red border on frame + thicc = min(thicc + 2, 16) + cv2.rectangle(frame, (0,0), (width,height), (0,0,255), thicc) +``` + +| Variable | Value | Meaning | +|----------|:-----:|---------| +| `score` threshold | **15** | Frames of closed eyes before alarm | +| `rpred` / `lpred` | `0` = Closed, `1` = Open | CNN prediction per eye | +| Border thickness `thicc` | Grows up to 16px | Visual urgency indicator | + +--- + +## ๐Ÿ“ Project Structure + +``` +Drowsiness detection [OPEN CV]/ +โ”‚ +โ”œโ”€โ”€ ๐Ÿ“‚ haar cascade files/ +โ”‚ โ”œโ”€โ”€ haarcascade_frontalface_alt.xml # Face detector +โ”‚ โ”œโ”€โ”€ haarcascade_lefteye_2splits.xml # Left eye detector +โ”‚ โ””โ”€โ”€ haarcascade_righteye_2splits.xml # Right eye detector +โ”‚ +โ”œโ”€โ”€ ๐Ÿ“‚ models/ +โ”‚ โ””โ”€โ”€ cnnCat2.h5 # Trained CNN weights (download separately) +โ”‚ +โ”œโ”€โ”€ drowsinessdetection.py # Main script โ€” webcam loop + detection + alarm +โ”œโ”€โ”€ model.py # CNN model definition + training script +โ”œโ”€โ”€ alarm.wav # Alert sound file +โ””โ”€โ”€ README.md # You are here +``` + +> **Note:** `models/cnnCat2.h5` is not included in the repo due to GitHub file size limits. Download it from the Google Drive link in the project or train your own by running `model.py`. -## Execution Instructions +--- -### Running the Detection System +## ๐Ÿš€ Getting Started -1. Open the command prompt and navigate to the directory where the main file `drowsiness detection.py` is located. -2. Run the script using the following command: +### 1. Clone the repository ```bash -python drowsiness detection.py +git clone https://github.com/shsarv/Machine-Learning-Projects.git +cd "Machine-Learning-Projects/Drowsiness detection [OPEN CV]" ``` -The system will access the webcam and start detecting drowsiness. The real-time status will be displayed on the screen. +### 2. Set up environment -## Summary +```bash +python -m venv venv +source venv/bin/activate # Linux / macOS +venv\Scripts\activate # Windows + +pip install -r requirements.txt +``` -This Python project implements a **Driver Drowsiness Detection System** using **OpenCV** and a **CNN model** to detect whether the driverโ€™s eyes are open or closed. When the eyes are detected as closed for a prolonged time, an alert sound is played to prevent potential accidents. This system can be implemented in vehicles or other applications to enhance driver safety. +### 3. Download the trained model -## Future Enhancements +The `cnnCat2.h5` model file must be placed in the `models/` folder. Download it from the link provided in the repository issues/releases, then: -- Improve the detection accuracy by training on a larger dataset. -- Implement real-time monitoring for multiple people. -- Add functionalities to detect other signs of drowsiness like head tilting or yawning. - -## Contributing +```bash +mkdir models +# Place cnnCat2.h5 inside models/ +``` -Feel free to contribute by submitting issues or pull requests. For major changes, please open an issue to discuss the proposed changes before submitting a PR. +Or train your own model from scratch: +```bash +python model.py +# Saves models/cnnCat2.h5 automatically +``` -## Acknowledgments +### 4. Run the detector -- [OpenCV Documentation](https://opencv.org/) +```bash +python drowsinessdetection.py +``` + +- The webcam opens automatically +- Eyes detected as closed โ†’ score increments +- Score exceeds threshold โ†’ **alarm sounds + red border appears** +- Press **`q`** to quit + +--- + +## ๐Ÿ› ๏ธ Tech Stack + +| Layer | Technology | +|-------|-----------| +| Language | Python 3.7+ | +| Computer Vision | OpenCV (`cv2`) | +| Eye Detection | Haar Cascade Classifiers | +| Deep Learning | Keras + TensorFlow backend | +| Model | Custom CNN (`cnnCat2.h5`) | +| Audio Alarm | Pygame (`pygame.mixer`) | +| Numerical Processing | NumPy | + +--- + +## โš ๏ธ Known Limitations + +| Limitation | Detail | +|-----------|--------| +| **Lighting sensitivity** | Haar cascades and CNN accuracy drop under poor or uneven lighting | +| **Glasses / sunglasses** | Frames and tinted lenses obstruct eye detection | +| **Head pose** | Extreme angles may cause Haar cascade face/eye detection to fail | +| **Single eye closure** | If only one eye closes (winking), score increments only partially | +| **No yawn detection** | Fatigue from yawning is not measured โ€” only eye closure | + +--- + +## ๐Ÿ“š References + +- [OpenCV Haar Cascade Documentation](https://docs.opencv.org/4.x/db/d28/tutorial_cascade_classifier.html) - [Keras Documentation](https://keras.io/) -- [TensorFlow Documentation](https://www.tensorflow.org/) +- [Pygame mixer Documentation](https://www.pygame.org/docs/ref/mixer.html) + +--- + +
    + +Part of the [Machine Learning Projects](https://github.com/shsarv/Machine-Learning-Projects) collection by [Sarvesh Kumar Sharma](https://github.com/shsarv) + +โญ Star the main repo if this helped you! ---- \ No newline at end of file +
    From 4c45f0d0f20c0d62bdaa1538cce68c31c9011eff Mon Sep 17 00:00:00 2001 From: shsarv4 <166940544+shsarv4@users.noreply.github.com> Date: Thu, 19 Mar 2026 00:02:10 +0530 Subject: [PATCH 5/8] Update README.md --- .../README.md | 255 ++++++++++++++++++ 1 file changed, 255 insertions(+) diff --git a/Gender and age detection using deep learning/README.md b/Gender and age detection using deep learning/README.md index e69de29..7cf6f24 100644 --- a/Gender and age detection using deep learning/README.md +++ b/Gender and age detection using deep learning/README.md @@ -0,0 +1,255 @@ +
    + +# ๐Ÿง‘โ€๐Ÿคโ€๐Ÿง‘ Gender & Age Detection โ€” OpenCV Deep Learning + +[![Python](https://img.shields.io/badge/Python-3.7+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/) +[![OpenCV](https://img.shields.io/badge/OpenCV-DNN-5C3EE8?style=for-the-badge&logo=opencv&logoColor=white)](https://opencv.org/) +[![Caffe](https://img.shields.io/badge/Caffe-Pre--trained%20Models-red?style=for-the-badge)](http://caffe.berkeleyvision.org/) +[![Dataset](https://img.shields.io/badge/Dataset-Adience-blueviolet?style=for-the-badge)](https://talhassner.github.io/home/projects/Adience/Adience-data.html) +[![License](https://img.shields.io/badge/License-MIT-1abc9c?style=for-the-badge)](../LICENSE.md) + +> Detects **faces** in images or a live webcam feed and predicts each person's **gender** (Male/Female) and **age range** across 8 age buckets โ€” using three pre-trained deep learning models loaded via **OpenCV DNN**. + +[๐Ÿ”™ Back to Main Repository](https://github.com/shsarv/Machine-Learning-Projects) + +
    + +--- + +## ๐Ÿ“Œ Table of Contents + +- [About the Project](#-about-the-project) +- [How It Works](#-how-it-works) +- [The Three Models](#-the-three-models) +- [Age & Gender Classes](#-age--gender-classes) +- [CNN Architecture](#-cnn-architecture) +- [Project Structure](#-project-structure) +- [Getting Started](#-getting-started) +- [Tech Stack](#-tech-stack) +- [References & Citation](#-references--citation) + +--- + +## ๐Ÿ”ฌ About the Project + +This project builds a **real-time gender and age detection system** using three pre-trained models served through OpenCV's DNN module โ€” no model training required. Based on the DataFlair deep learning project, it uses: + +- A **TensorFlow SSD** model for face detection +- A **Caffe CNN** (Levi & Hassner, 2015) for gender classification +- A **Caffe CNN** (Levi & Hassner, 2015) for age prediction + +The script (`gad.py`) accepts a **static image** via `--image` argument or runs on a **live webcam feed**, draws bounding boxes around detected faces, and overlays the predicted gender and age range on each face. + +--- + +## โš™๏ธ How It Works + +``` +Input: Image / Webcam Frame + โ”‚ + โ–ผ + blobFromImage(frame, 1.0, (300ร—300), [104,117,123]) + โ”‚ + โ–ผ + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” + โ”‚ Face Detection (TensorFlow SSD) โ”‚ + โ”‚ opencv_face_detector_uint8.pb โ”‚ + โ”‚ opencv_face_detector.pbtxt โ”‚ + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ”‚ + โ–ผ + For each face (confidence > 0.7): + Crop face ROI + 20px padding + blobFromImage(face, 1.0, (227ร—227), MODEL_MEAN_VALUES) + โ”‚ + โ”Œโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ” + โ–ผ โ–ผ + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” + โ”‚ Gender โ”‚ โ”‚ Age โ”‚ + โ”‚ Network โ”‚ โ”‚ Network โ”‚ + โ”‚ (Caffe) โ”‚ โ”‚ (Caffe) โ”‚ + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ”‚ โ”‚ + โ–ผ โ–ผ + Male/Female Age Bucket + โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ–ผ + "Gender: Male Age: (25-32)" + overlaid on bounding box +``` + +**Key preprocessing constant:** +```python +MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746) +``` +> BGR mean values subtracted from every face blob to normalize for illumination variation across the Adience training data. + +--- + +## ๐Ÿง  The Three Models + +| Model | Framework | Files | Purpose | +|-------|-----------|-------|---------| +| **Face Detector** | TensorFlow SSD | `opencv_face_detector_uint8.pb` + `opencv_face_detector.pbtxt` | Detect face bounding boxes | +| **Gender Net** | Caffe (Levi & Hassner) | `gender_net.caffemodel` + `gender_deploy.prototxt` | Classify Male / Female | +| **Age Net** | Caffe (Levi & Hassner) | `age_net.caffemodel` + `age_deploy.prototxt` | Predict one of 8 age ranges | + +```python +faceNet = cv2.dnn.readNet("opencv_face_detector_uint8.pb", "opencv_face_detector.pbtxt") +ageNet = cv2.dnn.readNet("age_net.caffemodel", "age_deploy.prototxt") +genderNet = cv2.dnn.readNet("gender_net.caffemodel", "gender_deploy.prototxt") +``` + +--- + +## ๐Ÿท๏ธ Age & Gender Classes + +**Gender** (2 classes): +```python +genderList = ['Male', 'Female'] +``` + +**Age** (8 buckets): +```python +ageList = ['(0-2)', '(4-6)', '(8-12)', '(15-20)', + '(25-32)', '(38-43)', '(48-53)', '(60-100)'] +``` + +> Age is treated as a **classification problem** over 8 discrete ranges rather than regression โ€” Levi & Hassner (2015) found classification over predefined buckets more robust than direct regression on the Adience benchmark. + +--- + +## ๐Ÿ—๏ธ CNN Architecture + +Both age and gender models share the same architecture โ€” a lightweight CNN similar to CaffeNet/AlexNet, trained on the **Adience dataset**: + +``` +Input: 227 ร— 227 ร— 3 face crop (mean-subtracted) + โ”‚ +Conv1: 96 filters, 7ร—7 kernel โ†’ ReLU โ†’ MaxPool โ†’ LRN +Conv2: 256 filters, 5ร—5 kernel โ†’ ReLU โ†’ MaxPool โ†’ LRN +Conv3: 384 filters, 3ร—3 kernel โ†’ ReLU โ†’ MaxPool + โ”‚ +FC1: 512 nodes โ†’ ReLU โ†’ Dropout +FC2: 512 nodes โ†’ ReLU โ†’ Dropout + โ”‚ +Softmax +โ”œโ”€โ”€ Gender Net output: 2 (Male / Female) +โ””โ”€โ”€ Age Net output: 8 (age range buckets) +``` + +--- + +## ๐Ÿ“ Project Structure + +``` +Gender and age detection using deep learning/ +โ”‚ +โ”œโ”€โ”€ gad.py # Main script โ€” detection pipeline +โ”‚ +โ”œโ”€โ”€ age_net.caffemodel # Age model weights (Caffe, ~44 MB) +โ”œโ”€โ”€ age_deploy.prototxt # Age model architecture +โ”œโ”€โ”€ gender_net.caffemodel # Gender model weights (Caffe, ~44 MB) +โ”œโ”€โ”€ gender_deploy.prototxt # Gender model architecture +โ”œโ”€โ”€ opencv_face_detector_uint8.pb # Face detector weights (TensorFlow) +โ”œโ”€โ”€ opencv_face_detector.pbtxt # Face detector architecture +โ”‚ +โ”œโ”€โ”€ girl1.jpg # Sample test images +โ”œโ”€โ”€ girl2.jpg # โ†‘ +โ”œโ”€โ”€ kid1.jpg # โ†‘ +โ”œโ”€โ”€ man1.jpg # โ†‘ +โ”œโ”€โ”€ minion.jpg # โ†‘ +โ”œโ”€โ”€ woman1.jpg # โ†‘ +โ”œโ”€โ”€ woman3.jpg # โ†‘ +โ”‚ +โ””โ”€โ”€ README.md +``` + +> **Note:** The `.caffemodel` files (~44 MB each) may not be included in the repository due to GitHub's file size limits. If missing, download them from [Tal Hassner's Adience page](https://talhassner.github.io/home/projects/Adience/Adience-data.html) and place them in the project root. + +--- + +## ๐Ÿš€ Getting Started + +### 1. Clone the repository + +```bash +git clone https://github.com/shsarv/Machine-Learning-Projects.git +cd "Machine-Learning-Projects/Gender and age detection using deep learning" +``` + +### 2. Set up environment + +```bash +python -m venv venv +source venv/bin/activate # Linux / macOS +venv\Scripts\activate # Windows + +pip install -r requirements.txt +``` + +### 3. Run on a sample image + +```bash +python gad.py --image girl1.jpg +# Output โ†’ Gender: Female Age: (25-32) years +``` + +Try the included sample images: + +```bash +python gad.py --image man1.jpg +python gad.py --image kid1.jpg +python gad.py --image woman1.jpg +python gad.py --image minion.jpg # ๐Ÿค” +``` + +### 4. Run on live webcam + +```bash +python gad.py +# No --image flag โ†’ defaults to webcam (index 0) +# Press Q to quit +``` + +--- + +## ๐Ÿ› ๏ธ Tech Stack + +| Layer | Technology | +|-------|-----------| +| Language | Python 3.7+ | +| Computer Vision | OpenCV (`cv2.dnn`) | +| Face Detection | TensorFlow SSD (ResNet-10 backbone) | +| Age / Gender Models | Caffe (Levi & Hassner, 2015) | +| Argument Parsing | `argparse` | +| Numerical Processing | NumPy | + +--- + +## ๐Ÿ“š References & Citation + +```bibtex +@inproceedings{Levi2015, + author = {Gil Levi and Tal Hassner}, + title = {Age and Gender Classification Using Convolutional Neural Networks}, + booktitle = {IEEE Workshop on Analysis and Modeling of Faces and Gestures (AMFG), + at the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)}, + year = {2015} +} +``` + +- [Levi & Hassner (2015) โ€” Original Paper & Models](https://talhassner.github.io/home/projects/Adience/Adience-data.html) +- [Adience Benchmark Dataset](https://talhassner.github.io/home/projects/Adience/Adience-data.html) +- [OpenCV DNN Face Detector](https://github.com/opencv/opencv/tree/master/samples/dnn) +- [LearnOpenCV โ€” Age & Gender Classification](https://learnopencv.com/age-gender-classification-using-opencv-deep-learning-c-python/) + +--- + +
    + +Part of the [Machine Learning Projects](https://github.com/shsarv/Machine-Learning-Projects) collection by [Sarvesh Kumar Sharma](https://github.com/shsarv) + +โญ Star the main repo if this helped you! + +
    From 788922517d9338f3a8aa4e63331a06cd7de911b7 Mon Sep 17 00:00:00 2001 From: shsarv4 <166940544+shsarv4@users.noreply.github.com> Date: Thu, 19 Mar 2026 00:05:28 +0530 Subject: [PATCH 6/8] Create README.md --- .../README.md | 218 ++++++++++++++++++ 1 file changed, 218 insertions(+) create mode 100644 Getting Admission in College Prediction/README.md diff --git a/Getting Admission in College Prediction/README.md b/Getting Admission in College Prediction/README.md new file mode 100644 index 0000000..d4278cc --- /dev/null +++ b/Getting Admission in College Prediction/README.md @@ -0,0 +1,218 @@ +
    + +# ๐ŸŽ“ Getting Admission in College Prediction + +[![Python](https://img.shields.io/badge/Python-3.7+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/) +[![scikit-learn](https://img.shields.io/badge/scikit--learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)](https://scikit-learn.org/) +[![Jupyter](https://img.shields.io/badge/Jupyter-Notebook-F37626?style=for-the-badge&logo=jupyter&logoColor=white)](https://jupyter.org/) +[![Dataset](https://img.shields.io/badge/Dataset-Kaggle-20BEFF?style=for-the-badge&logo=kaggle&logoColor=white)](https://www.kaggle.com/mohansacharya/graduate-admissions) +[![Best Rยฒ](https://img.shields.io/badge/Best%20Rยฒ-0.821-brightgreen?style=for-the-badge)]() +[![License](https://img.shields.io/badge/License-MIT-1abc9c?style=for-the-badge)](../LICENSE.md) + +> Predicts a student's **probability of graduate college admission** (as a continuous value between 0 and 1) from 7 academic and profile features โ€” using a `GridSearchCV`-powered model comparison across 6 regression algorithms. + +[๐Ÿ”™ Back to Main Repository](https://github.com/shsarv/Machine-Learning-Projects) + +
    + +--- + +## ๐Ÿ“Œ Table of Contents + +- [About the Project](#-about-the-project) +- [Dataset](#-dataset) +- [Features](#-features) +- [Methodology](#-methodology) +- [Model Comparison Results](#-model-comparison-results) +- [Final Model Performance](#-final-model-performance) +- [Sample Predictions](#-sample-predictions) +- [Project Structure](#-project-structure) +- [Getting Started](#-getting-started) +- [Tech Stack](#-tech-stack) + +--- + +## ๐Ÿ”ฌ About the Project + +Getting into a good graduate program is one of the most competitive processes for students worldwide. This project builds a **regression model** that predicts the probability of admission based on a student's GRE score, TOEFL score, CGPA, university rating, SOP, LOR, and research experience. + +Six regression algorithms are trained and compared using **GridSearchCV with 5-fold cross-validation** via a custom `find_best_model()` function. The best-performing model is then evaluated on a held-out test set. + +**What this project covers:** +- Exploratory data analysis on 500 graduate applicant profiles +- Custom `find_best_model()` with GridSearchCV across 6 regressors +- Feature importance and correlation analysis +- Linear Regression selected as the final model with **Rยฒ = 0.821** on test set + +--- + +## ๐Ÿ“Š Dataset + +| Property | Details | +|----------|---------| +| **File** | `admission_predict.csv` | +| **Source** | [Kaggle โ€” Graduate Admissions](https://www.kaggle.com/mohansacharya/graduate-admissions) | +| **Rows** | 500 student records | +| **Columns** | 9 (including Serial No. and target) | +| **Task** | Regression โ€” predict `Chance of Admit` โˆˆ [0, 1] | +| **Missing Values** | None | + +--- + +## ๐Ÿ”ฌ Features + +| Column | Type | Range | Description | +|--------|------|:-----:|-------------| +| `GRE Score` | Integer | 290โ€“340 | Graduate Record Examination score | +| `TOEFL Score` | Integer | 92โ€“120 | Test of English as a Foreign Language score | +| `University Rating` | Integer | 1โ€“5 | Prestige rating of undergraduate university | +| `SOP` | Float | 1.0โ€“5.0 | Strength of Statement of Purpose | +| `LOR` | Float | 1.0โ€“5.0 | Strength of Letter of Recommendation | +| `CGPA` | Float | 6.8โ€“9.92 | Undergraduate GPA (out of 10) | +| `Research` | Binary | 0 / 1 | Research experience (0 = No, 1 = Yes) | +| `Chance of Admit` โญ | Float | 0.34โ€“0.97 | **Target variable** โ€” probability of admission | + +> `Serial No.` is dropped before training as it carries no predictive information. + +--- + +## โš™๏ธ Methodology + +``` +Load admission_predict.csv (500 ร— 9) + โ”‚ + โ–ผ +EDA + Correlation Analysis +(heatmap, pairplots, distributions) + โ”‚ + โ–ผ +Drop 'Serial No.' column +Define X (7 features) and y ('Chance of Admit') + โ”‚ + โ–ผ +find_best_model(X, y) +โ””โ”€โ”€ GridSearchCV (cv=5) over 6 models + โ”‚ + โ–ผ +Select best model โ†’ Linear Regression (normalize=True) + โ”‚ + โ–ผ +Train/Test Split (80/20, random_state=5) +โ†’ 400 train samples, 100 test samples + โ”‚ + โ–ผ +Fit LinearRegression(normalize=True) +Evaluate on test set โ†’ Rยฒ = 0.821 + โ”‚ + โ–ผ +Sample Predictions +``` + +--- + +## ๐Ÿ“ˆ Model Comparison Results + +All 6 models evaluated using `GridSearchCV(cv=5)` via the custom `find_best_model()` function: + +| Model | Best Parameters | CV Rยฒ Score | +|-------|----------------|:-----------:| +| **Linear Regression** โœ… | `{'normalize': True}` | **0.8108** | +| Random Forest | `{'n_estimators': 15}` | 0.7689 | +| KNN | `{'n_neighbors': 20}` | 0.7230 | +| SVR | `{'gamma': 'scale'}` | 0.6541 | +| Decision Tree | `{'criterion': 'mse', 'splitter': 'random'}` | 0.5868 | +| Lasso | `{'alpha': 1, 'selection': 'random'}` | 0.2151 | + +> โœ… **Linear Regression** selected as the final model โ€” highest cross-validation Rยฒ score of **0.8108**. + +> Lasso performed poorly (Rยฒ = 0.2151) because L1 regularization shrinks coefficients aggressively, which is harmful here where all 7 features are genuinely correlated with admission probability. + +--- + +## ๐Ÿ† Final Model Performance + +| Metric | Value | +|--------|:-----:| +| Model | `LinearRegression(normalize=True)` | +| 5-Fold Cross-Validation Score | **81.0%** | +| Train samples | 400 | +| Test samples | 100 | +| **Test Rยฒ Score** | **0.8215** | + +--- + +## ๐Ÿ”ฎ Sample Predictions + +```python +# Input: [GRE, TOEFL, Univ Rating, SOP, LOR, CGPA, Research] + +model.predict([[337, 118, 4, 4.5, 4.5, 9.65, 0]]) +# โ†’ Chance of getting into UCLA is 92.855% + +model.predict([[320, 113, 2, 2.0, 2.5, 8.64, 1]]) +# โ†’ Chance of getting into UCLA is 73.627% +``` + +--- + +## ๐Ÿ“ Project Structure + +``` +Getting Admission in College Prediction/ +โ”‚ +โ”œโ”€โ”€ Admission_prediction.ipynb # Main notebook โ€” EDA, model comparison, training +โ”œโ”€โ”€ admission_predict.csv # Dataset (500 student records) +โ”œโ”€โ”€ requirements.txt # Python dependencies +โ””โ”€โ”€ README.md # You are here +``` + +--- + +## ๐Ÿš€ Getting Started + +### 1. Clone the repository + +```bash +git clone https://github.com/shsarv/Machine-Learning-Projects.git +cd "Machine-Learning-Projects/Getting Admission in College Prediction" +``` + +### 2. Set up environment + +```bash +python -m venv venv +source venv/bin/activate # Linux / macOS +venv\Scripts\activate # Windows + +pip install -r requirements.txt +``` + +### 3. Launch the notebook + +```bash +jupyter notebook Admission_prediction.ipynb +``` + +--- + +## ๐Ÿ› ๏ธ Tech Stack + +| Layer | Technology | +|-------|-----------| +| Language | Python 3.7.4 | +| ML Library | scikit-learn | +| Model Selection | `GridSearchCV`, `cross_val_score` | +| Models | `LinearRegression`, `Lasso`, `SVR`, `DecisionTreeRegressor`, `RandomForestRegressor`, `KNeighborsRegressor` | +| Data Processing | Pandas, NumPy | +| Visualization | Matplotlib | +| Notebook | Jupyter | + +--- + +
    + +Part of the [Machine Learning Projects](https://github.com/shsarv/Machine-Learning-Projects) collection by [Sarvesh Kumar Sharma](https://github.com/shsarv) + +โญ Star the main repo if this helped you! + +
    From 93b1a30e8951ef2833e860afc7a8c40a0344e1d3 Mon Sep 17 00:00:00 2001 From: shsarv4 <166940544+shsarv4@users.noreply.github.com> Date: Thu, 19 Mar 2026 00:10:25 +0530 Subject: [PATCH 7/8] Update README.md --- .../README.md | 303 +++++++++++++++++- 1 file changed, 302 insertions(+), 1 deletion(-) diff --git a/Heart Disease Prediction [END 2 END]/README.md b/Heart Disease Prediction [END 2 END]/README.md index bf59832..d82f9b5 100644 --- a/Heart Disease Prediction [END 2 END]/README.md +++ b/Heart Disease Prediction [END 2 END]/README.md @@ -1 +1,302 @@ -Look for Deployed Project At ![https://github.com/shsarv/Cardio-Monitor](https://github.com/shsarv/Cardio-Monitor) \ No newline at end of file +- Look for final Project At **![https://github.com/shsarv/Cardio-Monitor](https://github.com/shsarv/Cardio-Monitor)** + +
    + +# ๐Ÿซ€ Cardio Monitor โ€” Heart Disease Prediction Web App + +[![Python](https://img.shields.io/badge/Python-3.7+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/) +[![Flask](https://img.shields.io/badge/Flask-Web%20App-000000?style=for-the-badge&logo=flask&logoColor=white)](https://flask.palletsprojects.com/) +[![MongoDB](https://img.shields.io/badge/MongoDB-Database-47A248?style=for-the-badge&logo=mongodb&logoColor=white)](https://www.mongodb.com/) +[![scikit-learn](https://img.shields.io/badge/scikit--learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)](https://scikit-learn.org/) +[![Accuracy](https://img.shields.io/badge/Accuracy-92%25-brightgreen?style=for-the-badge)]() +[![License](https://img.shields.io/badge/License-MIT-1abc9c?style=for-the-badge)](LICENSE) + +> **Cardio Monitor** is a full-stack web application that predicts whether a patient is at risk of developing **heart disease** using a machine learning model with **92% accuracy** โ€” built with Flask, MongoDB, and scikit-learn. Course project for **Big Data Analytics (BCSE0158)**. + +[![Stars](https://img.shields.io/github/stars/shsarv/Cardio-Monitor?style=social)](https://github.com/shsarv/Cardio-Monitor/stargazers) +[![Forks](https://img.shields.io/github/forks/shsarv/Cardio-Monitor?style=social)](https://github.com/shsarv/Cardio-Monitor/forks) + +[๐Ÿ”— Core ML Project](https://github.com/shsarv/Heart-Disease-Prediction)  ยท  [๐Ÿ› Report Bug](https://github.com/shsarv/Cardio-Monitor/issues)  ยท  [โœจ Request Feature](https://github.com/shsarv/Cardio-Monitor/issues) + +
    + +--- + +## โš ๏ธ Medical Disclaimer + +> **This application is for educational and research purposes only.** It does not constitute medical advice. Always consult a qualified cardiologist or medical professional for clinical decisions. + +--- + +## ๐Ÿ“Œ Table of Contents + +- [About the Project](#-about-the-project) +- [How It Works](#-how-it-works) +- [Dataset & Features](#-dataset--features) +- [Model & Performance](#-model--performance) +- [Architecture](#-architecture) +- [Project Structure](#-project-structure) +- [Getting Started](#-getting-started) +- [Future Roadmap](#-future-roadmap) +- [Tech Stack](#-tech-stack) +- [References](#-references) + +--- + +## ๐Ÿ”ฌ About the Project + +Heart disease is the leading cause of death globally. Early detection through continuous monitoring can significantly reduce mortality rates. **Cardio Monitor** combines: + +- A **machine learning classifier** (92% accuracy) trained on the Cleveland Heart Disease dataset +- A **Flask web app** for real-time patient input and prediction +- A **MongoDB** backend for storing patient records and prediction history +- A **visualization module** for EDA and model insights +- A roadmap toward **Apache Spark Streaming** for large-scale real-time data processing + +The core ML research and model building is documented in the companion repository: [shsarv/Heart-Disease-Prediction](https://github.com/shsarv/Heart-Disease-Prediction). + +--- + +## โš™๏ธ How It Works + +``` +Patient Inputs Clinical Data via Web Form + โ”‚ + โ–ผ + Flask (app.py) + routes request to + โ”‚ + โ–ผ + prediction.py + Loads Heart_model1.pkl + Runs model.predict() + โ”‚ + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ” + โ–ผ โ–ผ + At Risk โค๏ธโ€๐Ÿฉน Not at Risk โœ… + โ”‚ + โ–ผ + Result displayed on web page + Record saved to MongoDB (database.py) +``` + +--- + +## ๐Ÿ“Š Dataset & Features + +| Property | Details | +|----------|---------| +| **File** | `heart.csv` | +| **Source** | Cleveland Heart Disease Dataset (UCI ML Repository) | +| **Samples** | 303 patient records | +| **Task** | Binary classification โ€” Heart Disease (1) / No Heart Disease (0) | + +### Input Features + +| Feature | Description | Range | +|---------|-------------|-------| +| `age` | Age of patient | Years | +| `sex` | Sex | 0 = Female, 1 = Male | +| `cp` | Chest pain type | 0โ€“3 | +| `trestbps` | Resting blood pressure | mm Hg | +| `chol` | Serum cholesterol | mg/dl | +| `fbs` | Fasting blood sugar > 120 mg/dl | 0 / 1 | +| `restecg` | Resting ECG results | 0โ€“2 | +| `thalach` | Maximum heart rate achieved | bpm | +| `exang` | Exercise induced angina | 0 / 1 | +| `oldpeak` | ST depression induced by exercise | Float | +| `slope` | Slope of peak exercise ST segment | 0โ€“2 | +| `ca` | Number of major vessels coloured by fluoroscopy | 0โ€“3 | +| `thal` | Thalassemia | 0โ€“3 | +| `target` โญ | **Heart disease present** | 0 / 1 | + +--- + +## ๐Ÿค– Model & Performance + +| Metric | Value | +|--------|:-----:| +| **Accuracy** | **92%** | +| **Saved Model** | `Heart_model1.pkl` / `heartmodel.pkl` | +| **Algorithm** | scikit-learn classifier (see core project) | +| **Library** | scikit-learn + mlxtend | + +> Two model files are present in the repo: `Heart_model1.pkl` (primary, used by `prediction.py`) and `heartmodel.pkl` (earlier iteration). Both are serialized with `pickle`. + +> For full model building details โ€” EDA, feature selection, algorithm comparison, and evaluation โ€” see the core project: [shsarv/Heart-Disease-Prediction](https://github.com/shsarv/Heart-Disease-Prediction). + +--- + +## ๐Ÿ—๏ธ Architecture + +``` +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ Flask Application โ”‚ +โ”‚ (app.py) โ”‚ +โ”‚ โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚templates/โ”‚ โ”‚prediction โ”‚ โ”‚database โ”‚ โ”‚ +โ”‚ โ”‚ HTML โ”‚ โ”‚ .py โ”‚ โ”‚ .py โ”‚ โ”‚ +โ”‚ โ”‚ pages โ”‚ โ”‚ ML model โ”‚ โ”‚ MongoDB โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ”‚ โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ static/ โ”‚ โ”‚ +โ”‚ โ”‚ CSS ยท JS ยท images โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ”‚ โ”‚ + โ–ผ โ–ผ + Heart_model1.pkl MongoDB Atlas + (scikit-learn) (patient records + + predictions) +``` + +--- + +## ๐Ÿ“ Project Structure + +``` +Cardio-Monitor/ +โ”‚ +โ”œโ”€โ”€ ๐Ÿ“‚ heart disease prediction/ # Jupyter notebooks โ€” EDA & model training +โ”œโ”€โ”€ ๐Ÿ“‚ static/ # CSS, JS, images +โ”œโ”€โ”€ ๐Ÿ“‚ templates/ # Jinja2 HTML templates (input form, result pages) +โ”œโ”€โ”€ ๐Ÿ“‚ __pycache__/ +โ”‚ +โ”œโ”€โ”€ app.py # Flask entry point โ€” routes and app config +โ”œโ”€โ”€ prediction.py # Loads Heart_model1.pkl, runs inference +โ”œโ”€โ”€ modelbuild.py # Model training and serialization script +โ”œโ”€โ”€ database.py # MongoDB connection and CRUD operations +โ”œโ”€โ”€ visualization.py # EDA and data visualization utilities +โ”‚ +โ”œโ”€โ”€ Heart_model1.pkl # Primary trained model (pickle) +โ”œโ”€โ”€ heartmodel.pkl # Alternate model iteration (pickle) +โ”œโ”€โ”€ heart.csv # Cleveland Heart Disease dataset +โ”œโ”€โ”€ Input Data.png # Screenshot of the web app input form +โ”‚ +โ”œโ”€โ”€ Procfile # Heroku deployment config +โ”œโ”€โ”€ requirements.txt # Python dependencies +โ”œโ”€โ”€ .gitignore +โ””โ”€โ”€ README.md +``` + +--- + +## ๐Ÿš€ Getting Started + +### Prerequisites + +- Python 3.7+ +- MongoDB (local or [MongoDB Atlas](https://www.mongodb.com/cloud/atlas)) + +### 1. Clone the repository + +```bash +git clone https://github.com/shsarv/Cardio-Monitor.git +cd Cardio-Monitor +``` + +### 2. Set up environment + +```bash +python -m venv venv +source venv/bin/activate # Linux / macOS +venv\Scripts\activate # Windows + +pip install -r requirements.txt +``` + +### 3. Configure MongoDB + +In `database.py`, update your MongoDB connection string: + +```python +# Local MongoDB +client = pymongo.MongoClient("mongodb://localhost:27017/") + +# MongoDB Atlas (cloud) +client = pymongo.MongoClient("mongodb+srv://:@cluster.mongodb.net/") +``` + +### 4. Run the app + +```bash +python app.py +``` + +Navigate to โ†’ **http://127.0.0.1:5000** + +### 5. Deploy to Heroku + +```bash +heroku login +heroku create cardio-monitor-app +git push heroku main +heroku open +``` + +> The `Procfile` already contains: `web: gunicorn app:app` + +--- + +## ๐Ÿ—บ๏ธ Future Roadmap + +| Feature | Status | +|---------|:------:| +| Flask web app with MongoDB | โœ… Done | +| 92% accuracy ML model | โœ… Done | +| Heroku deployment | โœ… Done | +| **Apache Spark Streaming** โ€” real-time patient data ingestion | ๐Ÿ”œ Planned | +| **PySpark MLlib** โ€” large-scale distributed model training | ๐Ÿ”œ Planned | +| **Deep Learning model** (Keras/TensorFlow) | ๐Ÿ”œ Planned | +| Live demo deployment | ๐Ÿ”œ Planned | + +--- + +## ๐Ÿ› ๏ธ Tech Stack + +**Current:** + +| Layer | Technology | +|-------|-----------| +| Language | Python 3.7+ | +| Web Framework | Flask | +| ML Library | scikit-learn, mlxtend | +| Database | MongoDB (PyMongo) | +| Model Serialization | Pickle | +| Frontend | HTML5, CSS3, Bootstrap | +| Deployment | Heroku (Procfile + gunicorn) | +| Notebook | Jupyter | + +**Planned (Future):** + +| Layer | Technology | +|-------|-----------| +| Streaming | Apache Spark Streaming | +| Distributed ML | PySpark MLlib | +| Deep Learning | Keras / TensorFlow (DeepL) | +| Database (scale) | MongoDB Atlas | + +--- + +## ๐Ÿ“š References + +- [Cleveland Heart Disease Dataset โ€” UCI ML Repository](https://archive.ics.uci.edu/ml/datasets/Heart+Disease) +- [Core ML Project โ€” shsarv/Heart-Disease-Prediction](https://github.com/shsarv/Heart-Disease-Prediction) +- [Flask Documentation](https://flask.palletsprojects.com/) +- [PyMongo Documentation](https://pymongo.readthedocs.io/) +- [mlxtend Documentation](https://rasbt.github.io/mlxtend/) +- [Apache Spark Streaming](https://spark.apache.org/streaming/) + +--- + +
    + +**Created by [Sarvesh Kumar Sharma](https://github.com/shsarv)** + +Course Project โ€” Big Data Analytics (BCSE0158) + +โญ Star this repo if you found it helpful! + +
    From 23afd01662bce6d157958c873d9cff84a788ae93 Mon Sep 17 00:00:00 2001 From: shsarv4 <166940544+shsarv4@users.noreply.github.com> Date: Thu, 19 Mar 2026 00:17:03 +0530 Subject: [PATCH 8/8] Create README.md --- Human Activity Detection/README.md | 300 +++++++++++++++++++++++++++++ 1 file changed, 300 insertions(+) create mode 100644 Human Activity Detection/README.md diff --git a/Human Activity Detection/README.md b/Human Activity Detection/README.md new file mode 100644 index 0000000..985071e --- /dev/null +++ b/Human Activity Detection/README.md @@ -0,0 +1,300 @@ +
    + +# ๐Ÿƒ Human Activity Recognition โ€” 2D Pose + LSTM RNN + +[![Python](https://img.shields.io/badge/Python-3.7+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/) +[![TensorFlow](https://img.shields.io/badge/TensorFlow-1.x-FF6F00?style=for-the-badge&logo=tensorflow&logoColor=white)](https://www.tensorflow.org/) +[![LSTM](https://img.shields.io/badge/LSTM-2%20Stacked%20Layers-9B59B6?style=for-the-badge)]() +[![Accuracy](https://img.shields.io/badge/Accuracy->90%25-brightgreen?style=for-the-badge)]() +[![ngrok](https://img.shields.io/badge/Deployed-ngrok-1F8ACB?style=for-the-badge)]() +[![License](https://img.shields.io/badge/License-MIT-1abc9c?style=for-the-badge)](../LICENSE.md) + +> Classifies **6 human activities** from **2D pose time series** (OpenPose keypoints) using a **2-layer stacked LSTM RNN** built in TensorFlow 1.x โ€” achieving **>90% accuracy** in ~7 minutes of training. Deployed via ngrok with a Flask web app and `sample_video.mp4` demo. + +[๐Ÿ”™ Back to Main Repository](https://github.com/shsarv/Machine-Learning-Projects) + +
    + +--- + +## ๐Ÿ“Œ Table of Contents + +- [About the Project](#-about-the-project) +- [Key Idea โ€” Why 2D Pose?](#-key-idea--why-2d-pose) +- [Dataset](#-dataset) +- [LSTM Architecture](#-lstm-architecture) +- [Training Configuration](#-training-configuration) +- [Results & Findings](#-results--findings) +- [Project Structure](#-project-structure) +- [Getting Started](#-getting-started) +- [Tech Stack](#-tech-stack) +- [References](#-references) + +--- + +## ๐Ÿ”ฌ About the Project + +This experiment classifies human activities using **2D pose time series data** and a **stacked LSTM RNN**. Rather than feeding raw RGB images or expensive 3D pose data into the network, it uses **2D (x, y) keypoints** extracted from video frames via OpenPose โ€” a much lighter and more accessible input representation. + +The core research questions: + +- Can **2D pose** match **3D pose** accuracy for activity recognition? (removes need for RGBD cameras) +- Can **2D pose** match **raw RGB image** accuracy? (smaller input = smaller model = better with limited data) +- Does this approach generalize to **animal** behaviour classification for robotics applications? + +The network architecture is based on Guillaume Chevalier's *LSTMs for Human Activity Recognition (2016)*, with key modifications for large class-ordered datasets using **random batch sampling without replacement**. + +--- + +## ๐Ÿง  Key Idea โ€” Why 2D Pose? + +``` +Raw Video Frame (640ร—480 RGB) + โ”‚ + โ–ผ + OpenPose Inference + 18 body keypoints ร— (x, y) coords + โ”‚ + โ–ผ + 36-dimensional feature vector per frame + โ”‚ + โ–ผ (32 frames = 1 time window) + LSTM RNN โ†’ Activity Class +``` + +| Input Type | Pros | Cons | +|------------|------|------| +| Raw RGB images | High information | Large models, lots of data needed | +| 3D pose (RGBD) | Rich spatial info | Needs depth sensors | +| **2D pose (x,y)** โœ… | Lightweight, RGB-only camera, small model | Some spatial ambiguity | + +> Limiting the feature vector to 2D pose keypoints allows for a **smaller LSTM model** that generalises better on limited datasets โ€” particularly relevant for future animal behaviour recognition tasks. + +--- + +## ๐Ÿ“Š Dataset + +| Property | Details | +|----------|---------| +| **Source** | Berkeley Multimodal Human Action Database (MHAD) โ€” 2D poses extracted via OpenPose | +| **Download** | `RNN-HAR-2D-Pose-database.zip` (~19.2 MB, Google Drive) | +| **Subjects** | 12 | +| **Angles** | 4 camera angles | +| **Repetitions** | 5 per subject per action | +| **Total videos** | 1,438 (2 missing from original 1,440) | +| **Total frames** | 211,200 | +| **Training windows** | 22,625 (32 timesteps each, 50% overlap) | +| **Test windows** | 5,751 | +| **Input shape** | `(22625, 32, 36)` โ†’ windows ร— timesteps ร— features | +| **Preprocessing** | โŒ None โ€” raw, unnormalized pose coordinates | + +### Activity Classes (6) + +| Label | Activity | +|-------|----------| +| `JUMPING` | Vertical jumps | +| `JUMPING_JACKS` | Jumping jacks | +| `BOXING` | Boxing motions | +| `WAVING_2HANDS` | Waving with both hands | +| `WAVING_1HAND` | Waving with one hand | +| `CLAPPING_HANDS` | Clapping hands | + +### Data Files + +``` +RNN-HAR-2D-Pose-database/ +โ”œโ”€โ”€ X_train.txt # 22,625 training windows (36 comma-separated floats per row) +โ”œโ”€โ”€ X_test.txt # 5,751 test windows +โ”œโ”€โ”€ Y_train.txt # Training labels (0โ€“5) +โ””โ”€โ”€ Y_test.txt # Test labels (0โ€“5) +``` + +--- + +## ๐Ÿ—๏ธ LSTM Architecture + +``` +Input: (batch_size, 32 timesteps, 36 features) + โ”‚ + โ–ผ + Linear projection: 36 โ†’ 34 (ReLU) + โ”‚ + โ–ผ + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” + โ”‚ BasicLSTMCell(34, forget_bias=1)โ”‚ โ† Layer 1 + โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค + โ”‚ BasicLSTMCell(34, forget_bias=1)โ”‚ โ† Layer 2 + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + tf.contrib.rnn.MultiRNNCell (stacked) + tf.contrib.rnn.static_rnn (many-to-one) + โ”‚ + Last output only + โ”‚ + โ–ผ + Linear: 34 โ†’ 6 + Softmax โ†’ Activity class +``` + +> **Why n_hidden = 34?** Testing across a range of hidden unit counts showed best generalisation when hidden units โ‰ˆ n_input (36). 34 was found to be optimal. + +> **Many-to-one classifier** โ€” only the last LSTM output (timestep 32) is used for classification, not the full sequence output. + +--- + +## โš™๏ธ Training Configuration + +| Parameter | Value | +|-----------|-------| +| Framework | TensorFlow 1.x (`%tensorflow_version 1.x`) | +| Timesteps (`n_steps`) | 32 | +| Input features (`n_input`) | 36 (18 keypoints ร— x, y) | +| Hidden units (`n_hidden`) | 34 | +| Classes (`n_classes`) | 6 | +| Epochs | 300 | +| Batch size | 512 | +| Optimizer | Adam | +| Initial learning rate | 0.005 | +| LR decay | Exponential โ€” `0.96` per 100,000 steps | +| Loss | Softmax cross-entropy + L2 regularization | +| L2 lambda | 0.0015 | +| Batch strategy | Random sampling **without replacement** (prevents class-order bias) | +| Training time | ~7 minutes (Google Colab) | + +**L2 regularization formula:** +```python +l2 = lambda_loss_amount * sum( + tf.nn.l2_loss(tf_var) for tf_var in tf.trainable_variables() +) +cost = tf.reduce_mean(softmax_cross_entropy) + l2 +``` + +**Decayed learning rate:** +```python +learning_rate = init_lr * decay_rate ^ (global_step / decay_steps) +# = 0.005 * 0.96 ^ (global_step / 100000) +``` + +--- + +## ๐Ÿ“ˆ Results & Findings + +| Metric | Value | +|--------|:-----:| +| **Final Accuracy** | **> 90%** | +| Training time | ~7 minutes | + +**Confusion pairs observed:** +- `CLAPPING_HANDS` โ†” `BOXING` โ€” similar upper-body motion pattern +- `JUMPING_JACKS` โ†” `WAVING_2HANDS` โ€” symmetric arm movements + +**Key conclusions:** +- 2D pose achieves >90% accuracy, validating its use over more expensive 3D pose or raw RGB inputs +- Hidden units โ‰ˆ n_input (34 โ‰ˆ 36) gives optimal generalisation +- Random batch sampling without replacement is **critical** โ€” ordered class batches degrade training significantly +- Approach is promising for future animal behaviour estimation with autonomous mobile robots + +--- + +## ๐Ÿ“ Project Structure + +``` +Human Activity Detection/ +โ”‚ +โ”œโ”€โ”€ ๐Ÿ“‚ images/ # Result plots and visualizations +โ”œโ”€โ”€ ๐Ÿ“‚ models/ # Saved LSTM model weights +โ”œโ”€โ”€ ๐Ÿ“‚ src/ # Helper source scripts +โ”œโ”€โ”€ ๐Ÿ“‚ templates/ # HTML templates (Flask app) +โ”‚ +โ”œโ”€โ”€ Human_Activity_Recogination.ipynb # Main notebook โ€” dataset, LSTM, training +โ”œโ”€โ”€ Human_Action_Classification_deployment_with_ngrok.ipynb # Flask + ngrok deployment notebook +โ”œโ”€โ”€ lstm_train.ipynb # Standalone LSTM training notebook +โ”œโ”€โ”€ app.py # Flask web application +โ”œโ”€โ”€ sample_video.mp4 # Sample video for live demo +โ””โ”€โ”€ requirements.txt # Python dependencies +``` + +--- + +## ๐Ÿš€ Getting Started + +### 1. Clone the repository + +```bash +git clone https://github.com/shsarv/Machine-Learning-Projects.git +cd "Machine-Learning-Projects/Human Activity Detection" +``` + +### 2. Set up environment + +```bash +python -m venv venv +source venv/bin/activate # Linux / macOS +venv\Scripts\activate # Windows + +pip install -r requirements.txt +``` + +> โš ๏ธ **TensorFlow 1.x required.** The LSTM uses `tf.contrib.rnn` and `tf.placeholder` APIs from TF1. +> ```bash +> pip install tensorflow==1.15.0 +> ``` + +### 3. Download the dataset + +The dataset is downloaded automatically in the notebook: +```python +!wget -O RNN-HAR-2D-Pose-database.zip \ + https://drive.google.com/u/1/uc?id=1IuZlyNjg6DMQE3iaO1Px6h1yLKgatynt +!unzip RNN-HAR-2D-Pose-database.zip +``` + +### 4. Run on Google Colab (recommended) + +``` +1. Open Human_Activity_Recogination.ipynb in Google Colab +2. Runtime โ†’ Change runtime type โ†’ GPU (optional, speeds training) +3. Run all cells โ€” training completes in ~7 minutes +``` + +### 5. Deploy with ngrok + +``` +Open Human_Action_Classification_deployment_with_ngrok.ipynb +Follow the ngrok setup cells to expose the Flask app publicly +``` + +--- + +## ๐Ÿ› ๏ธ Tech Stack + +| Layer | Technology | +|-------|-----------| +| Language | Python 3.7+ | +| Deep Learning | TensorFlow 1.x (`tf.contrib.rnn`) | +| Model | 2-layer stacked LSTM (`BasicLSTMCell`) | +| Pose Extraction | OpenPose (CMU Perceptual Computing Lab) | +| Data Processing | NumPy | +| Visualization | Matplotlib | +| Web Framework | Flask | +| Deployment | ngrok (tunnel) | +| Notebook | Jupyter / Google Colab | + +--- + +## ๐Ÿ“š References + +- Guillaume Chevalier (2016). *LSTMs for Human Activity Recognition.* [github.com/guillaume-chevalier](https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition) โ€” MIT License +- [Berkeley MHAD Dataset](http://tele-immersion.citris-uc.org/berkeley_mhad) +- [OpenPose โ€” CMU Perceptual Computing Lab](https://github.com/CMU-Perceptual-Computing-Lab/openpose) +- Goodfellow et al. *"It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model..."* โ€” basis for small batch strategy +- [Andrej Karpathy โ€” The Unreasonable Effectiveness of RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) โ€” referenced for many-to-one classifier design + +--- + +
    + +Part of the [Machine Learning Projects](https://github.com/shsarv/Machine-Learning-Projects) collection by [Sarvesh Kumar Sharma](https://github.com/shsarv) + +โญ Star the main repo if this helped you! + +