Skip to content

The code for my Bachelor thesis exploring parallel training strategies for deep learning text classification models.

Notifications You must be signed in to change notification settings

DimGiagias/text-classification-dl-parallelism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparative Analysis of Parallel Training Techniques for Deep Learning

This repository contains the source code for my Bachelor's thesis at the University of Piraeus:

"A Comparative Analysis of Data Parallelism and Model Parallelism for Deep Learning-Based Text Classification"
Department of Informatics, University of Piraeus, 2025
[Read thesis]

The project provides a modular framework to train, evaluate, and compare the performance of different deep learning models for text classification under three distinct training paradigms:

  1. Sequential Training: Standard training on a single GPU.
  2. Data-Parallel Training (DDP): Using PyTorch's DistributedDataParallel to train across multiple GPUs.
  3. Model-Parallel Training (MP): Splitting a single model's layers across multiple GPUs.

The primary goal is to analyze the trade-offs in terms of training speed, resource utilization, and model performance between these methods.

Framework Capabilities

  • Multiple Training Modes: Run experiments in sequential, ddp, mp modes, or directly compare parallel methods against a sequential baseline (compare-ddp, compare-mp).
  • Diverse Model Architectures: Includes implementations for four common text classification models:
    • CNN (Convolutional Neural Network)
    • LSTM (Long Short-Term Memory)
    • GRU (Gated Recurrent Unit)
    • Transformer
  • Comprehensive Comparison: The compare-* modes provide a head-to-head analysis of speedup, throughput, and final accuracy.
  • Experiment Tracking: Automatically saves detailed logs, performance metrics (JSON), and visualizations (PNG plots) for each experiment in the experiments/ directory.
  • Configurable: Easily manage model hyperparameters and training settings through JSON configuration files.
  • Command-Line Interface: A CLI allows for easy configuration and execution of experiments.

Repository Layout

  • main.py - main script to run experiments.
  • parser.py - command-line argument parse for the CLI flags and options.
  • datasets.py - dataset classes (AG News loader and vocabulary handling).
  • core/ - core framework code: config management, trainers, metrics, logging, visualization and utilities.
  • models/ - Pytorch model implementations: cnn.py, lstm.py, gru.py, transformer.py.
  • configs/ - JSON configuration files for models and training.
  • experiments/ - output directory for all experiments.

Implemented Models

The following models are implemented for text classification on the AG News dataset:

  • CNNTextClassifier: A standard CNN with multiple filter sizes for feature extraction.
  • LSTMTextClassifier: A recurrent neural network using LSTM cells.
  • GRUTextClassifier: A recurrent neural network using GRU cells.
  • TransformerTextClassifier: An encoder-only Transformer model that uses a [CLS] token for classification.

Parallelism Strategies

Data Parallelism (DDP)

This approach uses torch.nn.parallel.DistributedDataParallel (DDP). The model is replicated on each available GPU, and the dataset is sharded across them. Each GPU processes a unique subset of the data in parallel. Gradients are then synchronized and averaged across all GPUs before the model weights are updated. This is effective for scaling training to large datasets.

Model Parallelism (MP)

This approach is useful when a model is too large to fit on a single GPU. The model's layers are strategically split and placed on different GPUs. The forward and backward passes involve transferring intermediate activations between devices. The specific splits for each model are:

  • CNN: Embedding layer on cuda:0; convolutional and fully-connected layers on cuda:1.
  • LSTM/GRU: Embedding layer on cuda:0; recurrent and fully-connected layers on cuda:1.
  • Transformer: The encoder layers are distributed as evenly as possible across all available GPUs. The embedding and classification head are placed on the first and last GPUs, respectively.

Setup and Installation

1. Prerequisites

  • Python 3.8+
  • At least one NVIDIA GPU with CUDA support. For the model parallel mode 2+ GPUs are required.

2. Clone the Repository

git clone https://github.com/DimGiagias/text-classification-dl-parallelism.git
cd text-classification-dl-parallelism

3. Install Dependencies

It is recommended to use a virtual environment.

pip install -r requirements.txt

4. Download the Dataset

This project uses the AG News Classification Dataset. Please download it from: HuggingFace or Kaggle and place it in the data/ directory as instructed in data/README.md.

The expected structure is:

data/
├── train.csv
└── test.csv

Running Experiments

The main.py script is the entry point for all experiments. Use the --mode and --model arguments to define the experiment type.

Basic Usage

python main.py --mode <MODE> --model <MODEL> [OPTIONS]
  • --mode: sequential, ddp, mp, compare-ddp, compare-mp
  • --model: cnn, lstm, gru, transformer

Examples:

1. Run Sequential Training

Train a GRU model on a single GPU.

python main.py --mode sequential --model gru --epochs 10 --batch_size 64

2. Run Data-Parallel (DDP) Training

Train a Transformer model using all available GPUs.

python main.py --mode ddp --model transformer --epochs 10

3. Run Model-Parallel (MP) Training

Train a CNN model split across two GPUs.

python main.py --mode mp --model cnn --epochs 10

4. Compare DDP vs. Sequential

Run a full comparison for the LSTM model, benchmarking DDP against sequential training.

python main.py --mode compare-ddp --model lstm --epochs 20

5. Compare Model-Parallel vs. Sequential

Run a full comparison for the Transformer model, benchmarking MP against sequential training.

python main.py --mode compare-mp --model transformer --epochs 20

Command-Line Options

You can override default configurations using CLI arguments:

  • --epochs <int>: Number of training epochs.
  • --batch_size <int>: Training batch size.
  • --lr <float>: Learning rate.
  • --max_len <int>: Maximum sequence length.
  • --plot-confusion-matrices: Enable saving confusion matrix plots for each validation epoch.
  • --training-config <path>: Path to a custom training config JSON file.

Configuration

  • Global Training Config: configs/training_config.json defines default training parameters like epochs, learning rate, etc.
  • Model Hyperparameters: Each model has a corresponding JSON file in configs/models/ (e.g., cnn.json) where its architecture (e.g., embedding dimension, number of layers) can be configured.

Output and Results

All experiment artifacts are saved to a uniquely named directory inside experiments/. For example: experiments/20250927_062231_compare-mp_CNN/.

Each experiment directory contains:

  • Log Files: logs/ contains detailed logs from the training process. For parallel runs, logs are created for each process/rank.
  • Result Summaries (JSON): A JSON file (e.g., CNN_mp_vs_sequential__training_comparison.json) containing a complete record of all metrics, timings, and configurations.
  • Comparison Plots:
    • *_vs_sequential_comparison.png: A 2x2 plot comparing total time, accuracy curves, speedup, and epoch times.
    • *performance_summary.png: A table summarizing the key performance indicators.

About

The code for my Bachelor thesis exploring parallel training strategies for deep learning text classification models.

Resources

Stars

Watchers

Forks

Languages