This repository contains the source code for my Bachelor's thesis at the University of Piraeus:
"A Comparative Analysis of Data Parallelism and Model Parallelism for Deep Learning-Based Text Classification"
Department of Informatics, University of Piraeus, 2025
[Read thesis]
The project provides a modular framework to train, evaluate, and compare the performance of different deep learning models for text classification under three distinct training paradigms:
- Sequential Training: Standard training on a single GPU.
- Data-Parallel Training (DDP): Using PyTorch's
DistributedDataParallelto train across multiple GPUs. - Model-Parallel Training (MP): Splitting a single model's layers across multiple GPUs.
The primary goal is to analyze the trade-offs in terms of training speed, resource utilization, and model performance between these methods.
- Multiple Training Modes: Run experiments in
sequential,ddp,mpmodes, or directly compare parallel methods against a sequential baseline (compare-ddp,compare-mp). - Diverse Model Architectures: Includes implementations for four common text classification models:
CNN(Convolutional Neural Network)LSTM(Long Short-Term Memory)GRU(Gated Recurrent Unit)Transformer
- Comprehensive Comparison: The
compare-*modes provide a head-to-head analysis of speedup, throughput, and final accuracy. - Experiment Tracking: Automatically saves detailed logs, performance metrics (JSON), and visualizations (PNG plots) for each experiment in the
experiments/directory. - Configurable: Easily manage model hyperparameters and training settings through JSON configuration files.
- Command-Line Interface: A CLI allows for easy configuration and execution of experiments.
main.py- main script to run experiments.parser.py- command-line argument parse for the CLI flags and options.datasets.py- dataset classes (AG News loader and vocabulary handling).core/- core framework code: config management, trainers, metrics, logging, visualization and utilities.models/- Pytorch model implementations:cnn.py,lstm.py,gru.py,transformer.py.configs/- JSON configuration files for models and training.experiments/- output directory for all experiments.
The following models are implemented for text classification on the AG News dataset:
CNNTextClassifier: A standard CNN with multiple filter sizes for feature extraction.LSTMTextClassifier: A recurrent neural network using LSTM cells.GRUTextClassifier: A recurrent neural network using GRU cells.TransformerTextClassifier: An encoder-only Transformer model that uses a[CLS]token for classification.
This approach uses torch.nn.parallel.DistributedDataParallel (DDP). The model is replicated on each available GPU, and the dataset is sharded across them. Each GPU processes a unique subset of the data in parallel. Gradients are then synchronized and averaged across all GPUs before the model weights are updated. This is effective for scaling training to large datasets.
This approach is useful when a model is too large to fit on a single GPU. The model's layers are strategically split and placed on different GPUs. The forward and backward passes involve transferring intermediate activations between devices. The specific splits for each model are:
- CNN: Embedding layer on
cuda:0; convolutional and fully-connected layers oncuda:1. - LSTM/GRU: Embedding layer on
cuda:0; recurrent and fully-connected layers oncuda:1. - Transformer: The encoder layers are distributed as evenly as possible across all available GPUs. The embedding and classification head are placed on the first and last GPUs, respectively.
- Python 3.8+
- At least one NVIDIA GPU with CUDA support. For the model parallel mode 2+ GPUs are required.
git clone https://github.com/DimGiagias/text-classification-dl-parallelism.git
cd text-classification-dl-parallelismIt is recommended to use a virtual environment.
pip install -r requirements.txtThis project uses the AG News Classification Dataset. Please download it from: HuggingFace or Kaggle and place it in the data/ directory as instructed in data/README.md.
The expected structure is:
data/
├── train.csv
└── test.csv
The main.py script is the entry point for all experiments. Use the --mode and --model arguments to define the experiment type.
python main.py --mode <MODE> --model <MODEL> [OPTIONS]--mode:sequential,ddp,mp,compare-ddp,compare-mp--model:cnn,lstm,gru,transformer
Train a GRU model on a single GPU.
python main.py --mode sequential --model gru --epochs 10 --batch_size 64Train a Transformer model using all available GPUs.
python main.py --mode ddp --model transformer --epochs 10Train a CNN model split across two GPUs.
python main.py --mode mp --model cnn --epochs 10Run a full comparison for the LSTM model, benchmarking DDP against sequential training.
python main.py --mode compare-ddp --model lstm --epochs 20Run a full comparison for the Transformer model, benchmarking MP against sequential training.
python main.py --mode compare-mp --model transformer --epochs 20You can override default configurations using CLI arguments:
--epochs <int>: Number of training epochs.--batch_size <int>: Training batch size.--lr <float>: Learning rate.--max_len <int>: Maximum sequence length.--plot-confusion-matrices: Enable saving confusion matrix plots for each validation epoch.--training-config <path>: Path to a custom training config JSON file.
- Global Training Config:
configs/training_config.jsondefines default training parameters like epochs, learning rate, etc. - Model Hyperparameters: Each model has a corresponding JSON file in
configs/models/(e.g.,cnn.json) where its architecture (e.g., embedding dimension, number of layers) can be configured.
All experiment artifacts are saved to a uniquely named directory inside experiments/. For example: experiments/20250927_062231_compare-mp_CNN/.
Each experiment directory contains:
- Log Files:
logs/contains detailed logs from the training process. For parallel runs, logs are created for each process/rank. - Result Summaries (JSON): A JSON file (e.g.,
CNN_mp_vs_sequential__training_comparison.json) containing a complete record of all metrics, timings, and configurations. - Comparison Plots:
*_vs_sequential_comparison.png: A 2x2 plot comparing total time, accuracy curves, speedup, and epoch times.*performance_summary.png: A table summarizing the key performance indicators.