A collection of 55 parallel computing implementations exploring GPU programming with CUDA, shared-memory parallelism with OpenMP, and distributed computing with MPI.
This repository contains practical implementations of parallel algorithms across three dominant paradigms: CUDA for NVIDIA GPU acceleration, OpenMP for multi-core CPU parallelism, and MPI for distributed systems.
- assignments: Coursework implementations of parallel algorithms
- labs: Experimental implementations and exercises
- project: Major parallel computing project
- utils: Shared utility functions and helpers
- CUDA: GPU kernel development, memory management, thread synchronization
- OpenMP: Directive-based parallelization, loop scheduling, reduction operations
- MPI: Point-to-point and collective communication, distributed algorithms
GPU kernels leverage massive parallelism for data-parallel workloads. Includes memory coalescing optimization, shared memory usage, and kernel launch configuration.
Multi-core CPU acceleration using pragma directives. Covers parallel for loops, sections, tasks, and synchronization constructs.
Distributed memory implementations using message passing. Includes both blocking and non-communication patterns.
# CUDA implementations
nvcc -O3 cuda_program.cu -o cuda_program
# OpenMP implementations
gcc -fopenmp -O3 openmp_program.c -o openmp_program
# MPI implementations
mpicc -O3 mpi_program.c -o mpi_program
mpirun -np 4 ./mpi_program- Add performance benchmarking suite
- Document speedup achievements for each implementation
- Add comparison charts between paradigms
- Include visualization of parallel execution
- Add regression tests for correctness