Welcome to the Big Data Analytics lab repository. This repo contains weekly materials including tutorials, exercises, quizzes, homework, and reference solutions.
You can use this repository in two ways:
- Clone (recommended for most students)
Use this if you only want to access materials and work locally.
- Fork (recommended for reorganizing work)
Create your own copy of the repository on GitHub. This allows you to:
- track your progress
- commit your solutions
- share your work easily
You can also pull updates from the original repository (upstream) when new materials are released.
- Session 1 README
Setup, Python fundamentals, loops/indexing, and CSV basics. - Session 2 README
csv.DictReader, key-based data access, and practical data cleaning flow. - Session 3 README
Iterators/generators, streaming vs loading, complexity, and intro RAG concepts. - Session 4 README
Serial vs multiprocessing basics, process management, andPool-based parallel image processing.
- Complete all tutorials, exercises, quizzes, and homework each week
- Use one public GitHub repository (e.g. bda-homeworks) for your work
- Keep your repository updated regularly
- Discussion forum: Microsoft Teams
- Submission: Share your repository link in the Teams forum with the instructor and class