Skip to content

aayushmanz/Python-For-Data-Science

Repository files navigation

Python for Data Science

Python Jupyter License Last Commit

Welcome to my Python for Data Science learning repository.

This repository documents my hands-on journey from fundamental Python programming to data science-ready coding, utilizing Jupyter Notebooks, practice sets, and modular mini-projects.


Table of Contents


Repository Structure

Fundamentals of Python/

Core Python programming concepts:

  • Control flow (if-else statements, for loops, while loops, nested loops, loop control mechanisms)
  • Data structures (lists, tuples, sets, frozensets, dictionaries, strings)
  • String operations and practice exercises
  • Functions and arguments
  • Lambda functions, list comprehensions, and dictionary comprehensions
  • Sequence sum patterns
  • Modules and operators

Advance Python/

  • Decorators
  • Namespace and scope management

Exception Handling in Python/

  • Classification of Python errors
  • Implementation of try, except, else, and finally blocks
  • Creating and handling custom exceptions

File Handling in Python/

  • Text file operations (read, write, append) and context management using the with statement
  • Binary file operations
  • Serialization and deserialization
  • Pickling and unpickling Python objects

OOPS in Python/

Object-Oriented Programming principles:

  • Classes and objects (Part 1 & 2)
  • Reference variables and user-defined data types
  • Inheritance hierarchies
  • Encapsulation and data hiding
  • Abstraction
  • Polymorphism
  • Object aggregation and the super() function

Python Fundamental Questions/

Practice notebooks dedicated to:

  • Fundamental logic and level-1 problem solving
  • List and dictionary manipulation exercises
  • List comprehension practice
  • Decorator practicals
  • OOP practice
  • Exception handling practicals

My Projects/

Functional mini-projects developed during the learning phase:

  • Standard Calculator and Calculator V2
  • ATM System simulation
  • Library Management project
  • DinosaursPedia
  • Google Account Creation & Login simulation

NumPy For Python/

Fundamental array computing:

  • Array creation and attributes
  • Basic indexing and slicing
  • Iteration and array reshaping
  • Array stacking and splitting

NumPy For Practice/

Dedicated notebooks for reinforcing core NumPy concepts.

NumPy Advance/

In-depth exploration of advanced array operations and mathematical computing:

  • Advanced Indexing: Techniques for complex array selection and multi-dimensional slicing.
  • Array Broadcasting: Operational rules, implementation examples, and computational error resolution.
  • Handling Missing Values: Identification, filtering, and management of NaN/null data points within numerical arrays.
  • Plotting Graphs: Integrating array data with visualization operations.
  • Set Functions: Advanced operations including union, intersection, and unique value extraction on arrays.
  • Extra Methods (Part 1 & 2): Comprehensive coverage of specialized NumPy utility functions for extended statistical and mathematical operations.

Pandas in Python/

Comprehensive coverage of the Pandas library for data manipulation and analysis, organized into sub-folders:

Series in Pandas/

  • Creation and structural understanding (Part 1 & 2)
  • Indexing and slicing
  • Math methods
  • Plotting with built-in plot methods

DataFrames in Pandas/

  • Introduction, creation, and structural understanding
  • Filtering data and adding columns
  • Editing the index and using Python functionality within Pandas
  • Selecting columns, rows, and combined selections
  • Math methods and statistical operations
  • GroupBy: Aggregation, transformation, and group-level operations (Part 1 & 2), including hands-on exercises
  • Merging, joining, and concatenation of DataFrames
  • Important DataFrame methods (reference notebook)
  • DataFrames deep-dive (CampusX series)

Practice in Pandas/

  • Practice 1: YouTube channel analytics
  • Practice 2: Cities by GDP analysis
  • Practice 3: Student academic performance

Database from Kaggle/

A curated collection of real-world CSV datasets used across Pandas and NumPy notebooks:

Dataset Description
batsman_runs_ipl.csv IPL batsman run statistics
bollywood.csv Bollywood movie data
cities_by_gdp.csv Cities ranked by GDP
deliveries.csv IPL ball-by-ball delivery data
diabetes.csv Diabetes patient health metrics
global_top2000.csv Global top 2000 companies
imdb-top-1000.csv IMDB top 1000 movies
ipl-matches.csv IPL match results
kohli_ipl.csv Virat Kohli IPL performance stats
movies.csv General movies dataset
student_performance_finalscore.csv Student academic performance and final scores
subs.csv Subscriber data

datasets / subfolder contains additional supporting files:

File Description
courses.csv / courses.xlsx Course catalogue data
deliveries.csv Supplementary delivery data
matches.csv Supplementary match data
reg-month1.csv Registration data for month 1
reg-month2.csv / reg-month2.xlsx Registration data for month 2
students.csv / students.xlsx Student registration data

Tech Stack

Tool Purpose
Python 3 Core programming language
Jupyter Notebook Interactive development environment
NumPy Numerical computing
Pandas Data manipulation and analysis
Matplotlib Data visualization
Git & GitHub Version control and hosting

Goal

To establish a robust foundation in Python programming tailored for Data Science, bridging the gap between theoretical syntax and real-world data analysis and analytical projects.


Maintained by Ayush Suthar

About

Python concepts from basics to advanced for Machine Learning learners

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors