📎 Referral Code:
📊 Dashboard Sign In
Navigation
🗺️
Live Classes
🗺️
Courses
🎬
Assignments
💡
Q & A
💡
My Profile
💡
Recruiter Board
Job Support
🎯
Interview Board
👥
Chat Room
AI Tools
🌐
Project Explanation Agent
🛟
Support Works
🏠 Dashboard June-08-2026:Morning 9:00AM - 10:00AM Pandas Module introduction
Pandas Module introduction
⏱ 1.1h 📋 Video 9 of 11
Topics Covered
Pandas Introduction, Data Analysis, Cleaning, Filtering, Data Model, Joining,
📝 Class Notes
📚 Day 1 Class Notes

What is pandas?

Your complete beginner's guide — explained from zero, in plain English.

🐍 Python Library 📊 Data Analysis 🧹 Data Cleaning 🔗 Merging & Joining 📁 CSV · Excel · JSON · SQL · Parquet

What is pandas?

 

pandas is a free, open-source Python library created specifically for working with structured data — data that looks like a table with rows and columns, just like an Excel spreadsheet or a database table. The name comes from "Panel Data" — a term used in statistics and economics. It was built by Wes McKinney in 2008 and is now the most popular data tool in Python.

💡
Real-world analogy for freshers:
Think of pandas like a super-powered Excel that you control with Python code. In Excel, you click buttons to sort, filter, and calculate. In pandas, you write a few lines of Python and the same work happens — but on millions of rows, in seconds, automatically. And unlike Excel, you can save those steps as a script and reuse them forever.
▶ This is a pandas DataFrame — a table in Python memory
  ID Name Department City Salary Join Date
0 1 Alice HR Bangalore 60000 2023-01-15
1 2 Bob IT Hyderabad 75000 2022-07-23
2 3 Charlie Finance Mumbai 62000 2021-11-10
3 4 David IT Pune 82000 2023-03-05
4 5 Eve Sales Bangalore 55000 2022-01-20
5 6 Frank Finance Delhi NULL 2021-06-18
6 7 Grace HR Hyderabad 65000 2023-02-28
7 8 Henry Sales Mumbai 58000 2022-09-12
📌 Key Terms: The rows are called records / observations. The columns are called features / fields / attributes. The left-side numbers (0,1,2…) are the index — pandas' way of labelling each row. The red NULL cell is a missing value — one of the most common real-world data problems pandas helps you solve.
Terminal — Install pandas
pip install pandas
 
 
 
hello_pandas.py
# Step 1 — import the library (always the first line)
import pandas as pd       # "pd" is a universal shortcut everyone uses

# Step 2 — create your first DataFrame from a Python list of dicts
data = [
    {"Name": "Alice",   "Department": "HR",      "Salary": 60000},
    {"Name": "Bob",     "Department": "IT",      "Salary": 75000},
    {"Name": "Charlie", "Department": "Finance", "Salary": 62000},
]

df = pd.DataFrame(data)   # convert list → pandas table

print(df)   # action — shows the table
▶ Output
      Name Department  Salary
0    Alice         HR   60000
1      Bob         IT   75000
2  Charlie    Finance   62000

Why should you learn pandas?

 

Before learning how to use pandas, you need to understand why it matters. Every company — from startups to MNCs — generates data every single day. Someone has to clean it, analyse it, and turn it into useful information. That person is you, and pandas is your most important tool.

REASON 01
📊 Data is Everywhere

Every app, website, and business creates data — orders, clicks, transactions, logs. pandas lets you make sense of all of it.

REASON 02
💼 High Demand Jobs

Data Engineer, Data Analyst, Data Scientist — all of these roles use pandas daily. It is listed in almost every data job description in India.

REASON 03
⚡ 10× Faster than Excel

Excel crashes at 1 million rows. pandas handles 10 million rows in seconds. One script replaces hours of manual Excel work.

REASON 04
🔗 Gateway to Big Tech

pandas is the foundation for PySpark, Machine Learning (sklearn), and Data Visualisation (Matplotlib, Seaborn). Learn pandas first, everything else becomes easier.

REASON 05
🧹 Real Data is Messy

In real projects, 70% of time is spent cleaning data — fixing nulls, removing duplicates, correcting formats. pandas is built for exactly this.

REASON 06
🌐 Reads Any File

CSV, Excel, JSON, SQL databases, Parquet — one line of pandas code reads any format. No extra tools needed.

🎯 Remember this: In your first job interview, if you say "I know pandas and I can clean, analyse and transform data with it" — that immediately makes you stand out from freshers who only know theory.

Who uses pandas in their daily work?

 
Data Engineers
 
Data Analysts
 
Data Scientists
 
BI Developers
 
ML Engineers
 
Database Developers
 
Backend Developers

14 Things pandas can do — explained with code

 

These are the 14 capabilities introduced in today's class. For each one, you will see what it means, how to write it, and what output it gives.

📊
CONCEPT 01
Data Analysis — Explore & Understand Patterns

Data Analysis means looking at your data to discover what it is telling you. Before writing any code that changes data, you first explore it — how many rows? What columns? Any missing values? What is the average salary?

 
 
 
01_data_analysis.py
import pandas as pd

df = pd.read_csv("employees.csv")

print(df.shape)       # → (8, 6) — 8 rows, 6 columns
print(df.info())      # column names, types, nulls
print(df.head(3))    # first 3 rows
print(df.describe()) # count, mean, min, max of numeric cols
🔧
CONCEPT 02
Data Transformation — Create & Modify Columns

Transformation means reshaping or creating new data from existing data. This is how you add business logic — calculate tax, classify employees, extract year from a date, etc.

 
 
 
02_transformation.py
# Add a new "Tax" column
df["Tax"] = df["Salary"] * 0.10

# Add "Level" column using a condition
df["Level"] = df["Salary"].apply(lambda x: "Senior" if x > 70000 else "Junior")

# Convert Join Date from text to real date
df["Join Date"] = pd.to_datetime(df["Join Date"])
df["Join Year"] = df["Join Date"].dt.year

print(df[["Name", "Salary", "Tax", "Level", "Join Year"]])
▶ Output
      Name  Salary     Tax   Level  Join Year
0    Alice   60000  6000.0  Junior       2023
1      Bob   75000  7500.0  Senior       2022
2  Charlie   62000  6200.0  Junior       2021
3    David   82000  8200.0  Senior       2023
🧹
CONCEPT 03
Data Cleaning — Handle Nulls & Duplicates

In real-world data, problems are always present — missing values, duplicate entries, wrong formats. Data Cleaning fixes these issues before analysis so your results are accurate.

 
 
 
03_cleaning.py
# 1. Find missing values
print(df.isnull().sum())
# Salary    1  ← Frank has NULL salary

# 2. Fill NULL salary with dept average
avg_salary = df["Salary"].mean()
df["Salary"] = df["Salary"].fillna(avg_salary)

# 3. Remove duplicate rows
df = df.drop_duplicates()

# 4. Fix inconsistent text (strip spaces + lowercase)
df["Department"] = df["Department"].str.strip().str.title()

print("Nulls remaining:", df.isnull().sum().sum())
🔍
CONCEPT 04 & 05
Data Filtering & Aggregation

Filtering = picking rows that match a condition (like SQL WHERE). Aggregation = summarising many rows into one number — total, average, count, max.

 
 
 
04_05_filter_agg.py
# FILTERING — get IT department employees with salary > 70000
it_high = df[(df["Department"] == "IT") & (df["Salary"] > 70000)]
print(it_high[["Name", "Salary"]])

# AGGREGATION — average salary per department
dept_stats = df.groupby("Department")["Salary"].agg(
    avg_salary = "mean",
    max_salary = "max",
    headcount   = "count"
).reset_index()
print(dept_stats)
▶ Output — dept_stats
  Department  avg_salary  max_salary  headcount
0    Finance       62000       62000          2
1         HR       62500       65000          2
2         IT       78500       82000          2
3      Sales       56500       58000          2
🗂️
CONCEPT 06 & 08
Grouping & Sorting

Grouping = split data into categories and summarise each group (like SQL GROUP BY). Sorting = ordering rows so the highest, lowest, or alphabetical item appears first.

 
 
 
06_08_group_sort.py
# GROUPING — how many employees per city?
city_count = df.groupby("City")["Name"].count()
print(city_count)

# SORTING — top 3 highest paid employees
top3 = df.sort_values("Salary", ascending=False).head(3)
print(top3[["Name", "Department", "Salary"]])
▶ Output — top 3 highest paid
    Name Department  Salary
3  David         IT   82000
1    Bob         IT   75000
6  Grace         HR   65000
📁
CONCEPT 11 · 12 · 13 · 14
File Ops · ETL · Time Series · ML Data Prep

These are the professional-level applications of pandas you will use in real jobs:

📁 File Operations — read CSV / Excel / JSON / Parquet / SQL in one line, write results back.

🔄 ETL Development — Extract data from a source, Transform it (clean, join, calculate), Load it to a destination. This is what Data Engineers do every day.

📅 Time Series — analyse trends over time, resample from daily to monthly, compute rolling averages.

🤖 ML Data Preparation — before training any AI/ML model, you must prepare features using pandas — encode categories, scale numbers, split train/test data.

 
 
 
11_14_advanced.py
# ── File Operations ──────────────────────────────────────────
df = pd.read_csv("raw_sales.csv")          # Extract
df.to_excel("clean_sales.xlsx", index=False) # Save as Excel
df.to_parquet("sales.parquet")              # Save as Parquet (AWS/cloud)

# ── ETL Pipeline example ──────────────────────────────────────
raw   = pd.read_csv("orders.csv")             # Extract
clean = raw.dropna().drop_duplicates()        # Transform
clean["Revenue"] = clean["Qty"] * clean["Price"]
clean.to_csv("output/clean_orders.csv")      # Load

# ── Time Series — monthly revenue trend ──────────────────────
clean["Date"] = pd.to_datetime(clean["Date"])
monthly = clean.resample("M", on="Date")["Revenue"].sum()

# ── ML Prep — encode categorical columns ─────────────────────
df_encoded = pd.get_dummies(df, columns=["Department", "City"])
# Department_HR=1, Department_IT=1… ready for ML model input

Where will you implement this knowledge?

 

This is not a skill you learn and forget. Here is exactly where and how pandas appears in real careers.

Your Career Map with pandas
Data Engineer
Build ETL pipelines. Read from S3/SQL → clean with pandas → load to warehouse.
Data Analyst
Analyse sales, HR, finance data daily. Create pivot reports. Answer business questions.
Data Scientist
Prepare training data for ML models. Feature engineering. Exploratory analysis.
BI Developer
Process data before loading into Power BI / Tableau. Automate report generation.
ML Engineer
Preprocess datasets. Encode categories. Split train/test. Validate data quality.
Backend Developer
Process bulk data imports. Validate CSV uploads. Generate data exports.

Real projects where pandas is used:

🏦
Banking Analytics
Fraud detection, loan analysis, customer segmentation
🛒
E-Commerce
Order analytics, inventory, customer purchase patterns
🏥
Healthcare
Patient records, clinical trial data, drug efficacy reports
🎬
OTT / Movies
Recommendation engines, viewer behaviour analysis
📈
Finance
Stock analysis, portfolio tracking, risk calculation
🛵
Delivery / Logistics
Route optimisation, delay analysis, delivery reports
💼
HR Analytics
Attrition analysis, salary benchmarking, headcount reports
📊
Sales Reporting
Monthly sales trends, top-performing products, region-wise revenue

June-08-2026:Morning 9:00AM - 10:00AM
June-08-2026:Morning 9:00AM - 10:00AM
0% done