Pandas is a Python library used for data manipulation and analysis. It provides powerful tools to work with structured data like tables, similar to Excel but more powerful. Imagine you have a large spreadsheet with thousands of rows. Pandas allows you to filter, sort, and analyze this data quickly and efficiently.
A DataFrame is a 2-dimensional table with rows and columns, like an Excel sheet. Each column can hold different types of data (integers, strings, etc.).
import pandas as pd data = {'Name': ['Alice', 'Bob'], 'Score': [90, 85]} df = pd.DataFrame(data) print(df)
You can create a DataFrame by passing a dictionary, list, or other data structures to Pandas.
import pandas as pd data = {'Name': ['Alice', 'Bob'], 'Math': [90, 80]} df = pd.DataFrame(data) print(df)
A Series is a one-dimensional array-like structure with labels, which can hold any data type (integers, strings, etc.). It's like a single column in a DataFrame.
import pandas as pd scores = pd.Series([90, 85, 78]) print(scores)
You can select a column from a DataFrame by using the column name in square brackets.
df['Math']
You can add a new column by assigning a list or Series to a new column name.
df['Science'] = [85, 78]
print(df)
You can remove a column using the drop() method.
df.drop('Science', axis=1, inplace=True) print(df)
You can filter rows by applying conditions on columns.
df[df['Math'] > 80]
loc is used to access rows and columns by labels, while iloc is used to access rows and columns by integer position.
df.loc[0] df.iloc[0]
You can check for missing values using the isnull() method, which returns a DataFrame of the same shape with True where values are missing and False elsewhere. Example:
df.isnull()
You can fill missing values using the fillna() method.
df['Math'].fillna(df['Math'].mean(), inplace=True) print(df)
You can drop rows with missing values using the dropna() method.
df.dropna(inplace=True)
print(df)
apply() is used to apply a function along an axis of the DataFrame, while map() is used to apply a function element-wise on a Series.
def add_ten(x): return x + 10 df['Math'] = df['Math'].apply(add_ten) print(df)
You can merge two DataFrames using the merge() method, similar to SQL joins.
pd.merge(df1, df2, on='StudentID')
groupby() is used to split the data into groups based on some criteria, apply a function to each group, and combine the results.
df.groupby('Class').mean()
You can concatenate two DataFrames using pd.concat().
df = pd.concat([df1, df2]) print(df)
describe() provides a summary of statistics for numerical columns, including count, mean, std, min, and percentiles.
df.describe()
You can sort a DataFrame by a specific column using the sort_values() method.
df.sort_values('Math', inplace=True) print(df)
You can convert a DataFrame to a NumPy array using the to_numpy() method.
arr = df.to_numpy() print(arr)
df.any() checks if any element in the DataFrame is True (or non-zero), while df.all() checks if all elements are True.
df['Math'].any() df['Math'].all()
You can reset the index of a DataFrame using the reset_index() method.
df.reset_index(drop=True, inplace=True) print(df)
You can rename columns using the rename() method.
df.rename(columns={'Math': 'Mathematics'}, inplace=True) print(df)
You can drop duplicates using the drop_duplicates() method.
df.drop_duplicates(inplace=True)
print(df)
pivot_table() is used to create a pivot table, which is a table that summarizes data.
pd.pivot_table(df, values='Sales', index=['Product'], columns=['Region'], aggfunc=np.sum)
You can set a column as the index using the set_index() method.
df.set_index('StudentID', inplace=True) print(df)
cut() is used to bin continuous data into discrete intervals.
df['Score_Category'] = pd.cut(df['Score'], bins=[0, 60, 80, 100], labels=['Low', 'Medium', 'High']) print(df)
You can calculate the rolling mean using the rolling() method followed by mean().
df['Math_Rolling_Mean'] = df['Math'].rolling(3).mean() print(df)
You can create a copy of a DataFrame using the copy() method. This is useful when you want to make changes to a DataFrame without affecting the original one. Example:
df_copy = df.copy() print(df_copy)
melt() is used to transform or reshape data, turning columns into rows. It's often used when you need to unpivot a DataFrame, making it longer and narrower.
df_melted = pd.melt(df, id_vars=['Name'], value_vars=['Math', 'Science'], var_name='Subject', value_name='Score') print(df_melted)
You can convert a DataFrame to a dictionary using the to_dict() method.
df_dict = df.to_dict() print(df_dict)
applymap() is used to apply a function to every element of a DataFrame. It's useful when you want to transform or format all values in the DataFrame.
df = df.applymap(lambda x: round(x, 2)) print(df)