RegularPython|regular python|Python Theory|Python Videos|Python News|Python Blog|Python Interview Questions

Q1). What is Pandas in Python?

Pandas is a Python library used for data manipulation and analysis. It provides powerful tools to work with structured data like tables, similar to Excel but more powerful. Imagine you have a large spreadsheet with thousands of rows. Pandas allows you to filter, sort, and analyze this data quickly and efficiently.

Q2). What is a DataFrame in Pandas?

A DataFrame is a 2-dimensional table with rows and columns, like an Excel sheet. Each column can hold different types of data (integers, strings, etc.).

For example: if you have data about students' names and their scores, you can store it in a DataFrame with columns 'Name' and 'Score'. Example:

import pandas as pd

data = {'Name': ['Alice', 'Bob'], 'Score': [90, 85]}
df = pd.DataFrame(data)
print(df)

Q3). How do you create a DataFrame in Pandas?

You can create a DataFrame by passing a dictionary, list, or other data structures to Pandas.

For example: if you want to store data about students and their scores, you can create a DataFrame like this:

import pandas as pd

data = {'Name': ['Alice', 'Bob'], 'Math': [90, 80]}
df = pd.DataFrame(data)
print(df)

Q4). What is a Series in Pandas?

A Series is a one-dimensional array-like structure with labels, which can hold any data type (integers, strings, etc.). It's like a single column in a DataFrame.

For example: you can have a Series of students' scores. Example:

import pandas as pd

scores = pd.Series([90, 85, 78])
print(scores)

Q5). How do you select a column from a DataFrame?

You can select a column from a DataFrame by using the column name in square brackets.

For example: if you have a DataFrame with a 'Math' column, you can select it like this:

df['Math']

Q6). How do you add a new column to a DataFrame?

You can add a new column by assigning a list or Series to a new column name.

For example: if you want to add a 'Science' column to your DataFrame:

df['Science'] = [85, 78]
print(df)

Q7). How do you remove a column from a DataFrame?

You can remove a column using the drop() method.

For example: to remove the 'Science' column:

df.drop('Science', axis=1, inplace=True)
print(df)

Q8). How do you filter rows in a DataFrame?

You can filter rows by applying conditions on columns.

For example: to get all students who scored more than 80 in Math:

df[df['Math'] > 80]

Q9). What is the difference between loc and iloc in Pandas?

loc is used to access rows and columns by labels, while iloc is used to access rows and columns by integer position.

For example: df.loc[0] gets the first row by label, and df.iloc[0] gets the first row by index. Example:

df.loc[0]
df.iloc[0]

Q10). How do you check for missing values in a DataFrame?

You can check for missing values using the isnull() method, which returns a DataFrame of the same shape with True where values are missing and False elsewhere. Example:

df.isnull()

Q11). How do you fill missing values in a DataFrame?

You can fill missing values using the fillna() method.

For example: to fill missing values in the 'Math' column with the mean of the column:

df['Math'].fillna(df['Math'].mean(), inplace=True)
print(df)

Q12). How do you drop rows with missing values?

You can drop rows with missing values using the dropna() method.

For example: to remove all rows that contain any missing values:

df.dropna(inplace=True)
print(df)

Q13). What is the difference between apply() and map() in Pandas?

apply() is used to apply a function along an axis of the DataFrame, while map() is used to apply a function element-wise on a Series.

For example: df['Math'].apply(lambda x: x + 10) adds 10 to every value in the 'Math' column, while map() does the same for a Series. Example:

def add_ten(x):
    return x + 10

df['Math'] = df['Math'].apply(add_ten)
print(df)

Q14). How do you merge two DataFrames in Pandas?

You can merge two DataFrames using the merge() method, similar to SQL joins.

For example: if you have two DataFrames, df1 and df2, you can merge them on a common column like this:

pd.merge(df1, df2, on='StudentID')

Q15). What is the use of groupby() in Pandas?

groupby() is used to split the data into groups based on some criteria, apply a function to each group, and combine the results.

For example: you can use df.groupby('Class').mean() to get the average score for each class. Example:

df.groupby('Class').mean()

Q16). How do you concatenate two DataFrames?

You can concatenate two DataFrames using pd.concat().

For example: if you have two DataFrames, df1 and df2, you can concatenate them vertically like this:

df = pd.concat([df1, df2])
print(df)

Q17). What is the use of the describe() function in Pandas?

describe() provides a summary of statistics for numerical columns, including count, mean, std, min, and percentiles.

For example: df.describe() gives you a quick overview of the data distribution. Example:

df.describe()

Q18). How do you sort a DataFrame by a specific column?

You can sort a DataFrame by a specific column using the sort_values() method.

For example: df.sort_values('Math') sorts the DataFrame by the 'Math' column in ascending order. Example:

df.sort_values('Math', inplace=True)
print(df)

Q19). How do you convert a DataFrame to a NumPy array?

You can convert a DataFrame to a NumPy array using the to_numpy() method.

For example: df.to_numpy() converts the DataFrame to a 2D NumPy array. Example:

arr = df.to_numpy()
print(arr)

Q20). What is the difference between df.any() and df.all()?

df.any() checks if any element in the DataFrame is True (or non-zero), while df.all() checks if all elements are True.

For example: df['Math'].any() returns True if any score in 'Math' is non-zero, whereas df['Math'].all() returns True only if all scores are non-zero. Example:

df['Math'].any()
df['Math'].all()

Q21). How do you reset the index of a DataFrame?

You can reset the index of a DataFrame using the reset_index() method.

For example: df.reset_index() will reset the index to the default integer index. Example:

df.reset_index(drop=True, inplace=True)
print(df)

Q22). How do you rename columns in a DataFrame?

You can rename columns using the rename() method.

For example: df.rename(columns={'Math': 'Mathematics'}) renames the 'Math' column to 'Mathematics'. Example:

df.rename(columns={'Math': 'Mathematics'}, inplace=True)
print(df)

Q23). How do you drop duplicates from a DataFrame?

You can drop duplicates using the drop_duplicates() method.

For example: df.drop_duplicates() removes duplicate rows from the DataFrame. Example:

df.drop_duplicates(inplace=True)
print(df)

Q24). What is the purpose of the pivot_table() function?

pivot_table() is used to create a pivot table, which is a table that summarizes data.

For example: if you have a DataFrame with sales data, you can use pivot_table() to summarize sales by product and region. Example:

pd.pivot_table(df, values='Sales', index=['Product'], columns=['Region'], aggfunc=np.sum)

Q25). How do you set a column as the index of a DataFrame?

You can set a column as the index using the set_index() method.

For example: df.set_index('StudentID') sets the 'StudentID' column as the index. Example:

df.set_index('StudentID', inplace=True)
print(df)

Q26). What is the use of the cut() function in Pandas?

cut() is used to bin continuous data into discrete intervals.

For example: if you have students' scores and want to categorize them into 'Low', 'Medium', and 'High' based on their score ranges:

df['Score_Category'] = pd.cut(df['Score'], bins=[0, 60, 80, 100], labels=['Low', 'Medium', 'High'])
print(df)

Q27). How do you calculate the rolling mean in Pandas?

You can calculate the rolling mean using the rolling() method followed by mean().

For example: df['Math'].rolling(3).mean() calculates the rolling mean with a window of 3 for the 'Math' column. Example:

df['Math_Rolling_Mean'] = df['Math'].rolling(3).mean()
print(df)

Q28). How do you create a copy of a DataFrame?

You can create a copy of a DataFrame using the copy() method. This is useful when you want to make changes to a DataFrame without affecting the original one. Example:

df_copy = df.copy()
print(df_copy)

Q29). What is the use of the melt() function in Pandas?

melt() is used to transform or reshape data, turning columns into rows. It's often used when you need to unpivot a DataFrame, making it longer and narrower.

For example: if you have columns for different subjects and want to convert them into a single column: Example:

df_melted = pd.melt(df, id_vars=['Name'], value_vars=['Math', 'Science'], var_name='Subject', value_name='Score')
print(df_melted)

Q30). How do you convert a DataFrame to a dictionary?

You can convert a DataFrame to a dictionary using the to_dict() method.

For example: df.to_dict() converts the DataFrame into a dictionary where column names are keys and column values are lists. Example:

df_dict = df.to_dict()
print(df_dict)

Q31). What is the use of the applymap() function in Pandas?

applymap() is used to apply a function to every element of a DataFrame. It's useful when you want to transform or format all values in the DataFrame.

For example: you can use it to round all float values to two decimal places. Example:

df = df.applymap(lambda x: round(x, 2))
print(df)