Q1). How do you handle missing values in Pandas?

Handling missing values is crucial for data analysis. You can use methods like fillna() to replace missing values or dropna() to remove rows or columns with missing values. Example:

# Filling missing values with the mean of the column data.fillna(data.mean(), inplace=True)

Q2). How do you handle duplicate rows in a DataFrame?

To handle duplicate rows, you can use the drop_duplicates() method, which removes duplicate rows based on all or specific columns. Example:

# Removing duplicate rows based on all columns data.drop_duplicates(inplace=True)

Q3). How do you handle time series data in Pandas?

Pandas provides tools for working with time series data, such as resampling, shifting, and rolling window operations. You can also convert columns to datetime format using pd.to_datetime(). Example:

# Converting a column to datetime and resampling sales data by month sales_data['Date'] = pd.to_datetime(sales_data['Date']) monthly_sales = sales_data.resample('M', on='Date')['Sales'].sum()

Q4). How do you merge DataFrames with different shapes?

When merging DataFrames with different shapes, you can specify the type of join: inner (intersection), outer (union), left (left DataFrame's keys), or right (right DataFrame's keys). Example:

# Merging with an outer join to include all records from both DataFrames merged_data = pd.merge(df1, df2, on='Product_ID', how='outer')

Q5). How do you filter a DataFrame based on a condition?

You can filter a DataFrame based on a condition using boolean indexing. This allows you to select rows that meet a specific condition. Example:

# Filtering rows where sales are greater than 1000 high_sales = sales_data[sales_data['Sales'] > 1000]

Q6). What is the difference between .loc and .iloc?

.loc[] is used for label-based indexing and allows you to access a group of rows and columns by labels or a boolean array, while .iloc[] is used for integer-based indexing, allowing you to access rows and columns by position. Example:

# Using .loc[] to access rows with specific labels region_sales = sales_data.loc[sales_data['Region'] == 'West', ['Sales', 'Profit']] # Using .iloc[] to access specific rows and columns by position subset = sales_data.iloc[0:5, 1:4]

Q7). How do you rename columns in a DataFrame?

You can rename columns in a DataFrame using the rename() function, where you pass a dictionary mapping the old column names to the new ones. Example:

# Renaming columns 'Sales' to 'Total_Sales' and 'Profit' to 'Total_Profit' sales_data.rename(columns={'Sales': 'Total_Sales', 'Profit': 'Total_Profit'}, inplace=True)

Q8). What is the difference between a DataFrame and a Series in Pandas?

A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types, similar to a table in a database or an Excel spreadsheet. A Series is a 1-dimensional labeled array, similar to a single column or row of data. Example:

# Creating a DataFrame df = pd.DataFrame({'Product': ['A', 'B', 'C'], 'Sales': [100, 150, 200]}) # Creating a Series sales_series = pd.Series([100, 150, 200], name='Sales')

Q9). How do you sort a DataFrame by a column?

You can sort a DataFrame by a column using the sort_values() function, specifying the column to sort by and the sort order. Example:

# Sorting the DataFrame by the 'Sales' column in descending order sorted_data = sales_data.sort_values(by='Sales', ascending=False)

Q10). How do you filter rows based on multiple conditions?

You can filter rows based on multiple conditions by combining boolean conditions using the & (and) and | (or) operators. Example:

# Filtering rows where 'Sales' are greater than 1000 and 'Region' is 'West' filtered_data = sales_data[(sales_data['Sales'] > 1000) & (sales_data['Region'] == 'West')]

Q11). How do you apply a function to a DataFrame column?

You can apply a function to a DataFrame column using the apply() method, which allows you to pass a function that will be applied to each element of the column. Example:

# Applying a function to calculate the length of each product name sales_data['Product_Length'] = sales_data['Product'].apply(len)

Q12). How do you pivot a DataFrame?

You can pivot a DataFrame using the pivot() function, which reshapes the data based on column values, creating a new DataFrame where rows are transformed into columns. Example:

# Pivoting the DataFrame to show 'Sales' by 'Product' and 'Date' pivoted_data = sales_data.pivot(index='Date', columns='Product', values='Sales')

Q13). How do you group data in Pandas?

You can group data in Pandas using the groupby() function, which allows you to group rows based on column values and then perform aggregate operations on these groups. Example:

# Grouping sales data by 'Product' and calculating the sum of sales for each product grouped_data = sales_data.groupby('Product')['Sales'].sum()

Q14). How do you read a CSV file into a DataFrame?

You can read a CSV file into a DataFrame using the read_csv() function. This function loads data from a CSV file into a DataFrame, which is a table-like data structure. Example:

# Reading data from a CSV file into a DataFrame sales_data = pd.read_csv('sales_data.csv')

Q15). How do you save a DataFrame to a CSV file?

You can save a DataFrame to a CSV file using the to_csv() function. This function exports the DataFrame's data to a CSV file, which can be shared or stored. Example:

# Saving the DataFrame to a CSV file sales_data.to_csv('sales_data.csv', index=False)

Q16). How do you drop rows or columns from a DataFrame?

You can drop rows or columns from a DataFrame using the drop() method, specifying the axis parameter to indicate whether you're dropping rows (axis=0) or columns (axis=1). Example:

# Dropping a column 'Profit' from the DataFrame sales_data.drop('Profit', axis=1, inplace=True)

Q17). How do you handle categorical data in Pandas?

You can handle categorical data using the astype('category') method to convert columns to categorical data types. This can save memory and speed up operations. Example:

# Converting the 'Region' column to a categorical data type sales_data['Region'] = sales_data['Region'].astype('category')

Q18). How do you deal with outliers in a DataFrame?

You can deal with outliers by identifying them using statistical methods like IQR (Interquartile Range) or Z-scores, and then handling them either by removing or adjusting them. Example:

# Identifying outliers using IQR Q1 = sales_data['Sales'].quantile(0.25) Q3 = sales_data['Sales'].quantile(0.75) IQR = Q3 - Q1 outliers = sales_data[(sales_data['Sales'] < (Q1 - 1.5 * IQR)) | (sales_data['Sales'] > (Q3 + 1.5 * IQR))]

Q19). How do you concatenate DataFrames?

You can concatenate DataFrames using the concat() function, which allows you to combine them along a particular axis (rows or columns). Example:

# Concatenating two DataFrames along rows combined_data = pd.concat([df1, df2], axis=0)

Q20). How do you reshape a DataFrame?

Reshaping a DataFrame can be done using methods like melt() to unpivot the DataFrame or pivot_table() to create a pivot table. Example:

# Melting a DataFrame to long format melted_data = pd.melt(df, id_vars=['Product'], value_vars=['Q1', 'Q2', 'Q3', 'Q4'], var_name='Quarter', value_name='Sales')

Q21). How do you handle date and time in Pandas?

Pandas has robust functionality for handling date and time data using functions like pd.to_datetime() for conversion and datetime properties for extraction. Example:

# Converting a column to datetime and extracting the year sales_data['Date'] = pd.to_datetime(sales_data['Date']) sales_data['Year'] = sales_data['Date'].dt.year

Q22). How do you use apply() with multiple arguments?

You can use apply() with multiple arguments by passing a function that accepts multiple parameters, and using the args parameter to provide additional arguments. Example:

# Applying a function with multiple arguments def calculate_discount(price, discount): return price - (price * discount) sales_data['Discounted_Price'] = sales_data.apply(calculate_discount, args=(0.1,), axis=1)

Q23). How do you handle large datasets with Pandas?

Handling large datasets can be done using techniques such as chunking with read_csv() to process the data in smaller pieces, and using efficient data types. Example:

# Reading a large CSV file in chunks chunk_iter = pd.read_csv('large_data.csv', chunksize=10000) for chunk in chunk_iter: process(chunk)

Q24). How do you perform aggregation in Pandas?

Aggregation in Pandas can be done using functions like groupby() combined with aggregate functions such as sum(), mean(), and count(). Example:

# Aggregating data to get total sales by 'Product' total_sales = sales_data.groupby('Product')['Sales'].agg('sum')

Q25). How do you merge DataFrames on multiple columns?

You can merge DataFrames on multiple columns by specifying a list of column names in the on parameter of the merge() function. Example:

# Merging DataFrames on multiple columns merged_data = pd.merge(df1, df2, on=['Product_ID', 'Region'], how='inner')

Q26). How do you handle duplicate index values?

Handling duplicate index values involves resetting the index using reset_index() or reindexing with a unique index. Example:

# Resetting the index to handle duplicates cleaned_data = sales_data.reset_index(drop=True)

Q27). How do you handle missing data in a DataFrame?

Handling missing data can be done using methods such as fillna() to replace missing values or dropna() to remove rows or columns with missing values. Example:

# Filling missing values in 'Sales' column with 0 sales_data['Sales'].fillna(0, inplace=True)