Q1). What is database normalization?
Database normalization is the process of organizing data to reduce redundancy and improve data integrity.
For example: in a university database, instead of storing student addresses in every table where student data is mentioned, you store addresses in a separate table and link it to student records. This way, if a student changes their address, you only update it in one place.
Q2). What is an index in a database?
An index is a database structure that improves the speed of data retrieval operations.
For example: in a library database, if you frequently search for books by their titles, creating an index on the title column will make these searches much faster than scanning every book record.
Q3). What is a primary key?
A primary key is a unique identifier for a record in a database table. For instance, in a customer database, the customer ID is often used as a primary key because it uniquely identifies each customer and ensures that no two customers have the same ID.
Q4). What is ACID in database transactions?
ACID stands for Atomicity, Consistency, Isolation, and Durability. These properties ensure that database transactions are processed reliably.
For example: in an online shopping transaction, ACID ensures that either the order is fully completed or not processed at all, maintaining data integrity even in case of failures.
Q5). What is a foreign key?
A foreign key is a field in one table that uniquely identifies a row of another table.
For example: in a school database, the `class_id` in the `students` table is a foreign key that references the `class_id` in the `classes` table, linking students to their respective classes.
Q6). What is a view in a database?
A view is a virtual table based on the result of a query.
For example: you might create a view in an employee database to show only employees in a certain department, which can simplify queries and enhance security by restricting access to certain data.
Q7). What is a stored procedure?
A stored procedure is a precompiled collection of SQL statements stored in the database.
For example: a stored procedure might be used to automate a monthly report generation process, saving time and ensuring consistency.
Q8). What is data warehousing?
Data warehousing involves collecting and managing data from various sources to provide meaningful business insights. For instance, a retail chain might use a data warehouse to consolidate sales data from different stores and analyze trends to make informed business decisions.
Q9). What is denormalization?
Denormalization is the process of intentionally introducing redundancy into a database to improve performance.
For example: in a high-traffic e-commerce site, you might denormalize product details and customer reviews into a single table to speed up read operations and reduce query complexity.
Q10). What is database sharding?
Database sharding involves splitting a large database into smaller, more manageable pieces called shards.
For example: a social media platform might shard its user database by geographical region, so that user data is distributed across servers based on their location, improving performance and scalability.
Q11). What is the difference between OLTP and OLAP?
OLTP (Online Transaction Processing) systems handle real-time transactional data, such as order processing systems, while OLAP (Online Analytical Processing) systems are used for complex queries and analysis, such as business intelligence dashboards. For instance, an e-commerce site uses OLTP for order transactions and OLAP for analyzing sales trends.
Q12). What is database replication?
Database replication involves copying data from one database to another to ensure data availability and redundancy.
For example: a company might replicate its customer database across multiple servers to ensure that if one server fails, the data is still accessible from other servers.
Q13). What is a transaction log?
A transaction log records all changes made to a database, which helps in recovering data after a failure.
For example: if an unexpected shutdown occurs during a transaction, the transaction log allows the database to roll back to a consistent state, ensuring no data is lost.
Q14). What is a database schema?
A database schema defines the structure of a database, including tables, columns, and relationships.
For example: in a library database, the schema specifies how tables for books, authors, and borrowers are related and what information each table contains.
Q15). What is the purpose of database partitioning?
Database partitioning divides a large database into smaller, more manageable pieces to improve performance and manageability.
For example: a company might partition its customer database by region, so that queries can be executed more quickly within smaller datasets.
Q16). What is a data mart?
A data mart is a subset of a data warehouse focused on a specific business area. For instance, a sales data mart might contain detailed sales data for analysis by the sales team, while the overall data warehouse contains data for all departments.
Q17). What are database constraints?
Constraints are rules enforced on database columns to maintain data integrity.
For example: a `NOT NULL` constraint ensures that a column cannot have a NULL value, such as ensuring that every employee record has an employee ID.
Q18). What is the difference between a clustered index and a non-clustered index?
A clustered index sorts and stores the data rows of the table based on the index key, while a non-clustered index is a separate structure that references the data rows.
For example: in a database of employee records, a clustered index on the employee ID will physically sort the table rows by employee ID, while a non-clustered index on the department field will provide a reference to the employee records based on department.
Q19). What is an aggregate function?
Aggregate functions perform calculations on a set of values and return a single value.
For example: the `SUM()` function can be used to calculate the total sales revenue from a sales table, summing up the values in the sales amount column.
Q20). What is a materialized view?
A materialized view is a database object that contains the results of a query. Unlike a regular view, which generates data dynamically, a materialized view stores the data physically.
For example: a reporting application might use a materialized view to store precomputed sales data for faster access.
Q21). What is a deadlock in a database?
A deadlock occurs when two or more transactions are waiting for each other to release resources, causing a standstill.
For example: if Transaction A holds a lock on Table 1 and waits for a lock on Table 2, while Transaction B holds a lock on Table 2 and waits for Table 1, neither transaction can proceed, causing a deadlock.
Q22). What is the difference between a snapshot and a backup?
A snapshot is a point-in-time copy of a database, while a backup is a complete copy of the database that can be restored.
For example: a snapshot might be used to quickly revert to a previous state during development, while a backup provides a way to recover the database in case of a catastrophic failure.
Q23). What is data compression in databases?
Data compression reduces the size of the database by encoding data in a more efficient format.
For example: a database storing text data might use compression to reduce the storage size, making it faster and cheaper to manage large volumes of text.
Q24). What is a rollup in database reporting?
A rollup is a process of aggregating data to a higher level of granularity.
For example: a sales report might roll up daily sales data to monthly totals, providing a summary view of performance over time.
Q25). What is a database cursor?
A cursor is a database object used to retrieve and manipulate rows returned by a query.
For example: in a billing system, a cursor might be used to process each customer record one by one to generate invoices.
Q26). What is a database trigger?
A trigger is a set of instructions that are automatically executed in response to certain events on a table.
For example: you might create a trigger to automatically update a product's inventory count whenever a new order is placed.
Q27). What is data masking?
Data masking involves hiding sensitive data to protect it from unauthorized access.
For example: in a development environment, you might mask customer names and credit card numbers to prevent exposure while allowing developers to work with realistic data.
Q28). What is a database graph?
A database graph is a type of NoSQL database that uses graph structures to represent and store data.
For example: social networks use graph databases to model relationships between users, allowing for efficient queries on connections and interactions.
Q29). What is a hierarchical database?
A hierarchical database organizes data in a tree-like structure, where each record has a single parent and possibly multiple children.
For example: an organization's employee database might use a hierarchical structure to represent the reporting relationships between employees and managers.
Q30). What is a multi-dimensional database?
A multi-dimensional database stores data in a format optimized for complex queries and analysis, often used in OLAP systems.
For example: a sales data warehouse might use a multi-dimensional database to analyze sales data by various dimensions like time, location, and product category.
Q31). What is a distributed database?
A distributed database is a database that is spread across multiple physical locations.
For example: a global e-commerce company might use a distributed database to ensure that customer data is accessible quickly from servers located in different regions around the world.
Q32). What is a NoSQL database and when would you use it?
A NoSQL database is a type of database designed for handling large volumes of unstructured or semi-structured data. It is used when traditional relational databases are not suitable.
For example: a social media platform uses NoSQL databases to manage user-generated content and interactions that don't fit neatly into a table structure.
Q33). What is eventual consistency, and how does it differ from strong consistency?
Eventual consistency means that data will eventually become consistent across all nodes, but there might be temporary discrepancies. Strong consistency ensures that all nodes reflect the same data immediately.
For example: in a distributed e-commerce system, eventual consistency might mean that product availability updates might take some time to reflect across all regions.
Q34). What is a distributed transaction?
A distributed transaction spans multiple databases or systems, ensuring that all parts of the transaction are completed successfully or none at all.
For example: when processing a payment that affects both a user's bank account and an online store's inventory, a distributed transaction ensures both systems are updated consistently.
Q35). What is the CAP theorem?
The CAP theorem states that a distributed database can only guarantee two out of the following three properties: Consistency, Availability, and Partition Tolerance.
For example: in a distributed system, you might have to choose between ensuring all nodes see the same data (consistency) and ensuring the system remains operational even if some nodes are unreachable (partition tolerance).
Q36). What is a database materialized view?
A materialized view is a database object that stores the result of a query physically, unlike a regular view which is computed dynamically.
For example: a reporting system might use a materialized view to store pre-aggregated sales data for faster query performance.
Q37). What is a schema-less database?
A schema-less database, or schema-free database, allows for flexible data structures without a predefined schema.
For example: in a content management system, a schema-less database allows storing various types of content (articles, images, videos) without requiring a rigid data model.
Q38). What are distributed hash tables (DHTs)?
Distributed hash tables are a decentralized way of storing and retrieving data across multiple nodes using a hash function.
For example: in a peer-to-peer file sharing system, DHTs allow efficient data lookups and storage across many participating nodes.
Q39). What is database federation?
Database federation involves integrating multiple databases into a single system so that users can query them as if they were a single database.
For example: a company might federate its HR, sales, and finance databases to provide a unified view of employee performance metrics across departments.
Q40). What is a log-structured merge tree (LSM tree)?
An LSM tree is a data structure used in some NoSQL databases to optimize write and read performance.
For example: in a write-heavy application like a logging system, an LSM tree helps manage high volumes of write operations efficiently by batching them before merging with existing data.
Q41). What is a bloom filter?
A bloom filter is a probabilistic data structure used to test whether an element is a member of a set.
For example: in a search engine, a bloom filter can quickly check if a search term is likely to be in a large dataset, reducing the need for more costly lookups.
Q42). What is data sharding?
Data sharding involves dividing a large database into smaller, more manageable pieces, called shards, which can be distributed across different servers.
For example: a large-scale e-commerce site might shard its customer database by geographic region to improve performance and scalability.
Q43). What is a distributed ledger?
A distributed ledger is a decentralized database that is replicated across multiple nodes, ensuring all participants have access to the same data.
For example: blockchain technology is a type of distributed ledger used in cryptocurrencies to maintain a secure and immutable record of transactions.
Q44). What is eventual consistency in NoSQL databases?
Eventual consistency means that all nodes in a NoSQL database will eventually become consistent, but there may be a delay.
For example: in a distributed online game, player scores might be updated across different servers with some delay, but eventually, all scores will match.
Q45). What is a quorum in distributed databases?
A quorum is the minimum number of nodes required to agree on a read or write operation in a distributed database.
For example: if a distributed database requires a quorum of 3 out of 5 nodes to confirm a transaction, it ensures that the transaction is committed reliably even if some nodes fail.
Q46). What is a distributed database?
A distributed database is a database that is spread across multiple physical locations, either within a single organization or across different locations.
For example: a global social media platform might use a distributed database to ensure fast and reliable access to user data across different regions.