Skip to content

Commit

Permalink
adding DBMS
Browse files Browse the repository at this point in the history
  • Loading branch information
Isha307 committed Aug 8, 2023
1 parent 711290c commit 99b599d
Show file tree
Hide file tree
Showing 7 changed files with 87 additions and 0 deletions.
87 changes: 87 additions & 0 deletions DBMS/Indexing&Query_Optimizing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
## What is Indexing?
Data indexing is a crucial concept in the field of databases and information retrieval.
It refers to the process of organizing and structuring data in a way that allows for efficient and quick data retrieval.
In a database context, indexing involves creating data structures that provide a quick look-up mechanism for locating rows of data in a table.
These data structures typically consist of key-value pairs, where the key is a field or a combination of fields from the table, and the value is a reference or pointer to the actual data row.

Example: Let's assume you have a table called users with columns id, username, and email, and you want to create an index on the username column. Here's how you might do it using SQL for various database systems:

```bash
CREATE INDEX idx_username ON users(username);
```

## What is Database Sharding?
Database sharding is a database technique used to manage and distribute large datasets across multiple database instances or servers.
Sharding is primarily employed to improve scalability, performance, and availability in scenarios where a single database server cannot handle the load or storage requirements of the application.
In a sharded database, data is horizontally partitioned into smaller subsets, called shards.
Each shard contains a portion of the dataset and is stored on a separate database server.

## Query execution plans and cost estimation

Query execution plans and cost estimation are essential concepts in database management systems (DBMS) that play a crucial role in optimizing query performance. They help the DBMS determine the most efficient way to retrieve data based on the query and underlying data distribution.

** Query Execution Plan:
A query execution plan outlines the sequence of steps or operations that the DBMS will perform to execute a given query. It represents the logical and physical operations involved in retrieving data from the database. The goal of generating a query execution plan is to find the most efficient way to retrieve the requested data while minimizing resource usage (such as CPU, memory, and disk I/O).

The query execution plan typically includes operations such as:

Scanning tables or indexes
Joining multiple tables
Filtering rows based on conditions
Sorting data
Aggregating data

** Cost Estimation:
Cost estimation involves determining the expected resource consumption and execution time for each potential query execution plan. The goal is to estimate the cost of different plan alternatives and choose the one that minimizes this cost.

CPU usage
Memory usage
Disk I/O operations
Network traffic

The cost estimation process helps the query optimizer choose the most efficient execution plan by comparing the estimated costs of different alternatives. The optimizer strives to find a plan with the least resource-intensive operations and, consequently, the fastest execution time.

## Query Optimization Techniques

Query optimization techniques are critical for improving the performance of database queries.Let's explore some key query optimization techniques, including join strategies and index selection:

**Join Strategies**:
Joins are operations that combine data from two or more tables based on related columns. Optimizing join operations is essential because they can be resource-intensive, especially when dealing with large datasets. Different join strategies can be employed to optimize query performance:

1. **Nested Loop Join**: This strategy involves looping through each row of one table (outer table) and matching it with rows from the other table (inner table) using an index or a scan. It is efficient when one table is significantly smaller than the other.
```
SELECT o.OrderID, o.OrderDate, o.TotalAmount, c.FirstName, c.LastName
FROM Orders o JOIN Customers c ON o.CustomerID = c.CustomerID;
```

2. **Hash Join**: Hash joins involve hashing join key values from both tables into a hash table and then matching hash values to join the tables. It can be efficient for large join operations.

3. **Sort-Merge Join**: This strategy sorts the rows of both tables based on the join key and then merges the sorted lists. It is effective when both tables are already sorted or when dealing with large data.

**Index Selection**:
Indexes improve query performance by allowing the DBMS to quickly locate rows that satisfy specific conditions. Properly selecting and using indexes is crucial for optimizing query execution:

1. **B-Tree Indexes**: These are commonly used for equality and range queries. They provide efficient data access and support ordered data retrieval.

2. **Bitmap Indexes**: Ideal for low-cardinality columns (columns with few distinct values), such as gender or boolean columns, where a bitmap represents the presence or absence of each value.

3. **Full-Text Indexes**: Used for text-based searches to quickly locate words or phrases in large text documents.

4. **Spatial Indexes**: These are specialized indexes for spatial data, enabling efficient spatial queries like finding nearby locations.

**Table Partitioning**:
Partitioning involves dividing large database tables into smaller, more manageable pieces (partitions) based on a partitioning key. This technique can improve query performance by reducing the amount of data that needs to be scanned.

**Materialized Views**:
Materialized views store the results of a query in a separate structure, allowing for faster data retrieval. They are particularly useful for complex queries that are frequently executed.

**Caching**:
Caching involves storing frequently accessed data in memory to avoid repetitive disk reads. This technique can greatly improve the response time of queries.

**Query Rewriting and Transformation**:
DBMSs may automatically rewrite or transform queries to more efficient forms while maintaining the same result.

**Statistics and Cost Estimation**:
Accurate statistics about data distribution and table sizes are crucial for the query optimizer to estimate the cost of different query plans accurately.

These techniques are just a few examples of the many strategies that database systems use to optimize query performance. The optimal choice of technique depends on factors such as the database schema, data distribution, query patterns, and hardware resources. It's important to monitor query performance and periodically review and adjust optimization strategies as the database and application evolve.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit 99b599d

Please sign in to comment.