From 8a427c0b0f005e2ae709d2b61edc32850867f41d Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Sun, 28 Sep 2025 13:00:49 +0800 Subject: [PATCH 01/84] This PR adds a new guide on mastering partitioned tables in TiDB - Query optimization with partition pruning - Performance comparison: Non-Partitioned vs Local Index vs Global Index - Data cleanup efficiency: TTL vs DROP PARTITION - Partition drop performance: Local Index vs Global Index - Strategies to mitigate write hotspot issues with hash/key partitioning - Partition management challenges and best practices - Avoiding read/write hotspots on new partitions - Using PRE_SPLIT_REGIONS, SHARD_ROW_ID_BITS, and region splitting - Converting between partitioned and non-partitioned tables - Batch DML, Pipeline DML, IMPORT INTO, and Online DDL efficiency comparison --- tidb_partitioned_tables_guide.md | 690 +++++++++++++++++++++++++++++++ 1 file changed, 690 insertions(+) create mode 100644 tidb_partitioned_tables_guide.md diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md new file mode 100644 index 0000000000000..6fc54fa10a3e3 --- /dev/null +++ b/tidb_partitioned_tables_guide.md @@ -0,0 +1,690 @@ +# Mastering TiDB Partitioned Tables + +## Introduction + +Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. + +A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like DROP PARTITION. This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. + +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on AUTO_INCREMENT-style IDs where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. + +While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. + +This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it aims to equip you with the knowledge to make informed decisions about when and how to adopt partitioning strategies in your TiDB environment. + +## Agenda + +- Improving query efficiency + - Partition pruning + - Query performance comparison: Non-Partitioned Table vs. Local Index vs. Global Index +- Facilitating bulk data deletion + - Data cleanup efficiency: TTL vs. Direct Partition Drop + - Partition drop efficiency: Local Index vs Global Index +- Mitigating write hotspot issues +- Partition management challenge + - How to avoid hotspots caused by new range partitions +- Converting between partitioned and non-partitioned tables + +By understanding these aspects, you can make informed decisions on whether and how to implement partitioning in your TiDB environment. + +> **Note:** If you're new to partitioned tables in TiDB, we recommend reviewing the [Partitioned Table User Guide](https://docs.pingcap.com/tidb/stable/partitioned-table) first to better understand key concepts like partition pruning, global vs. local indexes, and partition strategies. + +## Improving Query Efficiency + +### Partition Pruning + +**Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions may contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. + +#### Applicable Scenarios + +Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: + +- **Time-series data queries**: When data is partitioned by time ranges (e.g., daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. +- **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. +- **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. + +For more use cases, please refer to https://docs.pingcap.com/tidb/stable/partition-pruning/ + +### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index + +In TiDB, local indexes are the default indexing strategy for partitioned tables. Each partition maintains its own set of indexes, while a Global Index refers to an index that spans all partitions in a partitioned table. Unlike Local Indexes, which are partition-specific and stored separately within each partition, a Global Index maintains a single, unified index across the entire table. This index includes references to all rows, regardless of which partition they belong to, and thus can provide global queries and operations, such as joins or lookups, with faster access. + +#### What Did We Test + +We evaluated query performance across three table configurations in TiDB: +- Non-Partitioned Table +- Partitioned Table with Global Index +- Partitioned Table with Local Index + +#### Test Setup + +- The query **accesses data via a secondary index** and uses IN conditions across multiple values. +- The **partitioned table** had **366 partitions**, defined by **range partitioning on a datetime column**. +- Each matching key could return **multiple rows**, simulating a **high-volume OLTP-style query pattern**. +- We also evaluated the **impact of different partition counts** to understand how partition granularity influences latency and index performance. + +#### Schema + +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `yeardate` int NOT NULL, + PRIMARY KEY (`id`,`yeardate`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +AUTO_INCREMENT=1284046228560811404 +PARTITION BY RANGE (`yeardate`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +PARTITION `fa_2024003` VALUES LESS THAN (2024003), +... +... +PARTITION `fa_2024366` VALUES LESS THAN (2024366)) +``` + +#### SQL + +```sql +SELECT `fa`.* +FROM `fa` +WHERE `fa`.`sid` IN ( + 1696271179344, + 1696317134004, + 1696181972136, + ... + 1696159221765 +); +``` + +- Query filters on secondary index, but does **not include the partition key**. +- Causes **Local Index** to scan across all partitions due to lack of pruning. +- Table lookup tasks are significantly higher for partitioned tables. + +#### Findings + +Data came from a table with **366 range partitions** (e.g., by date). +- The **Average Query Time** was obtained from the statement_summary view. +- The query used a **secondary index** and returned **400 rows**. + +Metrics collected: +- **Average Query Time**: from statement_summary +- **Cop Tasks** (Index Scan + Table Lookup): from execution plan + +#### Test Results + +| Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | +|---|---|---|---|---|---| +| Non-Partitioned Table | 12.6 ms | 72 | 79 | 151 | Delivering the best performance with the fewest Cop tasks — ideal for most OLTP use cases. | +| Partitioned Table with Local Index | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries will scan all partitions. | +| Partitioned Table with Global Index | 14.8 ms | 69 | 383 | 452 | Improving index scan efficiency, but table lookups can still be expensive if many rows match. | + +#### Execution Plan Examples + +**Non-partitioned table** + +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +| IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task: {total_time: 3.34ms, fetch_handle: 3.34ms, build: 600ns, wait: 2.86µs}, table_task: {total_time: 7.55ms, num: 1, concurrency: 5}, next: {wait_index: 3.49ms, wait_table_lookup_build: 492.5µs, wait_table_lookup_resp: 7.05ms} | | 706.7 KB | N/A | +| ├─IndexRangeScan_5(Build) | 398.73 | 90633.86 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:3.16ms, loops:3, cop_task: {num: 72, max: 780.4µs, min: 394.2µs, avg: 566.7µs, p95: 748µs, max_proc_keys: 20, p95_proc_keys: 10, tot_proc: 3.66ms, tot_wait: 18.6ms, copr_cache_hit_ratio: 0.00, build_task_duration: 94µs, max_distsql_concurrency: 15}, rpc_info:{Cop:{num_rpc:72, total_time:40.1ms}}, tikv_task:{proc max:1ms, min:0s, avg: 27.8µs, p80:0s, p95:0s, iters:72, tasks:72}, scan_detail: {total_process_keys: 400, total_process_keys_size: 22800, total_keys: 480, get_snapshot_time: 17.7ms, rocksdb: {key_skipped_count: 400, block: {cache_hit_count: 160}}}, time_detail: {total_process_time: 3.66ms, total_wait_time: 18.6ms, total_kv_read_wall_time: 2ms, tikv_wall_time: 27.4ms} | range:[1696125963161,1696125963161], [1696126443462,1696126443462], ..., keep order:false | N/A | N/A | +| └─TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task: {num: 79, max: 4.98ms, min: 0s, avg: 514.9µs, p95: 3.75ms, max_proc_keys: 10, p95_proc_keys: 5, tot_proc: 15ms, tot_wait: 21.4ms, copr_cache_hit_ratio: 0.00, build_task_duration: 341.2µs, max_distsql_concurrency: 1, max_extra_concurrency: 7, store_batch_num: 62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail: {total_process_keys: 400, total_process_keys_size: 489856, total_keys: 800, get_snapshot_time: 20.8ms, rocksdb: {key_skipped_count: 400, block: {cache_hit_count: 1600}}}, time_detail: {total_process_time: 15ms, total_wait_time: 21.4ms, tikv_wall_time: 10.9ms} | keep order:false | N/A | N/A | +``` + +[Similar detailed execution plans for partitioned tables with global and local indexes would follow...] + +#### How to Create a Global Index on a Partitioned Table in TiDB + +**Option 1: Add via ALTER TABLE** + +```sql +ALTER TABLE +ADD UNIQUE INDEX (col1, col2) GLOBAL; +``` + +- Adds a global index to an existing partitioned table. +- GLOBAL must be explicitly specified. +- You can also use ADD INDEX for non-unique global indexes. + +**Option 2: Define Inline on Table Creation** + +```sql +CREATE TABLE t ( + id BIGINT NOT NULL, + col1 VARCHAR(50), + col2 VARCHAR(50), + -- other columns... + + UNIQUE GLOBAL INDEX idx_col1_col2 (col1, col2) +) +PARTITION BY RANGE (id) ( + PARTITION p0 VALUES LESS THAN (10000), + PARTITION p1 VALUES LESS THAN (20000), + PARTITION pMax VALUES LESS THAN MAXVALUE +); +``` + +#### Summary + +The performance overhead of partitioned tables in TiDB depends significantly on the number of partitions and the type of index used. + +- The more partitions you have, the more severe the potential performance degradation. +- With a smaller number of partitions, the impact may not be as noticeable, but it's still workload-dependent. +- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. +- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (i.e., the number of rows requiring table lookups). + +#### Recommendation + +- Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. +- If you must use partitioned tables, benchmark both global Index and local Index strategies under your workload. +- Use global indexes when query performance across partitions is critical. +- Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. + +## Facilitating Bulk Data Deletion + +### Data Cleanup Efficiency: TTL vs. Direct Partition Drop + +In TiDB, historical data cleanup can be handled either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. + +#### What's the difference? + +- **TTL**: Automatically removes data based on its age, but may be slower due to the need to scan and clean data over time. +- **Partition Drop**: Deletes an entire partition at once, making it much faster, especially when dealing with large datasets. + +#### What Did We Test + +To compare the performance of TTL and partition drop, we configured TTL to execute every 10 minutes and created a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches were tested under background write loads of 50 and 100 concurrent threads. We measured key metrics such as execution time, system resource utilization, and the total number of rows deleted. + +#### Findings + +**TTL Performance:** +- On a high-write table, TTL runs every 10 minutes. +- With 50 threads, each TTL job took 8–10 minutes, deleting 7–11 million rows. +- With 100 threads, it handled up to 20 million rows, but execution time increased to 15–30 minutes, with greater variance. +- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. + +**Partition Drop Performance:** +- DROP PARTITION removes an entire data segment instantly, with minimal resource usage. +- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. + +#### How to Use TTL and Partition Drop in TiDB + +In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at https://docs.pingcap.com/tidb/stable/time-to-live/. + +**TTL schema** + +```sql +CREATE TABLE `ad_cache` ( + `session` varchar(255) NOT NULL, + `ad_id` varbinary(255) NOT NULL, + `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, + `suffix` bigint(20) NOT NULL, + `expire_time` timestamp NULL DEFAULT NULL, + `data` mediumblob DEFAULT NULL, + `version` int(11) DEFAULT NULL, + `is_delete` tinyint(1) DEFAULT NULL, + PRIMARY KEY (`session`, `ad_id`, `create_time`, `suffix`) +) +ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' +TTL_JOB_INTERVAL='10m' +``` + +**Drop Partition (Range INTERVAL partitioning)** + +```sql +CREATE TABLE `ad_cache` ( + `session_id` varchar(255) NOT NULL, + `external_id` varbinary(255) NOT NULL, + `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, + `id_suffix` bigint(20) NOT NULL, + `expire_time` timestamp NULL DEFAULT NULL, + `cache_data` mediumblob DEFAULT NULL, + `data_version` int(11) DEFAULT NULL, + `is_deleted` tinyint(1) DEFAULT NULL, + PRIMARY KEY ( + `session_id`, `external_id`, + `create_time`, `id_suffix` + ) NONCLUSTERED +) +SHARD_ROW_ID_BITS=7 +PRE_SPLIT_REGIONS=2 +PARTITION BY RANGE COLUMNS (create_time) +INTERVAL (10 MINUTE) +FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') +... +LAST PARTITION LESS THAN ('2025-02-19 20:00:00') +``` + +It's required to run DDL alter table partition ... to change the FIRST PARTITION and LAST PARTITION periodically. These two DDL statements can drop the old partitions and create new ones. + +```sql +ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}") +ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}") +``` + +#### Recommendation + +For workloads with **large or time-based data cleanup**, prefer using **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. TTL is still useful for finer-grained or background cleanup but may not be optimal under high write pressure or when deleting large volumes of data quickly. + +### Partition Drop Efficiency: Local Index vs Global Index + +Partition table with Global Index requires synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION. In this section, the tests show that DROP PARTITION is much slower when using a **Global Index** compared to a **Local Index**. This should be considered when designing partitioned tables. + +#### What Did We Test + +We created a table with **366 partitions** and tested the DROP PARTITION performance using both **Global Index** and **Local Index**. The total number of rows was **1 billion**. + +| Index Type | Duration (drop partition) | +|---|---| +| Global Index | 1 min 16.02 s | +| Local Index | 0.52 s | + +#### Findings + +Dropping a partition on a table with a Global Index took **76 seconds**, while the same operation with a Local Index took only **0.52 seconds**. The reason is that Global Indexes span all partitions and require more complex updates, while Local Indexes are limited to individual partitions and are easier to handle. + +**Global Index** + +```sql +mysql> alter table A drop partition A_2024363; +Query OK, 0 rows affected (1 min 16.02 sec) +``` + +**Local Index** + +```sql +mysql> alter table A drop partition A_2024363; +Query OK, 0 rows affected (0.52 sec) +``` + +#### Recommendation + +When a partitioned table contains global indexes, performing certain DDL operations such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. + +If you need to drop partitions frequently and minimize the performance impact the the system, it's better to use **Local Indexes** for faster and more efficient operations. + +## Mitigating Write Hotspot Issues + +### Background + +In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. + +This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: + +- A single Region handling most of the write workload, while other Regions remain idle. +- Higher write latency and reduced throughput. +- Limited performance gains from scaling out TiKV nodes, as the bottleneck remains concentrated on one Region. + +**Partitioned tables** can help mitigate this problem. By applying **hash** or **key** partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. + +### How It Works + +TiDB stores table data in **Regions**, each covering a continuous range of row keys. + +When the primary key is AUTO_INCREMENT and the secondary indexes on datetime columns are monotonically increasing: + +**Without Partitioning:** +- New rows always have the highest key values and are inserted into the same "last Region." +- That Region is served by one TiKV node at a time, becoming a single write bottleneck. + +**With Hash/Key Partitioning:** +- The table and the secondary indexes are split into multiple partitions using a hash or key function on the primary key or indexed columns. +- Each partition has its own set of Regions, often distributed across different TiKV nodes. +- Inserts are spread across multiple Regions in parallel, improving load distribution and throughput. + +### Use Case + +If a table with an AUTO_INCREMENT primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write load more evenly. + +```sql +CREATE TABLE server_info ( + id bigint NOT NULL AUTO_INCREMENT, + serial_no varchar(100) DEFAULT NULL, + device_name varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL, + device_type varchar(50) DEFAULT NULL, + modified_ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, + PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */, + KEY idx_serial_no (serial_no), + KEY idx_modified_ts (modified_ts) +) /*T![auto_id_cache] AUTO_ID_CACHE=1 */ +PARTITION BY KEY (id) PARTITIONS 16; +``` + +### Pros + +- **Balanced Write Load** — Hotspots are spread across multiple partitions, reducing contention and improving insert performance. +- **Query Optimization via Partition Pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. + +### Cons + +**Potential Query Performance Drop Without Partition Pruning** + +When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: + +```sql +select * from server_info where `serial_no` = ? +``` + +**Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: + +```sql +ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; +``` + +## Partition Management Challenge + +### How to Avoid Hotspots Caused by New Range Partitions + +#### Overview + +New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. + +#### Common Hotspot Scenarios + +**Read Hotspot** + +When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. + +**Root Cause:** +By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. + +**Impact:** +When a query does **not filter by partition key**, TiDB will **scan all partitions** (as seen in the execution plan partition:all). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. + +**Write Hotspot** + +When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: + +**Root Cause:** +In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. + +However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. + +**Impact:** +This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. + +#### Solutions + +**1. NONCLUSTERED Partitioned Table** + +**Pros:** +- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with SHARD_ROW_ID_BITS and [PRE_SPLIT_REGIONS](https://docs.pingcap.com/tidb/stable/sql-statement-split-region/#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. +- Lower operational overhead. + +**Cons:** +- Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. + +**Recommendation:** +- Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. + +**Best Practices** + +Create a partitioned table with SHARD_ROW_ID_BITS and PRE_SPLIT_REGIONS to pre-split table regions. The value of PRE_SPLIT_REGIONS must be less than or equal to that of SHARD_ROW_ID_BITS. The number of pre-split Regions for each partition is 2^(PRE_SPLIT_REGIONS). + +```sql +CREATE TABLE employees ( + id INT NOT NULL, + fname VARCHAR(30), + lname VARCHAR(30), + hired DATE NOT NULL DEFAULT '1970-01-01', + separated DATE DEFAULT '9999-12-31', + job_code INT, + store_id INT, + PRIMARY KEY (`id`,`hired`) NONCLUSTERED, + KEY `idx_employees_on_store_id` (`store_id`) +)SHARD_ROW_ID_BITS = 2 PRE_SPLIT_REGIONS=2 +PARTITION BY RANGE ( YEAR(hired) ) ( + PARTITION p0 VALUES LESS THAN (1991), + PARTITION p1 VALUES LESS THAN (1996), + PARTITION p2 VALUES LESS THAN (2001), + PARTITION p3 VALUES LESS THAN (2006) +); +``` + +Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. + +```sql +-- table +ALTER TABLE employees ATTRIBUTES 'merge_option=deny'; +-- partition +ALTER TABLE employees PARTITION `p3` ATTRIBUTES 'merge_option=deny'; +``` + +**Determining split boundaries based on existing business data** + +To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. + +**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: + +```sql +SELECT MIN(id), MAX(id) FROM employees; +``` + +- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. +- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. +- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. + +**Pre-split and scatter regions** + +A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. + +**Splitting regions for the primary key of all partitions** + +To split regions for the primary key of all partitions in a partitioned table, you can use a command like: + +```sql +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; +``` + +This example will split each partition's primary key range into `` regions between the specified boundary values. + +**Splitting Regions for the secondary index of all partitions.** + +```sql +SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +``` + +**(Optional) When adding a new partition, you MUST manually split regions for its primary key and indices.** + +```sql +ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); + +SHOW TABLE employees PARTITION (p4) regions; + +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "2006-01-01") AND (100000, "2011-01-01") REGIONS ; + +SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; + +SHOW TABLE employees PARTITION (p4) regions; +``` + +**2. CLUSTERED Partitioned Table** + +**Pros:** +- Queries using **Point Get** or **Table Range Scan** do **not** need additional lookups, resulting in better **read performance**. + +**Cons:** +- **Manual region splitting** is required when creating new partitions, increasing operational complexity. + +**Recommendation:** +- Ideal when low-latency point queries are important and operational resources are available to manage region splitting. + +**Best Practices** + +Create a CLUSTERED partitioned table. + +```sql +CREATE TABLE employees2 ( + id INT NOT NULL, + fname VARCHAR(30), + lname VARCHAR(30), + hired DATE NOT NULL DEFAULT '1970-01-01', + separated DATE DEFAULT '9999-12-31', + job_code INT, + store_id INT, + PRIMARY KEY (`id`,`hired`) CLUSTERED, + KEY `idx_employees2_on_store_id` (`store_id`) +) +PARTITION BY RANGE ( YEAR(hired) ) ( + PARTITION p0 VALUES LESS THAN (1991), + PARTITION p1 VALUES LESS THAN (1996), + PARTITION p2 VALUES LESS THAN (2001), + PARTITION p3 VALUES LESS THAN (2006) +); +``` + +Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. + +```sql +ALTER TABLE employees2 ATTRIBUTES 'merge_option=deny'; +``` + +**Determining split boundaries based on existing business data** + +To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. + +**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: + +```sql +SELECT MIN(id), MAX(id) FROM employees; +``` + +- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. +- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. +- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. + +**Pre-split and scatter regions** + +A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. + +**Splitting regions for all partitions** + +```sql +SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; +``` + +**Splitting regions for the secondary index of all partitions.** + +```sql +SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +``` + +**(Optional) When adding a new partition, you MUST manually split regions for the specific partition and its indices.** + +```sql +ALTER TABLE employees2 ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); + +show table employees2 PARTITION (p4) regions; + +SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; + +SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; + +show table employees2 PARTITION (p4) regions; +``` + +**3. CLUSTERED Non-partitioned Table** + +**Pros:** +- **No hotspot risk from new partitions**. +- Provides **good read performance** for point and range queries. + +**Cons:** +- **Cannot use DROP PARTITION** to clean up large volumes of old data. + +**Recommendation:** +- Best suited for use cases that require stable performance and do not benefit from partition-based data management. + +### Summary Table + +| Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | +|---|---|---|---|---|---| +| NONCLUSTERED Partitioned | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | +| CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | +| CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | + +## Converting Between Partitioned and Non-Partitioned Tables + +When working with large tables (e.g., 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: + +1. Batch DML: `INSERT INTO ... SELECT ...` +2. Pipeline DML: `INSERT INTO ... SELECT ...` +3. Import into: `IMPORT INTO ... FROM SELECT ...` +4. Online DDL: Direct schema transformation via `ALTER TABLE` + +This section compares the efficiency and implications of both methods in both directions of conversion, and provides best practice recommendations. + +### Method 1: Batch DML INSERT INTO ... SELECT ... + +**By Default** + +```sql +SET tidb_mem_quota_query = 0; +INSERT INTO fa_new SELECT * FROM fa; +-- 120 million rows copied in 1h 52m 47s +``` + +### Method 2: Pipeline DML INSERT INTO ... SELECT... + +```sql +SET tidb_dml_type = "bulk"; +SET tidb_mem_quota_query = 0; +SET tidb_enable_mutation_checker = OFF; +INSERT INTO fa_new SELECT * FROM fa; +-- 120 million rows copied in 58m 42s +``` + +### Method 3: IMPORT INTO ... FROM SELECT ... + +```sql +mysql> import into fa_new from select * from fa with thread=32,disable_precheck; +Query OK, 120000000 rows affected, 1 warning (16 min 49.90 sec) +Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 +``` + +### Method 4: Online DDL + +**From partition table to non-partitioned table** + +```sql +SET @@global.tidb_ddl_reorg_worker_cnt = 16; +SET @@global.tidb_ddl_reorg_batch_size = 4096; + +mysql> alter table fa REMOVE PARTITIONING; +-- real 170m12.024s (≈ 2h 50m) +``` + +**From non-partition table to partitioned table** + +```sql +SET @@global.tidb_ddl_reorg_worker_cnt = 16; +SET @@global.tidb_ddl_reorg_batch_size = 4096; +ALTER TABLE fa PARTITION BY RANGE (`yearweek`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +... +PARTITION `fa_2024365` VALUES LESS THAN (2024365), +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); + +Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) +``` + +### Findings + +| Method | Time Taken | +|---|---| +| Method 1: Batch DML INSERT INTO ... SELECT | 1h 52m 47s | +| Method 2: Pipeline DML: INSERT INTO ... SELECT ... | 58m 42s | +| Method 3: IMPORT INTO ... FROM SELECT ... | 16m 59s | +| Method 4: Online DDL (From partition table to non-partitioned table) | 2h 50m | +| Method 4: Online DDL (From non-partition table to partitioned table) | 2h 31m | + +### Recommendation + +TiDB offers two approaches for converting tables between partitioned and non-partitioned states: + +- Choose the offline method when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file From aa9e29930f22c2a3c73e981d74ac5988d952fed3 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Sun, 28 Sep 2025 13:13:50 +0800 Subject: [PATCH 02/84] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 6fc54fa10a3e3..88eb0d0df2d0d 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -687,4 +687,4 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) TiDB offers two approaches for converting tables between partitioned and non-partitioned states: -- Choose the offline method when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file +- Choose an offline method like `IMPORT INTO` when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file From d11ba34d53c8b4aa79b0df5452e3e8afb357ad69 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Sun, 28 Sep 2025 13:14:40 +0800 Subject: [PATCH 03/84] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 88eb0d0df2d0d..b48633c114643 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -305,7 +305,7 @@ Query OK, 0 rows affected (0.52 sec) When a partitioned table contains global indexes, performing certain DDL operations such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. -If you need to drop partitions frequently and minimize the performance impact the the system, it's better to use **Local Indexes** for faster and more efficient operations. +If you need to drop partitions frequently and minimize the performance impact on the system, it's better to use **local indexes** for faster and more efficient operations. ## Mitigating Write Hotspot Issues From 2b37ec83fe35f137121e0fc7fe0f576f09e80f38 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Sun, 28 Sep 2025 13:14:51 +0800 Subject: [PATCH 04/84] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index b48633c114643..55558a0b0800f 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -179,7 +179,7 @@ The performance overhead of partitioned tables in TiDB depends significantly on #### Recommendation - Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. -- If you must use partitioned tables, benchmark both global Index and local Index strategies under your workload. +- If you must use partitioned tables, benchmark both global index and local index strategies under your workload. - Use global indexes when query performance across partitions is critical. - Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. From ae8555d8f8b049bb757a0a922fdd939ad9721b1b Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Sun, 28 Sep 2025 13:15:04 +0800 Subject: [PATCH 05/84] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 55558a0b0800f..2e957f04b6e35 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -550,7 +550,7 @@ To avoid hotspots when a new table or partition is created, it is often benefici **Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: ```sql -SELECT MIN(id), MAX(id) FROM employees; +SELECT MIN(id), MAX(id) FROM employees2; ``` - If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. From a1563b98e2f2db762a43a0599a3d742b9f049aea Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Sun, 28 Sep 2025 16:29:25 +0800 Subject: [PATCH 06/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 6fc54fa10a3e3..61b6592e97b24 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -128,12 +128,30 @@ Metrics collected: **Non-partitioned table** ```yaml -| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | -| IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task: {total_time: 3.34ms, fetch_handle: 3.34ms, build: 600ns, wait: 2.86µs}, table_task: {total_time: 7.55ms, num: 1, concurrency: 5}, next: {wait_index: 3.49ms, wait_table_lookup_build: 492.5µs, wait_table_lookup_resp: 7.05ms} | | 706.7 KB | N/A | -| ├─IndexRangeScan_5(Build) | 398.73 | 90633.86 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:3.16ms, loops:3, cop_task: {num: 72, max: 780.4µs, min: 394.2µs, avg: 566.7µs, p95: 748µs, max_proc_keys: 20, p95_proc_keys: 10, tot_proc: 3.66ms, tot_wait: 18.6ms, copr_cache_hit_ratio: 0.00, build_task_duration: 94µs, max_distsql_concurrency: 15}, rpc_info:{Cop:{num_rpc:72, total_time:40.1ms}}, tikv_task:{proc max:1ms, min:0s, avg: 27.8µs, p80:0s, p95:0s, iters:72, tasks:72}, scan_detail: {total_process_keys: 400, total_process_keys_size: 22800, total_keys: 480, get_snapshot_time: 17.7ms, rocksdb: {key_skipped_count: 400, block: {cache_hit_count: 160}}}, time_detail: {total_process_time: 3.66ms, total_wait_time: 18.6ms, total_kv_read_wall_time: 2ms, tikv_wall_time: 27.4ms} | range:[1696125963161,1696125963161], [1696126443462,1696126443462], ..., keep order:false | N/A | N/A | -| └─TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task: {num: 79, max: 4.98ms, min: 0s, avg: 514.9µs, p95: 3.75ms, max_proc_keys: 10, p95_proc_keys: 5, tot_proc: 15ms, tot_wait: 21.4ms, copr_cache_hit_ratio: 0.00, build_task_duration: 341.2µs, max_distsql_concurrency: 1, max_extra_concurrency: 7, store_batch_num: 62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail: {total_process_keys: 400, total_process_keys_size: 489856, total_keys: 800, get_snapshot_time: 20.8ms, rocksdb: {key_skipped_count: 400, block: {cache_hit_count: 1600}}}, time_detail: {total_process_time: 15ms, total_wait_time: 21.4ms, tikv_wall_time: 10.9ms} | keep order:false | N/A | N/A | +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|---------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|----------|------| +| IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task:{total_time:3.34ms, fetch_handle:3.34ms, build:600ns, wait:2.86µs}, table_task:{total_time:7.55ms, num:1, concurrency:5}, next:{wait_index:3.49ms, wait_table_lookup_build:492.5µs, wait_table_lookup_resp:7.05ms} | | 706.7 KB | N/A | +| IndexRangeScan_5(Build) | 398.73 | 90633.86 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:3.16ms, loops:3, cop_task:{num:72, max:780.4µs, min:394.2µs, avg:566.7µs, p95:748µs, max_proc_keys:20, p95_proc_keys:10, tot_proc:3.66ms, tot_wait:18.6ms, copr_cache_hit_ratio:0.00, build_task_duration:94µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:72, total_time:40.1ms}}, tikv_task:{proc max:1ms, min:0s, avg:27.8µs, p80:0s, p95:0s, iters:72, tasks:72}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:480, get_snapshot_time:17.7ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:160}}}, time_detail:{total_process_time:3.66ms, total_wait_time:18.6ms, total_kv_read_wall_time:2ms, tikv_wall_time:27.4ms} | range:[1696125963161,1696125963161], …, [1696317134004,1696317134004], keep order:false | N/A | N/A | +| TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | ``` +**Partition table with global index** +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|------------------------|---------|-----------|---------|-----------|-------------------------------------------------|----------------|---------------|----------|------| +| IndexLookUp_8 | 398.73 | 786959.21 | 400 | root | partition:all | time:12.8ms, loops:2, index_task:{total_time:2.71ms, fetch_handle:2.71ms, build:528ns, wait:3.23µs}, table_task:{total_time:9.03ms, num:1, concurrency:5}, next:{wait_index:3.27ms, wait_table_lookup_build:1.49ms, wait_table_lookup_resp:7.53ms} | | 693.9 KB | N/A | +| IndexRangeScan_5(Build)| 398.73 | 102593.43 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid_global(sid, id)| time:2.49ms, loops:3, cop_task:{num:69, max:997µs, min:213.8µs, avg:469.8µs, p95:986.6µs, max_proc_keys:15, p95_proc_keys:10, tot_proc:13.4ms, tot_wait:1.52ms, copr_cache_hit_ratio:0.00, build_task_duration:498.4µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:69, total_time:31.8ms}}, tikv_task:{proc max:1ms, min:0s, avg:101.4µs, p80:0s, p95:1ms, iters:69, tasks:69}, scan_detail:{total_process_keys:400, total_process_keys_size:31200, total_keys:480, get_snapshot_time:679.9µs, rocksdb:{key_skipped_count:400, block:{cache_hit_count:189, read_count:54, read_byte:347.7 KB, read_time:6.17ms}}}, time_detail:{total_process_time:13.4ms, total_wait_time:1.52ms, total_kv_read_wall_time:7ms, tikv_wall_time:19.3ms} | range:[1696125963161,1696125963161], …, keep order:false, stats:partial[...] | N/A | N/A | +| TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | +``` + +**Partition table with local index** +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|---------|-------| +| IndexLookUp_7 | 398.73 | 784450.63 | 400 | root | partition:all | time:290.8ms, loops:2, index_task:{total_time:103.6ms, fetch_handle:7.74ms, build:133.2µs, wait:95.7ms}, table_task:{total_time:551.1ms, num:217, concurrency:5}, next:{wait_index:179.6ms, wait_table_lookup_build:391µs, wait_table_lookup_resp:109.5ms} | | 4.30 MB | N/A | +| IndexRangeScan_5(Build)| 398.73 | 90633.73 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:10.8ms, loops:800, cop_task:{num:600, max:65.6ms, min:1.02ms, avg:22.2ms, p95:45.1ms, max_proc_keys:5, p95_proc_keys:3, tot_proc:6.81s, tot_wait:4.77s, copr_cache_hit_ratio:0.00, build_task_duration:172.8ms, max_distsql_concurrency:3}, rpc_info:{Cop:{num_rpc:600, total_time:13.3s}}, tikv_task:{proc max:54ms, min:0s, avg:13.9ms, p80:20ms, p95:30ms, iters:600, tasks:600}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:29680, get_snapshot_time:2.47s, rocksdb:{key_skipped_count:400, block:{cache_hit_count:117580, read_count:29437, read_byte:104.9 MB, read_time:3.24s}}}, time_detail:{total_process_time:6.81s, total_suspend_time:1.51s, total_wait_time:4.77s, total_kv_read_wall_time:8.31s, tikv_wall_time:13.2s}} | range:[1696125963161,...,1696317134004], keep order:false, stats:partial[...] | N/A | N/A | +| TableRowIDScan_6(Probe)| 398.73 | 165221.49 | 400 | cop[tikv] | table:fa | time:514ms, loops:434, cop_task:{num:375, max:31.6ms, min:0s, avg:1.33ms, p95:1.67ms, max_proc_keys:2, p95_proc_keys:2, tot_proc:220.7ms, tot_wait:242.2ms, copr_cache_hit_ratio:0.00, build_task_duration:27.8ms, max_distsql_concurrency:1, max_extra_concurrency:1, store_batch_num:69}, rpc_info:{Cop:{num_rpc:306, total_time:495.5ms}}, tikv_task:{proc max:6ms, min:0s, avg:597.3µs, p80:1ms, p95:1ms, iters:375, tasks:375}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:158.3ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:3197, read_count:803, read_byte:10.2 MB, read_time:113.5ms}}}, time_detail:{total_process_time:220.7ms, total_suspend_time:5.39ms, total_wait_time:242.2ms, total_kv_read_wall_time:224ms, tikv_wall_time:430.5ms}} | keep order:false, stats:partial[...] | N/A | N/A | +``` [Similar detailed execution plans for partitioned tables with global and local indexes would follow...] #### How to Create a Global Index on a Partitioned Table in TiDB @@ -550,7 +568,7 @@ To avoid hotspots when a new table or partition is created, it is often benefici **Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: ```sql -SELECT MIN(id), MAX(id) FROM employees; +SELECT MIN(id), MAX(id) FROM employees2; ``` - If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. From 09c68e6cd2f74165a53fa2ce165274294e3fe1ab Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 10:34:07 +0800 Subject: [PATCH 07/84] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 5912f031ed923..47c8c509855d3 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -634,7 +634,7 @@ When working with large tables (e.g., 120 million rows), transforming between pa 3. Import into: `IMPORT INTO ... FROM SELECT ...` 4. Online DDL: Direct schema transformation via `ALTER TABLE` -This section compares the efficiency and implications of both methods in both directions of conversion, and provides best practice recommendations. +This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. ### Method 1: Batch DML INSERT INTO ... SELECT ... From 854262b9e8416ee09861e54f7b45953f2b09c67c Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 10:34:20 +0800 Subject: [PATCH 08/84] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 47c8c509855d3..c50f74e21902a 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -29,7 +29,7 @@ By understanding these aspects, you can make informed decisions on whether and h > **Note:** If you're new to partitioned tables in TiDB, we recommend reviewing the [Partitioned Table User Guide](https://docs.pingcap.com/tidb/stable/partitioned-table) first to better understand key concepts like partition pruning, global vs. local indexes, and partition strategies. -## Improving Query Efficiency +## Improving query efficiency ### Partition Pruning From 4b23f1149da381052667cfc1ef6ac49c8eb20ce2 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 10:35:44 +0800 Subject: [PATCH 09/84] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index c50f74e21902a..459cb4283b6bb 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -646,7 +646,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 1h 52m 47s ``` -### Method 2: Pipeline DML INSERT INTO ... SELECT... +### Method 2: Pipeline DML INSERT INTO ... SELECT ... ```sql SET tidb_dml_type = "bulk"; From cb32c4eeb8150ac70f8ed27ae16857cb0b732057 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 10:36:22 +0800 Subject: [PATCH 10/84] Update tidb_partitioned_tables_guide.md Co-authored-by: Lilian Lee --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 459cb4283b6bb..9ca0ce9b50555 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -1,4 +1,4 @@ -# Mastering TiDB Partitioned Tables +# Best Practices for Using TiDB Partitioned Tables ## Introduction From d04daf5cf2b2d5352a2d42f11e5987561a74d652 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 20:13:51 +0800 Subject: [PATCH 11/84] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 9ca0ce9b50555..2b660bc005bca 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -43,7 +43,7 @@ Partition pruning is most beneficial in scenarios where query predicates match t - **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. - **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. -For more use cases, please refer to https://docs.pingcap.com/tidb/stable/partition-pruning/ +For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). ### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index From e1c059a516483d2f5b67f7c146436bebd737726e Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 20:14:04 +0800 Subject: [PATCH 12/84] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 2b660bc005bca..3a1764541ec83 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -631,7 +631,7 @@ When working with large tables (e.g., 120 million rows), transforming between pa 1. Batch DML: `INSERT INTO ... SELECT ...` 2. Pipeline DML: `INSERT INTO ... SELECT ...` -3. Import into: `IMPORT INTO ... FROM SELECT ...` +3. `IMPORT INTO`: `IMPORT INTO ... FROM SELECT ...` 4. Online DDL: Direct schema transformation via `ALTER TABLE` This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. From ea9c8f047bb4567091d17f2a64d487bfb77d7327 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 30 Sep 2025 11:37:20 +0800 Subject: [PATCH 13/84] Update tidb_partitioned_tables_guide.md Co-authored-by: Mattias Jonsson --- tidb_partitioned_tables_guide.md | 1 - 1 file changed, 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 3a1764541ec83..8b1b4b7cc81aa 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -77,7 +77,6 @@ CREATE TABLE `fa` ( KEY `index_fa_on_account_id` (`account_id`), KEY `index_fa_on_user_id` (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -AUTO_INCREMENT=1284046228560811404 PARTITION BY RANGE (`yeardate`) (PARTITION `fa_2024001` VALUES LESS THAN (2024001), PARTITION `fa_2024002` VALUES LESS THAN (2024002), From 84f2eafa4a1cc0ae7ded198bde3a392ff8fad6e8 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 30 Sep 2025 11:39:52 +0800 Subject: [PATCH 14/84] Update tidb_partitioned_tables_guide.md Co-authored-by: Mattias Jonsson --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 8b1b4b7cc81aa..c26fa1ad297f9 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -626,7 +626,7 @@ show table employees2 PARTITION (p4) regions; ## Converting Between Partitioned and Non-Partitioned Tables -When working with large tables (e.g., 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: +When working with large tables (e.g. in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: 1. Batch DML: `INSERT INTO ... SELECT ...` 2. Pipeline DML: `INSERT INTO ... SELECT ...` From 3061517b556dd3034262a23998153e4798994268 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Mon, 13 Oct 2025 15:55:09 +0800 Subject: [PATCH 15/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 54 ++++++++++++++++++++++++++++---- 1 file changed, 48 insertions(+), 6 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index c26fa1ad297f9..094a81794a6e8 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -47,7 +47,7 @@ For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable ### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index -In TiDB, local indexes are the default indexing strategy for partitioned tables. Each partition maintains its own set of indexes, while a Global Index refers to an index that spans all partitions in a partitioned table. Unlike Local Indexes, which are partition-specific and stored separately within each partition, a Global Index maintains a single, unified index across the entire table. This index includes references to all rows, regardless of which partition they belong to, and thus can provide global queries and operations, such as joins or lookups, with faster access. +In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. #### What Did We Test @@ -71,13 +71,13 @@ CREATE TABLE `fa` ( `account_id` bigint(20) NOT NULL, `sid` bigint(20) DEFAULT NULL, `user_id` bigint NOT NULL, - `yeardate` int NOT NULL, - PRIMARY KEY (`id`,`yeardate`) /*T![clustered_index] CLUSTERED */, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, KEY `index_fa_on_sid` (`sid`), KEY `index_fa_on_account_id` (`account_id`), KEY `index_fa_on_user_id` (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -PARTITION BY RANGE (`yeardate`) +PARTITION BY RANGE (`date`) (PARTITION `fa_2024001` VALUES LESS THAN (2024001), PARTITION `fa_2024002` VALUES LESS THAN (2024002), PARTITION `fa_2024003` VALUES LESS THAN (2024003), @@ -637,7 +637,48 @@ This section compares the efficiency and implications of these methods in both d ### Method 1: Batch DML INSERT INTO ... SELECT ... -**By Default** +#### Table Schema: `fa` +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +PARTITION BY RANGE (`date`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +PARTITION `fa_2024003` VALUES LESS THAN (2024003), +... +... +PARTITION `fa_2024366` VALUES LESS THAN (2024366)) +``` + + +#### Table Schema: `fa_new` +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +); +``` + +#### Description +Supports bi-directional import, suitable for very large datasets. +### Method 1: By Default ```sql SET tidb_mem_quota_query = 0; @@ -645,6 +686,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 1h 52m 47s ``` + ### Method 2: Pipeline DML INSERT INTO ... SELECT ... ```sql @@ -680,7 +722,7 @@ mysql> alter table fa REMOVE PARTITIONING; ```sql SET @@global.tidb_ddl_reorg_worker_cnt = 16; SET @@global.tidb_ddl_reorg_batch_size = 4096; -ALTER TABLE fa PARTITION BY RANGE (`yearweek`) +ALTER TABLE fa PARTITION BY RANGE (`date`) (PARTITION `fa_2024001` VALUES LESS THAN (2024001), PARTITION `fa_2024002` VALUES LESS THAN (2024002), ... From 1b0893cbf21f963ef823860e1e3c1e237fca970f Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 15:56:16 +0800 Subject: [PATCH 16/84] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 094a81794a6e8..37e3a0aa3a513 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -4,7 +4,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. -A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like DROP PARTITION. This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. +A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on AUTO_INCREMENT-style IDs where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. From b3f3cf758b0abaf2e6ec5676c71e1346d2cc9623 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 15:57:40 +0800 Subject: [PATCH 17/84] Update tidb_partitioned_tables_guide.md Co-authored-by: Hangjie Mo --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 37e3a0aa3a513..78f444fc5ffd2 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -218,7 +218,7 @@ To compare the performance of TTL and partition drop, we configured TTL to execu #### Findings **TTL Performance:** -- On a high-write table, TTL runs every 10 minutes. +- On a write-heavy table, TTL runs every 10 minutes. - With 50 threads, each TTL job took 8–10 minutes, deleting 7–11 million rows. - With 100 threads, it handled up to 20 million rows, but execution time increased to 15–30 minutes, with greater variance. - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. From 8a2bf1183165e99d57d03017a67669807335ebef Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 15:57:56 +0800 Subject: [PATCH 18/84] Update tidb_partitioned_tables_guide.md Co-authored-by: Hangjie Mo --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 78f444fc5ffd2..034bcbbf4abd6 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -219,7 +219,7 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **TTL Performance:** - On a write-heavy table, TTL runs every 10 minutes. -- With 50 threads, each TTL job took 8–10 minutes, deleting 7–11 million rows. +- With 50 threads, each TTL job took 8–10 minutes, deleted 7–11 million rows. - With 100 threads, it handled up to 20 million rows, but execution time increased to 15–30 minutes, with greater variance. - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. From 7bf8e9d01bb94bc884bafee2d6952d977d5623d3 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 16:50:30 +0800 Subject: [PATCH 19/84] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 034bcbbf4abd6..940e8fcdf5f06 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -746,4 +746,4 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) TiDB offers two approaches for converting tables between partitioned and non-partitioned states: -- Choose an offline method like `IMPORT INTO` when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file +In this experiment, the table structures have been anonymized. For more detailed information on the usage of [TTL (Time To Live)](/time-to-live.md). \ No newline at end of file From 80289fe5e8374066d7326b5eb1811349c2875a85 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 16:50:51 +0800 Subject: [PATCH 20/84] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 940e8fcdf5f06..e033ede193faf 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -6,7 +6,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on AUTO_INCREMENT-style IDs where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto_increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. From 5700b84964177dd745e04a094cac194d9420ff68 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 16:51:19 +0800 Subject: [PATCH 21/84] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index e033ede193faf..9f0f4fd21162b 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -307,16 +307,7 @@ Dropping a partition on a table with a Global Index took **76 seconds**, while t **Global Index** ```sql -mysql> alter table A drop partition A_2024363; -Query OK, 0 rows affected (1 min 16.02 sec) -``` - -**Local Index** - -```sql -mysql> alter table A drop partition A_2024363; -Query OK, 0 rows affected (0.52 sec) -``` +ALTER TABLE A DROP PARTITION A_2024363; #### Recommendation From 82b27d4aeacd3f440d439d4494622d7e9620bf2a Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:21:47 +0800 Subject: [PATCH 22/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 65 ++++++++++++++++---------------- 1 file changed, 32 insertions(+), 33 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 9f0f4fd21162b..e623f6dd60682 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -27,7 +27,7 @@ This document examines partitioned tables in TiDB from multiple angles, includin By understanding these aspects, you can make informed decisions on whether and how to implement partitioning in your TiDB environment. -> **Note:** If you're new to partitioned tables in TiDB, we recommend reviewing the [Partitioned Table User Guide](https://docs.pingcap.com/tidb/stable/partitioned-table) first to better understand key concepts like partition pruning, global vs. local indexes, and partition strategies. +> **Note:** If you're new to partitioned tables in TiDB, we recommend reviewing the [Partitioned Table User Guide](/partitioned-table.md) first to better understand key concepts like partition pruning, global vs. local indexes, and partition strategies. ## Improving query efficiency @@ -83,7 +83,7 @@ PARTITION `fa_2024002` VALUES LESS THAN (2024002), PARTITION `fa_2024003` VALUES LESS THAN (2024003), ... ... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)) +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); ``` #### SQL @@ -247,7 +247,7 @@ CREATE TABLE `ad_cache` ( ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' -TTL_JOB_INTERVAL='10m' +TTL_JOB_INTERVAL='10m'; ``` **Drop Partition (Range INTERVAL partitioning)** @@ -273,14 +273,14 @@ PARTITION BY RANGE COLUMNS (create_time) INTERVAL (10 MINUTE) FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') ... -LAST PARTITION LESS THAN ('2025-02-19 20:00:00') +LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); ``` It's required to run DDL alter table partition ... to change the FIRST PARTITION and LAST PARTITION periodically. These two DDL statements can drop the old partitions and create new ones. ```sql -ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}") -ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}") +ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); +ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); ``` #### Recommendation @@ -308,6 +308,7 @@ Dropping a partition on a table with a Global Index took **76 seconds**, while t ```sql ALTER TABLE A DROP PARTITION A_2024363; +``` #### Recommendation @@ -374,7 +375,7 @@ PARTITION BY KEY (id) PARTITIONS 16; When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: ```sql -select * from server_info where `serial_no` = ? +select * from server_info where `serial_no` = ?; ``` **Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: @@ -415,6 +416,15 @@ However, if the initial write traffic to this new partition is **very high**, th **Impact:** This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. + +### Summary Table + +| Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | +|---|---|---|---|---|---| +| NONCLUSTERED Partitioned | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | +| CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | +| CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | + #### Solutions **1. NONCLUSTERED Partitioned Table** @@ -485,15 +495,15 @@ A common practice is to split the number of regions to **match** the number of T To split regions for the primary key of all partitions in a partitioned table, you can use a command like: ```sql -SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; ``` -This example will split each partition's primary key range into `` regions between the specified boundary values. +This example will split each partition's primary key range into `` regions between the specified boundary values. **Splitting Regions for the secondary index of all partitions.** ```sql -SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; ``` **(Optional) When adding a new partition, you MUST manually split regions for its primary key and indices.** @@ -503,9 +513,9 @@ ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); SHOW TABLE employees PARTITION (p4) regions; -SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "2006-01-01") AND (100000, "2011-01-01") REGIONS ; +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "2006-01-01") AND (100000, "2011-01-01") REGIONS ; -SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; SHOW TABLE employees PARTITION (p4) regions; ``` @@ -572,13 +582,13 @@ A common practice is to split the number of regions to **match** the number of T **Splitting regions for all partitions** ```sql -SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; +SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; ``` **Splitting regions for the secondary index of all partitions.** ```sql -SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; ``` **(Optional) When adding a new partition, you MUST manually split regions for the specific partition and its indices.** @@ -588,9 +598,9 @@ ALTER TABLE employees2 ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); show table employees2 PARTITION (p4) regions; -SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; +SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; -SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; show table employees2 PARTITION (p4) regions; ``` @@ -607,13 +617,6 @@ show table employees2 PARTITION (p4) regions; **Recommendation:** - Best suited for use cases that require stable performance and do not benefit from partition-based data management. -### Summary Table - -| Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | -|---|---|---|---|---|---| -| NONCLUSTERED Partitioned | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | -| CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | -| CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | ## Converting Between Partitioned and Non-Partitioned Tables @@ -626,8 +629,6 @@ When working with large tables (e.g. in this example 120 million rows), transfor This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. -### Method 1: Batch DML INSERT INTO ... SELECT ... - #### Table Schema: `fa` ```sql CREATE TABLE `fa` ( @@ -647,7 +648,7 @@ PARTITION `fa_2024002` VALUES LESS THAN (2024002), PARTITION `fa_2024003` VALUES LESS THAN (2024003), ... ... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)) +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); ``` @@ -663,13 +664,12 @@ CREATE TABLE `fa` ( KEY `index_fa_on_sid` (`sid`), KEY `index_fa_on_account_id` (`account_id`), KEY `index_fa_on_user_id` (`user_id`) -) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -); +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; ``` #### Description -Supports bi-directional import, suitable for very large datasets. -### Method 1: By Default +This example shows converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. +### Method 1: Batch DML INSERT INTO ... SELECT ... ```sql SET tidb_mem_quota_query = 0; @@ -703,8 +703,7 @@ Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 ```sql SET @@global.tidb_ddl_reorg_worker_cnt = 16; SET @@global.tidb_ddl_reorg_batch_size = 4096; - -mysql> alter table fa REMOVE PARTITIONING; +alter table fa REMOVE PARTITIONING; -- real 170m12.024s (≈ 2h 50m) ``` @@ -737,4 +736,4 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) TiDB offers two approaches for converting tables between partitioned and non-partitioned states: -In this experiment, the table structures have been anonymized. For more detailed information on the usage of [TTL (Time To Live)](/time-to-live.md). \ No newline at end of file +Choose an offline method like [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file From 97fd5c4b7aebabb49038b80e209eda6f2ce0a024 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 14 Oct 2025 14:22:41 +0800 Subject: [PATCH 23/84] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index e623f6dd60682..55cc03c0303b5 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -225,7 +225,8 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **Partition Drop Performance:** - DROP PARTITION removes an entire data segment instantly, with minimal resource usage. -- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. +- `ALTER TABLE ... DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. +- `ALTER TABLE ... DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. #### How to Use TTL and Partition Drop in TiDB From 45bfa106ffb619f8b1abd84ac9d17c7ebcc55f90 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 14 Oct 2025 14:24:16 +0800 Subject: [PATCH 24/84] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 55cc03c0303b5..8852de949d545 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -464,7 +464,7 @@ PARTITION BY RANGE ( YEAR(hired) ) ( ); ``` -Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. +Adding the [merge_option=deny](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. ```sql -- table From ad70a86f43f6b46bb2a2eb73189c62f7bdb41b81 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:25:54 +0800 Subject: [PATCH 25/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index e623f6dd60682..c11941c16368f 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -375,7 +375,7 @@ PARTITION BY KEY (id) PARTITIONS 16; When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: ```sql -select * from server_info where `serial_no` = ?; +SELECT * FROM server_info WHERE `serial_no` = ?; ``` **Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: From bc770ba865dced7c32cbb40dcdf956518e9c2a27 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 14 Oct 2025 14:26:17 +0800 Subject: [PATCH 26/84] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 8852de949d545..26a42c3eea601 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -323,7 +323,7 @@ If you need to drop partitions frequently and minimize the performance impact on In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. -This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: +This is common when the primary key is **monotonically increasing**—for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or secondary index on datetime column with default value set to `CURRENT_TIMESTAMP`—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: - A single Region handling most of the write workload, while other Regions remain idle. - Higher write latency and reduced throughput. From afc19e4c3a038e192475999ab7de01dd84b547c2 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:27:36 +0800 Subject: [PATCH 27/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index c11941c16368f..75d21f50176e0 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -6,7 +6,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto_increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. From 9b33c98e418631f69b3911e060eba5fc9d5e6cab Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:32:15 +0800 Subject: [PATCH 28/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index b27edac4ca73f..caec8b1d36904 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -6,7 +6,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/autoincrement.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. @@ -225,8 +225,7 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **Partition Drop Performance:** - DROP PARTITION removes an entire data segment instantly, with minimal resource usage. -- `ALTER TABLE ... DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. -- `ALTER TABLE ... DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. +- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. #### How to Use TTL and Partition Drop in TiDB @@ -323,7 +322,7 @@ If you need to drop partitions frequently and minimize the performance impact on In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. -This is common when the primary key is **monotonically increasing**—for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or secondary index on datetime column with default value set to `CURRENT_TIMESTAMP`—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: +This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: - A single Region handling most of the write workload, while other Regions remain idle. - Higher write latency and reduced throughput. @@ -464,7 +463,7 @@ PARTITION BY RANGE ( YEAR(hired) ) ( ); ``` -Adding the [merge_option=deny](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. +Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. ```sql -- table @@ -669,7 +668,7 @@ CREATE TABLE `fa` ( ``` #### Description -This example shows converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. +These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. ### Method 1: Batch DML INSERT INTO ... SELECT ... ```sql From c16bae130bbe814b95cf8c399a933e6cb34146e4 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:32:54 +0800 Subject: [PATCH 29/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index caec8b1d36904..7a3ac514198f8 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -6,7 +6,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/autoincrement.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. From a724b1f23317ed7756a478704a709f85b3dc0056 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:55:53 +0800 Subject: [PATCH 30/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 7a3ac514198f8..eeee073f90689 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -162,9 +162,14 @@ ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` -- Adds a global index to an existing partitioned table. -- GLOBAL must be explicitly specified. -- You can also use ADD INDEX for non-unique global indexes. + +Adds a global index to an existing partitioned table. + +- The `GLOBAL` keyword must be explicitly specified. +- For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. + - Not supported in v8.5.x + - Available starting from v9.0.0-beta.1 + - Expected to be included in the next LTS release **Option 2: Define Inline on Table Creation** From db96727a470153af5ce8f091a90fa660e3fd9059 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 14 Oct 2025 15:10:59 +0800 Subject: [PATCH 31/84] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index eeee073f90689..c8d3d63f5f47d 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -15,8 +15,8 @@ This document examines partitioned tables in TiDB from multiple angles, includin ## Agenda - Improving query efficiency - - Partition pruning - - Query performance comparison: Non-Partitioned Table vs. Local Index vs. Global Index + - Partition pruning + - Query performance comparison: Non-Partitioned Table vs. Local Index vs. Global Index - Facilitating bulk data deletion - Data cleanup efficiency: TTL vs. Direct Partition Drop - Partition drop efficiency: Local Index vs Global Index From 515a82e5ef604c7dde0c3f528292ffd893b5437f Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 14 Oct 2025 15:11:29 +0800 Subject: [PATCH 32/84] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 1 + 1 file changed, 1 insertion(+) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index c8d3d63f5f47d..a6d90d4477ef4 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -52,6 +52,7 @@ In TiDB, local indexes are the default for partitioned tables. Each partition ha #### What Did We Test We evaluated query performance across three table configurations in TiDB: + - Non-Partitioned Table - Partitioned Table with Global Index - Partitioned Table with Local Index From 2875c0e89d85c3cc8b7f796895b69cb14525dc84 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 15:15:35 +0800 Subject: [PATCH 33/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index eeee073f90689..abf9dc874ed52 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -39,7 +39,7 @@ By understanding these aspects, you can make informed decisions on whether and h Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: -- **Time-series data queries**: When data is partitioned by time ranges (e.g., daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. +- **Time-series data queries**: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. - **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. - **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. @@ -106,7 +106,7 @@ WHERE `fa`.`sid` IN ( #### Findings -Data came from a table with **366 range partitions** (e.g., by date). +Data came from a table with **366 range partitions** (for example, by date). - The **Average Query Time** was obtained from the statement_summary view. - The query used a **secondary index** and returned **400 rows**. @@ -196,7 +196,7 @@ The performance overhead of partitioned tables in TiDB depends significantly on - The more partitions you have, the more severe the potential performance degradation. - With a smaller number of partitions, the impact may not be as noticeable, but it's still workload-dependent. - For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. -- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (i.e., the number of rows requiring table lookups). +- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). #### Recommendation @@ -625,7 +625,7 @@ show table employees2 PARTITION (p4) regions; ## Converting Between Partitioned and Non-Partitioned Tables -When working with large tables (e.g. in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: +When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: 1. Batch DML: `INSERT INTO ... SELECT ...` 2. Pipeline DML: `INSERT INTO ... SELECT ...` From 9fa0726678ba924349bc1f3af22408894ebde2fb Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 16:11:55 +0800 Subject: [PATCH 34/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 21 +++------------------ 1 file changed, 3 insertions(+), 18 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index efc844d071ac2..6f0d1cc7dc781 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -1,5 +1,5 @@ # Best Practices for Using TiDB Partitioned Tables - +This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. ## Introduction Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. @@ -10,24 +10,9 @@ Another frequent scenario is using **hash or key partitioning** to address write While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. -This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it aims to equip you with the knowledge to make informed decisions about when and how to adopt partitioning strategies in your TiDB environment. - -## Agenda - -- Improving query efficiency - - Partition pruning - - Query performance comparison: Non-Partitioned Table vs. Local Index vs. Global Index -- Facilitating bulk data deletion - - Data cleanup efficiency: TTL vs. Direct Partition Drop - - Partition drop efficiency: Local Index vs Global Index -- Mitigating write hotspot issues -- Partition management challenge - - How to avoid hotspots caused by new range partitions -- Converting between partitioned and non-partitioned tables - -By understanding these aspects, you can make informed decisions on whether and how to implement partitioning in your TiDB environment. +This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. -> **Note:** If you're new to partitioned tables in TiDB, we recommend reviewing the [Partitioned Table User Guide](/partitioned-table.md) first to better understand key concepts like partition pruning, global vs. local indexes, and partition strategies. +> **Note:** To get started with the fundamentals, refer to the [Partitioned Table User Guide](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. ## Improving query efficiency From f9461289e21c95074afdafce6f8071b23675029d Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 16:27:10 +0800 Subject: [PATCH 35/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 6f0d1cc7dc781..2be4b97881943 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -210,8 +210,8 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **TTL Performance:** - On a write-heavy table, TTL runs every 10 minutes. -- With 50 threads, each TTL job took 8–10 minutes, deleted 7–11 million rows. -- With 100 threads, it handled up to 20 million rows, but execution time increased to 15–30 minutes, with greater variance. +- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. +- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** From bc27e69ebba1f58ff39c0f156bedc4f0e6f244af Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 16:35:56 +0800 Subject: [PATCH 36/84] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 2be4b97881943..0792c1d70684b 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -1,5 +1,7 @@ # Best Practices for Using TiDB Partitioned Tables + This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. + ## Introduction Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. @@ -121,6 +123,7 @@ Metrics collected: ``` **Partition table with global index** + ```yaml | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | |------------------------|---------|-----------|---------|-----------|-------------------------------------------------|----------------|---------------|----------|------| @@ -130,6 +133,7 @@ Metrics collected: ``` **Partition table with local index** + ```yaml | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | |------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|---------|-------| @@ -215,6 +219,7 @@ To compare the performance of TTL and partition drop, we configured TTL to execu - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** + - DROP PARTITION removes an entire data segment instantly, with minimal resource usage. - DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. @@ -328,10 +333,12 @@ TiDB stores table data in **Regions**, each covering a continuous range of row k When the primary key is AUTO_INCREMENT and the secondary indexes on datetime columns are monotonically increasing: **Without Partitioning:** + - New rows always have the highest key values and are inserted into the same "last Region." - That Region is served by one TiKV node at a time, becoming a single write bottleneck. **With Hash/Key Partitioning:** + - The table and the secondary indexes are split into multiple partitions using a hash or key function on the primary key or indexed columns. - Each partition has its own set of Regions, often distributed across different TiKV nodes. - Inserts are spread across multiple Regions in parallel, improving load distribution and throughput. @@ -390,9 +397,11 @@ New range partitions in a partitioned table can easily lead to hotspot issues in When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. **Root Cause:** + By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. **Impact:** + When a query does **not filter by partition key**, TiDB will **scan all partitions** (as seen in the execution plan partition:all). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. **Write Hotspot** @@ -405,6 +414,7 @@ In TiDB, any newly created table or partition initially contains only **one regi However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. **Impact:** + This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. @@ -421,13 +431,16 @@ This imbalance can cause that TiKV node to trigger **flow control**, leading to **1. NONCLUSTERED Partitioned Table** **Pros:** + - When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with SHARD_ROW_ID_BITS and [PRE_SPLIT_REGIONS](https://docs.pingcap.com/tidb/stable/sql-statement-split-region/#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. - Lower operational overhead. **Cons:** + - Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. **Recommendation:** + - Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. **Best Practices** @@ -514,12 +527,15 @@ SHOW TABLE employees PARTITION (p4) regions; **2. CLUSTERED Partitioned Table** **Pros:** + - Queries using **Point Get** or **Table Range Scan** do **not** need additional lookups, resulting in better **read performance**. **Cons:** + - **Manual region splitting** is required when creating new partitions, increasing operational complexity. **Recommendation:** + - Ideal when low-latency point queries are important and operational resources are available to manage region splitting. **Best Practices** @@ -599,13 +615,16 @@ show table employees2 PARTITION (p4) regions; **3. CLUSTERED Non-partitioned Table** **Pros:** + - **No hotspot risk from new partitions**. - Provides **good read performance** for point and range queries. **Cons:** + - **Cannot use DROP PARTITION** to clean up large volumes of old data. **Recommendation:** + - Best suited for use cases that require stable performance and do not benefit from partition-based data management. @@ -621,6 +640,7 @@ When working with large tables (for example in this example 120 million rows), t This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. #### Table Schema: `fa` + ```sql CREATE TABLE `fa` ( `id` bigint NOT NULL AUTO_INCREMENT, @@ -644,6 +664,7 @@ PARTITION `fa_2024366` VALUES LESS THAN (2024366)); #### Table Schema: `fa_new` + ```sql CREATE TABLE `fa` ( `id` bigint NOT NULL AUTO_INCREMENT, @@ -659,7 +680,9 @@ CREATE TABLE `fa` ( ``` #### Description + These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. + ### Method 1: Batch DML INSERT INTO ... SELECT ... ```sql From 831b91e0f72e993280db9b6be3b4696d5a6401de Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 16:43:58 +0800 Subject: [PATCH 37/84] Create tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 753 +++++++++++++++++++++++++++++++ 1 file changed, 753 insertions(+) create mode 100644 tidb-partitioned-tables-guide.md diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md new file mode 100644 index 0000000000000..2abe1d6f8a32f --- /dev/null +++ b/tidb-partitioned-tables-guide.md @@ -0,0 +1,753 @@ +# Best Practices for Using TiDB Partitioned Tables + +This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. + +## Introduction + +Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. + +A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. + +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. + +While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. + +This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. + +> **Note:** To get started with the fundamentals, refer to the [Partitioned Table User Guide](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. + +## Improving query efficiency + +### Partition Pruning + +**Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions may contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. + +#### Applicable Scenarios + +Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: + +- **Time-series data queries**: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. +- **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. +- **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. + +For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). + +### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index + +In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. + +#### What Did We Test + +We evaluated query performance across three table configurations in TiDB: + +- Non-Partitioned Table +- Partitioned Table with Global Index +- Partitioned Table with Local Index + +#### Test Setup + +- The query **accesses data via a secondary index** and uses IN conditions across multiple values. +- The **partitioned table** had **366 partitions**, defined by **range partitioning on a datetime column**. +- Each matching key could return **multiple rows**, simulating a **high-volume OLTP-style query pattern**. +- We also evaluated the **impact of different partition counts** to understand how partition granularity influences latency and index performance. + +#### Schema + +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +PARTITION BY RANGE (`date`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +PARTITION `fa_2024003` VALUES LESS THAN (2024003), +... +... +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); +``` + +#### SQL + +```sql +SELECT `fa`.* +FROM `fa` +WHERE `fa`.`sid` IN ( + 1696271179344, + 1696317134004, + 1696181972136, + ... + 1696159221765 +); +``` + +- Query filters on secondary index, but does **not include the partition key**. +- Causes **Local Index** to scan across all partitions due to lack of pruning. +- Table lookup tasks are significantly higher for partitioned tables. + +#### Findings + +Data came from a table with **366 range partitions** (for example, by date). +- The **Average Query Time** was obtained from the statement_summary view. +- The query used a **secondary index** and returned **400 rows**. + +Metrics collected: +- **Average Query Time**: from statement_summary +- **Cop Tasks** (Index Scan + Table Lookup): from execution plan + +#### Test Results + +| Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | +|---|---|---|---|---|---| +| Non-Partitioned Table | 12.6 ms | 72 | 79 | 151 | Delivering the best performance with the fewest Cop tasks — ideal for most OLTP use cases. | +| Partitioned Table with Local Index | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries will scan all partitions. | +| Partitioned Table with Global Index | 14.8 ms | 69 | 383 | 452 | Improving index scan efficiency, but table lookups can still be expensive if many rows match. | + +#### Execution Plan Examples + +**Non-partitioned table** + +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|---------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|----------|------| +| IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task:{total_time:3.34ms, fetch_handle:3.34ms, build:600ns, wait:2.86µs}, table_task:{total_time:7.55ms, num:1, concurrency:5}, next:{wait_index:3.49ms, wait_table_lookup_build:492.5µs, wait_table_lookup_resp:7.05ms} | | 706.7 KB | N/A | +| IndexRangeScan_5(Build) | 398.73 | 90633.86 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:3.16ms, loops:3, cop_task:{num:72, max:780.4µs, min:394.2µs, avg:566.7µs, p95:748µs, max_proc_keys:20, p95_proc_keys:10, tot_proc:3.66ms, tot_wait:18.6ms, copr_cache_hit_ratio:0.00, build_task_duration:94µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:72, total_time:40.1ms}}, tikv_task:{proc max:1ms, min:0s, avg:27.8µs, p80:0s, p95:0s, iters:72, tasks:72}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:480, get_snapshot_time:17.7ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:160}}}, time_detail:{total_process_time:3.66ms, total_wait_time:18.6ms, total_kv_read_wall_time:2ms, tikv_wall_time:27.4ms} | range:[1696125963161,1696125963161], …, [1696317134004,1696317134004], keep order:false | N/A | N/A | +| TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | +``` + +**Partition table with global index** + +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|------------------------|---------|-----------|---------|-----------|-------------------------------------------------|----------------|---------------|----------|------| +| IndexLookUp_8 | 398.73 | 786959.21 | 400 | root | partition:all | time:12.8ms, loops:2, index_task:{total_time:2.71ms, fetch_handle:2.71ms, build:528ns, wait:3.23µs}, table_task:{total_time:9.03ms, num:1, concurrency:5}, next:{wait_index:3.27ms, wait_table_lookup_build:1.49ms, wait_table_lookup_resp:7.53ms} | | 693.9 KB | N/A | +| IndexRangeScan_5(Build)| 398.73 | 102593.43 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid_global(sid, id)| time:2.49ms, loops:3, cop_task:{num:69, max:997µs, min:213.8µs, avg:469.8µs, p95:986.6µs, max_proc_keys:15, p95_proc_keys:10, tot_proc:13.4ms, tot_wait:1.52ms, copr_cache_hit_ratio:0.00, build_task_duration:498.4µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:69, total_time:31.8ms}}, tikv_task:{proc max:1ms, min:0s, avg:101.4µs, p80:0s, p95:1ms, iters:69, tasks:69}, scan_detail:{total_process_keys:400, total_process_keys_size:31200, total_keys:480, get_snapshot_time:679.9µs, rocksdb:{key_skipped_count:400, block:{cache_hit_count:189, read_count:54, read_byte:347.7 KB, read_time:6.17ms}}}, time_detail:{total_process_time:13.4ms, total_wait_time:1.52ms, total_kv_read_wall_time:7ms, tikv_wall_time:19.3ms} | range:[1696125963161,1696125963161], …, keep order:false, stats:partial[...] | N/A | N/A | +| TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | +``` + +**Partition table with local index** + +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|---------|-------| +| IndexLookUp_7 | 398.73 | 784450.63 | 400 | root | partition:all | time:290.8ms, loops:2, index_task:{total_time:103.6ms, fetch_handle:7.74ms, build:133.2µs, wait:95.7ms}, table_task:{total_time:551.1ms, num:217, concurrency:5}, next:{wait_index:179.6ms, wait_table_lookup_build:391µs, wait_table_lookup_resp:109.5ms} | | 4.30 MB | N/A | +| IndexRangeScan_5(Build)| 398.73 | 90633.73 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:10.8ms, loops:800, cop_task:{num:600, max:65.6ms, min:1.02ms, avg:22.2ms, p95:45.1ms, max_proc_keys:5, p95_proc_keys:3, tot_proc:6.81s, tot_wait:4.77s, copr_cache_hit_ratio:0.00, build_task_duration:172.8ms, max_distsql_concurrency:3}, rpc_info:{Cop:{num_rpc:600, total_time:13.3s}}, tikv_task:{proc max:54ms, min:0s, avg:13.9ms, p80:20ms, p95:30ms, iters:600, tasks:600}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:29680, get_snapshot_time:2.47s, rocksdb:{key_skipped_count:400, block:{cache_hit_count:117580, read_count:29437, read_byte:104.9 MB, read_time:3.24s}}}, time_detail:{total_process_time:6.81s, total_suspend_time:1.51s, total_wait_time:4.77s, total_kv_read_wall_time:8.31s, tikv_wall_time:13.2s}} | range:[1696125963161,...,1696317134004], keep order:false, stats:partial[...] | N/A | N/A | +| TableRowIDScan_6(Probe)| 398.73 | 165221.49 | 400 | cop[tikv] | table:fa | time:514ms, loops:434, cop_task:{num:375, max:31.6ms, min:0s, avg:1.33ms, p95:1.67ms, max_proc_keys:2, p95_proc_keys:2, tot_proc:220.7ms, tot_wait:242.2ms, copr_cache_hit_ratio:0.00, build_task_duration:27.8ms, max_distsql_concurrency:1, max_extra_concurrency:1, store_batch_num:69}, rpc_info:{Cop:{num_rpc:306, total_time:495.5ms}}, tikv_task:{proc max:6ms, min:0s, avg:597.3µs, p80:1ms, p95:1ms, iters:375, tasks:375}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:158.3ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:3197, read_count:803, read_byte:10.2 MB, read_time:113.5ms}}}, time_detail:{total_process_time:220.7ms, total_suspend_time:5.39ms, total_wait_time:242.2ms, total_kv_read_wall_time:224ms, tikv_wall_time:430.5ms}} | keep order:false, stats:partial[...] | N/A | N/A | +``` +[Similar detailed execution plans for partitioned tables with global and local indexes would follow...] + +#### How to Create a Global Index on a Partitioned Table in TiDB + +**Option 1: Add via ALTER TABLE** + +```sql +ALTER TABLE +ADD UNIQUE INDEX (col1, col2) GLOBAL; +``` + + +Adds a global index to an existing partitioned table. + +- The `GLOBAL` keyword must be explicitly specified. +- For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. + - Not supported in v8.5.x + - Available starting from v9.0.0-beta.1 + - Expected to be included in the next LTS release + +**Option 2: Define Inline on Table Creation** + +```sql +CREATE TABLE t ( + id BIGINT NOT NULL, + col1 VARCHAR(50), + col2 VARCHAR(50), + -- other columns... + + UNIQUE GLOBAL INDEX idx_col1_col2 (col1, col2) +) +PARTITION BY RANGE (id) ( + PARTITION p0 VALUES LESS THAN (10000), + PARTITION p1 VALUES LESS THAN (20000), + PARTITION pMax VALUES LESS THAN MAXVALUE +); +``` + +#### Summary + +The performance overhead of partitioned tables in TiDB depends significantly on the number of partitions and the type of index used. + +- The more partitions you have, the more severe the potential performance degradation. +- With a smaller number of partitions, the impact may not be as noticeable, but it's still workload-dependent. +- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. +- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). + +#### Recommendation + +- Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. +- If you must use partitioned tables, benchmark both global index and local index strategies under your workload. +- Use global indexes when query performance across partitions is critical. +- Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. + +## Facilitating Bulk Data Deletion + +### Data Cleanup Efficiency: TTL vs. Direct Partition Drop + +In TiDB, historical data cleanup can be handled either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. + +#### What's the difference? + +- **TTL**: Automatically removes data based on its age, but may be slower due to the need to scan and clean data over time. +- **Partition Drop**: Deletes an entire partition at once, making it much faster, especially when dealing with large datasets. + +#### What Did We Test + +To compare the performance of TTL and partition drop, we configured TTL to execute every 10 minutes and created a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches were tested under background write loads of 50 and 100 concurrent threads. We measured key metrics such as execution time, system resource utilization, and the total number of rows deleted. + +#### Findings + +**TTL Performance:** +- On a write-heavy table, TTL runs every 10 minutes. +- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. +- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. +- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. + +**Partition Drop Performance:** + +- DROP PARTITION removes an entire data segment instantly, with minimal resource usage. +- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. + +#### How to Use TTL and Partition Drop in TiDB + +In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at https://docs.pingcap.com/tidb/stable/time-to-live/. + +**TTL schema** + +```sql +CREATE TABLE `ad_cache` ( + `session` varchar(255) NOT NULL, + `ad_id` varbinary(255) NOT NULL, + `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, + `suffix` bigint(20) NOT NULL, + `expire_time` timestamp NULL DEFAULT NULL, + `data` mediumblob DEFAULT NULL, + `version` int(11) DEFAULT NULL, + `is_delete` tinyint(1) DEFAULT NULL, + PRIMARY KEY (`session`, `ad_id`, `create_time`, `suffix`) +) +ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' +TTL_JOB_INTERVAL='10m'; +``` + +**Drop Partition (Range INTERVAL partitioning)** + +```sql +CREATE TABLE `ad_cache` ( + `session_id` varchar(255) NOT NULL, + `external_id` varbinary(255) NOT NULL, + `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, + `id_suffix` bigint(20) NOT NULL, + `expire_time` timestamp NULL DEFAULT NULL, + `cache_data` mediumblob DEFAULT NULL, + `data_version` int(11) DEFAULT NULL, + `is_deleted` tinyint(1) DEFAULT NULL, + PRIMARY KEY ( + `session_id`, `external_id`, + `create_time`, `id_suffix` + ) NONCLUSTERED +) +SHARD_ROW_ID_BITS=7 +PRE_SPLIT_REGIONS=2 +PARTITION BY RANGE COLUMNS (create_time) +INTERVAL (10 MINUTE) +FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') +... +LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); +``` + +It's required to run DDL alter table partition ... to change the FIRST PARTITION and LAST PARTITION periodically. These two DDL statements can drop the old partitions and create new ones. + +```sql +ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); +ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); +``` + +#### Recommendation + +For workloads with **large or time-based data cleanup**, prefer using **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. TTL is still useful for finer-grained or background cleanup but may not be optimal under high write pressure or when deleting large volumes of data quickly. + +### Partition Drop Efficiency: Local Index vs Global Index + +Partition table with Global Index requires synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION. In this section, the tests show that DROP PARTITION is much slower when using a **Global Index** compared to a **Local Index**. This should be considered when designing partitioned tables. + +#### What Did We Test + +We created a table with **366 partitions** and tested the DROP PARTITION performance using both **Global Index** and **Local Index**. The total number of rows was **1 billion**. + +| Index Type | Duration (drop partition) | +|---|---| +| Global Index | 1 min 16.02 s | +| Local Index | 0.52 s | + +#### Findings + +Dropping a partition on a table with a Global Index took **76 seconds**, while the same operation with a Local Index took only **0.52 seconds**. The reason is that Global Indexes span all partitions and require more complex updates, while Local Indexes are limited to individual partitions and are easier to handle. + +**Global Index** + +```sql +ALTER TABLE A DROP PARTITION A_2024363; +``` + +#### Recommendation + +When a partitioned table contains global indexes, performing certain DDL operations such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. + +If you need to drop partitions frequently and minimize the performance impact on the system, it's better to use **local indexes** for faster and more efficient operations. + +## Mitigating Write Hotspot Issues + +### Background + +In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. + +This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: + +- A single Region handling most of the write workload, while other Regions remain idle. +- Higher write latency and reduced throughput. +- Limited performance gains from scaling out TiKV nodes, as the bottleneck remains concentrated on one Region. + +**Partitioned tables** can help mitigate this problem. By applying **hash** or **key** partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. + +### How It Works + +TiDB stores table data in **Regions**, each covering a continuous range of row keys. + +When the primary key is AUTO_INCREMENT and the secondary indexes on datetime columns are monotonically increasing: + +**Without Partitioning:** + +- New rows always have the highest key values and are inserted into the same "last Region." +- That Region is served by one TiKV node at a time, becoming a single write bottleneck. + +**With Hash/Key Partitioning:** + +- The table and the secondary indexes are split into multiple partitions using a hash or key function on the primary key or indexed columns. +- Each partition has its own set of Regions, often distributed across different TiKV nodes. +- Inserts are spread across multiple Regions in parallel, improving load distribution and throughput. + +### Use Case + +If a table with an AUTO_INCREMENT primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write load more evenly. + +```sql +CREATE TABLE server_info ( + id bigint NOT NULL AUTO_INCREMENT, + serial_no varchar(100) DEFAULT NULL, + device_name varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL, + device_type varchar(50) DEFAULT NULL, + modified_ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, + PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */, + KEY idx_serial_no (serial_no), + KEY idx_modified_ts (modified_ts) +) /*T![auto_id_cache] AUTO_ID_CACHE=1 */ +PARTITION BY KEY (id) PARTITIONS 16; +``` + +### Pros + +- **Balanced Write Load** — Hotspots are spread across multiple partitions, reducing contention and improving insert performance. +- **Query Optimization via Partition Pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. + +### Cons + +**Potential Query Performance Drop Without Partition Pruning** + +When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: + +```sql +SELECT * FROM server_info WHERE `serial_no` = ?; +``` + +**Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: + +```sql +ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; +``` + +## Partition Management Challenge + +### How to Avoid Hotspots Caused by New Range Partitions + +#### Overview + +New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. + +#### Common Hotspot Scenarios + +**Read Hotspot** + +When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. + +**Root Cause:** + +By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. + +**Impact:** + +When a query does **not filter by partition key**, TiDB will **scan all partitions** (as seen in the execution plan partition:all). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. + +**Write Hotspot** + +When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: + +**Root Cause:** +In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. + +However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. + +**Impact:** + +This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. + + +### Summary Table + +| Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | +|---|---|---|---|---|---| +| NONCLUSTERED Partitioned | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | +| CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | +| CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | + +#### Solutions + +**1. NONCLUSTERED Partitioned Table** + +**Pros:** + +- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with SHARD_ROW_ID_BITS and [PRE_SPLIT_REGIONS](https://docs.pingcap.com/tidb/stable/sql-statement-split-region/#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. +- Lower operational overhead. + +**Cons:** + +- Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. + +**Recommendation:** + +- Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. + +**Best Practices** + +Create a partitioned table with SHARD_ROW_ID_BITS and PRE_SPLIT_REGIONS to pre-split table regions. The value of PRE_SPLIT_REGIONS must be less than or equal to that of SHARD_ROW_ID_BITS. The number of pre-split Regions for each partition is 2^(PRE_SPLIT_REGIONS). + +```sql +CREATE TABLE employees ( + id INT NOT NULL, + fname VARCHAR(30), + lname VARCHAR(30), + hired DATE NOT NULL DEFAULT '1970-01-01', + separated DATE DEFAULT '9999-12-31', + job_code INT, + store_id INT, + PRIMARY KEY (`id`,`hired`) NONCLUSTERED, + KEY `idx_employees_on_store_id` (`store_id`) +)SHARD_ROW_ID_BITS = 2 PRE_SPLIT_REGIONS=2 +PARTITION BY RANGE ( YEAR(hired) ) ( + PARTITION p0 VALUES LESS THAN (1991), + PARTITION p1 VALUES LESS THAN (1996), + PARTITION p2 VALUES LESS THAN (2001), + PARTITION p3 VALUES LESS THAN (2006) +); +``` + +Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. + +```sql +-- table +ALTER TABLE employees ATTRIBUTES 'merge_option=deny'; +-- partition +ALTER TABLE employees PARTITION `p3` ATTRIBUTES 'merge_option=deny'; +``` + +**Determining split boundaries based on existing business data** + +To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. + +**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: + +```sql +SELECT MIN(id), MAX(id) FROM employees; +``` + +- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. +- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. +- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. + +**Pre-split and scatter regions** + +A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. + +**Splitting regions for the primary key of all partitions** + +To split regions for the primary key of all partitions in a partitioned table, you can use a command like: + +```sql +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; +``` + +This example will split each partition's primary key range into `` regions between the specified boundary values. + +**Splitting Regions for the secondary index of all partitions.** + +```sql +SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +``` + +**(Optional) When adding a new partition, you MUST manually split regions for its primary key and indices.** + +```sql +ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); + +SHOW TABLE employees PARTITION (p4) regions; + +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "2006-01-01") AND (100000, "2011-01-01") REGIONS ; + +SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; + +SHOW TABLE employees PARTITION (p4) regions; +``` + +**2. CLUSTERED Partitioned Table** + +**Pros:** + +- Queries using **Point Get** or **Table Range Scan** do **not** need additional lookups, resulting in better **read performance**. + +**Cons:** + +- **Manual region splitting** is required when creating new partitions, increasing operational complexity. + +**Recommendation:** + +- Ideal when low-latency point queries are important and operational resources are available to manage region splitting. + +**Best Practices** + +Create a CLUSTERED partitioned table. + +```sql +CREATE TABLE employees2 ( + id INT NOT NULL, + fname VARCHAR(30), + lname VARCHAR(30), + hired DATE NOT NULL DEFAULT '1970-01-01', + separated DATE DEFAULT '9999-12-31', + job_code INT, + store_id INT, + PRIMARY KEY (`id`,`hired`) CLUSTERED, + KEY `idx_employees2_on_store_id` (`store_id`) +) +PARTITION BY RANGE ( YEAR(hired) ) ( + PARTITION p0 VALUES LESS THAN (1991), + PARTITION p1 VALUES LESS THAN (1996), + PARTITION p2 VALUES LESS THAN (2001), + PARTITION p3 VALUES LESS THAN (2006) +); +``` + +Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. + +```sql +ALTER TABLE employees2 ATTRIBUTES 'merge_option=deny'; +``` + +**Determining split boundaries based on existing business data** + +To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. + +**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: + +```sql +SELECT MIN(id), MAX(id) FROM employees2; +``` + +- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. +- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. +- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. + +**Pre-split and scatter regions** + +A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. + +**Splitting regions for all partitions** + +```sql +SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; +``` + +**Splitting regions for the secondary index of all partitions.** + +```sql +SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +``` + +**(Optional) When adding a new partition, you MUST manually split regions for the specific partition and its indices.** + +```sql +ALTER TABLE employees2 ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); + +show table employees2 PARTITION (p4) regions; + +SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; + +SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; + +show table employees2 PARTITION (p4) regions; +``` + +**3. CLUSTERED Non-partitioned Table** + +**Pros:** + +- **No hotspot risk from new partitions**. +- Provides **good read performance** for point and range queries. + +**Cons:** + +- **Cannot use DROP PARTITION** to clean up large volumes of old data. + +**Recommendation:** + +- Best suited for use cases that require stable performance and do not benefit from partition-based data management. + + +## Converting Between Partitioned and Non-Partitioned Tables + +When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: + +1. Batch DML: `INSERT INTO ... SELECT ...` +2. Pipeline DML: `INSERT INTO ... SELECT ...` +3. `IMPORT INTO`: `IMPORT INTO ... FROM SELECT ...` +4. Online DDL: Direct schema transformation via `ALTER TABLE` + +This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. + +#### Table Schema: `fa` + +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +PARTITION BY RANGE (`date`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +PARTITION `fa_2024003` VALUES LESS THAN (2024003), +... +... +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); +``` + + +#### Table Schema: `fa_new` + +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; +``` + +#### Description + +These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. + +### Method 1: Batch DML INSERT INTO ... SELECT ... + +```sql +SET tidb_mem_quota_query = 0; +INSERT INTO fa_new SELECT * FROM fa; +-- 120 million rows copied in 1h 52m 47s +``` + + +### Method 2: Pipeline DML INSERT INTO ... SELECT ... + +```sql +SET tidb_dml_type = "bulk"; +SET tidb_mem_quota_query = 0; +SET tidb_enable_mutation_checker = OFF; +INSERT INTO fa_new SELECT * FROM fa; +-- 120 million rows copied in 58m 42s +``` + +### Method 3: IMPORT INTO ... FROM SELECT ... + +```sql +mysql> import into fa_new from select * from fa with thread=32,disable_precheck; +Query OK, 120000000 rows affected, 1 warning (16 min 49.90 sec) +Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 +``` + +### Method 4: Online DDL + +**From partition table to non-partitioned table** + +```sql +SET @@global.tidb_ddl_reorg_worker_cnt = 16; +SET @@global.tidb_ddl_reorg_batch_size = 4096; +alter table fa REMOVE PARTITIONING; +-- real 170m12.024 s (≈ 2 h 50 m) +``` + +**From non-partition table to partitioned table** + +```sql +SET @@global.tidb_ddl_reorg_worker_cnt = 16; +SET @@global.tidb_ddl_reorg_batch_size = 4096; +ALTER TABLE fa PARTITION BY RANGE (`date`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +... +PARTITION `fa_2024365` VALUES LESS THAN (2024365), +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); + +Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) +``` + +### Findings + +| Method | Time Taken | +|---|---| +| Method 1: Batch DML INSERT INTO ... SELECT | 1 h 52 m 47 s | +| Method 2: Pipeline DML: INSERT INTO ... SELECT ... | 58 m 42 s | +| Method 3: IMPORT INTO ... FROM SELECT ... | 16 m 59 s | +| Method 4: Online DDL (From partition table to non-partitioned table) | 2 h 50 m | +| Method 4: Online DDL (From non-partition table to partitioned table) | 2 h 31 m | + +### Recommendation + +TiDB offers two approaches for converting tables between partitioned and non-partitioned states: + +Choose an offline method like [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file From 0dc299d9d9ca445d62390cbeb6c7f342c8711fac Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 17:10:41 +0800 Subject: [PATCH 38/84] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 1 - 1 file changed, 1 deletion(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index 2abe1d6f8a32f..449fb120ef75e 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -152,7 +152,6 @@ ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` - Adds a global index to an existing partitioned table. - The `GLOBAL` keyword must be explicitly specified. From 017a1adf7ee4d45894628481145736b160d9dc43 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 17:22:10 +0800 Subject: [PATCH 39/84] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index 449fb120ef75e..6ea473e0d9490 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -114,7 +114,7 @@ Metrics collected: **Non-partitioned table** -```yaml +``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | |---------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|----------|------| | IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task:{total_time:3.34ms, fetch_handle:3.34ms, build:600ns, wait:2.86µs}, table_task:{total_time:7.55ms, num:1, concurrency:5}, next:{wait_index:3.49ms, wait_table_lookup_build:492.5µs, wait_table_lookup_resp:7.05ms} | | 706.7 KB | N/A | @@ -124,7 +124,7 @@ Metrics collected: **Partition table with global index** -```yaml +``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | |------------------------|---------|-----------|---------|-----------|-------------------------------------------------|----------------|---------------|----------|------| | IndexLookUp_8 | 398.73 | 786959.21 | 400 | root | partition:all | time:12.8ms, loops:2, index_task:{total_time:2.71ms, fetch_handle:2.71ms, build:528ns, wait:3.23µs}, table_task:{total_time:9.03ms, num:1, concurrency:5}, next:{wait_index:3.27ms, wait_table_lookup_build:1.49ms, wait_table_lookup_resp:7.53ms} | | 693.9 KB | N/A | @@ -134,7 +134,7 @@ Metrics collected: **Partition table with local index** -```yaml +``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | |------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|---------|-------| | IndexLookUp_7 | 398.73 | 784450.63 | 400 | root | partition:all | time:290.8ms, loops:2, index_task:{total_time:103.6ms, fetch_handle:7.74ms, build:133.2µs, wait:95.7ms}, table_task:{total_time:551.1ms, num:217, concurrency:5}, next:{wait_index:179.6ms, wait_table_lookup_build:391µs, wait_table_lookup_resp:109.5ms} | | 4.30 MB | N/A | @@ -152,6 +152,7 @@ ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` + Adds a global index to an existing partitioned table. - The `GLOBAL` keyword must be explicitly specified. From 3517f7fb57d3fbe29d6ac9bb07bb9e7b418946f3 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 17:34:06 +0800 Subject: [PATCH 40/84] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index 6ea473e0d9490..07785d723b8e8 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -95,11 +95,13 @@ WHERE `fa`.`sid` IN ( #### Findings Data came from a table with **366 range partitions** (for example, by date). -- The **Average Query Time** was obtained from the statement_summary view. + +- The **Average Query Time** was obtained from the `statement_summary` view. - The query used a **secondary index** and returned **400 rows**. Metrics collected: -- **Average Query Time**: from statement_summary + +- **Average Query Time**: from `statement_summary` - **Cop Tasks** (Index Scan + Table Lookup): from execution plan #### Test Results From 5013221dbfd85cd3553b5dd8163cc56fbbb15e3d Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 18:26:31 +0800 Subject: [PATCH 41/84] rename the file --- tidb-partitioned-tables-guide.md | 16 +- tidb_partitioned_tables_guide.md | 753 ------------------------------- 2 files changed, 8 insertions(+), 761 deletions(-) delete mode 100644 tidb_partitioned_tables_guide.md diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index 07785d723b8e8..e9fda6b2982a6 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -143,6 +143,7 @@ Metrics collected: | IndexRangeScan_5(Build)| 398.73 | 90633.73 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:10.8ms, loops:800, cop_task:{num:600, max:65.6ms, min:1.02ms, avg:22.2ms, p95:45.1ms, max_proc_keys:5, p95_proc_keys:3, tot_proc:6.81s, tot_wait:4.77s, copr_cache_hit_ratio:0.00, build_task_duration:172.8ms, max_distsql_concurrency:3}, rpc_info:{Cop:{num_rpc:600, total_time:13.3s}}, tikv_task:{proc max:54ms, min:0s, avg:13.9ms, p80:20ms, p95:30ms, iters:600, tasks:600}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:29680, get_snapshot_time:2.47s, rocksdb:{key_skipped_count:400, block:{cache_hit_count:117580, read_count:29437, read_byte:104.9 MB, read_time:3.24s}}}, time_detail:{total_process_time:6.81s, total_suspend_time:1.51s, total_wait_time:4.77s, total_kv_read_wall_time:8.31s, tikv_wall_time:13.2s}} | range:[1696125963161,...,1696317134004], keep order:false, stats:partial[...] | N/A | N/A | | TableRowIDScan_6(Probe)| 398.73 | 165221.49 | 400 | cop[tikv] | table:fa | time:514ms, loops:434, cop_task:{num:375, max:31.6ms, min:0s, avg:1.33ms, p95:1.67ms, max_proc_keys:2, p95_proc_keys:2, tot_proc:220.7ms, tot_wait:242.2ms, copr_cache_hit_ratio:0.00, build_task_duration:27.8ms, max_distsql_concurrency:1, max_extra_concurrency:1, store_batch_num:69}, rpc_info:{Cop:{num_rpc:306, total_time:495.5ms}}, tikv_task:{proc max:6ms, min:0s, avg:597.3µs, p80:1ms, p95:1ms, iters:375, tasks:375}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:158.3ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:3197, read_count:803, read_byte:10.2 MB, read_time:113.5ms}}}, time_detail:{total_process_time:220.7ms, total_suspend_time:5.39ms, total_wait_time:242.2ms, total_kv_read_wall_time:224ms, tikv_wall_time:430.5ms}} | keep order:false, stats:partial[...] | N/A | N/A | ``` + [Similar detailed execution plans for partitioned tables with global and local indexes would follow...] #### How to Create a Global Index on a Partitioned Table in TiDB @@ -154,14 +155,13 @@ ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` - Adds a global index to an existing partitioned table. - The `GLOBAL` keyword must be explicitly specified. - For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. - - Not supported in v8.5.x - - Available starting from v9.0.0-beta.1 - - Expected to be included in the next LTS release + - Not supported in v8.5.x + - Available starting from v9.0.0-beta.1 + - Expected to be included in the next LTS release **Option 2: Define Inline on Table Creation** @@ -215,10 +215,10 @@ To compare the performance of TTL and partition drop, we configured TTL to execu #### Findings **TTL Performance:** -- On a write-heavy table, TTL runs every 10 minutes. -- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. -- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. -- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. + - On a write-heavy table, TTL runs every 10 minutes. + - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. + - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. + - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md deleted file mode 100644 index 0792c1d70684b..0000000000000 --- a/tidb_partitioned_tables_guide.md +++ /dev/null @@ -1,753 +0,0 @@ -# Best Practices for Using TiDB Partitioned Tables - -This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. - -## Introduction - -Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. - -A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. - -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. - -While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. - -This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. - -> **Note:** To get started with the fundamentals, refer to the [Partitioned Table User Guide](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. - -## Improving query efficiency - -### Partition Pruning - -**Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions may contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. - -#### Applicable Scenarios - -Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: - -- **Time-series data queries**: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. -- **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. -- **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. - -For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). - -### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index - -In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. - -#### What Did We Test - -We evaluated query performance across three table configurations in TiDB: - -- Non-Partitioned Table -- Partitioned Table with Global Index -- Partitioned Table with Local Index - -#### Test Setup - -- The query **accesses data via a secondary index** and uses IN conditions across multiple values. -- The **partitioned table** had **366 partitions**, defined by **range partitioning on a datetime column**. -- Each matching key could return **multiple rows**, simulating a **high-volume OLTP-style query pattern**. -- We also evaluated the **impact of different partition counts** to understand how partition granularity influences latency and index performance. - -#### Schema - -```sql -CREATE TABLE `fa` ( - `id` bigint NOT NULL AUTO_INCREMENT, - `account_id` bigint(20) NOT NULL, - `sid` bigint(20) DEFAULT NULL, - `user_id` bigint NOT NULL, - `date` int NOT NULL, - PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, - KEY `index_fa_on_sid` (`sid`), - KEY `index_fa_on_account_id` (`account_id`), - KEY `index_fa_on_user_id` (`user_id`) -) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), -PARTITION `fa_2024003` VALUES LESS THAN (2024003), -... -... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); -``` - -#### SQL - -```sql -SELECT `fa`.* -FROM `fa` -WHERE `fa`.`sid` IN ( - 1696271179344, - 1696317134004, - 1696181972136, - ... - 1696159221765 -); -``` - -- Query filters on secondary index, but does **not include the partition key**. -- Causes **Local Index** to scan across all partitions due to lack of pruning. -- Table lookup tasks are significantly higher for partitioned tables. - -#### Findings - -Data came from a table with **366 range partitions** (for example, by date). -- The **Average Query Time** was obtained from the statement_summary view. -- The query used a **secondary index** and returned **400 rows**. - -Metrics collected: -- **Average Query Time**: from statement_summary -- **Cop Tasks** (Index Scan + Table Lookup): from execution plan - -#### Test Results - -| Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | -|---|---|---|---|---|---| -| Non-Partitioned Table | 12.6 ms | 72 | 79 | 151 | Delivering the best performance with the fewest Cop tasks — ideal for most OLTP use cases. | -| Partitioned Table with Local Index | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries will scan all partitions. | -| Partitioned Table with Global Index | 14.8 ms | 69 | 383 | 452 | Improving index scan efficiency, but table lookups can still be expensive if many rows match. | - -#### Execution Plan Examples - -**Non-partitioned table** - -```yaml -| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | -|---------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|----------|------| -| IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task:{total_time:3.34ms, fetch_handle:3.34ms, build:600ns, wait:2.86µs}, table_task:{total_time:7.55ms, num:1, concurrency:5}, next:{wait_index:3.49ms, wait_table_lookup_build:492.5µs, wait_table_lookup_resp:7.05ms} | | 706.7 KB | N/A | -| IndexRangeScan_5(Build) | 398.73 | 90633.86 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:3.16ms, loops:3, cop_task:{num:72, max:780.4µs, min:394.2µs, avg:566.7µs, p95:748µs, max_proc_keys:20, p95_proc_keys:10, tot_proc:3.66ms, tot_wait:18.6ms, copr_cache_hit_ratio:0.00, build_task_duration:94µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:72, total_time:40.1ms}}, tikv_task:{proc max:1ms, min:0s, avg:27.8µs, p80:0s, p95:0s, iters:72, tasks:72}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:480, get_snapshot_time:17.7ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:160}}}, time_detail:{total_process_time:3.66ms, total_wait_time:18.6ms, total_kv_read_wall_time:2ms, tikv_wall_time:27.4ms} | range:[1696125963161,1696125963161], …, [1696317134004,1696317134004], keep order:false | N/A | N/A | -| TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | -``` - -**Partition table with global index** - -```yaml -| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | -|------------------------|---------|-----------|---------|-----------|-------------------------------------------------|----------------|---------------|----------|------| -| IndexLookUp_8 | 398.73 | 786959.21 | 400 | root | partition:all | time:12.8ms, loops:2, index_task:{total_time:2.71ms, fetch_handle:2.71ms, build:528ns, wait:3.23µs}, table_task:{total_time:9.03ms, num:1, concurrency:5}, next:{wait_index:3.27ms, wait_table_lookup_build:1.49ms, wait_table_lookup_resp:7.53ms} | | 693.9 KB | N/A | -| IndexRangeScan_5(Build)| 398.73 | 102593.43 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid_global(sid, id)| time:2.49ms, loops:3, cop_task:{num:69, max:997µs, min:213.8µs, avg:469.8µs, p95:986.6µs, max_proc_keys:15, p95_proc_keys:10, tot_proc:13.4ms, tot_wait:1.52ms, copr_cache_hit_ratio:0.00, build_task_duration:498.4µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:69, total_time:31.8ms}}, tikv_task:{proc max:1ms, min:0s, avg:101.4µs, p80:0s, p95:1ms, iters:69, tasks:69}, scan_detail:{total_process_keys:400, total_process_keys_size:31200, total_keys:480, get_snapshot_time:679.9µs, rocksdb:{key_skipped_count:400, block:{cache_hit_count:189, read_count:54, read_byte:347.7 KB, read_time:6.17ms}}}, time_detail:{total_process_time:13.4ms, total_wait_time:1.52ms, total_kv_read_wall_time:7ms, tikv_wall_time:19.3ms} | range:[1696125963161,1696125963161], …, keep order:false, stats:partial[...] | N/A | N/A | -| TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | -``` - -**Partition table with local index** - -```yaml -| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | -|------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|---------|-------| -| IndexLookUp_7 | 398.73 | 784450.63 | 400 | root | partition:all | time:290.8ms, loops:2, index_task:{total_time:103.6ms, fetch_handle:7.74ms, build:133.2µs, wait:95.7ms}, table_task:{total_time:551.1ms, num:217, concurrency:5}, next:{wait_index:179.6ms, wait_table_lookup_build:391µs, wait_table_lookup_resp:109.5ms} | | 4.30 MB | N/A | -| IndexRangeScan_5(Build)| 398.73 | 90633.73 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:10.8ms, loops:800, cop_task:{num:600, max:65.6ms, min:1.02ms, avg:22.2ms, p95:45.1ms, max_proc_keys:5, p95_proc_keys:3, tot_proc:6.81s, tot_wait:4.77s, copr_cache_hit_ratio:0.00, build_task_duration:172.8ms, max_distsql_concurrency:3}, rpc_info:{Cop:{num_rpc:600, total_time:13.3s}}, tikv_task:{proc max:54ms, min:0s, avg:13.9ms, p80:20ms, p95:30ms, iters:600, tasks:600}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:29680, get_snapshot_time:2.47s, rocksdb:{key_skipped_count:400, block:{cache_hit_count:117580, read_count:29437, read_byte:104.9 MB, read_time:3.24s}}}, time_detail:{total_process_time:6.81s, total_suspend_time:1.51s, total_wait_time:4.77s, total_kv_read_wall_time:8.31s, tikv_wall_time:13.2s}} | range:[1696125963161,...,1696317134004], keep order:false, stats:partial[...] | N/A | N/A | -| TableRowIDScan_6(Probe)| 398.73 | 165221.49 | 400 | cop[tikv] | table:fa | time:514ms, loops:434, cop_task:{num:375, max:31.6ms, min:0s, avg:1.33ms, p95:1.67ms, max_proc_keys:2, p95_proc_keys:2, tot_proc:220.7ms, tot_wait:242.2ms, copr_cache_hit_ratio:0.00, build_task_duration:27.8ms, max_distsql_concurrency:1, max_extra_concurrency:1, store_batch_num:69}, rpc_info:{Cop:{num_rpc:306, total_time:495.5ms}}, tikv_task:{proc max:6ms, min:0s, avg:597.3µs, p80:1ms, p95:1ms, iters:375, tasks:375}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:158.3ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:3197, read_count:803, read_byte:10.2 MB, read_time:113.5ms}}}, time_detail:{total_process_time:220.7ms, total_suspend_time:5.39ms, total_wait_time:242.2ms, total_kv_read_wall_time:224ms, tikv_wall_time:430.5ms}} | keep order:false, stats:partial[...] | N/A | N/A | -``` -[Similar detailed execution plans for partitioned tables with global and local indexes would follow...] - -#### How to Create a Global Index on a Partitioned Table in TiDB - -**Option 1: Add via ALTER TABLE** - -```sql -ALTER TABLE -ADD UNIQUE INDEX (col1, col2) GLOBAL; -``` - - -Adds a global index to an existing partitioned table. - -- The `GLOBAL` keyword must be explicitly specified. -- For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. - - Not supported in v8.5.x - - Available starting from v9.0.0-beta.1 - - Expected to be included in the next LTS release - -**Option 2: Define Inline on Table Creation** - -```sql -CREATE TABLE t ( - id BIGINT NOT NULL, - col1 VARCHAR(50), - col2 VARCHAR(50), - -- other columns... - - UNIQUE GLOBAL INDEX idx_col1_col2 (col1, col2) -) -PARTITION BY RANGE (id) ( - PARTITION p0 VALUES LESS THAN (10000), - PARTITION p1 VALUES LESS THAN (20000), - PARTITION pMax VALUES LESS THAN MAXVALUE -); -``` - -#### Summary - -The performance overhead of partitioned tables in TiDB depends significantly on the number of partitions and the type of index used. - -- The more partitions you have, the more severe the potential performance degradation. -- With a smaller number of partitions, the impact may not be as noticeable, but it's still workload-dependent. -- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. -- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). - -#### Recommendation - -- Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. -- If you must use partitioned tables, benchmark both global index and local index strategies under your workload. -- Use global indexes when query performance across partitions is critical. -- Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. - -## Facilitating Bulk Data Deletion - -### Data Cleanup Efficiency: TTL vs. Direct Partition Drop - -In TiDB, historical data cleanup can be handled either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. - -#### What's the difference? - -- **TTL**: Automatically removes data based on its age, but may be slower due to the need to scan and clean data over time. -- **Partition Drop**: Deletes an entire partition at once, making it much faster, especially when dealing with large datasets. - -#### What Did We Test - -To compare the performance of TTL and partition drop, we configured TTL to execute every 10 minutes and created a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches were tested under background write loads of 50 and 100 concurrent threads. We measured key metrics such as execution time, system resource utilization, and the total number of rows deleted. - -#### Findings - -**TTL Performance:** -- On a write-heavy table, TTL runs every 10 minutes. -- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. -- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. -- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. - -**Partition Drop Performance:** - -- DROP PARTITION removes an entire data segment instantly, with minimal resource usage. -- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. - -#### How to Use TTL and Partition Drop in TiDB - -In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at https://docs.pingcap.com/tidb/stable/time-to-live/. - -**TTL schema** - -```sql -CREATE TABLE `ad_cache` ( - `session` varchar(255) NOT NULL, - `ad_id` varbinary(255) NOT NULL, - `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, - `suffix` bigint(20) NOT NULL, - `expire_time` timestamp NULL DEFAULT NULL, - `data` mediumblob DEFAULT NULL, - `version` int(11) DEFAULT NULL, - `is_delete` tinyint(1) DEFAULT NULL, - PRIMARY KEY (`session`, `ad_id`, `create_time`, `suffix`) -) -ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' -TTL_JOB_INTERVAL='10m'; -``` - -**Drop Partition (Range INTERVAL partitioning)** - -```sql -CREATE TABLE `ad_cache` ( - `session_id` varchar(255) NOT NULL, - `external_id` varbinary(255) NOT NULL, - `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, - `id_suffix` bigint(20) NOT NULL, - `expire_time` timestamp NULL DEFAULT NULL, - `cache_data` mediumblob DEFAULT NULL, - `data_version` int(11) DEFAULT NULL, - `is_deleted` tinyint(1) DEFAULT NULL, - PRIMARY KEY ( - `session_id`, `external_id`, - `create_time`, `id_suffix` - ) NONCLUSTERED -) -SHARD_ROW_ID_BITS=7 -PRE_SPLIT_REGIONS=2 -PARTITION BY RANGE COLUMNS (create_time) -INTERVAL (10 MINUTE) -FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') -... -LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); -``` - -It's required to run DDL alter table partition ... to change the FIRST PARTITION and LAST PARTITION periodically. These two DDL statements can drop the old partitions and create new ones. - -```sql -ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); -ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); -``` - -#### Recommendation - -For workloads with **large or time-based data cleanup**, prefer using **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. TTL is still useful for finer-grained or background cleanup but may not be optimal under high write pressure or when deleting large volumes of data quickly. - -### Partition Drop Efficiency: Local Index vs Global Index - -Partition table with Global Index requires synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION. In this section, the tests show that DROP PARTITION is much slower when using a **Global Index** compared to a **Local Index**. This should be considered when designing partitioned tables. - -#### What Did We Test - -We created a table with **366 partitions** and tested the DROP PARTITION performance using both **Global Index** and **Local Index**. The total number of rows was **1 billion**. - -| Index Type | Duration (drop partition) | -|---|---| -| Global Index | 1 min 16.02 s | -| Local Index | 0.52 s | - -#### Findings - -Dropping a partition on a table with a Global Index took **76 seconds**, while the same operation with a Local Index took only **0.52 seconds**. The reason is that Global Indexes span all partitions and require more complex updates, while Local Indexes are limited to individual partitions and are easier to handle. - -**Global Index** - -```sql -ALTER TABLE A DROP PARTITION A_2024363; -``` - -#### Recommendation - -When a partitioned table contains global indexes, performing certain DDL operations such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. - -If you need to drop partitions frequently and minimize the performance impact on the system, it's better to use **local indexes** for faster and more efficient operations. - -## Mitigating Write Hotspot Issues - -### Background - -In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. - -This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: - -- A single Region handling most of the write workload, while other Regions remain idle. -- Higher write latency and reduced throughput. -- Limited performance gains from scaling out TiKV nodes, as the bottleneck remains concentrated on one Region. - -**Partitioned tables** can help mitigate this problem. By applying **hash** or **key** partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. - -### How It Works - -TiDB stores table data in **Regions**, each covering a continuous range of row keys. - -When the primary key is AUTO_INCREMENT and the secondary indexes on datetime columns are monotonically increasing: - -**Without Partitioning:** - -- New rows always have the highest key values and are inserted into the same "last Region." -- That Region is served by one TiKV node at a time, becoming a single write bottleneck. - -**With Hash/Key Partitioning:** - -- The table and the secondary indexes are split into multiple partitions using a hash or key function on the primary key or indexed columns. -- Each partition has its own set of Regions, often distributed across different TiKV nodes. -- Inserts are spread across multiple Regions in parallel, improving load distribution and throughput. - -### Use Case - -If a table with an AUTO_INCREMENT primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write load more evenly. - -```sql -CREATE TABLE server_info ( - id bigint NOT NULL AUTO_INCREMENT, - serial_no varchar(100) DEFAULT NULL, - device_name varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL, - device_type varchar(50) DEFAULT NULL, - modified_ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, - PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */, - KEY idx_serial_no (serial_no), - KEY idx_modified_ts (modified_ts) -) /*T![auto_id_cache] AUTO_ID_CACHE=1 */ -PARTITION BY KEY (id) PARTITIONS 16; -``` - -### Pros - -- **Balanced Write Load** — Hotspots are spread across multiple partitions, reducing contention and improving insert performance. -- **Query Optimization via Partition Pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. - -### Cons - -**Potential Query Performance Drop Without Partition Pruning** - -When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: - -```sql -SELECT * FROM server_info WHERE `serial_no` = ?; -``` - -**Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: - -```sql -ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; -``` - -## Partition Management Challenge - -### How to Avoid Hotspots Caused by New Range Partitions - -#### Overview - -New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. - -#### Common Hotspot Scenarios - -**Read Hotspot** - -When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. - -**Root Cause:** - -By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. - -**Impact:** - -When a query does **not filter by partition key**, TiDB will **scan all partitions** (as seen in the execution plan partition:all). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. - -**Write Hotspot** - -When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: - -**Root Cause:** -In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. - -However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. - -**Impact:** - -This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. - - -### Summary Table - -| Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | -|---|---|---|---|---|---| -| NONCLUSTERED Partitioned | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | -| CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | -| CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | - -#### Solutions - -**1. NONCLUSTERED Partitioned Table** - -**Pros:** - -- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with SHARD_ROW_ID_BITS and [PRE_SPLIT_REGIONS](https://docs.pingcap.com/tidb/stable/sql-statement-split-region/#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. -- Lower operational overhead. - -**Cons:** - -- Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. - -**Recommendation:** - -- Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. - -**Best Practices** - -Create a partitioned table with SHARD_ROW_ID_BITS and PRE_SPLIT_REGIONS to pre-split table regions. The value of PRE_SPLIT_REGIONS must be less than or equal to that of SHARD_ROW_ID_BITS. The number of pre-split Regions for each partition is 2^(PRE_SPLIT_REGIONS). - -```sql -CREATE TABLE employees ( - id INT NOT NULL, - fname VARCHAR(30), - lname VARCHAR(30), - hired DATE NOT NULL DEFAULT '1970-01-01', - separated DATE DEFAULT '9999-12-31', - job_code INT, - store_id INT, - PRIMARY KEY (`id`,`hired`) NONCLUSTERED, - KEY `idx_employees_on_store_id` (`store_id`) -)SHARD_ROW_ID_BITS = 2 PRE_SPLIT_REGIONS=2 -PARTITION BY RANGE ( YEAR(hired) ) ( - PARTITION p0 VALUES LESS THAN (1991), - PARTITION p1 VALUES LESS THAN (1996), - PARTITION p2 VALUES LESS THAN (2001), - PARTITION p3 VALUES LESS THAN (2006) -); -``` - -Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. - -```sql --- table -ALTER TABLE employees ATTRIBUTES 'merge_option=deny'; --- partition -ALTER TABLE employees PARTITION `p3` ATTRIBUTES 'merge_option=deny'; -``` - -**Determining split boundaries based on existing business data** - -To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. - -**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: - -```sql -SELECT MIN(id), MAX(id) FROM employees; -``` - -- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. -- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. -- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. - -**Pre-split and scatter regions** - -A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. - -**Splitting regions for the primary key of all partitions** - -To split regions for the primary key of all partitions in a partitioned table, you can use a command like: - -```sql -SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; -``` - -This example will split each partition's primary key range into `` regions between the specified boundary values. - -**Splitting Regions for the secondary index of all partitions.** - -```sql -SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; -``` - -**(Optional) When adding a new partition, you MUST manually split regions for its primary key and indices.** - -```sql -ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); - -SHOW TABLE employees PARTITION (p4) regions; - -SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "2006-01-01") AND (100000, "2011-01-01") REGIONS ; - -SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; - -SHOW TABLE employees PARTITION (p4) regions; -``` - -**2. CLUSTERED Partitioned Table** - -**Pros:** - -- Queries using **Point Get** or **Table Range Scan** do **not** need additional lookups, resulting in better **read performance**. - -**Cons:** - -- **Manual region splitting** is required when creating new partitions, increasing operational complexity. - -**Recommendation:** - -- Ideal when low-latency point queries are important and operational resources are available to manage region splitting. - -**Best Practices** - -Create a CLUSTERED partitioned table. - -```sql -CREATE TABLE employees2 ( - id INT NOT NULL, - fname VARCHAR(30), - lname VARCHAR(30), - hired DATE NOT NULL DEFAULT '1970-01-01', - separated DATE DEFAULT '9999-12-31', - job_code INT, - store_id INT, - PRIMARY KEY (`id`,`hired`) CLUSTERED, - KEY `idx_employees2_on_store_id` (`store_id`) -) -PARTITION BY RANGE ( YEAR(hired) ) ( - PARTITION p0 VALUES LESS THAN (1991), - PARTITION p1 VALUES LESS THAN (1996), - PARTITION p2 VALUES LESS THAN (2001), - PARTITION p3 VALUES LESS THAN (2006) -); -``` - -Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. - -```sql -ALTER TABLE employees2 ATTRIBUTES 'merge_option=deny'; -``` - -**Determining split boundaries based on existing business data** - -To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. - -**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: - -```sql -SELECT MIN(id), MAX(id) FROM employees2; -``` - -- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. -- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. -- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. - -**Pre-split and scatter regions** - -A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. - -**Splitting regions for all partitions** - -```sql -SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; -``` - -**Splitting regions for the secondary index of all partitions.** - -```sql -SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; -``` - -**(Optional) When adding a new partition, you MUST manually split regions for the specific partition and its indices.** - -```sql -ALTER TABLE employees2 ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); - -show table employees2 PARTITION (p4) regions; - -SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; - -SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; - -show table employees2 PARTITION (p4) regions; -``` - -**3. CLUSTERED Non-partitioned Table** - -**Pros:** - -- **No hotspot risk from new partitions**. -- Provides **good read performance** for point and range queries. - -**Cons:** - -- **Cannot use DROP PARTITION** to clean up large volumes of old data. - -**Recommendation:** - -- Best suited for use cases that require stable performance and do not benefit from partition-based data management. - - -## Converting Between Partitioned and Non-Partitioned Tables - -When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: - -1. Batch DML: `INSERT INTO ... SELECT ...` -2. Pipeline DML: `INSERT INTO ... SELECT ...` -3. `IMPORT INTO`: `IMPORT INTO ... FROM SELECT ...` -4. Online DDL: Direct schema transformation via `ALTER TABLE` - -This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. - -#### Table Schema: `fa` - -```sql -CREATE TABLE `fa` ( - `id` bigint NOT NULL AUTO_INCREMENT, - `account_id` bigint(20) NOT NULL, - `sid` bigint(20) DEFAULT NULL, - `user_id` bigint NOT NULL, - `date` int NOT NULL, - PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, - KEY `index_fa_on_sid` (`sid`), - KEY `index_fa_on_account_id` (`account_id`), - KEY `index_fa_on_user_id` (`user_id`) -) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), -PARTITION `fa_2024003` VALUES LESS THAN (2024003), -... -... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); -``` - - -#### Table Schema: `fa_new` - -```sql -CREATE TABLE `fa` ( - `id` bigint NOT NULL AUTO_INCREMENT, - `account_id` bigint(20) NOT NULL, - `sid` bigint(20) DEFAULT NULL, - `user_id` bigint NOT NULL, - `date` int NOT NULL, - PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, - KEY `index_fa_on_sid` (`sid`), - KEY `index_fa_on_account_id` (`account_id`), - KEY `index_fa_on_user_id` (`user_id`) -) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; -``` - -#### Description - -These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. - -### Method 1: Batch DML INSERT INTO ... SELECT ... - -```sql -SET tidb_mem_quota_query = 0; -INSERT INTO fa_new SELECT * FROM fa; --- 120 million rows copied in 1h 52m 47s -``` - - -### Method 2: Pipeline DML INSERT INTO ... SELECT ... - -```sql -SET tidb_dml_type = "bulk"; -SET tidb_mem_quota_query = 0; -SET tidb_enable_mutation_checker = OFF; -INSERT INTO fa_new SELECT * FROM fa; --- 120 million rows copied in 58m 42s -``` - -### Method 3: IMPORT INTO ... FROM SELECT ... - -```sql -mysql> import into fa_new from select * from fa with thread=32,disable_precheck; -Query OK, 120000000 rows affected, 1 warning (16 min 49.90 sec) -Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 -``` - -### Method 4: Online DDL - -**From partition table to non-partitioned table** - -```sql -SET @@global.tidb_ddl_reorg_worker_cnt = 16; -SET @@global.tidb_ddl_reorg_batch_size = 4096; -alter table fa REMOVE PARTITIONING; --- real 170m12.024s (≈ 2h 50m) -``` - -**From non-partition table to partitioned table** - -```sql -SET @@global.tidb_ddl_reorg_worker_cnt = 16; -SET @@global.tidb_ddl_reorg_batch_size = 4096; -ALTER TABLE fa PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), -... -PARTITION `fa_2024365` VALUES LESS THAN (2024365), -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); - -Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) -``` - -### Findings - -| Method | Time Taken | -|---|---| -| Method 1: Batch DML INSERT INTO ... SELECT | 1h 52m 47s | -| Method 2: Pipeline DML: INSERT INTO ... SELECT ... | 58m 42s | -| Method 3: IMPORT INTO ... FROM SELECT ... | 16m 59s | -| Method 4: Online DDL (From partition table to non-partitioned table) | 2h 50m | -| Method 4: Online DDL (From non-partition table to partitioned table) | 2h 31m | - -### Recommendation - -TiDB offers two approaches for converting tables between partitioned and non-partitioned states: - -Choose an offline method like [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file From a294efbe470a322bd30a60e3434ec8b8329405de Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 18:41:53 +0800 Subject: [PATCH 42/84] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index e9fda6b2982a6..3836e11fc3ce8 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -215,10 +215,11 @@ To compare the performance of TTL and partition drop, we configured TTL to execu #### Findings **TTL Performance:** - - On a write-heavy table, TTL runs every 10 minutes. - - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. - - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. - - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. + + - On a write-heavy table, TTL runs every 10 minutes. + - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. + - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. + - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** @@ -227,7 +228,7 @@ To compare the performance of TTL and partition drop, we configured TTL to execu #### How to Use TTL and Partition Drop in TiDB -In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at https://docs.pingcap.com/tidb/stable/time-to-live/. +In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . **TTL schema** @@ -419,7 +420,6 @@ However, if the initial write traffic to this new partition is **very high**, th This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. - ### Summary Table | Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | @@ -629,7 +629,6 @@ show table employees2 PARTITION (p4) regions; - Best suited for use cases that require stable performance and do not benefit from partition-based data management. - ## Converting Between Partitioned and Non-Partitioned Tables When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: @@ -664,7 +663,6 @@ PARTITION `fa_2024003` VALUES LESS THAN (2024003), PARTITION `fa_2024366` VALUES LESS THAN (2024366)); ``` - #### Table Schema: `fa_new` ```sql @@ -693,7 +691,6 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 1h 52m 47s ``` - ### Method 2: Pipeline DML INSERT INTO ... SELECT ... ```sql From 50ac5742efc5e510b9de0a8f3dae423c4f52e98f Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 20:10:17 +0800 Subject: [PATCH 43/84] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index 3836e11fc3ce8..bf25738174702 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -216,10 +216,10 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **TTL Performance:** - - On a write-heavy table, TTL runs every 10 minutes. - - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. - - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. - - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. + - On a write-heavy table, TTL runs every 10 minutes. + - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. + - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. + - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** @@ -640,7 +640,7 @@ When working with large tables (for example in this example 120 million rows), t This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. -#### Table Schema: `fa` +### Table Schema: `fa` ```sql CREATE TABLE `fa` ( @@ -663,7 +663,7 @@ PARTITION `fa_2024003` VALUES LESS THAN (2024003), PARTITION `fa_2024366` VALUES LESS THAN (2024366)); ``` -#### Table Schema: `fa_new` +### Table Schema: `fa_new` ```sql CREATE TABLE `fa` ( @@ -683,7 +683,7 @@ CREATE TABLE `fa` ( These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. -### Method 1: Batch DML INSERT INTO ... SELECT ... +### Method 1: Batch DML INSERT INTO ... SELECT ```sql SET tidb_mem_quota_query = 0; @@ -691,7 +691,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 1h 52m 47s ``` -### Method 2: Pipeline DML INSERT INTO ... SELECT ... +### Method 2: Pipeline DML INSERT INTO ... SELECT ```sql SET tidb_dml_type = "bulk"; @@ -701,7 +701,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 58m 42s ``` -### Method 3: IMPORT INTO ... FROM SELECT ... +### Method 3: IMPORT INTO ... FROM SELECT ```sql mysql> import into fa_new from select * from fa with thread=32,disable_precheck; From 9f2cccfe8d19eba2a67c3dc10a8ebe327619f242 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 20:17:24 +0800 Subject: [PATCH 44/84] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index bf25738174702..62a1813011db9 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -216,10 +216,10 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **TTL Performance:** - - On a write-heavy table, TTL runs every 10 minutes. - - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. - - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. - - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. +- On a write-heavy table, TTL runs every 10 minutes. +- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. +- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. +- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** From cd5c1d58540178b5a53705b584e86ec9ba15bc37 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Wed, 15 Oct 2025 11:01:01 +0800 Subject: [PATCH 45/84] Move TiDB partitioned tables guide to best practices Renamed and relocated 'tidb-partitioned-tables-guide.md' to 'best-practices/tidb-partitioned-tables-guide.md'. Updated TOC.md to reference the new location and added front matter and editorial improvements to the guide for clarity and consistency. --- TOC.md | 1 + .../tidb-partitioned-tables-guide.md | 179 +++++++++--------- 2 files changed, 92 insertions(+), 88 deletions(-) rename tidb-partitioned-tables-guide.md => best-practices/tidb-partitioned-tables-guide.md (83%) diff --git a/TOC.md b/TOC.md index 3aba0440ea012..fc608950dc4d9 100644 --- a/TOC.md +++ b/TOC.md @@ -438,6 +438,7 @@ - [Optimize Multi-Column Indexes](/best-practices/multi-column-index-best-practices.md) - [Manage Indexes and Identify Unused Indexes](/best-practices/index-management-best-practices.md) - [Handle Millions of Tables in SaaS Multi-Tenant Scenarios](/best-practices/saas-best-practices.md) + - [Best Practices for Using TiDB Partitioned Tables](/best-practices/tidb-partitioned-tables-guide.md) - [Use UUIDs as Primary Keys](/best-practices/uuid.md) - [Develop Java Applications](/best-practices/java-app-best-practices.md) - [Handle High-Concurrency Writes](/best-practices/high-concurrency-best-practices.md) diff --git a/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md similarity index 83% rename from tidb-partitioned-tables-guide.md rename to best-practices/tidb-partitioned-tables-guide.md index 62a1813011db9..a80adf0160f14 100644 --- a/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -1,9 +1,12 @@ +--- +title: Best Practices for Using TiDB Partitioned Tables +summary: Learn best practices for using TiDB partitioned tables to improve performance, simplify data management, and handle large-scale datasets efficiently. +--- + # Best Practices for Using TiDB Partitioned Tables This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. -## Introduction - Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. @@ -14,15 +17,17 @@ While partitioning offers clear benefits, it also presents **common challenges** This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. -> **Note:** To get started with the fundamentals, refer to the [Partitioned Table User Guide](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. +> **Note:** +> +> To get started with the fundamentals, refer to [Partitioning](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. -## Improving query efficiency +## Improve query efficiency -### Partition Pruning +### Partition pruning **Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions may contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. -#### Applicable Scenarios +#### Applicable scenarios Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: @@ -32,19 +37,19 @@ Partition pruning is most beneficial in scenarios where query predicates match t For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). -### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index +### Query performance on secondary indexes: non-partitioned tables vs. local indexes vs. global indexes In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. -#### What Did We Test +#### What did we test We evaluated query performance across three table configurations in TiDB: -- Non-Partitioned Table -- Partitioned Table with Global Index -- Partitioned Table with Local Index +- Non-partitioned tables +- Partitioned tables with global indexes +- Partitioned tables with local indexes -#### Test Setup +#### Test setup - The query **accesses data via a secondary index** and uses IN conditions across multiple values. - The **partitioned table** had **366 partitions**, defined by **range partitioning on a datetime column**. @@ -94,17 +99,17 @@ WHERE `fa`.`sid` IN ( #### Findings -Data came from a table with **366 range partitions** (for example, by date). +Data comes from a table with **366 range partitions** (for example, by date). -- The **Average Query Time** was obtained from the `statement_summary` view. -- The query used a **secondary index** and returned **400 rows**. +- The **Average Query Time** is obtained from the `statement_summary` view. +- The query uses a **secondary index** and returns **400 rows**. Metrics collected: - **Average Query Time**: from `statement_summary` - **Cop Tasks** (Index Scan + Table Lookup): from execution plan -#### Test Results +#### Test results | Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | |---|---|---|---|---|---| @@ -112,7 +117,7 @@ Metrics collected: | Partitioned Table with Local Index | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries will scan all partitions. | | Partitioned Table with Global Index | 14.8 ms | 69 | 383 | 452 | Improving index scan efficiency, but table lookups can still be expensive if many rows match. | -#### Execution Plan Examples +#### Execution plan examples **Non-partitioned table** @@ -124,7 +129,7 @@ Metrics collected: | TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | ``` -**Partition table with global index** +**Partition tables with global indexes** ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -134,7 +139,7 @@ Metrics collected: | TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | ``` -**Partition table with local index** +**Partition tables with local indexes** ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -146,9 +151,9 @@ Metrics collected: [Similar detailed execution plans for partitioned tables with global and local indexes would follow...] -#### How to Create a Global Index on a Partitioned Table in TiDB +#### How to create a global index on a partitioned table in TiDB -**Option 1: Add via ALTER TABLE** +**Option 1: add via ALTER TABLE** ```sql ALTER TABLE @@ -163,7 +168,7 @@ Adds a global index to an existing partitioned table. - Available starting from v9.0.0-beta.1 - Expected to be included in the next LTS release -**Option 2: Define Inline on Table Creation** +**Option 2: Define inline on table creation** ```sql CREATE TABLE t ( @@ -190,27 +195,27 @@ The performance overhead of partitioned tables in TiDB depends significantly on - For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. - For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). -#### Recommendation +#### Recommendations - Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. - If you must use partitioned tables, benchmark both global index and local index strategies under your workload. - Use global indexes when query performance across partitions is critical. - Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. -## Facilitating Bulk Data Deletion +## Facilitate bulk data deletion -### Data Cleanup Efficiency: TTL vs. Direct Partition Drop +### Data cleanup efficiency: TTL vs. direct partition drop -In TiDB, historical data cleanup can be handled either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. +In TiDB, you can clear up historical data either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. #### What's the difference? -- **TTL**: Automatically removes data based on its age, but may be slower due to the need to scan and clean data over time. -- **Partition Drop**: Deletes an entire partition at once, making it much faster, especially when dealing with large datasets. +- **TTL**: automatically removes data based on its age, but might be slower due to the need to scan and clean data over time. +- **Partition Drop**: deletes an entire partition at once, making it much faster, especially when dealing with large datasets. -#### What Did We Test +#### What did we test -To compare the performance of TTL and partition drop, we configured TTL to execute every 10 minutes and created a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches were tested under background write loads of 50 and 100 concurrent threads. We measured key metrics such as execution time, system resource utilization, and the total number of rows deleted. +To compare the performance of TTL and partition drop, we configure TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write loads of 50 and 100 concurrent threads. We measure key metrics such as execution time, system resource utilization, and the total number of rows deleted. #### Findings @@ -221,14 +226,14 @@ To compare the performance of TTL and partition drop, we configured TTL to execu - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. -**Partition Drop Performance:** +**Partition drop performance:** - DROP PARTITION removes an entire data segment instantly, with minimal resource usage. - DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. -#### How to Use TTL and Partition Drop in TiDB +#### How to use TTL and partition drop in TiDB -In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . +In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . **TTL schema** @@ -275,33 +280,37 @@ FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); ``` -It's required to run DDL alter table partition ... to change the FIRST PARTITION and LAST PARTITION periodically. These two DDL statements can drop the old partitions and create new ones. +It is required to run `ALTER TABLE PARTITION ...` to change the `FIRST PARTITION` and `LAST PARTITION` periodically. These two DDL statements can drop the old partitions and create new ones. ```sql ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); ``` -#### Recommendation +#### Recommendations -For workloads with **large or time-based data cleanup**, prefer using **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. TTL is still useful for finer-grained or background cleanup but may not be optimal under high write pressure or when deleting large volumes of data quickly. +For workloads with **large or time-based data cleanup**, it is recommended to use **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. -### Partition Drop Efficiency: Local Index vs Global Index +TTL is still useful for finer-grained or background cleanup, but might not be optimal under high write pressure or when deleting large volumes of data quickly. -Partition table with Global Index requires synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION. In this section, the tests show that DROP PARTITION is much slower when using a **Global Index** compared to a **Local Index**. This should be considered when designing partitioned tables. +### Partition drop efficiency: local index vs. global index -#### What Did We Test +Partition tables with global indexes require synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORG PARTITION`. -We created a table with **366 partitions** and tested the DROP PARTITION performance using both **Global Index** and **Local Index**. The total number of rows was **1 billion**. +In this section, the tests show that `DROP PARTITION` is much slower when using a global index compared to a local index**. Take this into consideration when you design partitioned tables. -| Index Type | Duration (drop partition) | -|---|---| -| Global Index | 1 min 16.02 s | -| Local Index | 0.52 s | +#### What did we test + +Create a table with **366 partitions** and test the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is **1 billion**. + +| Index Type | Duration (drop partition) | +|--------------|---------------------------| +| Global Index | 1 min 16.02 s | +| Local Index | 0.52 s | #### Findings -Dropping a partition on a table with a Global Index took **76 seconds**, while the same operation with a Local Index took only **0.52 seconds**. The reason is that Global Indexes span all partitions and require more complex updates, while Local Indexes are limited to individual partitions and are easier to handle. +Dropping a partition on a table with a global index takes **76 seconds**, while the same operation with a local index takes only **0.52 seconds**. The reason is that global indexes span all partitions and require more complex updates, while local indexes are limited to individual partitions and are easier to handle. **Global Index** @@ -309,15 +318,13 @@ Dropping a partition on a table with a Global Index took **76 seconds**, while t ALTER TABLE A DROP PARTITION A_2024363; ``` -#### Recommendation - -When a partitioned table contains global indexes, performing certain DDL operations such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. +#### Recommendations -If you need to drop partitions frequently and minimize the performance impact on the system, it's better to use **local indexes** for faster and more efficient operations. +When a partitioned table contains global indexes, performing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORG PARTITION` requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. -## Mitigating Write Hotspot Issues +If you need to drop partitions frequently and minimize the performance impact on the system, it is recommended to use **local indexes** for faster and more efficient operations. -### Background +## Mitigate write hotspot issues In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. @@ -329,7 +336,7 @@ This is common when the primary key is **monotonically increasing**—for exampl **Partitioned tables** can help mitigate this problem. By applying **hash** or **key** partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. -### How It Works +### How it works TiDB stores table data in **Regions**, each covering a continuous range of row keys. @@ -346,9 +353,9 @@ When the primary key is AUTO_INCREMENT and the secondary indexes on datetime col - Each partition has its own set of Regions, often distributed across different TiKV nodes. - Inserts are spread across multiple Regions in parallel, improving load distribution and throughput. -### Use Case +### Use cases -If a table with an AUTO_INCREMENT primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write load more evenly. +If a table with an [`AUTO_INCREMENT`](/auto-increment.md) primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write load more evenly. ```sql CREATE TABLE server_info ( @@ -379,39 +386,39 @@ When converting a non-partitioned table to a partitioned table, TiDB creates a s SELECT * FROM server_info WHERE `serial_no` = ?; ``` -**Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: +**Mitigation**: add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: ```sql ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; ``` -## Partition Management Challenge +## Partition management challenges -### How to Avoid Hotspots Caused by New Range Partitions +### How to avoid Hotspots caused by new range partitions #### Overview New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. -#### Common Hotspot Scenarios +#### Common hotspot scenarios -**Read Hotspot** +**Read hotspot** When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. -**Root Cause:** +**Root cause:** By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. -**Impact:** +**impact:** When a query does **not filter by partition key**, TiDB will **scan all partitions** (as seen in the execution plan partition:all). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. -**Write Hotspot** +**Write hotspot** When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: -**Root Cause:** +**Root cause:** In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. @@ -430,24 +437,24 @@ This imbalance can cause that TiKV node to trigger **flow control**, leading to #### Solutions -**1. NONCLUSTERED Partitioned Table** +**1. NONCLUSTERED partitioned table** **Pros:** -- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with SHARD_ROW_ID_BITS and [PRE_SPLIT_REGIONS](https://docs.pingcap.com/tidb/stable/sql-statement-split-region/#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. +- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with `SHARD_ROW_ID_BITS` and [PRE_SPLIT_REGIONS](/sql-statements//sql-statement-split-region.md#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. - Lower operational overhead. **Cons:** - Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. -**Recommendation:** +**Recommendation** - Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. -**Best Practices** +**Best practices** -Create a partitioned table with SHARD_ROW_ID_BITS and PRE_SPLIT_REGIONS to pre-split table regions. The value of PRE_SPLIT_REGIONS must be less than or equal to that of SHARD_ROW_ID_BITS. The number of pre-split Regions for each partition is 2^(PRE_SPLIT_REGIONS). +Create a partitioned table with `SHARD_ROW_ID_BITS` and `PRE_SPLIT_REGIONS` to pre-split table regions. The value of `PRE_SPLIT_REGIONS` must be less than or equal to that of `SHARD_ROW_ID_BITS`. The number of pre-split Regions for each partition is `2^(PRE_SPLIT_REGIONS)`. ```sql CREATE TABLE employees ( @@ -469,7 +476,7 @@ PARTITION BY RANGE ( YEAR(hired) ) ( ); ``` -Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. +Adding the [`merge_option=deny`](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. ```sql -- table @@ -536,11 +543,11 @@ SHOW TABLE employees PARTITION (p4) regions; - **Manual region splitting** is required when creating new partitions, increasing operational complexity. -**Recommendation:** +**Recommendation** - Ideal when low-latency point queries are important and operational resources are available to manage region splitting. -**Best Practices** +**Best practices** Create a CLUSTERED partitioned table. @@ -564,13 +571,13 @@ PARTITION BY RANGE ( YEAR(hired) ) ( ); ``` -Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. +Adding the [`merge_option=deny`](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. ```sql ALTER TABLE employees2 ATTRIBUTES 'merge_option=deny'; ``` -**Determining split boundaries based on existing business data** +**Determine split boundaries based on existing business data** To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. @@ -588,29 +595,25 @@ SELECT MIN(id), MAX(id) FROM employees2; A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. -**Splitting regions for all partitions** +**Split Regions for all partitions.** ```sql SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; ``` -**Splitting regions for the secondary index of all partitions.** +**Split Regions for the secondary index of all partitions.** ```sql SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; ``` -**(Optional) When adding a new partition, you MUST manually split regions for the specific partition and its indices.** +**(Optional) When adding a new partition, you MUST manually split Regions for the specific partition and its indexes.** ```sql ALTER TABLE employees2 ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); - show table employees2 PARTITION (p4) regions; - SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; - SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; - show table employees2 PARTITION (p4) regions; ``` @@ -629,7 +632,7 @@ show table employees2 PARTITION (p4) regions; - Best suited for use cases that require stable performance and do not benefit from partition-based data management. -## Converting Between Partitioned and Non-Partitioned Tables +## Converte between partitioned and non-partitioned tables When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: @@ -640,7 +643,7 @@ When working with large tables (for example in this example 120 million rows), t This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. -### Table Schema: `fa` +### Table schema: `fa` ```sql CREATE TABLE `fa` ( @@ -663,7 +666,7 @@ PARTITION `fa_2024003` VALUES LESS THAN (2024003), PARTITION `fa_2024366` VALUES LESS THAN (2024366)); ``` -### Table Schema: `fa_new` +### Table schema: `fa_new` ```sql CREATE TABLE `fa` ( @@ -683,7 +686,7 @@ CREATE TABLE `fa` ( These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. -### Method 1: Batch DML INSERT INTO ... SELECT +### Method 1: Batch DML `INSERT INTO ... SELECT` ```sql SET tidb_mem_quota_query = 0; @@ -691,7 +694,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 1h 52m 47s ``` -### Method 2: Pipeline DML INSERT INTO ... SELECT +### Method 2: Pipeline DML `INSERT INTO ... SELECT` ```sql SET tidb_dml_type = "bulk"; @@ -701,7 +704,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 58m 42s ``` -### Method 3: IMPORT INTO ... FROM SELECT +### Method 3: `IMPORT INTO ... FROM SELECT` ```sql mysql> import into fa_new from select * from fa with thread=32,disable_precheck; @@ -745,8 +748,8 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) | Method 4: Online DDL (From partition table to non-partitioned table) | 2 h 50 m | | Method 4: Online DDL (From non-partition table to partitioned table) | 2 h 31 m | -### Recommendation +### Recommendations TiDB offers two approaches for converting tables between partitioned and non-partitioned states: -Choose an offline method like [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file +Choose an offline method such as [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. From 1c926684e8343d630596e75507181b7d6b3f195a Mon Sep 17 00:00:00 2001 From: houfaxin Date: Wed, 15 Oct 2025 11:30:35 +0800 Subject: [PATCH 46/84] Update tidb-partitioned-tables-guide.md --- best-practices/tidb-partitioned-tables-guide.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index a80adf0160f14..ed5efc57c2221 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -7,11 +7,11 @@ summary: Learn best practices for using TiDB partitioned tables to improve perfo This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. -Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. +Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in Online Analytical Processing (OLAP) workloads with massive datasets. -A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. +A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [**global indexes**](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. @@ -149,7 +149,7 @@ Metrics collected: | TableRowIDScan_6(Probe)| 398.73 | 165221.49 | 400 | cop[tikv] | table:fa | time:514ms, loops:434, cop_task:{num:375, max:31.6ms, min:0s, avg:1.33ms, p95:1.67ms, max_proc_keys:2, p95_proc_keys:2, tot_proc:220.7ms, tot_wait:242.2ms, copr_cache_hit_ratio:0.00, build_task_duration:27.8ms, max_distsql_concurrency:1, max_extra_concurrency:1, store_batch_num:69}, rpc_info:{Cop:{num_rpc:306, total_time:495.5ms}}, tikv_task:{proc max:6ms, min:0s, avg:597.3µs, p80:1ms, p95:1ms, iters:375, tasks:375}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:158.3ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:3197, read_count:803, read_byte:10.2 MB, read_time:113.5ms}}}, time_detail:{total_process_time:220.7ms, total_suspend_time:5.39ms, total_wait_time:242.2ms, total_kv_read_wall_time:224ms, tikv_wall_time:430.5ms}} | keep order:false, stats:partial[...] | N/A | N/A | ``` -[Similar detailed execution plans for partitioned tables with global and local indexes would follow...] +The following sections describe similar detailed execution plans for partitioned tables with global and local indexes. #### How to create a global index on a partitioned table in TiDB From 02ffcb97318b9ab2a1b285d5649a986063fd27fd Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Wed, 15 Oct 2025 15:16:18 +0800 Subject: [PATCH 47/84] Update best-practices/tidb-partitioned-tables-guide.md --- best-practices/tidb-partitioned-tables-guide.md | 1 + 1 file changed, 1 insertion(+) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index ed5efc57c2221..0e7150aa8b2e3 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -419,6 +419,7 @@ When a query does **not filter by partition key**, TiDB will **scan all partitio When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: **Root cause:** + In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. From fef2247ae94cc5c4a183cbcdb03893f8a923992f Mon Sep 17 00:00:00 2001 From: houfaxin Date: Wed, 15 Oct 2025 17:18:56 +0800 Subject: [PATCH 48/84] Update TOC.md --- TOC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/TOC.md b/TOC.md index fc608950dc4d9..065047a1034da 100644 --- a/TOC.md +++ b/TOC.md @@ -438,7 +438,7 @@ - [Optimize Multi-Column Indexes](/best-practices/multi-column-index-best-practices.md) - [Manage Indexes and Identify Unused Indexes](/best-practices/index-management-best-practices.md) - [Handle Millions of Tables in SaaS Multi-Tenant Scenarios](/best-practices/saas-best-practices.md) - - [Best Practices for Using TiDB Partitioned Tables](/best-practices/tidb-partitioned-tables-guide.md) + - [Use TiDB Partitioned Tables](/best-practices/tidb-partitioned-tables-guide.md) - [Use UUIDs as Primary Keys](/best-practices/uuid.md) - [Develop Java Applications](/best-practices/java-app-best-practices.md) - [Handle High-Concurrency Writes](/best-practices/high-concurrency-best-practices.md) From b054380a8f71bf59182a8b83741a99c9c53ad284 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Thu, 16 Oct 2025 14:31:44 +0800 Subject: [PATCH 49/84] Update tidb-partitioned-tables-guide.md --- .../tidb-partitioned-tables-guide.md | 111 +++++++++--------- 1 file changed, 58 insertions(+), 53 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index 0e7150aa8b2e3..61970d4a7e4c1 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -11,7 +11,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [**global indexes**](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions might suffer performance drawbacks again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. @@ -19,13 +19,13 @@ This document examines partitioned tables in TiDB from multiple angles, includin > **Note:** > -> To get started with the fundamentals, refer to [Partitioning](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. +> To get started with the fundamentals, see [Partitioning](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. ## Improve query efficiency ### Partition pruning -**Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions may contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. +**Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions might contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. #### Applicable scenarios @@ -41,7 +41,7 @@ For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. -#### What did we test +#### Types of tables to be tested We evaluated query performance across three table configurations in TiDB: @@ -51,13 +51,15 @@ We evaluated query performance across three table configurations in TiDB: #### Test setup -- The query **accesses data via a secondary index** and uses IN conditions across multiple values. -- The **partitioned table** had **366 partitions**, defined by **range partitioning on a datetime column**. -- Each matching key could return **multiple rows**, simulating a **high-volume OLTP-style query pattern**. -- We also evaluated the **impact of different partition counts** to understand how partition granularity influences latency and index performance. +- The query accesses data via a secondary index and uses `IN` conditions across multiple values. +- The partitioned table contains 366 partitions, defined by range partitioning on a datetime column. +- Each matching key returns multiple rows, simulating a high-volume OLTP-style query pattern. +- The **impact of different partition counts** is also evaluated to understand how partition granularity influences latency and index performance. #### Schema +The following schema is used in the example. + ```sql CREATE TABLE `fa` ( `id` bigint NOT NULL AUTO_INCREMENT, @@ -81,6 +83,8 @@ PARTITION `fa_2024366` VALUES LESS THAN (2024366)); #### SQL +The following SQL statement is used in the example. + ```sql SELECT `fa`.* FROM `fa` @@ -93,33 +97,33 @@ WHERE `fa`.`sid` IN ( ); ``` -- Query filters on secondary index, but does **not include the partition key**. -- Causes **Local Index** to scan across all partitions due to lack of pruning. +- Query filters on secondary index, but does not include the partition key. +- Causes local indexes to scan across all partitions due to lack of pruning. - Table lookup tasks are significantly higher for partitioned tables. #### Findings -Data comes from a table with **366 range partitions** (for example, by date). +Data comes from a table with 366 range partitions (for example, by date). - The **Average Query Time** is obtained from the `statement_summary` view. -- The query uses a **secondary index** and returns **400 rows**. +- The query uses a secondary index and returns 400 rows. Metrics collected: - **Average Query Time**: from `statement_summary` -- **Cop Tasks** (Index Scan + Table Lookup): from execution plan +- **Cop Tasks** (Index Scan + Table Lookup): from the execution plan #### Test results | Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | |---|---|---|---|---|---| -| Non-Partitioned Table | 12.6 ms | 72 | 79 | 151 | Delivering the best performance with the fewest Cop tasks — ideal for most OLTP use cases. | -| Partitioned Table with Local Index | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries will scan all partitions. | -| Partitioned Table with Global Index | 14.8 ms | 69 | 383 | 452 | Improving index scan efficiency, but table lookups can still be expensive if many rows match. | +| Non-partitioned table | 12.6 ms | 72 | 79 | 151 | Provides the best performance with the fewest Cop tasks, which is ideal for most OLTP use cases. | +| Partitioned table with local indexes | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries scan all partitions. | +| Partitioned table with global indexes | 14.8 ms | 69 | 383 | 452 | It improves index scan efficiency, but table lookups can still take long time if many rows match. | #### Execution plan examples -**Non-partitioned table** +##### Non-partitioned table ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -129,7 +133,7 @@ Metrics collected: | TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | ``` -**Partition tables with global indexes** +##### Partition tables with global indexes ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -139,7 +143,7 @@ Metrics collected: | TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | ``` -**Partition tables with local indexes** +##### Partition tables with local indexes ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -151,24 +155,28 @@ Metrics collected: The following sections describe similar detailed execution plans for partitioned tables with global and local indexes. -#### How to create a global index on a partitioned table in TiDB +#### Create a global index on a partitioned table in TiDB + + +You can define inline when creating a table to create a global index. ```sql CREATE TABLE t ( @@ -190,52 +198,50 @@ PARTITION BY RANGE (id) ( The performance overhead of partitioned tables in TiDB depends significantly on the number of partitions and the type of index used. -- The more partitions you have, the more severe the potential performance degradation. -- With a smaller number of partitions, the impact may not be as noticeable, but it's still workload-dependent. +- The more partitions you have, the more severe the potential performance degrades. +- With a smaller number of partitions, the impact might not be as noticeable, but it is still workload-dependent. - For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. - For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). #### Recommendations -- Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. +- Avoid partitioned tables unless necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. - If you must use partitioned tables, benchmark both global index and local index strategies under your workload. - Use global indexes when query performance across partitions is critical. -- Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. +- Use local indexes only if your main concern is DDL efficiency (such as fast `DROP PARTITION`) and the performance side effect from the partition table is acceptable. ## Facilitate bulk data deletion -### Data cleanup efficiency: TTL vs. direct partition drop +In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual partition drop. While both methods serve the same purpose, they differ significantly in performance. The testcases in this section show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. -In TiDB, you can clear up historical data either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. - -#### What's the difference? +#### Differences between TTL and partition drop - **TTL**: automatically removes data based on its age, but might be slower due to the need to scan and clean data over time. - **Partition Drop**: deletes an entire partition at once, making it much faster, especially when dealing with large datasets. -#### What did we test +#### Test case -To compare the performance of TTL and partition drop, we configure TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write loads of 50 and 100 concurrent threads. We measure key metrics such as execution time, system resource utilization, and the total number of rows deleted. +To compare the performance of TTL and partition drop, the test case in this section configures TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write loads of 50 and 100 concurrent threads. This test case measures key metrics such as execution time, system resource utilization, and the total number of rows deleted. #### Findings **TTL Performance:** - On a write-heavy table, TTL runs every 10 minutes. -- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. -- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. -- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. +- With 50 threads, each TTL job takes 8 to 10 minutes, deleted 7 to 11 million rows. +- With 100 threads, it handles up to 20 million rows, but the execution time increases to 15 to 30 minutes, with greater variance. +- TTL jobs impact system performance under high workloads due to extra scanning and deletion activity, reducing overall QPS. **Partition drop performance:** -- DROP PARTITION removes an entire data segment instantly, with minimal resource usage. -- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. +- `DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. +- `DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. #### How to use TTL and partition drop in TiDB -In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . +In this test case, the table structures have been anonymized. For more detailed information on the usage of TTL, see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . -**TTL schema** +The following is the TTL schema. ```sql CREATE TABLE `ad_cache` ( @@ -254,7 +260,7 @@ TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' TTL_JOB_INTERVAL='10m'; ``` -**Drop Partition (Range INTERVAL partitioning)** +The following is the SQL statement for dropping partitions (Range INTERVAL partitioning). ```sql CREATE TABLE `ad_cache` ( @@ -289,7 +295,7 @@ ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); #### Recommendations -For workloads with **large or time-based data cleanup**, it is recommended to use **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. +For workloads with large or time-based data cleanup, it is recommended to use partitioned tables with DROP PARTITION. It offers better performance, lower system impact, and simpler management. TTL is still useful for finer-grained or background cleanup, but might not be optimal under high write pressure or when deleting large volumes of data quickly. @@ -299,9 +305,9 @@ Partition tables with global indexes require synchronous updates to the global i In this section, the tests show that `DROP PARTITION` is much slower when using a global index compared to a local index**. Take this into consideration when you design partitioned tables. -#### What did we test +#### Test case -Create a table with **366 partitions** and test the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is **1 billion**. +This test case creates a table with 366 partitions and tests the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is 1 billion. | Index Type | Duration (drop partition) | |--------------|---------------------------| @@ -380,7 +386,7 @@ PARTITION BY KEY (id) PARTITIONS 16; **Potential Query Performance Drop Without Partition Pruning** -When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: +When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This might significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: ```sql SELECT * FROM server_info WHERE `serial_no` = ?; @@ -408,7 +414,7 @@ When using **range-partitioned tables**, if queries do **not** filter data using **Root cause:** -By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. +By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions might be merged into a **single region**. **impact:** @@ -416,17 +422,16 @@ When a query does **not filter by partition key**, TiDB will **scan all partitio **Write hotspot** -When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: +When using a time-based field as the partition key, a write hotspot might occur when switching to a new partition: **Root cause:** - In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. -However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. +However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it might not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. **Impact:** -This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. +This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn might impact the overall read and write performance of the cluster. ### Summary Table From a7185a51dc7c53f5eb16a87553e6c415ee047c95 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Mon, 20 Oct 2025 14:58:37 +0800 Subject: [PATCH 50/84] Update tidb-partitioned-tables-guide.md --- best-practices/tidb-partitioned-tables-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index 61970d4a7e4c1..3d93dd5880c56 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -200,7 +200,7 @@ The performance overhead of partitioned tables in TiDB depends significantly on - The more partitions you have, the more severe the potential performance degrades. - With a smaller number of partitions, the impact might not be as noticeable, but it is still workload-dependent. -- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. +- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs (Remote Procedure Call) triggered. This means more partitions will likely result in more RPCs, leading to higher latency. - For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). #### Recommendations @@ -225,7 +225,7 @@ To compare the performance of TTL and partition drop, the test case in this sect #### Findings -**TTL Performance:** +**TTL performance:** - On a write-heavy table, TTL runs every 10 minutes. - With 50 threads, each TTL job takes 8 to 10 minutes, deleted 7 to 11 million rows. @@ -237,7 +237,7 @@ To compare the performance of TTL and partition drop, the test case in this sect - `DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. - `DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. -#### How to use TTL and partition drop in TiDB +#### Use TTL and partition drop in TiDB In this test case, the table structures have been anonymized. For more detailed information on the usage of TTL, see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . From 3bfa654be2b2a5efa2fd078fbf5a43a987205f56 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Mon, 20 Oct 2025 17:20:25 +0800 Subject: [PATCH 51/84] Update tidb-partitioned-tables-guide.md --- .../tidb-partitioned-tables-guide.md | 71 +++++++++---------- 1 file changed, 35 insertions(+), 36 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index 3d93dd5880c56..d157d5598c3c7 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -39,7 +39,7 @@ For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable ### Query performance on secondary indexes: non-partitioned tables vs. local indexes vs. global indexes -In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. +In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. #### Types of tables to be tested @@ -51,8 +51,7 @@ We evaluated query performance across three table configurations in TiDB: #### Test setup -- The query accesses data via a secondary index and uses `IN` conditions across multiple values. -- The partitioned table contains 366 partitions, defined by range partitioning on a datetime column. +- The **partitioned table** had **365 partitions**, defined by **range partitioning on a date column**. - Each matching key returns multiple rows, simulating a high-volume OLTP-style query pattern. - The **impact of different partition counts** is also evaluated to understand how partition granularity influences latency and index performance. @@ -73,12 +72,12 @@ CREATE TABLE `fa` ( KEY `index_fa_on_user_id` (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), -PARTITION `fa_2024003` VALUES LESS THAN (2024003), +(PARTITION `fa_2024001` VALUES LESS THAN (2025001), +PARTITION `fa_2024002` VALUES LESS THAN (2025002), +PARTITION `fa_2024003` VALUES LESS THAN (2025003), ... ... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); +PARTITION `fa_2024365` VALUES LESS THAN (2025365)); ``` #### SQL @@ -98,12 +97,12 @@ WHERE `fa`.`sid` IN ( ``` - Query filters on secondary index, but does not include the partition key. -- Causes local indexes to scan across all partitions due to lack of pruning. +- Causes local indexes key lookup for each partition due to lack of pruning. - Table lookup tasks are significantly higher for partitioned tables. #### Findings -Data comes from a table with 366 range partitions (for example, by date). +Data comes from a table with 365 range partitions (for example, by date). - The **Average Query Time** is obtained from the `statement_summary` view. - The query uses a secondary index and returns 400 rows. @@ -157,8 +156,6 @@ The following sections describe similar detailed execution plans for partitioned #### Create a global index on a partitioned table in TiDB - + You can define inline when creating a table to create a global index. ```sql @@ -200,14 +199,14 @@ The performance overhead of partitioned tables in TiDB depends significantly on - The more partitions you have, the more severe the potential performance degrades. - With a smaller number of partitions, the impact might not be as noticeable, but it is still workload-dependent. -- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs (Remote Procedure Call) triggered. This means more partitions will likely result in more RPCs, leading to higher latency. -- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). +- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of [Remote Procedure Calls (RPCs)](https://docs.pingcap.com/tidb/stable/glossary/#remote-procedure-call-rpc) triggered. This means more partitions will likely result in more RPCs, leading to higher latency. +- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). Note that for very large tables where data is already distributed across many Regions, accessing data through a global index may have similar performance to a non-partitioned table, as both scenarios require multiple cross-Region RPCs. #### Recommendations - Avoid partitioned tables unless necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. -- If you must use partitioned tables, benchmark both global index and local index strategies under your workload. -- Use global indexes when query performance across partitions is critical. +- If you know all queries will make use of good partitioning pruning (matching only a few partitions) then local indexes are good +- If you know critical queries does not have good partitioning pruning (matching many partitions) then Global index is to recommend. - Use local indexes only if your main concern is DDL efficiency (such as fast `DROP PARTITION`) and the performance side effect from the partition table is acceptable. ## Facilitate bulk data deletion @@ -286,7 +285,7 @@ FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); ``` -It is required to run `ALTER TABLE PARTITION ...` to change the `FIRST PARTITION` and `LAST PARTITION` periodically. These two DDL statements can drop the old partitions and create new ones. +You are required to run DDL statements like `ALTER TABLE PARTITION ...` to change the `FIRST PARTITION` and `LAST PARTITION` periodically. These two DDL statements can drop the old partitions and create new ones. ```sql ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); @@ -301,13 +300,13 @@ TTL is still useful for finer-grained or background cleanup, but might not be op ### Partition drop efficiency: local index vs. global index -Partition tables with global indexes require synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORG PARTITION`. +A partitioned table with a global index requires synchronous updates to the global index, which can significantly increase the execution time for DDL operations, such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORGANIZE PARTITION`. -In this section, the tests show that `DROP PARTITION` is much slower when using a global index compared to a local index**. Take this into consideration when you design partitioned tables. +In this section, the tests show that `DROP PARTITION` is much slower when using a **global index** compared to a **local index**. This should be considered when you design partitioned tables. #### Test case -This test case creates a table with 366 partitions and tests the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is 1 billion. +This test case creates a table with 365 partitions and tests the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is 1 billion. | Index Type | Duration (drop partition) | |--------------|---------------------------| @@ -316,7 +315,7 @@ This test case creates a table with 366 partitions and tests the `DROP PARTITION #### Findings -Dropping a partition on a table with a global index takes **76 seconds**, while the same operation with a local index takes only **0.52 seconds**. The reason is that global indexes span all partitions and require more complex updates, while local indexes are limited to individual partitions and are easier to handle. +Dropping a partition on a table with a global index takes **76 seconds**, while the same operation with a local index takes only **0.52 seconds**. The reason is that global indexes span all partitions and require more complex updates, while local indexes can just be dropped together with the partition data. **Global Index** @@ -326,7 +325,7 @@ ALTER TABLE A DROP PARTITION A_2024363; #### Recommendations -When a partitioned table contains global indexes, performing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORG PARTITION` requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. +When a partitioned table contains global indexes, performing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORGANIZE PARTITION` requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. If you need to drop partitions frequently and minimize the performance impact on the system, it is recommended to use **local indexes** for faster and more efficient operations. @@ -386,7 +385,7 @@ PARTITION BY KEY (id) PARTITIONS 16; **Potential Query Performance Drop Without Partition Pruning** -When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This might significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: +When converting a non-partitioned table to a partitioned table, TiDB creates a separate Regions for each partition. This might significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: ```sql SELECT * FROM server_info WHERE `serial_no` = ?; @@ -525,7 +524,7 @@ This example will split each partition's primary key range into `; ``` -**(Optional) When adding a new partition, you MUST manually split regions for its primary key and indices.** +**(Optional) When adding a new partition, you should manually split regions for its primary key and indices.** ```sql ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); @@ -664,12 +663,12 @@ CREATE TABLE `fa` ( KEY `index_fa_on_user_id` (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), -PARTITION `fa_2024003` VALUES LESS THAN (2024003), +(PARTITION `fa_2024001` VALUES LESS THAN (2025001), +PARTITION `fa_2024002` VALUES LESS THAN (2025002), +PARTITION `fa_2024003` VALUES LESS THAN (2025003), ... ... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); +PARTITION `fa_2024365` VALUES LESS THAN (2025365)); ``` ### Table schema: `fa_new` @@ -723,8 +722,8 @@ Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 **From partition table to non-partitioned table** ```sql -SET @@global.tidb_ddl_reorg_worker_cnt = 16; -SET @@global.tidb_ddl_reorg_batch_size = 4096; +SET @@global.tidb_ddl_REORGANIZE_worker_cnt = 16; +SET @@global.tidb_ddl_REORGANIZE_batch_size = 4096; alter table fa REMOVE PARTITIONING; -- real 170m12.024 s (≈ 2 h 50 m) ``` @@ -732,14 +731,14 @@ alter table fa REMOVE PARTITIONING; **From non-partition table to partitioned table** ```sql -SET @@global.tidb_ddl_reorg_worker_cnt = 16; -SET @@global.tidb_ddl_reorg_batch_size = 4096; +SET @@global.tidb_ddl_REORGANIZE_worker_cnt = 16; +SET @@global.tidb_ddl_REORGANIZE_batch_size = 4096; ALTER TABLE fa PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), +(PARTITION `fa_2024001` VALUES LESS THAN (2025001), +PARTITION `fa_2024002` VALUES LESS THAN (2025002), ... -PARTITION `fa_2024365` VALUES LESS THAN (2024365), -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); +PARTITION `fa_2024365` VALUES LESS THAN (2025365), +PARTITION `fa_2024365` VALUES LESS THAN (2025365ƒf)); Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) ``` From 2fcc6d7eeeadd25ffcbd2fef545f57da80ff99f2 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 21 Oct 2025 14:41:19 +0800 Subject: [PATCH 52/84] Update tidb-partitioned-tables-guide.md --- .../tidb-partitioned-tables-guide.md | 38 ++++++++++--------- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index d157d5598c3c7..37af510037928 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -223,6 +223,7 @@ In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual To compare the performance of TTL and partition drop, the test case in this section configures TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write loads of 50 and 100 concurrent threads. This test case measures key metrics such as execution time, system resource utilization, and the total number of rows deleted. #### Findings +> **Note**: The performance benefits described below apply to partitioned tables without global indexes. **TTL performance:** @@ -233,8 +234,8 @@ To compare the performance of TTL and partition drop, the test case in this sect **Partition drop performance:** -- `DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. -- `DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. +- `ALTER TABLE ... DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. +- `ALTER TABLE ... DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. #### Use TTL and partition drop in TiDB @@ -244,14 +245,14 @@ The following is the TTL schema. ```sql CREATE TABLE `ad_cache` ( - `session` varchar(255) NOT NULL, - `ad_id` varbinary(255) NOT NULL, + `session_id` varchar(255) NOT NULL, + `external_id` varbinary(255) NOT NULL, `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, - `suffix` bigint(20) NOT NULL, + `id_suffix` bigint(20) NOT NULL, `expire_time` timestamp NULL DEFAULT NULL, - `data` mediumblob DEFAULT NULL, - `version` int(11) DEFAULT NULL, - `is_delete` tinyint(1) DEFAULT NULL, + `cache_data` mediumblob DEFAULT NULL, + `data_version` int(11) DEFAULT NULL, + `is_deleted` tinyint(1) DEFAULT NULL, PRIMARY KEY (`session`, `ad_id`, `create_time`, `suffix`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin @@ -325,25 +326,26 @@ ALTER TABLE A DROP PARTITION A_2024363; #### Recommendations -When a partitioned table contains global indexes, performing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORGANIZE PARTITION` requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. +When a partitioned table contains global indexes, executing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORGANIZE PARTITION` requires updating the global index entries to reflect the changes. This update must be performed immediately to ensure consistency, which can significantly increase the execution time of these DDL operations. If you need to drop partitions frequently and minimize the performance impact on the system, it is recommended to use **local indexes** for faster and more efficient operations. ## Mitigate write hotspot issues -In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. +In TiDB, **write hotspots** can occur when incoming write traffic is unevenly distributed across Regions. -This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: +This is common when the primary key is **monotonically increasing**—for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or secondary index on datetime column with default value set to `CURRENT_TIMESTAMP`—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: -- A single Region handling most of the write workload, while other Regions remain idle. +- A single [Region](https://docs.pingcap.com/tidb/stable/tidb-storage/#region) handling most of the write workload, while other Regions remain idle. - Higher write latency and reduced throughput. - Limited performance gains from scaling out TiKV nodes, as the bottleneck remains concentrated on one Region. **Partitioned tables** can help mitigate this problem. By applying **hash** or **key** partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. +> **Note**: This section uses partitioned tables as an example for mitigating read/write hotspots. TiDB also provides other features such as `AUTO_RANDOM` and `SHARD_ROW_ID_BITS` for hotspot mitigation. When using partitioned tables in certain scenarios, you may need to set `merge_option=deny` to maintain partition boundaries. For more details, see [issue #58128](https://github.com/pingcap/tidb/issues/58128). ### How it works -TiDB stores table data in **Regions**, each covering a continuous range of row keys. +TiDB stores table data and indexes in **Regions**, each covering a continuous range of row keys. When the primary key is AUTO_INCREMENT and the secondary indexes on datetime columns are monotonically increasing: @@ -378,20 +380,20 @@ PARTITION BY KEY (id) PARTITIONS 16; ### Pros -- **Balanced Write Load** — Hotspots are spread across multiple partitions, reducing contention and improving insert performance. +- **Balanced Write Load** — Hotspots are spread across multiple partitions, and therefore multiple **Regions**, reducing contention and improving insert performance. - **Query Optimization via Partition Pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. ### Cons **Potential Query Performance Drop Without Partition Pruning** -When converting a non-partitioned table to a partitioned table, TiDB creates a separate Regions for each partition. This might significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: +When converting a non-partitioned table to a partitioned table, TiDB creates separate Regions for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions or do index lookups in all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: ```sql SELECT * FROM server_info WHERE `serial_no` = ?; ``` -**Mitigation**: add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: +**Mitigation**: add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely truncated, making global indexes a feasible solution in these scenarios. Example: ```sql ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; @@ -424,7 +426,7 @@ When a query does **not filter by partition key**, TiDB will **scan all partitio When using a time-based field as the partition key, a write hotspot might occur when switching to a new partition: **Root cause:** -In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. +In TiDB, newly created partitions initially contain only **one region** on a single TiKV node. As writes concentrate on this single region, it must **split** into multiple regions before writes can be distributed across multiple TiKV nodes. This splitting process is the main cause of the temporary write hotspot. However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it might not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. @@ -738,7 +740,7 @@ ALTER TABLE fa PARTITION BY RANGE (`date`) PARTITION `fa_2024002` VALUES LESS THAN (2025002), ... PARTITION `fa_2024365` VALUES LESS THAN (2025365), -PARTITION `fa_2024365` VALUES LESS THAN (2025365ƒf)); +PARTITION `fa_2024365` VALUES LESS THAN (2025365)); Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) ``` From aec934310f39507a0138871882e51ad8564f4156 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Wed, 22 Oct 2025 10:58:49 +0800 Subject: [PATCH 53/84] Rename partitioned tables guide and update references Renamed 'tidb-partitioned-tables-guide.md' to 'tidb-partitioned-tables-best-practices.md' for consistency. Updated TOC and internal note to reflect the new filename and clarified the scope of performance benefits for partitioned tables without global indexes. --- TOC.md | 2 +- ...es-guide.md => tidb-partitioned-tables-best-practices.md} | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) rename best-practices/{tidb-partitioned-tables-guide.md => tidb-partitioned-tables-best-practices.md} (99%) diff --git a/TOC.md b/TOC.md index 065047a1034da..731c1d7e33d28 100644 --- a/TOC.md +++ b/TOC.md @@ -438,7 +438,7 @@ - [Optimize Multi-Column Indexes](/best-practices/multi-column-index-best-practices.md) - [Manage Indexes and Identify Unused Indexes](/best-practices/index-management-best-practices.md) - [Handle Millions of Tables in SaaS Multi-Tenant Scenarios](/best-practices/saas-best-practices.md) - - [Use TiDB Partitioned Tables](/best-practices/tidb-partitioned-tables-guide.md) + - [Use TiDB Partitioned Tables](/best-practices/tidb-partitioned-tables-best-practices.md) - [Use UUIDs as Primary Keys](/best-practices/uuid.md) - [Develop Java Applications](/best-practices/java-app-best-practices.md) - [Handle High-Concurrency Writes](/best-practices/high-concurrency-best-practices.md) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-best-practices.md similarity index 99% rename from best-practices/tidb-partitioned-tables-guide.md rename to best-practices/tidb-partitioned-tables-best-practices.md index 37af510037928..89667171815f5 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -223,7 +223,10 @@ In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual To compare the performance of TTL and partition drop, the test case in this section configures TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write loads of 50 and 100 concurrent threads. This test case measures key metrics such as execution time, system resource utilization, and the total number of rows deleted. #### Findings -> **Note**: The performance benefits described below apply to partitioned tables without global indexes. + +> **Note:** +> +> The performance benefits described in this section only apply to partitioned tables without global indexes. **TTL performance:** From 6aec89b9f12c2a1035567ccb7b4e66e8df744726 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Thu, 23 Oct 2025 21:20:28 +0800 Subject: [PATCH 54/84] Update tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 89667171815f5..5da892a78f0e8 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -33,7 +33,7 @@ Partition pruning is most beneficial in scenarios where query predicates match t - **Time-series data queries**: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. - **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. -- **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. +- **Hybrid Transactional and Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). From 0dd7df8646b459ec9e9666f126829670361c7eae Mon Sep 17 00:00:00 2001 From: houfaxin Date: Mon, 27 Oct 2025 18:53:18 +0800 Subject: [PATCH 55/84] Update tidb-partitioned-tables-best-practices.md --- .../tidb-partitioned-tables-best-practices.md | 23 +++++++++++-------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 5da892a78f0e8..778df083dc81d 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -23,12 +23,15 @@ This document examines partitioned tables in TiDB from multiple angles, includin ## Improve query efficiency +This section describes how to improve query efficiency by the following methods: + +- Partition pruning +- Query performance on secondary indexes + ### Partition pruning **Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions might contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. -#### Applicable scenarios - Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: - **Time-series data queries**: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. @@ -39,7 +42,7 @@ For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable ### Query performance on secondary indexes: non-partitioned tables vs. local indexes vs. global indexes -In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. +In TiDB, partitioned tables use local indexes by default. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. #### Types of tables to be tested @@ -166,7 +169,9 @@ You can use `ALTER TABLE` to add a global index to an existing partitioned table ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` -**Note:** + +> **Note:** +> > In TiDB v8.5.x and earlier versions, global indexes can only be created on unique columns. Starting from v9.0.0 (currently in beta), global indexes on non-unique columns are supported. This limitation will be removed in the next LTS version. - The `GLOBAL` keyword must be explicitly specified. @@ -289,7 +294,7 @@ FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); ``` -You are required to run DDL statements like `ALTER TABLE PARTITION ...` to change the `FIRST PARTITION` and `LAST PARTITION` periodically. These two DDL statements can drop the old partitions and create new ones. +You need to run DDL statements such as `ALTER TABLE PARTITION ...` to change the `FIRST PARTITION` and `LAST PARTITION` periodically. These two DDL statements can drop the old partitions and create new ones. ```sql ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); @@ -404,9 +409,7 @@ ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; ## Partition management challenges -### How to avoid Hotspots caused by new range partitions - -#### Overview +### How to avoid hotspots caused by new range partitions New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. @@ -437,7 +440,7 @@ However, if the initial write traffic to this new partition is **very high**, th This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn might impact the overall read and write performance of the cluster. -### Summary Table +### Summary | Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | |---|---|---|---|---|---| @@ -642,7 +645,7 @@ show table employees2 PARTITION (p4) regions; - Best suited for use cases that require stable performance and do not benefit from partition-based data management. -## Converte between partitioned and non-partitioned tables +## Convert between partitioned and non-partitioned tables When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: From c3044439ed8a1ced8f0348d05da11d90a7024882 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Fri, 14 Nov 2025 14:07:23 +0800 Subject: [PATCH 56/84] Update tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 778df083dc81d..cfa8239d6ef0d 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -218,7 +218,7 @@ The performance overhead of partitioned tables in TiDB depends significantly on In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual partition drop. While both methods serve the same purpose, they differ significantly in performance. The testcases in this section show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. -#### Differences between TTL and partition drop +### Differences between TTL and partition drop - **TTL**: automatically removes data based on its age, but might be slower due to the need to scan and clean data over time. - **Partition Drop**: deletes an entire partition at once, making it much faster, especially when dealing with large datasets. From 84994ad08cd1ef7543de6f5bed08f880615bb9c7 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Fri, 14 Nov 2025 14:13:09 +0800 Subject: [PATCH 57/84] Update tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 1 + 1 file changed, 1 insertion(+) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index cfa8239d6ef0d..b835a1f03f7b9 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -432,6 +432,7 @@ When a query does **not filter by partition key**, TiDB will **scan all partitio When using a time-based field as the partition key, a write hotspot might occur when switching to a new partition: **Root cause:** + In TiDB, newly created partitions initially contain only **one region** on a single TiKV node. As writes concentrate on this single region, it must **split** into multiple regions before writes can be distributed across multiple TiKV nodes. This splitting process is the main cause of the temporary write hotspot. However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it might not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. From ca8bc75fae5a20eac50b82d8b3dd9ca374f7bf22 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Fri, 14 Nov 2025 14:48:11 +0800 Subject: [PATCH 58/84] Apply suggestions from code review Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- .../tidb-partitioned-tables-best-practices.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index b835a1f03f7b9..4bca0ade95116 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -42,7 +42,7 @@ For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable ### Query performance on secondary indexes: non-partitioned tables vs. local indexes vs. global indexes -In TiDB, partitioned tables use local indexes by default. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. +In TiDB, partitioned tables use local indexes by default. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries that span multiple partitions because a query using local indexes must perform a lookup in each relevant partition, while a query using a global index only needs to perform a single lookup for the entire table. #### Types of tables to be tested @@ -121,7 +121,7 @@ Metrics collected: |---|---|---|---|---|---| | Non-partitioned table | 12.6 ms | 72 | 79 | 151 | Provides the best performance with the fewest Cop tasks, which is ideal for most OLTP use cases. | | Partitioned table with local indexes | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries scan all partitions. | -| Partitioned table with global indexes | 14.8 ms | 69 | 383 | 452 | It improves index scan efficiency, but table lookups can still take long time if many rows match. | +| Partitioned table with global indexes | 14.8 ms | 69 | 383 | 452 | It improves index scan efficiency, but table lookups can still take a long time if many rows match. | #### Execution plan examples @@ -202,7 +202,7 @@ PARTITION BY RANGE (id) ( The performance overhead of partitioned tables in TiDB depends significantly on the number of partitions and the type of index used. -- The more partitions you have, the more severe the potential performance degrades. +- The more partitions you have, the more severe the potential performance degradation. - With a smaller number of partitions, the impact might not be as noticeable, but it is still workload-dependent. - For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of [Remote Procedure Calls (RPCs)](https://docs.pingcap.com/tidb/stable/glossary/#remote-procedure-call-rpc) triggered. This means more partitions will likely result in more RPCs, leading to higher latency. - For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). Note that for very large tables where data is already distributed across many Regions, accessing data through a global index may have similar performance to a non-partitioned table, as both scenarios require multiple cross-Region RPCs. @@ -210,13 +210,13 @@ The performance overhead of partitioned tables in TiDB depends significantly on #### Recommendations - Avoid partitioned tables unless necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. -- If you know all queries will make use of good partitioning pruning (matching only a few partitions) then local indexes are good -- If you know critical queries does not have good partitioning pruning (matching many partitions) then Global index is to recommend. +- If you know all queries will make use of good partition pruning (matching only a few partitions), then local indexes are a good choice. +- If you know critical queries do not have good partition pruning (matching many partitions), then a global index is recommended. - Use local indexes only if your main concern is DDL efficiency (such as fast `DROP PARTITION`) and the performance side effect from the partition table is acceptable. ## Facilitate bulk data deletion -In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual partition drop. While both methods serve the same purpose, they differ significantly in performance. The testcases in this section show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. +In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual partition drop. While both methods serve the same purpose, they differ significantly in performance. The test cases in this section show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. ### Differences between TTL and partition drop @@ -236,7 +236,7 @@ To compare the performance of TTL and partition drop, the test case in this sect **TTL performance:** - On a write-heavy table, TTL runs every 10 minutes. -- With 50 threads, each TTL job takes 8 to 10 minutes, deleted 7 to 11 million rows. +- With 50 threads, each TTL job takes 8 to 10 minutes, deleting 7 to 11 million rows. - With 100 threads, it handles up to 20 million rows, but the execution time increases to 15 to 30 minutes, with greater variance. - TTL jobs impact system performance under high workloads due to extra scanning and deletion activity, reducing overall QPS. @@ -261,7 +261,7 @@ CREATE TABLE `ad_cache` ( `cache_data` mediumblob DEFAULT NULL, `data_version` int(11) DEFAULT NULL, `is_deleted` tinyint(1) DEFAULT NULL, - PRIMARY KEY (`session`, `ad_id`, `create_time`, `suffix`) + PRIMARY KEY (`session_id`, `external_id`, `create_time`, `id_suffix`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' @@ -455,7 +455,7 @@ This imbalance can cause that TiKV node to trigger **flow control**, leading to **Pros:** -- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with `SHARD_ROW_ID_BITS` and [PRE_SPLIT_REGIONS](/sql-statements//sql-statement-split-region.md#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. +- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with `SHARD_ROW_ID_BITS` and [PRE_SPLIT_REGIONS](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. - Lower operational overhead. **Cons:** @@ -481,7 +481,7 @@ CREATE TABLE employees ( store_id INT, PRIMARY KEY (`id`,`hired`) NONCLUSTERED, KEY `idx_employees_on_store_id` (`store_id`) -)SHARD_ROW_ID_BITS = 2 PRE_SPLIT_REGIONS=2 +) SHARD_ROW_ID_BITS = 2 PRE_SPLIT_REGIONS=2 PARTITION BY RANGE ( YEAR(hired) ) ( PARTITION p0 VALUES LESS THAN (1991), PARTITION p1 VALUES LESS THAN (1996), @@ -542,7 +542,7 @@ SHOW TABLE employees PARTITION (p4) regions; SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "2006-01-01") AND (100000, "2011-01-01") REGIONS ; -SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; SHOW TABLE employees PARTITION (p4) regions; ``` @@ -648,7 +648,7 @@ show table employees2 PARTITION (p4) regions; ## Convert between partitioned and non-partitioned tables -When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: +When working with large tables (for example, a table with 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: 1. Batch DML: `INSERT INTO ... SELECT ...` 2. Pipeline DML: `INSERT INTO ... SELECT ...` @@ -683,7 +683,7 @@ PARTITION `fa_2024365` VALUES LESS THAN (2025365)); ### Table schema: `fa_new` ```sql -CREATE TABLE `fa` ( +CREATE TABLE `fa_new` ( `id` bigint NOT NULL AUTO_INCREMENT, `account_id` bigint(20) NOT NULL, `sid` bigint(20) DEFAULT NULL, From 8d5616e225da385854f9a6cd1154602e6f0c106b Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Fri, 14 Nov 2025 14:48:55 +0800 Subject: [PATCH 59/84] Update best-practices/tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 4bca0ade95116..71cb559992dd8 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -747,7 +747,7 @@ ALTER TABLE fa PARTITION BY RANGE (`date`) PARTITION `fa_2024002` VALUES LESS THAN (2025002), ... PARTITION `fa_2024365` VALUES LESS THAN (2025365), -PARTITION `fa_2024365` VALUES LESS THAN (2025365)); +PARTITION `fa_2024366` VALUES LESS THAN (2025366)); Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) ``` From d9c6f7b3c680a0ca4f0e56198fe7e6a3f013e217 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Fri, 14 Nov 2025 14:54:19 +0800 Subject: [PATCH 60/84] Apply suggestions from code review Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- best-practices/tidb-partitioned-tables-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 71cb559992dd8..44fe31c219e9c 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -180,7 +180,7 @@ ADD UNIQUE INDEX (col1, col2) GLOBAL; - Available starting from v9.0.0-beta.1 - Expected to be included in the next LTS release -You can define inline when creating a table to create a global index. +You can also create a global index inline when you create a table. ```sql CREATE TABLE t ( From 7f16cce13c4ae9273e706a6f94c96a24563048fd Mon Sep 17 00:00:00 2001 From: houfaxin Date: Mon, 17 Nov 2025 10:33:24 +0800 Subject: [PATCH 61/84] Update tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 44fe31c219e9c..df19d75504cd6 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -172,7 +172,7 @@ ADD UNIQUE INDEX (col1, col2) GLOBAL; > **Note:** > -> In TiDB v8.5.x and earlier versions, global indexes can only be created on unique columns. Starting from v9.0.0 (currently in beta), global indexes on non-unique columns are supported. This limitation will be removed in the next LTS version. +> In TiDB v8.5.x and earlier versions, global indexes can only be created on unique columns. Starting from v8.5.4 and v9.0.0 (currently in beta), global indexes on non-unique columns are supported. This limitation will be removed in the next LTS version. - The `GLOBAL` keyword must be explicitly specified. - For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. From a9d122ec8579c03a39753c6dc926852412fc26dd Mon Sep 17 00:00:00 2001 From: houfaxin Date: Mon, 17 Nov 2025 10:39:08 +0800 Subject: [PATCH 62/84] Update tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index df19d75504cd6..71d49bf15d3d0 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -172,7 +172,7 @@ ADD UNIQUE INDEX (col1, col2) GLOBAL; > **Note:** > -> In TiDB v8.5.x and earlier versions, global indexes can only be created on unique columns. Starting from v8.5.4 and v9.0.0 (currently in beta), global indexes on non-unique columns are supported. This limitation will be removed in the next LTS version. +> In TiDB v8.5.3 and earlier versions, global indexes can only be created on unique columns. Starting from v8.5.4 and v9.0.0 (currently in beta), global indexes on non-unique columns are supported. This limitation will be removed in the next LTS version. - The `GLOBAL` keyword must be explicitly specified. - For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. From db576602d4951944dd0fa60af8f584752e29a3cf Mon Sep 17 00:00:00 2001 From: houfaxin Date: Wed, 19 Nov 2025 17:05:31 +0800 Subject: [PATCH 63/84] Update tidb-partitioned-tables-best-practices.md --- .../tidb-partitioned-tables-best-practices.md | 100 +++++++++--------- 1 file changed, 50 insertions(+), 50 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 71d49bf15d3d0..3fbb91f2cd141 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -30,13 +30,13 @@ This section describes how to improve query efficiency by the following methods: ### Partition pruning -**Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions might contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. +Partition pruning is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions might contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: -- **Time-series data queries**: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. -- **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. -- **Hybrid Transactional and Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. +- Time-series data queries: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. +- Multi-tenant or category-based datasets: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. +- Hybrid Transactional and Analytical Processing (HTAP): Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing full table scans on large datasets. For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). @@ -46,7 +46,7 @@ In TiDB, partitioned tables use local indexes by default. Each partition has its #### Types of tables to be tested -We evaluated query performance across three table configurations in TiDB: +The query performance of the following types of tables are evaluated: - Non-partitioned tables - Partitioned tables with global indexes @@ -54,9 +54,9 @@ We evaluated query performance across three table configurations in TiDB: #### Test setup -- The **partitioned table** had **365 partitions**, defined by **range partitioning on a date column**. +- The partitioned table had 365 partitions, defined by the range partitioning on a date column. - Each matching key returns multiple rows, simulating a high-volume OLTP-style query pattern. -- The **impact of different partition counts** is also evaluated to understand how partition granularity influences latency and index performance. +- The impact of different partition counts is also evaluated to understand how partition granularity influences latency and index performance. #### Schema @@ -125,7 +125,7 @@ Metrics collected: #### Execution plan examples -##### Non-partitioned table +The following is an execution plan example for non-partitioned tables: ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -135,7 +135,7 @@ Metrics collected: | TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | ``` -##### Partition tables with global indexes +The following is an execution plan example for partition tables with global indexes ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -145,7 +145,7 @@ Metrics collected: | TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | ``` -##### Partition tables with local indexes +The following is an execution plan example for partition tables with local indexes ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -303,15 +303,15 @@ ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); #### Recommendations -For workloads with large or time-based data cleanup, it is recommended to use partitioned tables with DROP PARTITION. It offers better performance, lower system impact, and simpler management. +For workloads with large or time-based data cleanup, it is recommended to use partitioned tables with `DROP PARTITION`. It offers better performance, lower system impact, and simpler management. TTL is still useful for finer-grained or background cleanup, but might not be optimal under high write pressure or when deleting large volumes of data quickly. ### Partition drop efficiency: local index vs. global index -A partitioned table with a global index requires synchronous updates to the global index, which can significantly increase the execution time for DDL operations, such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORGANIZE PARTITION`. +A partitioned table with a global index requires synchronous updates to the global index, which can significantly increase the execution time for DDL operations, such as `DROP PARTITION`, `TRUNCATE PARTITION`, and `REORGANIZE PARTITION`. -In this section, the tests show that `DROP PARTITION` is much slower when using a **global index** compared to a **local index**. This should be considered when you design partitioned tables. +In this section, the tests show that `DROP PARTITION` is much slower when using a global index compared to a local index. Take this into consideration when you design partitioned tables. #### Test case @@ -319,12 +319,12 @@ This test case creates a table with 365 partitions and tests the `DROP PARTITION | Index Type | Duration (drop partition) | |--------------|---------------------------| -| Global Index | 1 min 16.02 s | -| Local Index | 0.52 s | +| Global Index | 76.02 seconds | +| Local Index | 0.52 seconds | #### Findings -Dropping a partition on a table with a global index takes **76 seconds**, while the same operation with a local index takes only **0.52 seconds**. The reason is that global indexes span all partitions and require more complex updates, while local indexes can just be dropped together with the partition data. +Dropping a partition on a table with a global index takes **76.02 seconds**, while the same operation with a local index takes only **0.52 seconds**. The reason is that global indexes span all partitions and require more complex updates, while local indexes can just be dropped together with the partition data. **Global Index** @@ -334,7 +334,7 @@ ALTER TABLE A DROP PARTITION A_2024363; #### Recommendations -When a partitioned table contains global indexes, executing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORGANIZE PARTITION` requires updating the global index entries to reflect the changes. This update must be performed immediately to ensure consistency, which can significantly increase the execution time of these DDL operations. +When a partitioned table contains global indexes, executing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, and `REORGANIZE PARTITION` requires updating the global index entries to reflect the changes. This update must be performed immediately to ensure consistency, which can significantly increase the execution time of these DDL operations. If you need to drop partitions frequently and minimize the performance impact on the system, it is recommended to use **local indexes** for faster and more efficient operations. @@ -342,35 +342,38 @@ If you need to drop partitions frequently and minimize the performance impact on In TiDB, **write hotspots** can occur when incoming write traffic is unevenly distributed across Regions. -This is common when the primary key is **monotonically increasing**—for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or secondary index on datetime column with default value set to `CURRENT_TIMESTAMP`—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: +This is common when the primary key is monotonically increasing, for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or secondary index on datetime column with the default value set to `CURRENT_TIMESTAMP`. Because new rows and index entries are always appended to the "rightmost" Region, over time, this can lead to: - A single [Region](https://docs.pingcap.com/tidb/stable/tidb-storage/#region) handling most of the write workload, while other Regions remain idle. - Higher write latency and reduced throughput. - Limited performance gains from scaling out TiKV nodes, as the bottleneck remains concentrated on one Region. -**Partitioned tables** can help mitigate this problem. By applying **hash** or **key** partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. -> **Note**: This section uses partitioned tables as an example for mitigating read/write hotspots. TiDB also provides other features such as `AUTO_RANDOM` and `SHARD_ROW_ID_BITS` for hotspot mitigation. When using partitioned tables in certain scenarios, you may need to set `merge_option=deny` to maintain partition boundaries. For more details, see [issue #58128](https://github.com/pingcap/tidb/issues/58128). +Partitioned tables can help mitigate this problem. By applying hash or key partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. + +> **Note:** +> +> This section uses partitioned tables as an example for mitigating read and write hotspots. TiDB also provides other features such as [`AUTO_INCREMENT`](/auto-increment.md) and `SHARD_ROW_ID_BITS` for hotspot mitigation. When using partitioned tables in certain scenarios, you might need to set `merge_option=deny` to maintain partition boundaries. For more details, see [issue #58128](https://github.com/pingcap/tidb/issues/58128). ### How it works TiDB stores table data and indexes in **Regions**, each covering a continuous range of row keys. -When the primary key is AUTO_INCREMENT and the secondary indexes on datetime columns are monotonically increasing: +When the primary key is [`AUTO_INCREMENT`](/auto-increment.md) and the secondary indexes on datetime columns are monotonically increasing: -**Without Partitioning:** +**Without partitioning:** - New rows always have the highest key values and are inserted into the same "last Region." - That Region is served by one TiKV node at a time, becoming a single write bottleneck. -**With Hash/Key Partitioning:** +**With hash or key partitioning:** - The table and the secondary indexes are split into multiple partitions using a hash or key function on the primary key or indexed columns. - Each partition has its own set of Regions, often distributed across different TiKV nodes. -- Inserts are spread across multiple Regions in parallel, improving load distribution and throughput. +- Inserts are spread across multiple Regions in parallel, improving workload distribution and throughput. ### Use cases -If a table with an [`AUTO_INCREMENT`](/auto-increment.md) primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write load more evenly. +If a table with an [`AUTO_INCREMENT`](/auto-increment.md) primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write workload more evenly. ```sql CREATE TABLE server_info ( @@ -388,20 +391,20 @@ PARTITION BY KEY (id) PARTITIONS 16; ### Pros -- **Balanced Write Load** — Hotspots are spread across multiple partitions, and therefore multiple **Regions**, reducing contention and improving insert performance. -- **Query Optimization via Partition Pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. +- **Balanced write workload** — Hotspots are spread across multiple partitions, and therefore multiple **Regions**, reducing contention and improving insert performance. +- **Query optimization via partition pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. ### Cons **Potential Query Performance Drop Without Partition Pruning** -When converting a non-partitioned table to a partitioned table, TiDB creates separate Regions for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions or do index lookups in all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: +When converting a non-partitioned table to a partitioned table, TiDB creates separate Regions for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions or do index lookups in all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. For example: ```sql SELECT * FROM server_info WHERE `serial_no` = ?; ``` -**Mitigation**: add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely truncated, making global indexes a feasible solution in these scenarios. Example: +**Mitigation**: add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely truncated, making global indexes a feasible solution in these scenarios. For example: ```sql ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; @@ -409,13 +412,9 @@ ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; ## Partition management challenges -### How to avoid hotspots caused by new range partitions - New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. -#### Common hotspot scenarios - -**Read hotspot** +### Read hotspots When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. @@ -425,17 +424,17 @@ By default, TiDB creates an empty region for each partition when the table is cr **impact:** -When a query does **not filter by partition key**, TiDB will **scan all partitions** (as seen in the execution plan partition:all). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. +When a query does not filter by partition key, TiDB will scan all partitions (as seen in the execution plan `partition:all`). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. -**Write hotspot** +### Write hotspots When using a time-based field as the partition key, a write hotspot might occur when switching to a new partition: **Root cause:** -In TiDB, newly created partitions initially contain only **one region** on a single TiKV node. As writes concentrate on this single region, it must **split** into multiple regions before writes can be distributed across multiple TiKV nodes. This splitting process is the main cause of the temporary write hotspot. +In TiDB, newly created partitions initially contain only one region on a single TiKV node. As writes concentrate on this single region, it must split into multiple regions before writes can be distributed across multiple TiKV nodes. This splitting process is the main cause of the temporary write hotspot. -However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it might not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. +However, if the initial write traffic to this new partition is very high, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it might not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. **Impact:** @@ -449,13 +448,13 @@ This imbalance can cause that TiKV node to trigger **flow control**, leading to | CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | | CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | -#### Solutions +### Solutions -**1. NONCLUSTERED partitioned table** +#### 1. Non-clustered partitioned tables **Pros:** -- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with `SHARD_ROW_ID_BITS` and [PRE_SPLIT_REGIONS](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. +- When a new partition is created in a non-clustered partitioned table configured with `SHARD_ROW_ID_BITS` and [PRE_SPLIT_REGIONS](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. - Lower operational overhead. **Cons:** @@ -501,9 +500,9 @@ ALTER TABLE employees PARTITION `p3` ATTRIBUTES 'merge_option=deny'; **Determining split boundaries based on existing business data** -To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. +To avoid hotspots when a new table or partition is created, it is often beneficial to pre-split regions before heavy writes begin. To make pre-splitting effective, configure the lower and upper boundaries for region splitting based on the actual business data distribution. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. -**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: +Identify the minimum and maximum values from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: ```sql SELECT MIN(id), MAX(id) FROM employees; @@ -547,7 +546,7 @@ SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees_on_store_id` SHOW TABLE employees PARTITION (p4) regions; ``` -**2. CLUSTERED Partitioned Table** +#### 2. Clustered partitioned tables **Pros:** @@ -631,7 +630,7 @@ SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_i show table employees2 PARTITION (p4) regions; ``` -**3. CLUSTERED Non-partitioned Table** +#### 3. Clustered non-partitioned tables **Pros:** @@ -696,8 +695,6 @@ CREATE TABLE `fa_new` ( ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; ``` -#### Description - These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. ### Method 1: Batch DML `INSERT INTO ... SELECT` @@ -721,14 +718,17 @@ INSERT INTO fa_new SELECT * FROM fa; ### Method 3: `IMPORT INTO ... FROM SELECT` ```sql -mysql> import into fa_new from select * from fa with thread=32,disable_precheck; +IMPORT INTO fa_new FROM SELECT * FROM fa WITH thread = 32, disable_precheck; +``` + +``` Query OK, 120000000 rows affected, 1 warning (16 min 49.90 sec) Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 ``` ### Method 4: Online DDL -**From partition table to non-partitioned table** +**From a partition table to a non-partitioned table** ```sql SET @@global.tidb_ddl_REORGANIZE_worker_cnt = 16; @@ -737,7 +737,7 @@ alter table fa REMOVE PARTITIONING; -- real 170m12.024 s (≈ 2 h 50 m) ``` -**From non-partition table to partitioned table** +**From a non-partition table to a partitioned table** ```sql SET @@global.tidb_ddl_REORGANIZE_worker_cnt = 16; From ada40d6d4f3946a299efa31460507292b7789ee2 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Thu, 20 Nov 2025 15:17:32 +0800 Subject: [PATCH 64/84] Update tidb-partitioned-tables-best-practices.md --- .../tidb-partitioned-tables-best-practices.md | 255 +++++++----------- 1 file changed, 94 insertions(+), 161 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 3fbb91f2cd141..afd454ecdad69 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -7,9 +7,9 @@ summary: Learn best practices for using TiDB partitioned tables to improve perfo This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. -Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in Online Analytical Processing (OLAP) workloads with massive datasets. +Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage partition pruning to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in Online Analytical Processing (OLAP) workloads with massive datasets. -A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [**global indexes**](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. +A common use case is range partitioning combined with local indexes, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [**global indexes**](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions might suffer performance drawbacks again, a situation where global indexes can help. @@ -103,7 +103,13 @@ WHERE `fa`.`sid` IN ( - Causes local indexes key lookup for each partition due to lack of pruning. - Table lookup tasks are significantly higher for partitioned tables. -#### Findings +#### Test results + +| Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | +|---|---|---|---|---|---| +| Non-partitioned table | 12.6 ms | 72 | 79 | 151 | Provides the best performance with the fewest Cop tasks, which is ideal for most OLTP use cases. | +| Partitioned table with local indexes | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries scan all partitions. | +| Partitioned table with global indexes | 14.8 ms | 69 | 383 | 452 | It improves index scan efficiency, but table lookups can still take a long time if many rows match. | Data comes from a table with 365 range partitions (for example, by date). @@ -115,17 +121,9 @@ Metrics collected: - **Average Query Time**: from `statement_summary` - **Cop Tasks** (Index Scan + Table Lookup): from the execution plan -#### Test results - -| Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | -|---|---|---|---|---|---| -| Non-partitioned table | 12.6 ms | 72 | 79 | 151 | Provides the best performance with the fewest Cop tasks, which is ideal for most OLTP use cases. | -| Partitioned table with local indexes | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries scan all partitions. | -| Partitioned table with global indexes | 14.8 ms | 69 | 383 | 452 | It improves index scan efficiency, but table lookups can still take a long time if many rows match. | - #### Execution plan examples -The following is an execution plan example for non-partitioned tables: +The following is an execution plan example for a non-partitioned table: ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -135,7 +133,7 @@ The following is an execution plan example for non-partitioned tables: | TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | ``` -The following is an execution plan example for partition tables with global indexes +The following is an execution plan example for a partition tables with a global index: ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -145,7 +143,7 @@ The following is an execution plan example for partition tables with global inde | TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | ``` -The following is an execution plan example for partition tables with local indexes +The following is an execution plan example for a partition table with a local index: ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -161,6 +159,12 @@ The following sections describe similar detailed execution plans for partitioned There are two options for you to create a global index on a partitioned table in TiDB. +> **Note:** +> +> - In TiDB v8.5.3 and earlier versions, global indexes can only be created on unique columns. Starting from v8.5.4, global indexes on non-unique columns are supported. This limitation will be removed in the next LTS version. +> - For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. +> - The `GLOBAL` keyword must be explicitly specified. + ##### Option 1: add via `ALTER TABLE` You can use `ALTER TABLE` to add a global index to an existing partitioned table. @@ -170,15 +174,7 @@ ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` -> **Note:** -> -> In TiDB v8.5.3 and earlier versions, global indexes can only be created on unique columns. Starting from v8.5.4 and v9.0.0 (currently in beta), global indexes on non-unique columns are supported. This limitation will be removed in the next LTS version. - -- The `GLOBAL` keyword must be explicitly specified. -- For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. - - Not supported in v8.5.0 and later versions - - Available starting from v9.0.0-beta.1 - - Expected to be included in the next LTS release +##### Option 2: define inline when creating the table You can also create a global index inline when you create a table. @@ -233,14 +229,14 @@ To compare the performance of TTL and partition drop, the test case in this sect > > The performance benefits described in this section only apply to partitioned tables without global indexes. -**TTL performance:** +The following are findings about the TTL performance: - On a write-heavy table, TTL runs every 10 minutes. - With 50 threads, each TTL job takes 8 to 10 minutes, deleting 7 to 11 million rows. - With 100 threads, it handles up to 20 million rows, but the execution time increases to 15 to 30 minutes, with greater variance. - TTL jobs impact system performance under high workloads due to extra scanning and deletion activity, reducing overall QPS. -**Partition drop performance:** +The following are findings about partition drop performance: - `ALTER TABLE ... DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. - `ALTER TABLE ... DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. @@ -326,7 +322,7 @@ This test case creates a table with 365 partitions and tests the `DROP PARTITION Dropping a partition on a table with a global index takes **76.02 seconds**, while the same operation with a local index takes only **0.52 seconds**. The reason is that global indexes span all partitions and require more complex updates, while local indexes can just be dropped together with the partition data. -**Global Index** +You can use the following SQL statement to drop the partition: ```sql ALTER TABLE A DROP PARTITION A_2024363; @@ -338,14 +334,14 @@ When a partitioned table contains global indexes, executing certain DDL operatio If you need to drop partitions frequently and minimize the performance impact on the system, it is recommended to use **local indexes** for faster and more efficient operations. -## Mitigate write hotspot issues +## Mitigate hotspot issues -In TiDB, **write hotspots** can occur when incoming write traffic is unevenly distributed across Regions. +In TiDB, hotspots can occur when incoming read or write traffic is unevenly distributed across Regions. This is common when the primary key is monotonically increasing, for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or secondary index on datetime column with the default value set to `CURRENT_TIMESTAMP`. Because new rows and index entries are always appended to the "rightmost" Region, over time, this can lead to: - A single [Region](https://docs.pingcap.com/tidb/stable/tidb-storage/#region) handling most of the write workload, while other Regions remain idle. -- Higher write latency and reduced throughput. +- Higher read or write latency and reduced throughput. - Limited performance gains from scaling out TiKV nodes, as the bottleneck remains concentrated on one Region. Partitioned tables can help mitigate this problem. By applying hash or key partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. @@ -396,23 +392,23 @@ PARTITION BY KEY (id) PARTITIONS 16; ### Cons -**Potential Query Performance Drop Without Partition Pruning** +There are some risks when using partition tables. -When converting a non-partitioned table to a partitioned table, TiDB creates separate Regions for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions or do index lookups in all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. For example: +- When converting a non-partitioned table to a partitioned table, TiDB creates separate Regions for each partition. This might significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions or do index lookups in all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. For example, `serial_no` is not the partition key, which will cause the query performance regression: -```sql -SELECT * FROM server_info WHERE `serial_no` = ?; -``` + ```sql + SELECT * FROM server_info WHERE `serial_no` = ?; + ``` -**Mitigation**: add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely truncated, making global indexes a feasible solution in these scenarios. For example: +- Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, hash and key partitioned tables do not support `DROP PARTITION`. In practice, such partitions are rarely truncated, making global indexes a feasible solution in these scenarios. For example: -```sql -ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; -``` + ```sql + ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; + ``` ## Partition management challenges -New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. +New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by new range partitions. ### Read hotspots @@ -442,30 +438,34 @@ This imbalance can cause that TiKV node to trigger **flow control**, leading to ### Summary -| Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | -|---|---|---|---|---|---| -| NONCLUSTERED Partitioned | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | -| CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | -| CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | +The following show the summary information for non-clustered and clusted partition tbales. -### Solutions +| Type | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | +|---|---|---|---|---|---| +| Non-clustered partitioned table | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | +| Clustered partitioned table | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | +| Clustered non-partitioned table | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | -#### 1. Non-clustered partitioned tables +### Solutions for non-clustered partitioned tables -**Pros:** +#### Pros - When a new partition is created in a non-clustered partitioned table configured with `SHARD_ROW_ID_BITS` and [PRE_SPLIT_REGIONS](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. - Lower operational overhead. -**Cons:** +#### Cons + +Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. + +#### Recommendation -- Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. +Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. -**Recommendation** +#### Best practices -- Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. +To address hotspot issues caused by new range partitions, you can perform the following steps. -**Best practices** +##### Step 1. Use `SHARD_ROW_ID_BITS` and `PRE_SPLIT_REGIONS` Create a partitioned table with `SHARD_ROW_ID_BITS` and `PRE_SPLIT_REGIONS` to pre-split table regions. The value of `PRE_SPLIT_REGIONS` must be less than or equal to that of `SHARD_ROW_ID_BITS`. The number of pre-split Regions for each partition is `2^(PRE_SPLIT_REGIONS)`. @@ -489,6 +489,8 @@ PARTITION BY RANGE ( YEAR(hired) ) ( ); ``` +##### Step 2. Add the `merge_option=deny` attribute + Adding the [`merge_option=deny`](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. ```sql @@ -498,7 +500,7 @@ ALTER TABLE employees ATTRIBUTES 'merge_option=deny'; ALTER TABLE employees PARTITION `p3` ATTRIBUTES 'merge_option=deny'; ``` -**Determining split boundaries based on existing business data** +##### Step 3. Determine split boundaries based on existing business data To avoid hotspots when a new table or partition is created, it is often beneficial to pre-split regions before heavy writes begin. To make pre-splitting effective, configure the lower and upper boundaries for region splitting based on the actual business data distribution. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. @@ -508,17 +510,17 @@ Identify the minimum and maximum values from existing production data so that in SELECT MIN(id), MAX(id) FROM employees; ``` -- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. -- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. -- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. +- If the table is new and has no historical data, estimate the min/max values based on your business logic and expected data range. +- For composite primary keys or composite indexes, only the leftmost column needs to be considered when deciding split boundaries. +- If the leftmost column is a string, take string length and distribution into account to ensure even data spread. -**Pre-split and scatter regions** +##### Step 4. Pre-split and scatter regions -A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. +A common practice is to split the number of regions to match the number of TiKV nodes, or to be twice the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. -**Splitting regions for the primary key of all partitions** +##### Step 5. Split regions for the primary key and the secondary index of all partitions if needed -To split regions for the primary key of all partitions in a partitioned table, you can use a command like: +To split regions for the primary key of all partitions in a partitioned table, you can use the following SQL statement: ```sql SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; @@ -526,13 +528,13 @@ SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (1 This example will split each partition's primary key range into `` regions between the specified boundary values. -**Splitting Regions for the secondary index of all partitions.** +To split regions for the secondary index of all partitions in a partitioned table, you can use the following SQL statement: ```sql SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; ``` -**(Optional) When adding a new partition, you should manually split regions for its primary key and indices.** +##### Step 6. (Optional) When adding a new partition, you need to manually split regions for its primary key and indices ```sql ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); @@ -546,117 +548,51 @@ SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees_on_store_id` SHOW TABLE employees PARTITION (p4) regions; ``` -#### 2. Clustered partitioned tables +### Solutions for clustered partitioned tables -**Pros:** +#### Pros -- Queries using **Point Get** or **Table Range Scan** do **not** need additional lookups, resulting in better **read performance**. +Queries using **Point Get** or **Table Range Scan** do **not** need additional lookups, resulting in better **read performance**. -**Cons:** +#### Cons -- **Manual region splitting** is required when creating new partitions, increasing operational complexity. +Manual region splitting is required when creating new partitions, increasing operational complexity. -**Recommendation** +#### Recommendation -- Ideal when low-latency point queries are important and operational resources are available to manage region splitting. +Ideal when low-latency point queries are important and operational resources are available to manage region splitting. -**Best practices** +#### Best practices -Create a CLUSTERED partitioned table. +To address hotspot issues caused by new range partitions, you can perform the steps described in [Best practices for non-clustered partitioned tables](#best-practices). -```sql -CREATE TABLE employees2 ( - id INT NOT NULL, - fname VARCHAR(30), - lname VARCHAR(30), - hired DATE NOT NULL DEFAULT '1970-01-01', - separated DATE DEFAULT '9999-12-31', - job_code INT, - store_id INT, - PRIMARY KEY (`id`,`hired`) CLUSTERED, - KEY `idx_employees2_on_store_id` (`store_id`) -) -PARTITION BY RANGE ( YEAR(hired) ) ( - PARTITION p0 VALUES LESS THAN (1991), - PARTITION p1 VALUES LESS THAN (1996), - PARTITION p2 VALUES LESS THAN (2001), - PARTITION p3 VALUES LESS THAN (2006) -); -``` - -Adding the [`merge_option=deny`](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. - -```sql -ALTER TABLE employees2 ATTRIBUTES 'merge_option=deny'; -``` - -**Determine split boundaries based on existing business data** - -To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. - -**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: +### Solutions for clustered non-partitioned tables -```sql -SELECT MIN(id), MAX(id) FROM employees2; -``` - -- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. -- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. -- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. +#### Pros -**Pre-split and scatter regions** +- No hotspot risks from new range partitions. +- Provides good read performance for point and range queries. -A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. +#### Cons -**Split Regions for all partitions.** +Cannot use `DROP PARTITION` to clean up large volumes of old data to improve deletion efficiency. -```sql -SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; -``` +#### Recommendation -**Split Regions for the secondary index of all partitions.** - -```sql -SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; -``` - -**(Optional) When adding a new partition, you MUST manually split Regions for the specific partition and its indexes.** - -```sql -ALTER TABLE employees2 ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); -show table employees2 PARTITION (p4) regions; -SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; -SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; -show table employees2 PARTITION (p4) regions; -``` - -#### 3. Clustered non-partitioned tables - -**Pros:** - -- **No hotspot risk from new partitions**. -- Provides **good read performance** for point and range queries. - -**Cons:** - -- **Cannot use DROP PARTITION** to clean up large volumes of old data. - -**Recommendation:** - -- Best suited for use cases that require stable performance and do not benefit from partition-based data management. +Best suited for use cases that require stable performance and do not benefit from partition-based data management. ## Convert between partitioned and non-partitioned tables When working with large tables (for example, a table with 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: -1. Batch DML: `INSERT INTO ... SELECT ...` -2. Pipeline DML: `INSERT INTO ... SELECT ...` -3. `IMPORT INTO`: `IMPORT INTO ... FROM SELECT ...` -4. Online DDL: Direct schema transformation via `ALTER TABLE` +- Batch DML: `INSERT INTO ... SELECT ...` +- Pipeline DML: `INSERT INTO ... SELECT ...` +- `IMPORT INTO`: `IMPORT INTO ... FROM SELECT ...` +- Online DDL: Direct schema transformation via `ALTER TABLE` This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. -### Table schema: `fa` +### Table schema for a partitioned table: `fa` ```sql CREATE TABLE `fa` ( @@ -679,7 +615,7 @@ PARTITION `fa_2024003` VALUES LESS THAN (2025003), PARTITION `fa_2024365` VALUES LESS THAN (2025365)); ``` -### Table schema: `fa_new` +### Table schema for a non-partitioned table: `fa_new` ```sql CREATE TABLE `fa_new` ( @@ -717,6 +653,7 @@ INSERT INTO fa_new SELECT * FROM fa; ### Method 3: `IMPORT INTO ... FROM SELECT` + ```sql IMPORT INTO fa_new FROM SELECT * FROM fa WITH thread = 32, disable_precheck; ``` @@ -728,21 +665,21 @@ Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 ### Method 4: Online DDL -**From a partition table to a non-partitioned table** +The following SQL statement converts from a partition table to a non-partitioned table: ```sql SET @@global.tidb_ddl_REORGANIZE_worker_cnt = 16; SET @@global.tidb_ddl_REORGANIZE_batch_size = 4096; -alter table fa REMOVE PARTITIONING; +ALTER TABLE fa REMOVE PARTITIONING; -- real 170m12.024 s (≈ 2 h 50 m) ``` -**From a non-partition table to a partitioned table** +The following SQL statement converts from a non-partition table to a partitioned table: ```sql SET @@global.tidb_ddl_REORGANIZE_worker_cnt = 16; SET @@global.tidb_ddl_REORGANIZE_batch_size = 4096; -ALTER TABLE fa PARTITION BY RANGE (`date`) +ALTER TABLE fa_new PARTITION BY RANGE (`date`) (PARTITION `fa_2024001` VALUES LESS THAN (2025001), PARTITION `fa_2024002` VALUES LESS THAN (2025002), ... @@ -754,6 +691,8 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) ### Findings +The following table show the time taken by each method. + | Method | Time Taken | |---|---| | Method 1: Batch DML INSERT INTO ... SELECT | 1 h 52 m 47 s | @@ -761,9 +700,3 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) | Method 3: IMPORT INTO ... FROM SELECT ... | 16 m 59 s | | Method 4: Online DDL (From partition table to non-partitioned table) | 2 h 50 m | | Method 4: Online DDL (From non-partition table to partitioned table) | 2 h 31 m | - -### Recommendations - -TiDB offers two approaches for converting tables between partitioned and non-partitioned states: - -Choose an offline method such as [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. From e8203dd8242c5238a96c9f0f29d48e82ec256607 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Thu, 20 Nov 2025 16:19:09 +0800 Subject: [PATCH 65/84] Update tidb-partitioned-tables-best-practices.md --- .../tidb-partitioned-tables-best-practices.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index afd454ecdad69..cece00b53ba36 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -438,13 +438,13 @@ This imbalance can cause that TiKV node to trigger **flow control**, leading to ### Summary -The following show the summary information for non-clustered and clusted partition tbales. +The following show the summary information for non-clustered and clusted partition tables. -| Type | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | -|---|---|---|---|---|---| -| Non-clustered partitioned table | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | -| Clustered partitioned table | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | -| Clustered non-partitioned table | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | +| Table Type | Region Pre-splitting | Read performance | Write scalability | Data cleanup via partition | +|---|---|---|---|---| +| Non-clustered partitioned table | Automatic | Lower (more lookups) | High | Supported | +| Clustered partitioned table | Manual | High (fewer lookups) | High (if managed) | Supported | +| Clustered non-partitioned table | N/A | High | Stable | Not supported | ### Solutions for non-clustered partitioned tables From 6b90dd7c4bbb3877a9a6c01d7994f401cab267c8 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Thu, 20 Nov 2025 16:50:21 +0800 Subject: [PATCH 66/84] Update tidb-partitioned-tables-best-practices.md --- .../tidb-partitioned-tables-best-practices.md | 44 +++++++++---------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index cece00b53ba36..4543c3aaf5529 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -9,11 +9,11 @@ This guide introduces how to use partitioned tables in TiDB to improve performan Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage partition pruning to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in Online Analytical Processing (OLAP) workloads with massive datasets. -A common use case is range partitioning combined with local indexes, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [**global indexes**](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. +A common use case is range partitioning combined with local indexes, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [global indexes](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions might suffer performance drawbacks again, a situation where global indexes can help. +Another frequent scenario is using hash or key partitioning to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions might suffer performance drawbacks again, a situation where global indexes can help. -While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. +While partitioning offers clear benefits, it also presents common challenges, such as hotspots caused by newly created range partitions. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. @@ -332,7 +332,7 @@ ALTER TABLE A DROP PARTITION A_2024363; When a partitioned table contains global indexes, executing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, and `REORGANIZE PARTITION` requires updating the global index entries to reflect the changes. This update must be performed immediately to ensure consistency, which can significantly increase the execution time of these DDL operations. -If you need to drop partitions frequently and minimize the performance impact on the system, it is recommended to use **local indexes** for faster and more efficient operations. +If you need to drop partitions frequently and minimize the performance impact on the system, it is recommended to use local indexes for faster and more efficient operations. ## Mitigate hotspot issues @@ -352,7 +352,7 @@ Partitioned tables can help mitigate this problem. By applying hash or key parti ### How it works -TiDB stores table data and indexes in **Regions**, each covering a continuous range of row keys. +TiDB stores table data and indexes in Regions, each covering a continuous range of row keys. When the primary key is [`AUTO_INCREMENT`](/auto-increment.md) and the secondary indexes on datetime columns are monotonically increasing: @@ -369,7 +369,7 @@ When the primary key is [`AUTO_INCREMENT`](/auto-increment.md) and the secondary ### Use cases -If a table with an [`AUTO_INCREMENT`](/auto-increment.md) primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write workload more evenly. +If a table with an [`AUTO_INCREMENT`](/auto-increment.md) primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying hash or key partitioning on the primary key can help distribute the write workload more evenly. ```sql CREATE TABLE server_info ( @@ -387,7 +387,7 @@ PARTITION BY KEY (id) PARTITIONS 16; ### Pros -- **Balanced write workload** — Hotspots are spread across multiple partitions, and therefore multiple **Regions**, reducing contention and improving insert performance. +- **Balanced write workload** — Hotspots are spread across multiple partitions, and therefore multiple Regions, reducing contention and improving insert performance. - **Query optimization via partition pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. ### Cons @@ -400,7 +400,7 @@ There are some risks when using partition tables. SELECT * FROM server_info WHERE `serial_no` = ?; ``` -- Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, hash and key partitioned tables do not support `DROP PARTITION`. In practice, such partitions are rarely truncated, making global indexes a feasible solution in these scenarios. For example: +- Add a global index on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, hash and key partitioned tables do not support `DROP PARTITION`. In practice, such partitions are rarely truncated, making global indexes a feasible solution in these scenarios. For example: ```sql ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; @@ -412,15 +412,15 @@ New range partitions in a partitioned table can easily lead to hotspot issues in ### Read hotspots -When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. +When using range-partitioned tables, if queries do not filter data using the partition key, new empty partitions can easily become read hotspots. **Root cause:** -By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions might be merged into a **single region**. +By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions might be merged into a single region. **impact:** -When a query does not filter by partition key, TiDB will scan all partitions (as seen in the execution plan `partition:all`). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. +When a query does not filter by partition key, TiDB will scan all partitions (as seen in the execution plan `partition:all`). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a read hotspot. ### Write hotspots @@ -434,7 +434,7 @@ However, if the initial write traffic to this new partition is very high, the Ti **Impact:** -This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn might impact the overall read and write performance of the cluster. +This imbalance can cause that TiKV node to trigger flow control, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn might impact the overall read and write performance of the cluster. ### Summary @@ -450,12 +450,12 @@ The following show the summary information for non-clustered and clusted partiti #### Pros -- When a new partition is created in a non-clustered partitioned table configured with `SHARD_ROW_ID_BITS` and [PRE_SPLIT_REGIONS](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. +- When a new partition is created in a non-clustered partitioned table configured with `SHARD_ROW_ID_BITS` and [PRE_SPLIT_REGIONS](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be automatically pre-split, significantly reducing manual intervention. - Lower operational overhead. #### Cons -Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. +Queries using **Point Get** or **Table Range Scan** will require more table lookups, which can degrade read performance for such query types. #### Recommendation @@ -552,7 +552,7 @@ SHOW TABLE employees PARTITION (p4) regions; #### Pros -Queries using **Point Get** or **Table Range Scan** do **not** need additional lookups, resulting in better **read performance**. +Queries using **Point Get** or **Table Range Scan** do not need additional lookups, resulting in better read performance. #### Cons @@ -575,7 +575,7 @@ To address hotspot issues caused by new range partitions, you can perform the st #### Cons -Cannot use `DROP PARTITION` to clean up large volumes of old data to improve deletion efficiency. +You cannot use `DROP PARTITION` to clean up large volumes of old data to improve deletion efficiency. #### Recommendation @@ -694,9 +694,9 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) The following table show the time taken by each method. | Method | Time Taken | -|---|---| -| Method 1: Batch DML INSERT INTO ... SELECT | 1 h 52 m 47 s | -| Method 2: Pipeline DML: INSERT INTO ... SELECT ... | 58 m 42 s | -| Method 3: IMPORT INTO ... FROM SELECT ... | 16 m 59 s | -| Method 4: Online DDL (From partition table to non-partitioned table) | 2 h 50 m | -| Method 4: Online DDL (From non-partition table to partitioned table) | 2 h 31 m | +|--------|------------| +| Method 1: Batch DML: `INSERT INTO ... SELECT` | 1 h 52 m 47 s | +| Method 2: Pipeline DML: `INSERT INTO ... SELECT ...` | 58 m 42 s | +| Method 3: `IMPORT INTO ... FROM SELECT ...` | 16 m 59 s | +| Method 4: Online DDL (From partition table to non-partitioned table) | 2 h 50 m | +| Method 4: Online DDL (From non-partition table to partitioned table) | 2 h 31 m | From 83bbb4effb39aa76b1cb135fefd2753f418c8043 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Fri, 21 Nov 2025 15:15:49 +0800 Subject: [PATCH 67/84] Update tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 4543c3aaf5529..3e4003a12c562 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -7,7 +7,7 @@ summary: Learn best practices for using TiDB partitioned tables to improve perfo This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. -Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage partition pruning to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in Online Analytical Processing (OLAP) workloads with massive datasets. +Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage partition pruning to skip irrelevant data during query execution, reducing resource consumption and accelerating performance, particularly in Online Analytical Processing (OLAP) workloads with massive datasets. A common use case is range partitioning combined with local indexes, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [global indexes](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. From 15d8aa9d20fc2dbc817eff5f78210d3601cbcc09 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Fri, 21 Nov 2025 17:35:09 +0800 Subject: [PATCH 68/84] Update tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 3e4003a12c562..beb3ea568899f 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -9,9 +9,9 @@ This guide introduces how to use partitioned tables in TiDB to improve performan Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage partition pruning to skip irrelevant data during query execution, reducing resource consumption and accelerating performance, particularly in Online Analytical Processing (OLAP) workloads with massive datasets. -A common use case is range partitioning combined with local indexes, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [global indexes](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. +A common use case is range partitioning combined with local indexes, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned tables to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [global indexes](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using hash or key partitioning to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions might suffer performance drawbacks again, a situation where global indexes can help. +Another scenario is using hash or key partitioning to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance workload, but similar to range partitioning, queries without partition-pruning conditions might suffer performance drawbacks again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents common challenges, such as hotspots caused by newly created range partitions. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. From d29acfaf833bba5941e9b7b6ecf37f676ffab3da Mon Sep 17 00:00:00 2001 From: houfaxin Date: Tue, 25 Nov 2025 22:30:42 +0800 Subject: [PATCH 69/84] Update tidb-partitioned-tables-best-practices.md --- .../tidb-partitioned-tables-best-practices.md | 28 ++++++++++--------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index beb3ea568899f..89b22b177d258 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -13,9 +13,9 @@ A common use case is range partitioning combined with local indexes, which enabl Another scenario is using hash or key partitioning to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance workload, but similar to range partitioning, queries without partition-pruning conditions might suffer performance drawbacks again, a situation where global indexes can help. -While partitioning offers clear benefits, it also presents common challenges, such as hotspots caused by newly created range partitions. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. +While partitioning offers clear benefits, it also presents common challenges, such as hotspots caused by newly created range partitions. To address this issue, TiDB provides solutions for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. -This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. +This document examines partitioned tables in TiDB from multiple perspectives, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. > **Note:** > @@ -30,13 +30,13 @@ This section describes how to improve query efficiency by the following methods: ### Partition pruning -Partition pruning is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions might contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. +Partition pruning is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes filter conditions of the query and determines which partitions might contain relevant data, and scans only those partitions. This significantly improves query performance by reducing I/O and computation overhead. Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: -- Time-series data queries: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. -- Multi-tenant or category-based datasets: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. -- Hybrid Transactional and Analytical Processing (HTAP): Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing full table scans on large datasets. +- Time-series data queries: when data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. +- Multi-tenant or category-based datasets: partitioning by tenant ID or category enables queries to focus on a small subset of partitions. +- Hybrid Transactional and Analytical Processing (HTAP): especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing full table scans on large datasets. For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). @@ -54,7 +54,7 @@ The query performance of the following types of tables are evaluated: #### Test setup -- The partitioned table had 365 partitions, defined by the range partitioning on a date column. +- The partitioned table has 365 partitions, defined by the range partitioning on a date column. - Each matching key returns multiple rows, simulating a high-volume OLTP-style query pattern. - The impact of different partition counts is also evaluated to understand how partition granularity influences latency and index performance. @@ -105,6 +105,8 @@ WHERE `fa`.`sid` IN ( #### Test results +The following table shows the test results. + | Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | |---|---|---|---|---|---| | Non-partitioned table | 12.6 ms | 72 | 79 | 151 | Provides the best performance with the fewest Cop tasks, which is ideal for most OLTP use cases. | @@ -116,7 +118,7 @@ Data comes from a table with 365 range partitions (for example, by date). - The **Average Query Time** is obtained from the `statement_summary` view. - The query uses a secondary index and returns 400 rows. -Metrics collected: +The following metrics are collected: - **Average Query Time**: from `statement_summary` - **Cop Tasks** (Index Scan + Table Lookup): from the execution plan @@ -199,15 +201,15 @@ PARTITION BY RANGE (id) ( The performance overhead of partitioned tables in TiDB depends significantly on the number of partitions and the type of index used. - The more partitions you have, the more severe the potential performance degradation. -- With a smaller number of partitions, the impact might not be as noticeable, but it is still workload-dependent. +- With a smaller number of partitions, the impact might not be noticeable, but it is still workload-dependent. - For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of [Remote Procedure Calls (RPCs)](https://docs.pingcap.com/tidb/stable/glossary/#remote-procedure-call-rpc) triggered. This means more partitions will likely result in more RPCs, leading to higher latency. -- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). Note that for very large tables where data is already distributed across many Regions, accessing data through a global index may have similar performance to a non-partitioned table, as both scenarios require multiple cross-Region RPCs. +- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). Note that for very large tables where data is already distributed across many Regions, accessing data through a global index might have similar performance to a non-partitioned table, as both scenarios require multiple cross-Region RPCs. #### Recommendations -- Avoid partitioned tables unless necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. -- If you know all queries will make use of good partition pruning (matching only a few partitions), then local indexes are a good choice. -- If you know critical queries do not have good partition pruning (matching many partitions), then a global index is recommended. +- Do not use partitioned tables unless necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. +- If you are sure that all queries will make use of good partition pruning (matching only a few partitions), then local indexes are a good choice. +- If you are sure that critical queries do not have good partition pruning (matching many partitions), then a global index is recommended. - Use local indexes only if your main concern is DDL efficiency (such as fast `DROP PARTITION`) and the performance side effect from the partition table is acceptable. ## Facilitate bulk data deletion From 2d181ad5d28e4fc36be71035467178826a066d3d Mon Sep 17 00:00:00 2001 From: houfaxin Date: Wed, 26 Nov 2025 17:12:48 +0800 Subject: [PATCH 70/84] Update tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 89b22b177d258..f907c7f23e4c3 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -223,7 +223,7 @@ In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual #### Test case -To compare the performance of TTL and partition drop, the test case in this section configures TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write loads of 50 and 100 concurrent threads. This test case measures key metrics such as execution time, system resource utilization, and the total number of rows deleted. +To compare the performance of TTL and partition drop, the test case in this section configures TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write workloads of 50 and 100 concurrent threads. This test case measures key metrics such as execution time, system resource utilization, and the total number of rows deleted. #### Findings @@ -238,7 +238,7 @@ The following are findings about the TTL performance: - With 100 threads, it handles up to 20 million rows, but the execution time increases to 15 to 30 minutes, with greater variance. - TTL jobs impact system performance under high workloads due to extra scanning and deletion activity, reducing overall QPS. -The following are findings about partition drop performance: +The following are findings about the partition drop performance: - `ALTER TABLE ... DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. - `ALTER TABLE ... DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. From de047818b88c306e3731b6c3d16424cba1433d1d Mon Sep 17 00:00:00 2001 From: houfaxin Date: Fri, 28 Nov 2025 11:24:06 +0800 Subject: [PATCH 71/84] Update tidb-partitioned-tables-best-practices.md --- .../tidb-partitioned-tables-best-practices.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index f907c7f23e4c3..d07ae447cb709 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -216,14 +216,14 @@ The performance overhead of partitioned tables in TiDB depends significantly on In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual partition drop. While both methods serve the same purpose, they differ significantly in performance. The test cases in this section show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. -### Differences between TTL and partition drop +### Differences between TTL and `DROP PARTITION` -- **TTL**: automatically removes data based on its age, but might be slower due to the need to scan and clean data over time. -- **Partition Drop**: deletes an entire partition at once, making it much faster, especially when dealing with large datasets. +- TTL: automatically removes data based on its age, but might be slower due to the need to scan and clean data over time. +- `DROP PARTITION`: deletes an entire partition at once, making it much faster, especially when dealing with large datasets. #### Test case -To compare the performance of TTL and partition drop, the test case in this section configures TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write workloads of 50 and 100 concurrent threads. This test case measures key metrics such as execution time, system resource utilization, and the total number of rows deleted. +To compare the performance of TTL and `DROP PARTITION`, the test case in this section configures TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write workloads of 50 and 100 concurrent threads. This test case measures key metrics such as execution time, system resource utilization, and the total number of rows deleted. #### Findings @@ -238,14 +238,14 @@ The following are findings about the TTL performance: - With 100 threads, it handles up to 20 million rows, but the execution time increases to 15 to 30 minutes, with greater variance. - TTL jobs impact system performance under high workloads due to extra scanning and deletion activity, reducing overall QPS. -The following are findings about the partition drop performance: +The following are findings about the `DROP PARTITION` performance: - `ALTER TABLE ... DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. - `ALTER TABLE ... DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. -#### Use TTL and partition drop in TiDB +#### Use TTL and `DROP PARTITION` in TiDB -In this test case, the table structures have been anonymized. For more detailed information on the usage of TTL, see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . +In this test case, the table structures have been anonymized. For more information about the usage of TTL, see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . The following is the TTL schema. From 5858c3c9cb47ec8dc7c87b24e628368302ee37d3 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Fri, 28 Nov 2025 16:29:19 +0800 Subject: [PATCH 72/84] Update tidb-partitioned-tables-best-practices.md --- .../tidb-partitioned-tables-best-practices.md | 86 ++++++++----------- 1 file changed, 36 insertions(+), 50 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index d07ae447cb709..72b4ceb9f2a0e 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -315,10 +315,10 @@ In this section, the tests show that `DROP PARTITION` is much slower when using This test case creates a table with 365 partitions and tests the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is 1 billion. -| Index Type | Duration (drop partition) | +| Index type | Duration (drop partition) | |--------------|---------------------------| -| Global Index | 76.02 seconds | -| Local Index | 0.52 seconds | +| Global index | 76.02 seconds | +| Local index | 0.52 seconds | #### Findings @@ -338,9 +338,7 @@ If you need to drop partitions frequently and minimize the performance impact on ## Mitigate hotspot issues -In TiDB, hotspots can occur when incoming read or write traffic is unevenly distributed across Regions. - -This is common when the primary key is monotonically increasing, for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or secondary index on datetime column with the default value set to `CURRENT_TIMESTAMP`. Because new rows and index entries are always appended to the "rightmost" Region, over time, this can lead to: +In TiDB, hotspots can occur when incoming read or write traffic is unevenly distributed across Regions. This is common when the primary key is monotonically increasing, for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or a secondary index on the datetime column with the default value set to `CURRENT_TIMESTAMP`. Because new rows and index entries are always appended to the "rightmost" Region, over time, this can lead to: - A single [Region](https://docs.pingcap.com/tidb/stable/tidb-storage/#region) handling most of the write workload, while other Regions remain idle. - Higher read or write latency and reduced throughput. @@ -354,9 +352,7 @@ Partitioned tables can help mitigate this problem. By applying hash or key parti ### How it works -TiDB stores table data and indexes in Regions, each covering a continuous range of row keys. - -When the primary key is [`AUTO_INCREMENT`](/auto-increment.md) and the secondary indexes on datetime columns are monotonically increasing: +TiDB stores table data and indexes in Regions, each covering a continuous range of row keys. When the primary key is [`AUTO_INCREMENT`](/auto-increment.md) and the secondary indexes on datetime columns are monotonically increasing: **Without partitioning:** @@ -389,8 +385,8 @@ PARTITION BY KEY (id) PARTITIONS 16; ### Pros -- **Balanced write workload** — Hotspots are spread across multiple partitions, and therefore multiple Regions, reducing contention and improving insert performance. -- **Query optimization via partition pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. +- Balanced write workload: hotspots are spread across multiple partitions, and therefore multiple Regions, reducing contention and improving insert performance. +- Query optimization via partition pruning: if queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. ### Cons @@ -422,27 +418,27 @@ By default, TiDB creates an empty region for each partition when the table is cr **impact:** -When a query does not filter by partition key, TiDB will scan all partitions (as seen in the execution plan `partition:all`). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a read hotspot. +When a query does not filter by partition key, TiDB scans all partitions (as seen in the execution plan `partition:all`). As a result, the single region holding multiple empty partitions is scanned repeatedly, leading to a read hotspot. ### Write hotspots -When using a time-based field as the partition key, a write hotspot might occur when switching to a new partition: +When using a time-based field as the partition key, a write hotspot might occur when switching to a new partition. **Root cause:** In TiDB, newly created partitions initially contain only one region on a single TiKV node. As writes concentrate on this single region, it must split into multiple regions before writes can be distributed across multiple TiKV nodes. This splitting process is the main cause of the temporary write hotspot. -However, if the initial write traffic to this new partition is very high, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it might not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. +However, if the initial write traffic to this new partition is very high, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it might not have enough spare resources (such as I/O capacity and CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. **Impact:** -This imbalance can cause that TiKV node to trigger flow control, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn might impact the overall read and write performance of the cluster. +This imbalance can cause the TiKV node to trigger flow control, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn might impact the overall read and write performance of the cluster. ### Summary -The following show the summary information for non-clustered and clusted partition tables. +The following table shows the summary information for non-clustered and clustered partition tables. -| Table Type | Region Pre-splitting | Read performance | Write scalability | Data cleanup via partition | +| Table type | Region pre-splitting | Read performance | Write scalability | Data cleanup via partition | |---|---|---|---|---| | Non-clustered partitioned table | Automatic | Lower (more lookups) | High | Supported | | Clustered partitioned table | Manual | High (fewer lookups) | High (if managed) | Supported | @@ -457,11 +453,11 @@ The following show the summary information for non-clustered and clusted partiti #### Cons -Queries using **Point Get** or **Table Range Scan** will require more table lookups, which can degrade read performance for such query types. +Queries using **Point Get** or **Table Range Scan** require more table lookups, which can degrade read performance for such query types. -#### Recommendation +#### Suitable scenarios -Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. +It is suitable for workloads where write scalability and operational ease are more critical than low-latency reads. #### Best practices @@ -506,13 +502,13 @@ ALTER TABLE employees PARTITION `p3` ATTRIBUTES 'merge_option=deny'; To avoid hotspots when a new table or partition is created, it is often beneficial to pre-split regions before heavy writes begin. To make pre-splitting effective, configure the lower and upper boundaries for region splitting based on the actual business data distribution. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. -Identify the minimum and maximum values from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: +Identify the minimum and maximum values from existing production data so that incoming writes are more likely to target different pre-allocated regions. The following is an example query for the existing data: ```sql SELECT MIN(id), MAX(id) FROM employees; ``` -- If the table is new and has no historical data, estimate the min/max values based on your business logic and expected data range. +- If the table is new and has no historical data, estimate the minimum and maximum values based on your business logic and expected data range. - For composite primary keys or composite indexes, only the leftmost column needs to be considered when deciding split boundaries. - If the leftmost column is a string, take string length and distribution into account to ensure even data spread. @@ -522,21 +518,21 @@ A common practice is to split the number of regions to match the number of TiKV ##### Step 5. Split regions for the primary key and the secondary index of all partitions if needed -To split regions for the primary key of all partitions in a partitioned table, you can use the following SQL statement: +To split regions for the primary key of all partitions in a partitioned table, use the following SQL statement: ```sql SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; ``` -This example will split each partition's primary key range into `` regions between the specified boundary values. +This example splits each partition's primary key range into `` regions between the specified boundary values. -To split regions for the secondary index of all partitions in a partitioned table, you can use the following SQL statement: +To split regions for the secondary index of all partitions in a partitioned table, use the following SQL statement: ```sql SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; ``` -##### Step 6. (Optional) When adding a new partition, you need to manually split regions for its primary key and indices +##### Step 6. (Optional) When adding a new partition, you need to manually split regions for its primary key and indexes ```sql ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); @@ -560,9 +556,9 @@ Queries using **Point Get** or **Table Range Scan** do not need additional looku Manual region splitting is required when creating new partitions, increasing operational complexity. -#### Recommendation +#### Suitable scenarios -Ideal when low-latency point queries are important and operational resources are available to manage region splitting. +It is suitable when low-latency point queries are important and operational resources are available to manage region splitting. #### Best practices @@ -579,18 +575,17 @@ To address hotspot issues caused by new range partitions, you can perform the st You cannot use `DROP PARTITION` to clean up large volumes of old data to improve deletion efficiency. -#### Recommendation +#### Suitable scenarios -Best suited for use cases that require stable performance and do not benefit from partition-based data management. +It is suitable for scenarios that require stable performance and do not benefit from partition-based data management. ## Convert between partitioned and non-partitioned tables When working with large tables (for example, a table with 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: -- Batch DML: `INSERT INTO ... SELECT ...` -- Pipeline DML: `INSERT INTO ... SELECT ...` -- `IMPORT INTO`: `IMPORT INTO ... FROM SELECT ...` -- Online DDL: Direct schema transformation via `ALTER TABLE` +- [Pipelined DML](/pipelined-dml.md): `INSERT INTO ... SELECT ...` +- [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md): `IMPORT INTO ... FROM SELECT ...` +- [Online DDL](/dm/feature-online-ddl.md): Direct schema transformation via `ALTER TABLE` This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. @@ -635,15 +630,7 @@ CREATE TABLE `fa_new` ( These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. -### Method 1: Batch DML `INSERT INTO ... SELECT` - -```sql -SET tidb_mem_quota_query = 0; -INSERT INTO fa_new SELECT * FROM fa; --- 120 million rows copied in 1h 52m 47s -``` - -### Method 2: Pipeline DML `INSERT INTO ... SELECT` +### Method 1: Pipelined DML `INSERT INTO ... SELECT` ```sql SET tidb_dml_type = "bulk"; @@ -653,7 +640,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 58m 42s ``` -### Method 3: `IMPORT INTO ... FROM SELECT` +### Method 2: `IMPORT INTO ... FROM SELECT` ```sql @@ -665,7 +652,7 @@ Query OK, 120000000 rows affected, 1 warning (16 min 49.90 sec) Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 ``` -### Method 4: Online DDL +### Method 3: Online DDL The following SQL statement converts from a partition table to a non-partitioned table: @@ -697,8 +684,7 @@ The following table show the time taken by each method. | Method | Time Taken | |--------|------------| -| Method 1: Batch DML: `INSERT INTO ... SELECT` | 1 h 52 m 47 s | -| Method 2: Pipeline DML: `INSERT INTO ... SELECT ...` | 58 m 42 s | -| Method 3: `IMPORT INTO ... FROM SELECT ...` | 16 m 59 s | -| Method 4: Online DDL (From partition table to non-partitioned table) | 2 h 50 m | -| Method 4: Online DDL (From non-partition table to partitioned table) | 2 h 31 m | +| Method 1: Pipelined DML: `INSERT INTO ... SELECT ...` | 58 m 42 s | +| Method 2: `IMPORT INTO ... FROM SELECT ...` | 16 m 59 s | +| Method 3: Online DDL (From partition table to non-partitioned table) | 2 h 50 m | +| Method 3: Online DDL (From non-partition table to partitioned table) | 2 h 31 m | From 83e4658938e6ebdb22f605b83e2fc5e5b8b39a5b Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Wed, 3 Dec 2025 15:17:23 +0800 Subject: [PATCH 73/84] Update best-practices/tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 1 - 1 file changed, 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 72b4ceb9f2a0e..cfec5fa9e60af 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -642,7 +642,6 @@ INSERT INTO fa_new SELECT * FROM fa; ### Method 2: `IMPORT INTO ... FROM SELECT` - ```sql IMPORT INTO fa_new FROM SELECT * FROM fa WITH thread = 32, disable_precheck; ``` From 4d565dadb6989cc3b721a918b6e4ef75183792bd Mon Sep 17 00:00:00 2001 From: houfaxin Date: Thu, 4 Dec 2025 09:26:00 +0800 Subject: [PATCH 74/84] Update tidb-partitioned-tables-best-practices.md --- .../tidb-partitioned-tables-best-practices.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index cfec5fa9e60af..8b02194c09b00 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -9,7 +9,7 @@ This guide introduces how to use partitioned tables in TiDB to improve performan Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage partition pruning to skip irrelevant data during query execution, reducing resource consumption and accelerating performance, particularly in Online Analytical Processing (OLAP) workloads with massive datasets. -A common use case is range partitioning combined with local indexes, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned tables to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [global indexes](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. +A common use case is range partitioning combined with local indexes, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method removes obsolete data almost instantly and preserves high query efficiency when filtering by the partition key. However, after migrating from non-partitioned tables to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [global indexes](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. Another scenario is using hash or key partitioning to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance workload, but similar to range partitioning, queries without partition-pruning conditions might suffer performance drawbacks again, a situation where global indexes can help. @@ -38,7 +38,7 @@ Partition pruning is most beneficial in scenarios where query predicates match t - Multi-tenant or category-based datasets: partitioning by tenant ID or category enables queries to focus on a small subset of partitions. - Hybrid Transactional and Analytical Processing (HTAP): especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing full table scans on large datasets. -For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). +For more use cases, see [Partition Pruning](/partition-pruning.md). ### Query performance on secondary indexes: non-partitioned tables vs. local indexes vs. global indexes @@ -135,7 +135,7 @@ The following is an execution plan example for a non-partitioned table: | TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | ``` -The following is an execution plan example for a partition tables with a global index: +The following is an execution plan example for a partitioned tables with a global index: ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -305,7 +305,7 @@ For workloads with large or time-based data cleanup, it is recommended to use pa TTL is still useful for finer-grained or background cleanup, but might not be optimal under high write pressure or when deleting large volumes of data quickly. -### Partition drop efficiency: local index vs. global index +### Partition drop efficiency: local indexes vs. global indexes A partitioned table with a global index requires synchronous updates to the global index, which can significantly increase the execution time for DDL operations, such as `DROP PARTITION`, `TRUNCATE PARTITION`, and `REORGANIZE PARTITION`. @@ -390,7 +390,7 @@ PARTITION BY KEY (id) PARTITIONS 16; ### Cons -There are some risks when using partition tables. +There are some risks when using partitioned tables. - When converting a non-partitioned table to a partitioned table, TiDB creates separate Regions for each partition. This might significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions or do index lookups in all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. For example, `serial_no` is not the partition key, which will cause the query performance regression: @@ -436,7 +436,7 @@ This imbalance can cause the TiKV node to trigger flow control, leading to a sha ### Summary -The following table shows the summary information for non-clustered and clustered partition tables. +The following table shows the summary information for non-clustered and clustered partitioned tables. | Table type | Region pre-splitting | Read performance | Write scalability | Data cleanup via partition | |---|---|---|---|---| @@ -448,7 +448,7 @@ The following table shows the summary information for non-clustered and clustere #### Pros -- When a new partition is created in a non-clustered partitioned table configured with `SHARD_ROW_ID_BITS` and [PRE_SPLIT_REGIONS](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be automatically pre-split, significantly reducing manual intervention. +- When a new partition is created in a non-clustered partitioned table configured with `SHARD_ROW_ID_BITS` and [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be automatically pre-split, significantly reducing manual intervention. - Lower operational overhead. #### Cons From b6789f50df1d1a9baebe54061bf89c1b4f5b1e23 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Thu, 4 Dec 2025 09:31:22 +0800 Subject: [PATCH 75/84] Update tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 8b02194c09b00..5596dfe7acd4b 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -348,7 +348,7 @@ Partitioned tables can help mitigate this problem. By applying hash or key parti > **Note:** > -> This section uses partitioned tables as an example for mitigating read and write hotspots. TiDB also provides other features such as [`AUTO_INCREMENT`](/auto-increment.md) and `SHARD_ROW_ID_BITS` for hotspot mitigation. When using partitioned tables in certain scenarios, you might need to set `merge_option=deny` to maintain partition boundaries. For more details, see [issue #58128](https://github.com/pingcap/tidb/issues/58128). +> This section uses partitioned tables as an example for mitigating read and write hotspots. TiDB also provides other features such as [`AUTO_INCREMENT`](/auto-increment.md) and [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md) for hotspot mitigation. When using partitioned tables in certain scenarios, you might need to set `merge_option=deny` to maintain partition boundaries. For more details, see [issue #58128](https://github.com/pingcap/tidb/issues/58128). ### How it works @@ -448,7 +448,7 @@ The following table shows the summary information for non-clustered and clustere #### Pros -- When a new partition is created in a non-clustered partitioned table configured with `SHARD_ROW_ID_BITS` and [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be automatically pre-split, significantly reducing manual intervention. +- When a new partition is created in a non-clustered partitioned table configured with [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md) and [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be automatically pre-split, significantly reducing manual intervention. - Lower operational overhead. #### Cons @@ -465,7 +465,7 @@ To address hotspot issues caused by new range partitions, you can perform the fo ##### Step 1. Use `SHARD_ROW_ID_BITS` and `PRE_SPLIT_REGIONS` -Create a partitioned table with `SHARD_ROW_ID_BITS` and `PRE_SPLIT_REGIONS` to pre-split table regions. The value of `PRE_SPLIT_REGIONS` must be less than or equal to that of `SHARD_ROW_ID_BITS`. The number of pre-split Regions for each partition is `2^(PRE_SPLIT_REGIONS)`. +Create a partitioned table with [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md) and [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions) to pre-split table regions. The value of `PRE_SPLIT_REGIONS` must be less than or equal to that of `SHARD_ROW_ID_BITS`. The number of pre-split Regions for each partition is `2^(PRE_SPLIT_REGIONS)`. ```sql CREATE TABLE employees ( From 6c18cc685c85f3bbfdd2a90aa3555ca70e14cec2 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Fri, 5 Dec 2025 08:14:33 +0800 Subject: [PATCH 76/84] Apply suggestions from code review Co-authored-by: Aolin --- .../tidb-partitioned-tables-best-practices.md | 28 +++++++++---------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 5596dfe7acd4b..759e925539024 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -5,17 +5,17 @@ summary: Learn best practices for using TiDB partitioned tables to improve perfo # Best Practices for Using TiDB Partitioned Tables -This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. +This guide describes how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. -Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage partition pruning to skip irrelevant data during query execution, reducing resource consumption and accelerating performance, particularly in Online Analytical Processing (OLAP) workloads with massive datasets. +Partitioned tables in TiDB provide a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage partition pruning to skip irrelevant data during query execution. This reduces resource consumption and improves performance, particularly in Online Analytical Processing (OLAP) workloads with large datasets. -A common use case is range partitioning combined with local indexes, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method removes obsolete data almost instantly and preserves high query efficiency when filtering by the partition key. However, after migrating from non-partitioned tables to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [global indexes](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. +A common use case is combining [Range partitioning](/partitioned-table.md#range-partitioning) with local indexes to efficiently clean up historical data through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method removes obsolete data almost instantly and preserves high query efficiency when filtering by the partition key. However, after migrating from non-partitioned tables to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [global indexes](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. -Another scenario is using hash or key partitioning to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance workload, but similar to range partitioning, queries without partition-pruning conditions might suffer performance drawbacks again, a situation where global indexes can help. +Another scenario is using Hash or Key partitioning to address write hotspot issues, especially in workloads that use [`AUTO_INCREMENT`](/auto-increment.md) IDs where sequential inserts can overload specific TiKV Regions. Distributing writes across partitions helps balance workload, but similar to Range partitioning, queries without partition-pruning conditions might suffer performance drawbacks again, a situation where global indexes can help. -While partitioning offers clear benefits, it also presents common challenges, such as hotspots caused by newly created range partitions. To address this issue, TiDB provides solutions for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. +Although partitioning provides clear benefits, it also introduces challenges. For example, newly created Range partitions can create temporary hotspots. To address this issue, TiDB supports automatic or manual Region pre-splitting to balance data distribution and avoid bottlenecks. -This document examines partitioned tables in TiDB from multiple perspectives, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. +This document examines partitioned tables in TiDB from several perspectives, including query optimization, data cleanup, write scalability, and index management. It also provides practical guidance on how to optimize partitioned table design and tune performance in TiDB through detailed scenarios and best practices. > **Note:** > @@ -383,12 +383,12 @@ CREATE TABLE server_info ( PARTITION BY KEY (id) PARTITIONS 16; ``` -### Pros +### Advantages - Balanced write workload: hotspots are spread across multiple partitions, and therefore multiple Regions, reducing contention and improving insert performance. - Query optimization via partition pruning: if queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. -### Cons +### Disadvantages There are some risks when using partitioned tables. @@ -446,12 +446,12 @@ The following table shows the summary information for non-clustered and clustere ### Solutions for non-clustered partitioned tables -#### Pros +#### Advantages - When a new partition is created in a non-clustered partitioned table configured with [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md) and [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be automatically pre-split, significantly reducing manual intervention. - Lower operational overhead. -#### Cons +#### Disadvantages Queries using **Point Get** or **Table Range Scan** require more table lookups, which can degrade read performance for such query types. @@ -548,11 +548,11 @@ SHOW TABLE employees PARTITION (p4) regions; ### Solutions for clustered partitioned tables -#### Pros +#### Advantages Queries using **Point Get** or **Table Range Scan** do not need additional lookups, resulting in better read performance. -#### Cons +#### Disadvantages Manual region splitting is required when creating new partitions, increasing operational complexity. @@ -566,12 +566,12 @@ To address hotspot issues caused by new range partitions, you can perform the st ### Solutions for clustered non-partitioned tables -#### Pros +#### Advantages - No hotspot risks from new range partitions. - Provides good read performance for point and range queries. -#### Cons +#### Disadvantages You cannot use `DROP PARTITION` to clean up large volumes of old data to improve deletion efficiency. From 40631a0582964eb29a62573120bcb3c843416ed0 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Tue, 9 Dec 2025 21:57:36 +0800 Subject: [PATCH 77/84] Update best-practices/tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 759e925539024..4a8a4c9fc93ca 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -245,7 +245,7 @@ The following are findings about the `DROP PARTITION` performance: #### Use TTL and `DROP PARTITION` in TiDB -In this test case, the table structures have been anonymized. For more information about the usage of TTL, see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . +In this test case, the table structures have been anonymized. For more information about the usage of TTL, see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md). The following is the TTL schema. From dbeebe8c77e743a5c95121cb4049d298067e0325 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Wed, 10 Dec 2025 15:21:57 +0800 Subject: [PATCH 78/84] Apply suggestions from code review Co-authored-by: Aolin --- .../tidb-partitioned-tables-best-practices.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 4a8a4c9fc93ca..1ef43fcfda1df 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -25,24 +25,26 @@ This document examines partitioned tables in TiDB from several perspectives, inc This section describes how to improve query efficiency by the following methods: -- Partition pruning -- Query performance on secondary indexes +- [Partition pruning](#partition-pruning) +- [Query performance on secondary indexes](#query-performance-on-secondary-indexes-non-partitioned-tables-vs-local-indexes-vs-global-indexes) ### Partition pruning -Partition pruning is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes filter conditions of the query and determines which partitions might contain relevant data, and scans only those partitions. This significantly improves query performance by reducing I/O and computation overhead. +Partition pruning is an optimization technique that reduces the amount of data TiDB scans when querying partitioned tables. Instead of scanning all partitions, TiDB evaluates the query filter conditions to identify the partitions that might contain matching data and scans only those partitions. This approach reduces I/O and computation overhead, which significantly improves query performance. -Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: +Partition pruning is most effective when query predicates align with the partitioning strategy. Typical use cases include the following: -- Time-series data queries: when data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. +- Time-series data queries: when data is partitioned by time ranges (for example, daily or monthly), queries limited to a specific time window can quickly skip unrelated partitions. - Multi-tenant or category-based datasets: partitioning by tenant ID or category enables queries to focus on a small subset of partitions. -- Hybrid Transactional and Analytical Processing (HTAP): especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing full table scans on large datasets. +- Hybrid Transactional and Analytical Processing (HTAP): especially for Range partitioning, TiDB can apply partition pruning to analytical workloads on TiFlash. This optimization skips irrelevant partitions and avoids full table scans on large datasets. For more use cases, see [Partition Pruning](/partition-pruning.md). ### Query performance on secondary indexes: non-partitioned tables vs. local indexes vs. global indexes -In TiDB, partitioned tables use local indexes by default. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries that span multiple partitions because a query using local indexes must perform a lookup in each relevant partition, while a query using a global index only needs to perform a single lookup for the entire table. +In TiDB, partitioned tables use local indexes by default, where each partition maintains its own set of indexes. In contrast, a global index covers the entire table in one index and tracks rows across all partitions. + +For queries that access data from multiple partitions, global indexes generally provide better performance. This is because a query using local indexes requires separate index lookups in each relevant partition, while a query using a global index performs a single lookup across the entire table. #### Types of tables to be tested From 8c465fefef1dd2e2e84d68414d51593bce43f997 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Tue, 23 Dec 2025 10:04:44 +0800 Subject: [PATCH 79/84] Apply suggestions from code review Co-authored-by: Aolin --- .../tidb-partitioned-tables-best-practices.md | 105 ++++++++++-------- 1 file changed, 59 insertions(+), 46 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 1ef43fcfda1df..890edfba34069 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -40,25 +40,27 @@ Partition pruning is most effective when query predicates align with the partiti For more use cases, see [Partition Pruning](/partition-pruning.md). -### Query performance on secondary indexes: non-partitioned tables vs. local indexes vs. global indexes +### Query performance on secondary indexes: Non-partitioned tables vs. local indexes vs. global indexes In TiDB, partitioned tables use local indexes by default, where each partition maintains its own set of indexes. In contrast, a global index covers the entire table in one index and tracks rows across all partitions. For queries that access data from multiple partitions, global indexes generally provide better performance. This is because a query using local indexes requires separate index lookups in each relevant partition, while a query using a global index performs a single lookup across the entire table. -#### Types of tables to be tested +#### Tested table types -The query performance of the following types of tables are evaluated: +This test compares query performance across the following table configurations: -- Non-partitioned tables -- Partitioned tables with global indexes -- Partitioned tables with local indexes +- Non-partitioned table +- Partitioned table with local indexes +- Partitioned table with global indexes #### Test setup -- The partitioned table has 365 partitions, defined by the range partitioning on a date column. -- Each matching key returns multiple rows, simulating a high-volume OLTP-style query pattern. -- The impact of different partition counts is also evaluated to understand how partition granularity influences latency and index performance. +The test uses the following configuration: + +- The partitioned table contains 365 Range partitions, defined on a `date` column. +- The workload simulates a high-volume OLTP query pattern, where each index key matches multiple rows. +- The test also evaluates different partition counts to measure how partition granularity affects query latency and index efficiency. #### Schema @@ -76,18 +78,18 @@ CREATE TABLE `fa` ( KEY `index_fa_on_account_id` (`account_id`), KEY `index_fa_on_user_id` (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2025001), -PARTITION `fa_2024002` VALUES LESS THAN (2025002), -PARTITION `fa_2024003` VALUES LESS THAN (2025003), -... -... -PARTITION `fa_2024365` VALUES LESS THAN (2025365)); +PARTITION BY RANGE (`date`)( + PARTITION `fa_2024001` VALUES LESS THAN (2025001), + PARTITION `fa_2024002` VALUES LESS THAN (2025002), + PARTITION `fa_2024003` VALUES LESS THAN (2025003), + ... + PARTITION `fa_2024365` VALUES LESS THAN (2025365) +); ``` #### SQL -The following SQL statement is used in the example. +The following SQL statement filters on the secondary index (`sid`) without including the partition key (`date`): ```sql SELECT `fa`.* @@ -101,33 +103,38 @@ WHERE `fa`.`sid` IN ( ); ``` -- Query filters on secondary index, but does not include the partition key. -- Causes local indexes key lookup for each partition due to lack of pruning. -- Table lookup tasks are significantly higher for partitioned tables. +This query pattern is representative because it: -#### Test results +- Filters on a secondary index without the partition key. +- Triggers a local index lookup for each partition due to lack of pruning. +- Generates significantly more table lookup tasks for partitioned tables. -The following table shows the test results. +#### Test results -| Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | -|---|---|---|---|---|---| -| Non-partitioned table | 12.6 ms | 72 | 79 | 151 | Provides the best performance with the fewest Cop tasks, which is ideal for most OLTP use cases. | -| Partitioned table with local indexes | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries scan all partitions. | -| Partitioned table with global indexes | 14.8 ms | 69 | 383 | 452 | It improves index scan efficiency, but table lookups can still take a long time if many rows match. | +The following table shows results for a query returning 400 rows from a table with 365 Range partitions. -Data comes from a table with 365 range partitions (for example, by date). +| Configuration | Average query time | Cop tasks (index scan) | Cop tasks (table lookup) | Total Cop tasks | +|---|---|---|---|---| +| Non-partitioned table | 12.6 ms | 72 | 79 | 151 | +| Partitioned table with local indexes | 108 ms | 600 | 375 | 975 | +| Partitioned table with global indexes | 14.8 ms | 69 | 383 | 452 | -- The **Average Query Time** is obtained from the `statement_summary` view. -- The query uses a secondary index and returns 400 rows. +- **Non-partitioned table**: provides the best performance with the fewest tasks. Suitable for most OLTP workloads. +- **Partitioned table with global indexes**: improve index scan efficiency, but table lookups remain expensive when many rows match. +- **Partitioned table with local indexes**: when the query condition does not include the partition key, local index queries scan all partitions. -The following metrics are collected: -- **Average Query Time**: from `statement_summary` -- **Cop Tasks** (Index Scan + Table Lookup): from the execution plan +> **Note:** +> +> - **Average query time** is sourced from the `statement_summary` view. +> - **Cop tasks** metrics are derived from the execution plan. #### Execution plan examples -The following is an execution plan example for a non-partitioned table: +The following examples show the execution plans for each configuration. + +
+Non-partitioned table ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -137,7 +144,10 @@ The following is an execution plan example for a non-partitioned table: | TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | ``` -The following is an execution plan example for a partitioned tables with a global index: +
+ +
+Partitioned table with global indexes ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -147,7 +157,10 @@ The following is an execution plan example for a partitioned tables with a globa | TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | ``` -The following is an execution plan example for a partition table with a local index: +
+ +
+Partitioned table with local indexes ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -157,30 +170,30 @@ The following is an execution plan example for a partition table with a local in | TableRowIDScan_6(Probe)| 398.73 | 165221.49 | 400 | cop[tikv] | table:fa | time:514ms, loops:434, cop_task:{num:375, max:31.6ms, min:0s, avg:1.33ms, p95:1.67ms, max_proc_keys:2, p95_proc_keys:2, tot_proc:220.7ms, tot_wait:242.2ms, copr_cache_hit_ratio:0.00, build_task_duration:27.8ms, max_distsql_concurrency:1, max_extra_concurrency:1, store_batch_num:69}, rpc_info:{Cop:{num_rpc:306, total_time:495.5ms}}, tikv_task:{proc max:6ms, min:0s, avg:597.3µs, p80:1ms, p95:1ms, iters:375, tasks:375}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:158.3ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:3197, read_count:803, read_byte:10.2 MB, read_time:113.5ms}}}, time_detail:{total_process_time:220.7ms, total_suspend_time:5.39ms, total_wait_time:242.2ms, total_kv_read_wall_time:224ms, tikv_wall_time:430.5ms}} | keep order:false, stats:partial[...] | N/A | N/A | ``` -The following sections describe similar detailed execution plans for partitioned tables with global and local indexes. +
-#### Create a global index on a partitioned table in TiDB +#### Create a global index on a partitioned table -There are two options for you to create a global index on a partitioned table in TiDB. +You can create a global index on a partitioned table using one of the following methods. -> **Note:** +> **Note:** > -> - In TiDB v8.5.3 and earlier versions, global indexes can only be created on unique columns. Starting from v8.5.4, global indexes on non-unique columns are supported. This limitation will be removed in the next LTS version. +> - In TiDB v8.5.3 and earlier versions, you can only create global indexes on unique columns. Starting from v8.5.4, TiDB supports global indexes on non-unique columns. This limitation will be removed in a future LTS version. > - For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. -> - The `GLOBAL` keyword must be explicitly specified. +> - You must explicitly specify the `GLOBAL` keyword. -##### Option 1: add via `ALTER TABLE` +##### Option 1: Use `ALTER TABLE` -You can use `ALTER TABLE` to add a global index to an existing partitioned table. +To add a global index to an existing partitioned table, use `ALTER TABLE`: ```sql ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` -##### Option 2: define inline when creating the table +##### Option 2: Define the index at table creation -You can also create a global index inline when you create a table. +To create a global index when creating a table, define the global index inline in the `CREATE TABLE` statement: ```sql CREATE TABLE t ( From 948c66f9ad6916c8d73ad2ce3f2de98d22cbe186 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Tue, 23 Dec 2025 10:05:11 +0800 Subject: [PATCH 80/84] Apply suggestions from code review Co-authored-by: Aolin --- best-practices/tidb-partitioned-tables-best-practices.md | 1 - 1 file changed, 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 890edfba34069..3a58d7c835a41 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -201,7 +201,6 @@ CREATE TABLE t ( col1 VARCHAR(50), col2 VARCHAR(50), -- other columns... - UNIQUE GLOBAL INDEX idx_col1_col2 (col1, col2) ) PARTITION BY RANGE (id) ( From 8c5fc06a1f140a6d05c8c90675de16caa2c398a9 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Tue, 23 Dec 2025 18:04:59 +0800 Subject: [PATCH 81/84] Apply suggestions from code review Co-authored-by: Aolin --- .../tidb-partitioned-tables-best-practices.md | 77 ++++++++++--------- 1 file changed, 40 insertions(+), 37 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 3a58d7c835a41..137442200079a 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -122,8 +122,6 @@ The following table shows results for a query returning 400 rows from a table wi - **Non-partitioned table**: provides the best performance with the fewest tasks. Suitable for most OLTP workloads. - **Partitioned table with global indexes**: improve index scan efficiency, but table lookups remain expensive when many rows match. - **Partitioned table with local indexes**: when the query condition does not include the partition key, local index queries scan all partitions. - - > **Note:** > > - **Average query time** is sourced from the `statement_summary` view. @@ -210,34 +208,41 @@ PARTITION BY RANGE (id) ( ); ``` -#### Summary +#### Performance summary -The performance overhead of partitioned tables in TiDB depends significantly on the number of partitions and the type of index used. +The performance overhead of TiDB partitioned tables depends on the number of partitions and the index type. -- The more partitions you have, the more severe the potential performance degradation. -- With a smaller number of partitions, the impact might not be noticeable, but it is still workload-dependent. -- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of [Remote Procedure Calls (RPCs)](https://docs.pingcap.com/tidb/stable/glossary/#remote-procedure-call-rpc) triggered. This means more partitions will likely result in more RPCs, leading to higher latency. -- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). Note that for very large tables where data is already distributed across many Regions, accessing data through a global index might have similar performance to a non-partitioned table, as both scenarios require multiple cross-Region RPCs. +- **Partition count**: Performance degrades as the number of partitions increases. While the impact might be negligible for a small number of partitions, this varies based on the workload. +- **Local indexes**: if a query does not include an effective partition pruning condition, the number of partitions directly determines the number of [Remote Procedure Calls (RPCs)](https://docs.pingcap.com/tidb/stable/glossary/#remote-procedure-call-rpc). This means more partitions typically lead to more RPCs and higher latency. +- **Global indexes**: the performance depends on both the number of partitions involved and the number of rows that require table lookups. For very large tables where data is distributed across multiple Regions, accessing data through a global index provides performance similar to that of a non-partitioned table, because both scenarios involve multiple cross-Region RPCs. #### Recommendations -- Do not use partitioned tables unless necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. -- If you are sure that all queries will make use of good partition pruning (matching only a few partitions), then local indexes are a good choice. -- If you are sure that critical queries do not have good partition pruning (matching many partitions), then a global index is recommended. -- Use local indexes only if your main concern is DDL efficiency (such as fast `DROP PARTITION`) and the performance side effect from the partition table is acceptable. +Use the following guidelines when you design partitioned tables and indexes in TiDB: + +- Use partitioned tables only when necessary. For most OLTP workloads, a well-indexed, non-partitioned table provides better performance and simpler management. +- Use local indexes when all queries include an effective partition pruning condition that matches a small number of partitions. +- Use global indexes for critical queries that lack effective partition pruning conditions and match a large number of partitions. +- Use local indexes only when DDL operation efficiency (such as fast `DROP PARTITION`) is a priority and any potential performance impact is acceptable. ## Facilitate bulk data deletion -In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual partition drop. While both methods serve the same purpose, they differ significantly in performance. The test cases in this section show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. +In TiDB, you can remove historical data by using [TTL (Time to Live)](/time-to-live.md) or by manually dropping partitions. Although both methods delete data, their performance characteristics differ significantly. The following test results show that dropping partitions is generally faster and consumes fewer resources, making it a better option for large datasets and frequent data purging. ### Differences between TTL and `DROP PARTITION` -- TTL: automatically removes data based on its age, but might be slower due to the need to scan and clean data over time. -- `DROP PARTITION`: deletes an entire partition at once, making it much faster, especially when dealing with large datasets. +- TTL: automatically deletes data based on its age. This method might be slower because it scans and deletes rows incrementally over time. +- `DROP PARTITION`: deletes an entire partition in a single operation. This approach is typically much faster, especially for large datasets. #### Test case -To compare the performance of TTL and `DROP PARTITION`, the test case in this section configures TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write workloads of 50 and 100 concurrent threads. This test case measures key metrics such as execution time, system resource utilization, and the total number of rows deleted. +This test compares the performance of TTL and `DROP PARTITION`. + +- TTL configuration: runs every 10 minutes. +- Partition configuration: drops one partition every 10 minutes. +- Workload: background write workloads with 50 and 100 concurrent threads. + +The test measures execution time, system resource usage, and the total number of rows deleted. #### Findings @@ -247,21 +252,21 @@ To compare the performance of TTL and `DROP PARTITION`, the test case in this se The following are findings about the TTL performance: -- On a write-heavy table, TTL runs every 10 minutes. - With 50 threads, each TTL job takes 8 to 10 minutes, deleting 7 to 11 million rows. -- With 100 threads, it handles up to 20 million rows, but the execution time increases to 15 to 30 minutes, with greater variance. -- TTL jobs impact system performance under high workloads due to extra scanning and deletion activity, reducing overall QPS. +- With 100 threads, TTL handles up to 20 million rows, but execution time increases to 15 to 30 minutes and shows higher variance. +- Under heavy workloads, TTL jobs reduce overall QPS due to additional scanning and deletion overhead. The following are findings about the `DROP PARTITION` performance: -- `ALTER TABLE ... DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. -- `ALTER TABLE ... DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. +- The `ALTER TABLE ... DROP PARTITION` statement removes an entire partition almost immediately. +- The operation uses minimal resources because it occurs at the metadata level. +- `DROP PARTITION` is faster and more predictable than TTL, especially for large historical datasets. #### Use TTL and `DROP PARTITION` in TiDB -In this test case, the table structures have been anonymized. For more information about the usage of TTL, see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md). +The following examples use anonymized table structures. For more information about TTL, see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md). -The following is the TTL schema. +The following example shows a TTL-enabled table schema: ```sql CREATE TABLE `ad_cache` ( @@ -280,7 +285,7 @@ TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' TTL_JOB_INTERVAL='10m'; ``` -The following is the SQL statement for dropping partitions (Range INTERVAL partitioning). +The following example shows a partitioned table that uses Range INTERVAL partitioning: ```sql CREATE TABLE `ad_cache` ( @@ -306,7 +311,7 @@ FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); ``` -You need to run DDL statements such as `ALTER TABLE PARTITION ...` to change the `FIRST PARTITION` and `LAST PARTITION` periodically. These two DDL statements can drop the old partitions and create new ones. +To update `FIRST PARTITION` and `LAST PARTITION` periodically, run DDL statements similar to the following. These statements drop old partitions and create new ones. ```sql ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); @@ -315,30 +320,29 @@ ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); #### Recommendations -For workloads with large or time-based data cleanup, it is recommended to use partitioned tables with `DROP PARTITION`. It offers better performance, lower system impact, and simpler management. - -TTL is still useful for finer-grained or background cleanup, but might not be optimal under high write pressure or when deleting large volumes of data quickly. +- Use partitioned tables with `DROP PARTITION` for large-scale or time-based data cleanup. This approach provides better performance, lower system impact, and simpler operational behavior. +- Use TTL for fine-grained or background data cleanup. TTL is less suitable for workloads with high write throughput or rapid deletion of large data volumes. ### Partition drop efficiency: local indexes vs. global indexes -A partitioned table with a global index requires synchronous updates to the global index, which can significantly increase the execution time for DDL operations, such as `DROP PARTITION`, `TRUNCATE PARTITION`, and `REORGANIZE PARTITION`. +For partitioned tables with global indexes, DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, and `REORGANIZE PARTITION` must update global index entries synchronously. These updates can significantly increase DDL execution time. -In this section, the tests show that `DROP PARTITION` is much slower when using a global index compared to a local index. Take this into consideration when you design partitioned tables. +This section shows that `DROP PARTITION` is substantially slower on tables with global indexes than on tables with local indexes. Consider this behavior when you design partitioned tables. #### Test case -This test case creates a table with 365 partitions and tests the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is 1 billion. +This test creates a table with 365 partitions and approximately 1 billion rows. It compares `DROP PARTITION` performance when using global indexes and local indexes. -| Index type | Duration (drop partition) | +| Index type | Drop partition duration | |--------------|---------------------------| | Global index | 76.02 seconds | | Local index | 0.52 seconds | #### Findings -Dropping a partition on a table with a global index takes **76.02 seconds**, while the same operation with a local index takes only **0.52 seconds**. The reason is that global indexes span all partitions and require more complex updates, while local indexes can just be dropped together with the partition data. +Dropping a partition on a table with a global index takes **76.02 seconds**, whereas the same operation on a table with a local index takes only **0.52 seconds**. This difference occurs because global indexes span all partitions and require additional index updates, while local indexes are dropped together with the partition data. -You can use the following SQL statement to drop the partition: +You can use the following SQL statement to drop a partition: ```sql ALTER TABLE A DROP PARTITION A_2024363; @@ -346,9 +350,8 @@ ALTER TABLE A DROP PARTITION A_2024363; #### Recommendations -When a partitioned table contains global indexes, executing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, and `REORGANIZE PARTITION` requires updating the global index entries to reflect the changes. This update must be performed immediately to ensure consistency, which can significantly increase the execution time of these DDL operations. - -If you need to drop partitions frequently and minimize the performance impact on the system, it is recommended to use local indexes for faster and more efficient operations. +- If a partitioned table uses global indexes, expect longer execution times for DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, and `REORGANIZE PARTITION`. +- If you need to drop partitions frequently and minimize performance impact, use local indexes to achieve faster and more efficient partition management. ## Mitigate hotspot issues From c73e95199161a57823aeac22e71a365174291a8d Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Sun, 4 Jan 2026 10:30:14 +0800 Subject: [PATCH 82/84] Update best-practices/tidb-partitioned-tables-best-practices.md Co-authored-by: Aolin --- best-practices/tidb-partitioned-tables-best-practices.md | 1 + 1 file changed, 1 insertion(+) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 137442200079a..afc9030329f1a 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -122,6 +122,7 @@ The following table shows results for a query returning 400 rows from a table wi - **Non-partitioned table**: provides the best performance with the fewest tasks. Suitable for most OLTP workloads. - **Partitioned table with global indexes**: improve index scan efficiency, but table lookups remain expensive when many rows match. - **Partitioned table with local indexes**: when the query condition does not include the partition key, local index queries scan all partitions. + > **Note:** > > - **Average query time** is sourced from the `statement_summary` view. From e5e6d49cc788fb23a008f306e3017f54463f46d8 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Mon, 12 Jan 2026 11:07:39 +0800 Subject: [PATCH 83/84] Apply suggestions from code review Co-authored-by: Aolin --- .../tidb-partitioned-tables-best-practices.md | 186 ++++++++++-------- 1 file changed, 103 insertions(+), 83 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index afc9030329f1a..5495becc7b933 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -356,36 +356,44 @@ ALTER TABLE A DROP PARTITION A_2024363; ## Mitigate hotspot issues -In TiDB, hotspots can occur when incoming read or write traffic is unevenly distributed across Regions. This is common when the primary key is monotonically increasing, for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or a secondary index on the datetime column with the default value set to `CURRENT_TIMESTAMP`. Because new rows and index entries are always appended to the "rightmost" Region, over time, this can lead to: +In TiDB, hotspots occur when read or write traffic is unevenly distributed across [Regions](/tidb-storage.md#region). Hotspots commonly occur when you use: -- A single [Region](https://docs.pingcap.com/tidb/stable/tidb-storage/#region) handling most of the write workload, while other Regions remain idle. -- Higher read or write latency and reduced throughput. -- Limited performance gains from scaling out TiKV nodes, as the bottleneck remains concentrated on one Region. +- A monotonically increasing primary key, such as an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`. +- A secondary index on a datetime column with a default value of `CURRENT_TIMESTAMP`. -Partitioned tables can help mitigate this problem. By applying hash or key partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. +TiDB appends new rows and index entries to the "rightmost" Region. Over time, this behavior can lead to the following issues: + +- A single Region handles most of the write workload, while other Regions remain underutilized. +- Read and write latency increases, and overall throughput decreases. +- Adding more TiKV nodes provides little performance improvement because the bottleneck remains on a single Region. + +To mitigate these issues, you can use partitioned tables. By applying Hash or Key partitioning to the primary key, TiDB distributes insert operations across multiple partitions and Regions, reducing hotspot contention on any single Region. > **Note:** > -> This section uses partitioned tables as an example for mitigating read and write hotspots. TiDB also provides other features such as [`AUTO_INCREMENT`](/auto-increment.md) and [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md) for hotspot mitigation. When using partitioned tables in certain scenarios, you might need to set `merge_option=deny` to maintain partition boundaries. For more details, see [issue #58128](https://github.com/pingcap/tidb/issues/58128). +> This section uses partitioned tables as an example for mitigating read and write hotspots. TiDB offers additional features for hotspot mitigation, such as [`AUTO_INCREMENT`](/auto-increment.md) and [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md). +> +> When you use partitioned tables in specific scenarios, set `merge_option=deny` to preserve partition boundaries. For more details, see [issue #58128](https://github.com/pingcap/tidb/issues/58128). -### How it works +### How partitioning works -TiDB stores table data and indexes in Regions, each covering a continuous range of row keys. When the primary key is [`AUTO_INCREMENT`](/auto-increment.md) and the secondary indexes on datetime columns are monotonically increasing: +TiDB stores table data and indexes in Regions, where each Region covers a continuous range of row keys. When a table uses an `AUTO_INCREMENT` primary key or a monotonically increasing datetime index, the distribution of the write workload depends on whether the table is partitioned. -**Without partitioning:** +**Non-partitioned tables** -- New rows always have the highest key values and are inserted into the same "last Region." -- That Region is served by one TiKV node at a time, becoming a single write bottleneck. +In a non-partitioned table, new rows always have the largest key values and are written to the same "last" Region. This single Region, served by one TiKV node, can become a write bottleneck. -**With hash or key partitioning:** +**Hash or Key partitioned tables** -- The table and the secondary indexes are split into multiple partitions using a hash or key function on the primary key or indexed columns. -- Each partition has its own set of Regions, often distributed across different TiKV nodes. -- Inserts are spread across multiple Regions in parallel, improving workload distribution and throughput. +- TiDB splits the table and its indexes into multiple partitions by applying a Hash or Key function to the primary key or indexed columns. +- Each partition has its own set of Regions, which are typically distributed across different TiKV nodes. +- Insert operations are distributed across multiple Regions in parallel, improving workload balance and write throughput. -### Use cases +### When to use partitioning -If a table with an [`AUTO_INCREMENT`](/auto-increment.md) primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying hash or key partitioning on the primary key can help distribute the write workload more evenly. +If a table with an [`AUTO_INCREMENT`](/auto-increment.md) primary key receives heavy bulk inserts and experiences write hotspots, apply Hash or Key partitioning to the primary key to distribute the write workload more evenly. + +The following SQL statement creates a table with 16 partitions based on the primary key: ```sql CREATE TABLE server_info ( @@ -402,21 +410,27 @@ PARTITION BY KEY (id) PARTITIONS 16; ``` ### Advantages +### Benefits + +Partitioned tables provide the following benefits: + +- **Balanced write workloads**: hotspots are distributed across multiple partitions and Regions, reducing contention and improving insert performance. +- **Improved query performance through partition pruning**: for queries that filter by the partition key, TiDB skips irrelevant partitions, reducing scanned data and improving query latency. -- Balanced write workload: hotspots are spread across multiple partitions, and therefore multiple Regions, reducing contention and improving insert performance. -- Query optimization via partition pruning: if queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. +### Limitations -### Disadvantages +Before you use partitioned tables, consider the following limitations: -There are some risks when using partitioned tables. +- Converting a non-partitioned table to a partitioned table increases the total number of Regions, as TiDB creates separate Regions for each partition. +- Queries that do not filter by the partition key cannot use partition pruning. TiDB must scan all partitions or perform index lookups across all partitions, which increases the number of coprocessor tasks and can degrade performance. -- When converting a non-partitioned table to a partitioned table, TiDB creates separate Regions for each partition. This might significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions or do index lookups in all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. For example, `serial_no` is not the partition key, which will cause the query performance regression: + For example, the following query does not use the partition key (`id`) and might experience performance degradation: ```sql SELECT * FROM server_info WHERE `serial_no` = ?; ``` -- Add a global index on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, hash and key partitioned tables do not support `DROP PARTITION`. In practice, such partitions are rarely truncated, making global indexes a feasible solution in these scenarios. For example: +- To reduce scan overhead for queries that do not use the partition key, you need to create a global index. Although global indexes can slow down `DROP PARTITION` operations, Hash and Key partitioned tables do not support `DROP PARTITION`. Therefore, global indexes are a practical solution because these partitions are rarely truncated. For example: ```sql ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; @@ -424,66 +438,71 @@ There are some risks when using partitioned tables. ## Partition management challenges -New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by new range partitions. +New Range partitions can cause hotspot issues in TiDB. This section describes common scenarios and provides mitigation strategies. ### Read hotspots -When using range-partitioned tables, if queries do not filter data using the partition key, new empty partitions can easily become read hotspots. +In Range-partitioned tables, new empty partitions can become read hotspots if queries do not filter data by the partition key. **Root cause:** -By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions might be merged into a single region. +By default, TiDB creates an empty Region for each partition when you create a table. If no data is written for a period, TiDB might merge Regions for multiple empty partitions into a single Region. -**impact:** +**Impact:** -When a query does not filter by partition key, TiDB scans all partitions (as seen in the execution plan `partition:all`). As a result, the single region holding multiple empty partitions is scanned repeatedly, leading to a read hotspot. +When a query does not filter by the partition key, TiDB scans all partitions, which is shown as `partition:all` in the execution plan. As a result, the single Region holding multiple empty partitions is scanned repeatedly, causing a read hotspot. ### Write hotspots -When using a time-based field as the partition key, a write hotspot might occur when switching to a new partition. +Using a time-based column as the partition key might cause write hotspots when traffic shifts to a new partition. **Root cause:** -In TiDB, newly created partitions initially contain only one region on a single TiKV node. As writes concentrate on this single region, it must split into multiple regions before writes can be distributed across multiple TiKV nodes. This splitting process is the main cause of the temporary write hotspot. +In TiDB, newly created partitions initially contain a single Region on one TiKV node. All writes are directed to this single Region until it splits and data redistributes. During this period, the TiKV node must handle both application writes and Region-splitting tasks. -However, if the initial write traffic to this new partition is very high, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it might not have enough spare resources (such as I/O capacity and CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. +If the initial write traffic to the new partition is very high, the TiKV node might not have sufficient resources (such as CPU or I/O capacity) to split and scatter Regions promptly. As a result, writes remain concentrated on the same node longer than expected. **Impact:** -This imbalance can cause the TiKV node to trigger flow control, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn might impact the overall read and write performance of the cluster. +This imbalance can trigger flow control on the TiKV node, leading to a sharp drop in QPS, increased write latency, and high CPU utilization, which can degrade overall cluster performance. -### Summary +### Comparison of partitioned table types -The following table shows the summary information for non-clustered and clustered partitioned tables. +The following table compares non-clustered partitioned tables, clustered partitioned tables, and clustered non-partitioned tables: -| Table type | Region pre-splitting | Read performance | Write scalability | Data cleanup via partition | +| Table type | Region pre-splitting | Read performance | Write scalability | Data cleanup by partition | |---|---|---|---|---| -| Non-clustered partitioned table | Automatic | Lower (more lookups) | High | Supported | -| Clustered partitioned table | Manual | High (fewer lookups) | High (if managed) | Supported | +| Non-clustered partitioned table | Automatic | Lower (additional lookups required) | High | Supported | +| Clustered partitioned table | Manual | High (fewer lookups) | High (with manual management) | Supported | | Clustered non-partitioned table | N/A | High | Stable | Not supported | ### Solutions for non-clustered partitioned tables #### Advantages -- When a new partition is created in a non-clustered partitioned table configured with [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md) and [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions), the regions can be automatically pre-split, significantly reducing manual intervention. -- Lower operational overhead. +- When you create a new partition in a non-clustered partitioned table configured with [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md) and [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions), TiDB automatically pre-splits Regions, significantly reducing manual effort. +- Operational overhead is low. #### Disadvantages -Queries using **Point Get** or **Table Range Scan** require more table lookups, which can degrade read performance for such query types. +Queries using **Point Get** or **Table Range Scan** require additional table lookups, which can degrade read performance. #### Suitable scenarios -It is suitable for workloads where write scalability and operational ease are more critical than low-latency reads. +Use non-clustered partitioned tables when write scalability and operational simplicity are more important than low-latency reads. #### Best practices -To address hotspot issues caused by new range partitions, you can perform the following steps. +To mitigate hotspot issues caused by new Range partitions, follow these steps. ##### Step 1. Use `SHARD_ROW_ID_BITS` and `PRE_SPLIT_REGIONS` -Create a partitioned table with [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md) and [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions) to pre-split table regions. The value of `PRE_SPLIT_REGIONS` must be less than or equal to that of `SHARD_ROW_ID_BITS`. The number of pre-split Regions for each partition is `2^(PRE_SPLIT_REGIONS)`. +Create a partitioned table with [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md) and [`PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions) to pre-split Regions. + +**Requirements:** + +- The value of `PRE_SPLIT_REGIONS` must be less than or equal to `SHARD_ROW_ID_BITS`. +- Each partition is pre-split into `2^(PRE_SPLIT_REGIONS)` Regions. ```sql CREATE TABLE employees ( @@ -496,7 +515,7 @@ CREATE TABLE employees ( store_id INT, PRIMARY KEY (`id`,`hired`) NONCLUSTERED, KEY `idx_employees_on_store_id` (`store_id`) -) SHARD_ROW_ID_BITS = 2 PRE_SPLIT_REGIONS=2 +) SHARD_ROW_ID_BITS = 2 PRE_SPLIT_REGIONS = 2 PARTITION BY RANGE ( YEAR(hired) ) ( PARTITION p0 VALUES LESS THAN (1991), PARTITION p1 VALUES LESS THAN (1996), @@ -507,50 +526,52 @@ PARTITION BY RANGE ( YEAR(hired) ) ( ##### Step 2. Add the `merge_option=deny` attribute -Adding the [`merge_option=deny`](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. +Add the [`merge_option=deny`](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute at the table or partition level to prevent empty Regions from being merged. When you drop a partition, TiDB still merges Regions that belong to the dropped partition. ```sql --- table +-- Table level ALTER TABLE employees ATTRIBUTES 'merge_option=deny'; --- partition +-- Partition level ALTER TABLE employees PARTITION `p3` ATTRIBUTES 'merge_option=deny'; ``` -##### Step 3. Determine split boundaries based on existing business data +##### Step 3. Determine split boundaries based on business data -To avoid hotspots when a new table or partition is created, it is often beneficial to pre-split regions before heavy writes begin. To make pre-splitting effective, configure the lower and upper boundaries for region splitting based on the actual business data distribution. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. +To avoid hotspots when you create a table or add a partition, pre-split Regions before heavy writes begin. For effective pre-splitting, configure the lower and upper boundaries for Region splitting based on the actual business data distribution. Avoid setting excessively wide boundaries, as this can prevent data effective data distribution across TiKV nodes, defeating the purpose of pre-splitting. -Identify the minimum and maximum values from existing production data so that incoming writes are more likely to target different pre-allocated regions. The following is an example query for the existing data: +Determine the minimum and maximum values from existing production data so that incoming writes target different pre-allocated Regions. The following query provides an example for retrieving the existing data range: ```sql SELECT MIN(id), MAX(id) FROM employees; ``` -- If the table is new and has no historical data, estimate the minimum and maximum values based on your business logic and expected data range. -- For composite primary keys or composite indexes, only the leftmost column needs to be considered when deciding split boundaries. -- If the leftmost column is a string, take string length and distribution into account to ensure even data spread. +- If the table has no historical data, estimate the minimum and maximum values based on business requirements and expected data ranges. +- For composite primary keys or composite indexes, use only the leftmost column to define split boundaries. +- If the leftmost column is a string, consider its length and value distribution to ensure even data distribution. -##### Step 4. Pre-split and scatter regions +##### Step 4. Pre-split and scatter Regions A common practice is to split the number of regions to match the number of TiKV nodes, or to be twice the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. -##### Step 5. Split regions for the primary key and the secondary index of all partitions if needed +##### Step 5. Split Regions for primary and secondary indexes if needed -To split regions for the primary key of all partitions in a partitioned table, use the following SQL statement: +To split Regions for the primary key of all partitions in a partitioned table, use the following SQL statement: ```sql SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; ``` -This example splits each partition's primary key range into `` regions between the specified boundary values. +This example splits each partition's primary key range into `` Regions within the specified boundaries. -To split regions for the secondary index of all partitions in a partitioned table, use the following SQL statement: +To split Regions for a secondary index of all partitions in a partitioned table, use the following SQL statement: ```sql SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; ``` -##### Step 6. (Optional) When adding a new partition, you need to manually split regions for its primary key and indexes +##### (Optional) Step 6. Manually split Regions when adding a new partition + +When you add a partition, you can manually split Regions for its primary key and indexes. ```sql ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); @@ -568,46 +589,46 @@ SHOW TABLE employees PARTITION (p4) regions; #### Advantages -Queries using **Point Get** or **Table Range Scan** do not need additional lookups, resulting in better read performance. +Queries using **Point Get** or **Table Range Scan** do not require additional lookups, which improves read performance. #### Disadvantages -Manual region splitting is required when creating new partitions, increasing operational complexity. +You must manually split Regions when you create new partitions, which increases operational complexity. #### Suitable scenarios -It is suitable when low-latency point queries are important and operational resources are available to manage region splitting. +Use clustered partitioned tables when low-latency point queries are critical and you can manage manual Region splitting. #### Best practices -To address hotspot issues caused by new range partitions, you can perform the steps described in [Best practices for non-clustered partitioned tables](#best-practices). +To mitigate hotspot issues caused by new Range partitions, follow the steps in [Best practices for non-clustered partitioned tables](#best-practices). ### Solutions for clustered non-partitioned tables #### Advantages -- No hotspot risks from new range partitions. -- Provides good read performance for point and range queries. +- No hotspot risk from new Range partitions. +- Good read performance for point and range queries. #### Disadvantages -You cannot use `DROP PARTITION` to clean up large volumes of old data to improve deletion efficiency. +You cannot use `DROP PARTITION` to efficiently delete large volumes of historical data. #### Suitable scenarios -It is suitable for scenarios that require stable performance and do not benefit from partition-based data management. +Use clustered non-partitioned tables when you require stable performance and do not need partition-based data lifecycle management. ## Convert between partitioned and non-partitioned tables -When working with large tables (for example, a table with 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: +For large tables, such as those with 120 million rows, you might need to convert between partitioned and non-partitioned schemas for performance tuning or schema redesign. TiDB supports the following approaches: - [Pipelined DML](/pipelined-dml.md): `INSERT INTO ... SELECT ...` - [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md): `IMPORT INTO ... FROM SELECT ...` -- [Online DDL](/dm/feature-online-ddl.md): Direct schema transformation via `ALTER TABLE` +- [Online DDL](/dm/feature-online-ddl.md): direct schema transformation using `ALTER TABLE` -This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. +This section compares the efficiency and implications of these methods for both conversion directions and provides best practice recommendations. -### Table schema for a partitioned table: `fa` +### Partitioned table schema: `fa` ```sql CREATE TABLE `fa` ( @@ -626,11 +647,10 @@ PARTITION BY RANGE (`date`) PARTITION `fa_2024002` VALUES LESS THAN (2025002), PARTITION `fa_2024003` VALUES LESS THAN (2025003), ... -... PARTITION `fa_2024365` VALUES LESS THAN (2025365)); ``` -### Table schema for a non-partitioned table: `fa_new` +### Non-partitioned table schema: `fa_new` ```sql CREATE TABLE `fa_new` ( @@ -646,7 +666,7 @@ CREATE TABLE `fa_new` ( ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; ``` -These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. +These examples demonstrate converting a partitioned table to a non-partitioned table. The same methods apply when converting a non-partitioned table to a partitioned table. ### Method 1: Pipelined DML `INSERT INTO ... SELECT` @@ -671,16 +691,16 @@ Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 ### Method 3: Online DDL -The following SQL statement converts from a partition table to a non-partitioned table: +The following SQL statement converts a partitioned table to a non-partitioned table: ```sql SET @@global.tidb_ddl_REORGANIZE_worker_cnt = 16; SET @@global.tidb_ddl_REORGANIZE_batch_size = 4096; ALTER TABLE fa REMOVE PARTITIONING; --- real 170m12.024 s (≈ 2 h 50 m) +-- Actual time: 170m 12.024s (approximately 2h 50m) ``` -The following SQL statement converts from a non-partition table to a partitioned table: +The following SQL statement converts a non-partitioned table to a partitioned table: ```sql SET @@global.tidb_ddl_REORGANIZE_worker_cnt = 16; @@ -697,11 +717,11 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) ### Findings -The following table show the time taken by each method. +The following table shows the time taken by each method for a 120-million-row table: -| Method | Time Taken | +| Method | Time taken | |--------|------------| -| Method 1: Pipelined DML: `INSERT INTO ... SELECT ...` | 58 m 42 s | -| Method 2: `IMPORT INTO ... FROM SELECT ...` | 16 m 59 s | -| Method 3: Online DDL (From partition table to non-partitioned table) | 2 h 50 m | -| Method 3: Online DDL (From non-partition table to partitioned table) | 2 h 31 m | +| Method 1: Pipelined DML (`INSERT INTO ... SELECT ...`) | 58m 42s | +| Method 2: `IMPORT INTO ... FROM SELECT ...` | 16m 59s | +| Method 3: Online DDL (from partitioned to non-partitioned table) | 2h 50m | +| Method 3: Online DDL (from non-partitioned to partitioned table) | 2h 31m | From 35e9b00aa0a39d0cb9e1065a1739f1e0567f7806 Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Mon, 19 Jan 2026 19:10:28 +0800 Subject: [PATCH 84/84] Update best-practices/tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 1 - 1 file changed, 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 5495becc7b933..68caafe5b9804 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -409,7 +409,6 @@ CREATE TABLE server_info ( PARTITION BY KEY (id) PARTITIONS 16; ``` -### Advantages ### Benefits Partitioned tables provide the following benefits: