Skip to content

Conversation

@shaoxiqian
Copy link
Contributor

@shaoxiqian shaoxiqian commented Sep 28, 2025

  • Query optimization with partition pruning
  • Performance comparison: Non-Partitioned vs Local Index vs Global Index
  • Data cleanup efficiency: TTL vs DROP PARTITION
  • Partition drop performance: Local Index vs Global Index
  • Strategies to mitigate write hotspot issues with hash/key partitioning
  • Partition management challenges and best practices
    • Avoiding read/write hotspots on new partitions
    • Using PRE_SPLIT_REGIONS, SHARD_ROW_ID_BITS, and region splitting
  • Converting between partitioned and non-partitioned tables
    • Batch DML, Pipeline DML, IMPORT INTO, and Online DDL efficiency comparison

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

- Query optimization with partition pruning
- Performance comparison: Non-Partitioned vs Local Index vs Global Index
- Data cleanup efficiency: TTL vs DROP PARTITION
- Partition drop performance: Local Index vs Global Index
- Strategies to mitigate write hotspot issues with hash/key partitioning
- Partition management challenges and best practices
  - Avoiding read/write hotspots on new partitions
  - Using PRE_SPLIT_REGIONS, SHARD_ROW_ID_BITS, and region splitting
- Converting between partitioned and non-partitioned tables
  - Batch DML, Pipeline DML, IMPORT INTO, and Online DDL efficiency comparison
@ti-chi-bot ti-chi-bot bot added the missing-translation-status This PR does not have translation status info. label Sep 28, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @shaoxiqian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive guide on mastering partitioned tables in TiDB. It covers critical aspects such as optimizing query performance through partition pruning and global indexes, efficiently managing bulk data deletion with DROP PARTITION versus TTL, mitigating write hotspots using hash/key partitioning, and best practices for managing new range partitions. The guide also provides a detailed comparison of methods for converting tables between partitioned and non-partitioned states, offering practical insights and recommendations for effective TiDB environment management.

Highlights

  • Query Optimization: Details how partition pruning enhances query efficiency and compares performance across non-partitioned, local index, and global index configurations, providing insights into when to use each.
  • Bulk Data Deletion: Explores the efficiency of data cleanup using TTL versus direct partition drops, highlighting the significant performance advantages of DROP PARTITION and its implications for local vs. global indexes.
  • Write Hotspot Mitigation: Discusses strategies, particularly hash/key partitioning, to alleviate write hotspot issues caused by monotonically increasing primary keys, improving load distribution and throughput.
  • Partition Management: Addresses common challenges like read and write hotspots in new range partitions and provides solutions using SHARD_ROW_ID_BITS, PRE_SPLIT_REGIONS, and region splitting for both NONCLUSTERED and CLUSTERED tables.
  • Table Conversion Methods: Compares the efficiency of various methods (Batch DML, Pipeline DML, IMPORT INTO, Online DDL) for converting between partitioned and non-partitioned tables, offering recommendations for optimal performance.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 28, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive guide on using partitioned tables in TiDB. The document covers various aspects from query optimization and data cleanup to mitigating hotspots and managing partitions. My review focuses on improving clarity, consistency, and adherence to the documentation style guide. I've suggested changes to use sentence case for headings, ensure consistent terminology for 'global index' and 'local index', and improve phrasing for better readability. I also found some minor typos and inconsistencies in code examples that could confuse readers.

@shaoxiqian shaoxiqian changed the title adds a new guide on mastering partitioned tables in TiDB add a new guide on mastering partitioned tables in TiDB Sep 28, 2025
shaoxiqian and others added 6 commits September 28, 2025 13:13
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…p/docs into Mastering-TiDB-Partitioned-Tables
@shaoxiqian
Copy link
Contributor Author

/retest

shaoxiqian and others added 4 commits September 29, 2025 10:34
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Lilian Lee <lilin@pingcap.com>
shaoxiqian and others added 2 commits September 29, 2025 20:13
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mattias Jonsson <mjonss@users.noreply.github.com>
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
```

These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table.
These examples demonstrate converting a partitioned table to a non-partitioned table. The same methods apply when converting a non-partitioned table to a partitioned table.

PARTITION `fa_2024365` VALUES LESS THAN (2025365));
```

### Table schema for a non-partitioned table: `fa_new`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Table schema for a non-partitioned table: `fa_new`
### Non-partitioned table schema: `fa_new`

PARTITION `fa_2024002` VALUES LESS THAN (2025002),
PARTITION `fa_2024003` VALUES LESS THAN (2025003),
...
...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
...


This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations.

### Table schema for a partitioned table: `fa`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Table schema for a partitioned table: `fa`
### Partitioned table schema: `fa`

- [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md): `IMPORT INTO ... FROM SELECT ...`
- [Online DDL](/dm/feature-online-ddl.md): Direct schema transformation via `ALTER TABLE`

This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations.
This section compares the efficiency and implications of these methods for both conversion directions and provides best practice recommendations.


- [Pipelined DML](/pipelined-dml.md): `INSERT INTO ... SELECT ...`
- [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md): `IMPORT INTO ... FROM SELECT ...`
- [Online DDL](/dm/feature-online-ddl.md): Direct schema transformation via `ALTER TABLE`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Online DDL](/dm/feature-online-ddl.md): Direct schema transformation via `ALTER TABLE`
- [Online DDL](/dm/feature-online-ddl.md): direct schema transformation using `ALTER TABLE`


## Convert between partitioned and non-partitioned tables

When working with large tables (for example, a table with 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When working with large tables (for example, a table with 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations:
For large tables, such as those with 120 million rows, you might need to convert between partitioned and non-partitioned schemas for performance tuning or schema redesign. TiDB supports the following approaches:


#### Suitable scenarios

It is suitable for scenarios that require stable performance and do not benefit from partition-based data management.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It is suitable for scenarios that require stable performance and do not benefit from partition-based data management.
Use clustered non-partitioned tables when you require stable performance and do not need partition-based data lifecycle management.


#### Disadvantages

You cannot use `DROP PARTITION` to clean up large volumes of old data to improve deletion efficiency.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You cannot use `DROP PARTITION` to clean up large volumes of old data to improve deletion efficiency.
You cannot use `DROP PARTITION` to efficiently delete large volumes of historical data.

Comment on lines 589 to 590
- No hotspot risks from new range partitions.
- Provides good read performance for point and range queries.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- No hotspot risks from new range partitions.
- Provides good read performance for point and range queries.
- No hotspot risk from new Range partitions.
- Good read performance for point and range queries.


#### Best practices

To address hotspot issues caused by new range partitions, you can perform the steps described in [Best practices for non-clustered partitioned tables](#best-practices).
Copy link
Collaborator

@Oreoxmt Oreoxmt Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To address hotspot issues caused by new range partitions, you can perform the steps described in [Best practices for non-clustered partitioned tables](#best-practices).
To mitigate hotspot issues caused by new Range partitions, follow the steps in [Best practices for non-clustered partitioned tables](#best-practices).


#### Suitable scenarios

It is suitable when low-latency point queries are important and operational resources are available to manage region splitting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It is suitable when low-latency point queries are important and operational resources are available to manage region splitting.
Use clustered partitioned tables when low-latency point queries are critical and you can manage manual Region splitting.


#### Disadvantages

Manual region splitting is required when creating new partitions, increasing operational complexity.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Manual region splitting is required when creating new partitions, increasing operational complexity.
You must manually split Regions when you create new partitions, which increases operational complexity.


#### Advantages

Queries using **Point Get** or **Table Range Scan** do not need additional lookups, resulting in better read performance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Queries using **Point Get** or **Table Range Scan** do not need additional lookups, resulting in better read performance.
Queries using **Point Get** or **Table Range Scan** do not require additional lookups, which improves read performance.

Comment on lines 529 to 531
- If the table is new and has no historical data, estimate the minimum and maximum values based on your business logic and expected data range.
- For composite primary keys or composite indexes, only the leftmost column needs to be considered when deciding split boundaries.
- If the leftmost column is a string, take string length and distribution into account to ensure even data spread.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If the table is new and has no historical data, estimate the minimum and maximum values based on your business logic and expected data range.
- For composite primary keys or composite indexes, only the leftmost column needs to be considered when deciding split boundaries.
- If the leftmost column is a string, take string length and distribution into account to ensure even data spread.
- If the table has no historical data, estimate the minimum and maximum values based on business requirements and expected data ranges.
- For composite primary keys or composite indexes, use only the leftmost column to define split boundaries.
- If the leftmost column is a string, consider its length and value distribution to ensure even data distribution.

Co-authored-by: Aolin <aolin.zhang@pingcap.com>
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 12, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 12, 2026

[LGTM Timeline notifier]

Timeline:

  • 2025-10-03 08:14:25.172266392 +0000 UTC m=+418045.428997792: ✖️🔁 reset by dveeden.
  • 2025-10-14 11:57:15.919784395 +0000 UTC m=+181741.997036955: ✖️🔁 reset by dveeden.
  • 2025-10-15 07:23:04.318583067 +0000 UTC m=+251690.395835627: ☑️ agreed by dveeden.
  • 2026-01-12 03:25:09.775101671 +0000 UTC m=+241553.836966581: ☑️ agreed by Oreoxmt.

@hfxsd
Copy link
Collaborator

hfxsd commented Jan 19, 2026

/approve

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 19, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hfxsd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Jan 19, 2026
@ti-chi-bot ti-chi-bot bot merged commit 92729ec into master Jan 19, 2026
9 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #22348.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved area/best-practices Adds or updates TiDB best practices. lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. translation/doing This PR's assignee is translating this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants