-
Notifications
You must be signed in to change notification settings - Fork 710
best-practices: add a new guide on mastering partitioned tables #21846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Query optimization with partition pruning - Performance comparison: Non-Partitioned vs Local Index vs Global Index - Data cleanup efficiency: TTL vs DROP PARTITION - Partition drop performance: Local Index vs Global Index - Strategies to mitigate write hotspot issues with hash/key partitioning - Partition management challenges and best practices - Avoiding read/write hotspots on new partitions - Using PRE_SPLIT_REGIONS, SHARD_ROW_ID_BITS, and region splitting - Converting between partitioned and non-partitioned tables - Batch DML, Pipeline DML, IMPORT INTO, and Online DDL efficiency comparison
Summary of ChangesHello @shaoxiqian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive guide on mastering partitioned tables in TiDB. It covers critical aspects such as optimizing query performance through partition pruning and global indexes, efficiently managing bulk data deletion with Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a comprehensive guide on using partitioned tables in TiDB. The document covers various aspects from query optimization and data cleanup to mitigating hotspots and managing partitions. My review focuses on improving clarity, consistency, and adherence to the documentation style guide. I've suggested changes to use sentence case for headings, ensure consistent terminology for 'global index' and 'local index', and improve phrasing for better readability. I also found some minor typos and inconsistencies in code examples that could confuse readers.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…p/docs into Mastering-TiDB-Partitioned-Tables
|
/retest |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Lilian Lee <lilin@pingcap.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mattias Jonsson <mjonss@users.noreply.github.com>
| ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; | ||
| ``` | ||
|
|
||
| These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. | |
| These examples demonstrate converting a partitioned table to a non-partitioned table. The same methods apply when converting a non-partitioned table to a partitioned table. |
| PARTITION `fa_2024365` VALUES LESS THAN (2025365)); | ||
| ``` | ||
|
|
||
| ### Table schema for a non-partitioned table: `fa_new` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### Table schema for a non-partitioned table: `fa_new` | |
| ### Non-partitioned table schema: `fa_new` |
| PARTITION `fa_2024002` VALUES LESS THAN (2025002), | ||
| PARTITION `fa_2024003` VALUES LESS THAN (2025003), | ||
| ... | ||
| ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ... |
|
|
||
| This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. | ||
|
|
||
| ### Table schema for a partitioned table: `fa` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### Table schema for a partitioned table: `fa` | |
| ### Partitioned table schema: `fa` |
| - [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md): `IMPORT INTO ... FROM SELECT ...` | ||
| - [Online DDL](/dm/feature-online-ddl.md): Direct schema transformation via `ALTER TABLE` | ||
|
|
||
| This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. | |
| This section compares the efficiency and implications of these methods for both conversion directions and provides best practice recommendations. |
|
|
||
| - [Pipelined DML](/pipelined-dml.md): `INSERT INTO ... SELECT ...` | ||
| - [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md): `IMPORT INTO ... FROM SELECT ...` | ||
| - [Online DDL](/dm/feature-online-ddl.md): Direct schema transformation via `ALTER TABLE` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - [Online DDL](/dm/feature-online-ddl.md): Direct schema transformation via `ALTER TABLE` | |
| - [Online DDL](/dm/feature-online-ddl.md): direct schema transformation using `ALTER TABLE` |
|
|
||
| ## Convert between partitioned and non-partitioned tables | ||
|
|
||
| When working with large tables (for example, a table with 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| When working with large tables (for example, a table with 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: | |
| For large tables, such as those with 120 million rows, you might need to convert between partitioned and non-partitioned schemas for performance tuning or schema redesign. TiDB supports the following approaches: |
|
|
||
| #### Suitable scenarios | ||
|
|
||
| It is suitable for scenarios that require stable performance and do not benefit from partition-based data management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| It is suitable for scenarios that require stable performance and do not benefit from partition-based data management. | |
| Use clustered non-partitioned tables when you require stable performance and do not need partition-based data lifecycle management. |
|
|
||
| #### Disadvantages | ||
|
|
||
| You cannot use `DROP PARTITION` to clean up large volumes of old data to improve deletion efficiency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| You cannot use `DROP PARTITION` to clean up large volumes of old data to improve deletion efficiency. | |
| You cannot use `DROP PARTITION` to efficiently delete large volumes of historical data. |
| - No hotspot risks from new range partitions. | ||
| - Provides good read performance for point and range queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - No hotspot risks from new range partitions. | |
| - Provides good read performance for point and range queries. | |
| - No hotspot risk from new Range partitions. | |
| - Good read performance for point and range queries. |
|
|
||
| #### Best practices | ||
|
|
||
| To address hotspot issues caused by new range partitions, you can perform the steps described in [Best practices for non-clustered partitioned tables](#best-practices). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| To address hotspot issues caused by new range partitions, you can perform the steps described in [Best practices for non-clustered partitioned tables](#best-practices). | |
| To mitigate hotspot issues caused by new Range partitions, follow the steps in [Best practices for non-clustered partitioned tables](#best-practices). |
|
|
||
| #### Suitable scenarios | ||
|
|
||
| It is suitable when low-latency point queries are important and operational resources are available to manage region splitting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| It is suitable when low-latency point queries are important and operational resources are available to manage region splitting. | |
| Use clustered partitioned tables when low-latency point queries are critical and you can manage manual Region splitting. |
|
|
||
| #### Disadvantages | ||
|
|
||
| Manual region splitting is required when creating new partitions, increasing operational complexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Manual region splitting is required when creating new partitions, increasing operational complexity. | |
| You must manually split Regions when you create new partitions, which increases operational complexity. |
|
|
||
| #### Advantages | ||
|
|
||
| Queries using **Point Get** or **Table Range Scan** do not need additional lookups, resulting in better read performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Queries using **Point Get** or **Table Range Scan** do not need additional lookups, resulting in better read performance. | |
| Queries using **Point Get** or **Table Range Scan** do not require additional lookups, which improves read performance. |
| - If the table is new and has no historical data, estimate the minimum and maximum values based on your business logic and expected data range. | ||
| - For composite primary keys or composite indexes, only the leftmost column needs to be considered when deciding split boundaries. | ||
| - If the leftmost column is a string, take string length and distribution into account to ensure even data spread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - If the table is new and has no historical data, estimate the minimum and maximum values based on your business logic and expected data range. | |
| - For composite primary keys or composite indexes, only the leftmost column needs to be considered when deciding split boundaries. | |
| - If the leftmost column is a string, take string length and distribution into account to ensure even data spread. | |
| - If the table has no historical data, estimate the minimum and maximum values based on business requirements and expected data ranges. | |
| - For composite primary keys or composite indexes, use only the leftmost column to define split boundaries. | |
| - If the leftmost column is a string, consider its length and value distribution to ensure even data distribution. |
Co-authored-by: Aolin <aolin.zhang@pingcap.com>
[LGTM Timeline notifier]Timeline:
|
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hfxsd The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
In response to a cherrypick label: new pull request created to branch |
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?