-
Notifications
You must be signed in to change notification settings - Fork 487
design: builtin schema migration #33863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design: builtin schema migration #33863
Conversation
SangJunBak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything makes sense! Just some edge cases and an alternative
| To avoid data loss and other surprises caused by automatic builtin schema migrations, we introduce the concept of explicit migration instructions. | ||
| A migration instruction instructs the process which builtin collection to migrate at which version, and which mechanism to use. | ||
|
|
||
| Migration instructions are kept in a hard-coded list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we handle the cases where we write Mechanism::Evolution for a schema and it's not backwards compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we handle the cases where we write
Mechanism::Evolutionfor a schema and it's not backwards compatible?
That would be a bug. The migration by evolution mechanism checks the new schema for backward compatibility and panics if it is not backward compatible.
Thinking of this edge case where we upgrade from 0.148.1 to 0.149.
Hm, yeah this scheme depends on the MIGRATIONS list always being complete and not missing any migrations performed at previous versions. Inserting incompatible migrations retroactively isn't allowed. Inserting compatible migrations is fine, but maybe we still want to disallow them, to not muddy the waters too much. Given that patch releases should be reserved for critical bug fixes, I don't think disallowing builtin schema changes in them is unreasonable.
There is the question of whether we can prevent people from accidentally adding schema migrations retroactively. Your scenario is only possible because semver allows arbitrarily inserting new versions between existing versions. We could go the way of the protobuf migrations and have a BUILTIN_SCHEMA_VERSION: u64 that we increase every time we change a builtin schema. That would leave no way to insert new versions between existing versions and the additional complexity from tracking an additional version seems low enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inserting compatible migrations is fine, but maybe we still want to disallow them, to not muddy the waters too much. Given that patch releases should be reserved for critical bug fixes, I don't think disallowing builtin schema changes in them is unreasonable.
Among other things, it would be a semver break, since we'd end up removing API surface in a minor release.
There is the question of whether we can prevent people from accidentally adding schema migrations retroactively.
I think we can do a decent amount of checking here in CI... verify that the list of migrations in the last release is consistent with the current one. I think we'll want similar CI to ensure that we don't prematurely drop migrations, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about CI! I think the tests don't even have to look at the migrations list specifically, they can just attempt all possible migration paths, and in Jun's example the migration from 0.148.1 to 0.149.0 would fail. The only wrinkle is that you'd only find the issue in the CI for 0.149.0, if you run it after cutting 0.149.0. So we might need to add a step when cutting a new patch release that also checks if upgrading from that patch release to all existing higher versions succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we might need to add a step when cutting a new patch release that also checks if upgrading from that patch release to all existing higher versions succeeds.
Good idea for to add to the testing plan/suite and makes sense to me. Will make note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that totally works as well, but the cost is very different... testing all possible upgrade paths for N versions is O(2^N), while testing that each version's migration list is a superset of the previous is O(N). Maybe that's fine if we're paying that cost for other reasons!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's O(2^N) only if we test multi-step upgrades (i.e., when a test scenario can involve more than 2 versions). It's only O(N^2) if each test scenario just upgrades from one version to another. The latter is enough if upgrading from A to B can't be affected by what version we upgraded to A from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to what Gabor said, I think we get away with only upgrading to and from the version we are currently testing. E.g. in the CI of version 0.148.1, we test upgrading each smaller supported version to 0.148.1 and test upgrading 0.148.1 to each larger existing version. The reasoning being that we don't need to test upgrading to/from other versions because we have already done that in the CI of these versions. In which case the effort would be O(N).
bkirwi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!!
|
|
||
| In the subsequent read-only bootstrap phase, the process creates persist read and write handles using the new schema. | ||
| Read handles perform transparent migration of any data updates that flow through them, so dataflow hydration can proceed using the new schema. | ||
| Write handles only require a matching registered shard schema when writing batches, which is something a read-only process doesn't do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two qualifiers:
- Right now we require a matching registered schema when creating the handle, though as discussed that can change.
- I seem to recall a case in txn-wal where we write empty batches to advance the frontier, and we'd need to make sure that those use the old schema or work differently. (That seemed sketchy to me at the time, so I feel good about changing it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I was pretending that we've already made the write handle change to not distract the reader unnecessarily, optimistically assuming that it's easy to implement and will be done before the design doc merges.
I seem to recall a case in txn-wal where we write empty batches to advance the frontier, and we'd need to make sure that those use the old schema or work differently.
Interesting! So far I've only found that for replaced builtin table shards, we tick forward their frontiers. But we explicitly make sure that they don't get inserted into txn-wal while the environment is in read-only mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relevant method is DataSnapshot::unblock_read - it's documented to do a CaA in some cases, if you trace the callers, you can see that quite a few methods which "look" read-only (snapshot-and-fetch, etc.) do end up calling it.
My memory is that those read-only-looking methods were sometimes called in read-only mode, but I haven't verified this anytime recently.
Personally I don't think unblock_read has to work that way. If this ends up being a real issue and not just a historical one we can probably discuss options then...
| The general approach to shard replacement matches the existing implementation, but takes care to not needlessly interfere with processes at other versions. | ||
|
|
||
| To support schema migration by shard replacement, we need a place to store new shard IDs across restarts, and we will keep using the migration shard for this. | ||
| The migration shard contains entries of the form `(GlobalId, Version) -> ShardId`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can use the catalog epoch or similar for this? Since being able to abort an upgrade and then upgrade to a smaller version is in scope, if we fail an upgrade from A to C, then successfully upgrade A to B and later B to C, it seems like we might end up with weird state that is partly from the failed early upgrade and partly from the later one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it either needs to be the Mz version, or a new "builtin version" we introduce. In any case, the version needs to identify the schemas of the builtin collections. Otherwise how does a read-only process know if an entry it finds for, e.g. (catalog epoch + 1), was written by a different process with the same builtin schemas, or by one with different schemas?
if we fail an upgrade from A to C, then successfully upgrade A to B and later B to C, it seems like we might end up with weird state that is partly from the failed early upgrade and partly from the later one.
I think that should work out fine:
- C starts read-only, mints new shard IDs, populates the migration shard with entries
(<gid>, C, <shard-C>). Maybe starts to hydrate but then is aborted. - B starts read-only, mints new shard IDs, populates the migration shard with entries
(<gid>, B, <shard-B>). Hydrates successfully and promotes. - Upon B's promotion the
<shard-B>shards become the "official" shards, the shards they replace get finalized. - C starts read-only again, finds existing
<shard-C>entries in the migration shard, so doesn't mint new one. C hydrates successfully and promotes. - Upon C's promotion the
<shard-C>shards become the "official" shards, the<shard-B>shards get finalized.
The above assumes that in version C we have to replace the same shards as in version B (i.e. we made two incompatible schema changes). More likely, version C doesn't have to perform any migration at all, or migrates different shards than B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poking through, the specific field I had in mind was deploy_generation, which increases for every unique deploy. My understanding is that a deploy happens at a specific version, so it should be impossible for the schemas to change while the deploy generation remains the same.
I think that should work out fine [...]
To be clear: I'm not worried about the schemas, but rather the actual contents of the shard. It seems like during your step 1, I might start writing data to the shard, then stop writing to it until step 4. This means we may end up with a shard whose semantics are difficult to explain. Today, the semantics of a history shard are at least straightforward to explain - it includes all events from a particular deploy after some arbitrary cutoff time. With this approach we'd also include little chunks of data from previous deploys that were otherwise not visible to any client.
OTOH, if the shards belong to a particular deploy, then if that deploy fails they would be ignored and the data in them wouldn't be observed by any future successful deploy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: I'm not worried about the schemas, but rather the actual contents of the shard. It seems like during your step 1, I might start writing data to the shard, then stop writing to it until step 4.
I think this is also mostly fine, at least given how things are currently working, because:
- For builtin storage-collections (including all the histories) we don't write them in the read-only environment, just advance their frontiers. So no previous contents to worry about here.
- For builtin tables, we truncate them when we start up (or emit a correcting diff, not quite sure). So the code deals with previous contents that way.
Note that we have to be able to handle migration restarts at the same version because envd can restart in read-only mode (and does so every time it observes new DDL from the leader env) and when it does it needs to not become confused running the migration again.
My understanding is that a deploy happens at a specific version, so it should be impossible for the schemas to change while the deploy generation remains the same.
I'm not sure that's true! At least according to this doc the deploy generation is only increased during a leader promotion. So nothing prevents a user to start an upgrade with deploy generation N + 1, then abort that, then start another upgrade with a different Mz version but again using deploy generation N + 1.
All that said, it seems like a good idea to key the migration shard not by deploy generation, but by (Mz version, deploy generation). The reason is that in theory a self-managed user could decide to start two upgrades to the same Mz version but at different deploy generations at the same time. Not saying that this would be a reasonable thing to do, but possible. And I imagine in this case we'd want the prevent the two deploys from sharing migration state, for sanity reasons.
| To avoid data loss and other surprises caused by automatic builtin schema migrations, we introduce the concept of explicit migration instructions. | ||
| A migration instruction instructs the process which builtin collection to migrate at which version, and which mechanism to use. | ||
|
|
||
| Migration instructions are kept in a hard-coded list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inserting compatible migrations is fine, but maybe we still want to disallow them, to not muddy the waters too much. Given that patch releases should be reserved for critical bug fixes, I don't think disallowing builtin schema changes in them is unreasonable.
Among other things, it would be a semver break, since we'd end up removing API surface in a minor release.
There is the question of whether we can prevent people from accidentally adding schema migrations retroactively.
I think we can do a decent amount of checking here in CI... verify that the list of migrations in the last release is consistent with the current one. I think we'll want similar CI to ensure that we don't prematurely drop migrations, for example.
| ## Alternatives | ||
|
|
||
| As an alternative to schema evolution, we can consider a migration scheme that creates a new shard and copies over all existing data from the old shard, performing the migration in the process. | ||
| Doing so would enable us to perform arbitrary rewrites of the data, as well as breaking schema changes without loss of historical data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like this would be a natural extension of the approach you propose above. (Just another type of entry in the MIGRATIONS slice.) Happy to leave it for future work.
|
|
||
| ### Shard Replacement | ||
|
|
||
| The general approach to shard replacement matches the existing implementation, but takes care to not needlessly interfere with processes at other versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I don't understand:
- For Persist-level migrations, the resulting shard contents will be the contents of the previous leader up until the new version took leadership, then anything we add after that.
- For shard replacements... what's the desired behaviour? This section makes it sound like we'd create the shards even in read-only mode, so shard replacements will include data from the read-only replicas, which is inconsistent. (It will contain logs of data that was never visible to the user.) But if we only create the new shards when taking leadership, why do we even need the migration shard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have to create the replacements in read-only mode because everything in adapter assumes the builtin storage collections to exist. There is also the issue that the entire point of read-only mode is that we can let dataflows hydrate before the cutover. But that means that for builtin collections with replaced shards, you have to write to these shards in read-only mode, or at least tick their frontiers forward, so that dataflows reading from these collections can make progress.
We have special code to tick forward the replacement shards:
materialize/src/storage-controller/src/lib.rs
Lines 1173 to 1181 in 7ed8ee0
| // In read-only mode, we use a special read-only table worker | |
| // that allows writing to migrated tables and will continually | |
| // bump their shard upper so that it tracks the txn shard upper. | |
| // We do this, so that they remain readable at a recent | |
| // timestamp, which in turn allows dataflows that depend on them | |
| // to (re-)hydrate. | |
| // | |
| // We only want to register migrated tables, though, and leave | |
| // existing tables out/never write to them in read-only mode. |
For builtin tables, we even write their out their contents:
materialize/src/adapter/src/coord.rs
Lines 2204 to 2208 in 7ed8ee0
| // When 0dt is enabled, we create new shards for any migrated builtin storage collections. | |
| // In read-only mode, the migrated builtin tables (which are a subset of migrated builtin | |
| // storage collections) need to be back-filled so that any dependent dataflow can be | |
| // hydrated. Additionally, these shards are not registered with the txn-shard, and cannot | |
| // be registered while in read-only, so they are written to directly. |
So, yes the inconsistency you point out exists. It also exists regardless of read-only mode with indexes on retained-history collection. Suppose a new Mz version has an optimizer or rendering change that changes the results of dataflow computations. An index with retained history will show the new results for times before the upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! I don't love the inconsistency, but it sounds like it's no worse than stuff we're already doing, so I will not worry about it.
bkirwi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some lingering questions about the semantics of various collections, but I think this is a pretty clear step in the right direction - thanks!
SangJunBak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on my end!
ggevay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me too!
| * For each object to migrate, perform the migration using the selected mechanism. | ||
|
|
||
| Note that merging `Evolution` migrations like this is sound because persist requires that each shard schema is backward compatible with any previous schema registered with the shard. | ||
| Which means schema evolution will succeed even if we skip intermediary schemas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(In other words, compatibility of schema changes is transitive.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go away with your smart words :D
ec68e20 to
5c1f071
Compare
|
TFTRs! |

Motivation
Proposes a design to fix:
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.