-
Notifications
You must be signed in to change notification settings - Fork 7
Control Plane support for initializing the diskless log #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3827745 to
79bea61
Compare
Introduces control plane support for initializing diskless logs, enabling migrating topics from classic to diskless. The initialization transfers log offsets and producer state.
79bea61 to
1329854
Compare
| CREATE DOMAIN leader_epoch_t AS INT NOT NULL | ||
| CHECK (VALUE >= 0); | ||
|
|
||
| ALTER TABLE logs ADD COLUMN diskless_start_offset offset_t DEFAULT 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we will have 0 as default here, how we will further distinguish the topics that are in process of migration and those that are created as diskless for the start? Will we use this offset for routing the fetch logic to tiered vs diskless? If so then will 0 mean that there is no tiered data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we will have 0 as default here, how we will further distinguish the topics that are in process of migration and those that are created as diskless for the start?
Once the entry in the logs table is present, it means the migration was successful. All the necessary information transfer are made within the same transaction (producer state, high watermark, ...)
Will we use this offset for routing the fetch logic to tiered vs diskless? If so then will 0 mean that there is no tiered data?
Yes exactly, diskless_start_offset represents the boundary B0 of the design doc. So diskless_start_offset == 0 means no tiered data is present. This field will then be updated after the merge to TS has been performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But will it be 0 after the migration initialization for tiered partition right away since it's the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get it now. I thought for a moment that this default will be set for all the diskless + hybrid/migrated topics. But do I now understand correctly that it will be set only for the partitions that were created as diskless from the very beginning?
| UPDATE logs | ||
| SET log_start_offset = l_request.log_start_offset, | ||
| high_watermark = l_request.diskless_start_offset, | ||
| diskless_start_offset = l_request.diskless_start_offset, | ||
| leader_epoch_at_init = l_request.leader_epoch | ||
| WHERE topic_id = l_request.topic_id | ||
| AND partition = l_request.partition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually do updates here? I suspect that if we use diskless_log_start for routing decisions then it could be already observed by one of the nodes so the routing decision might be incorrect in this case
| long logStartOffset = 0; | ||
| long highWatermark = 0; | ||
| long byteSize = 0; | ||
| Long disklessStartOffset = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for PostgresControlPlane it is 0, right? Should not they be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, i'll fix this
| INTO l_existing_log | ||
| FROM logs | ||
| WHERE topic_id = l_request.topic_id | ||
| AND partition = l_request.partition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to lock here with FOR UPDATE?
| FOR l_request IN | ||
| SELECT * | ||
| FROM unnest(arg_requests) | ||
| LOOP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check here that
l_request.log_start_offset <= l_request.diskless_start_offset
?
Introduces control plane support for initializing diskless logs, enabling migrating topics from classic to diskless. The initialization transfers log offsets and producer state.