-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add snapsafety features to VMClock device #5564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
bchalios
wants to merge
8
commits into
main
Choose a base branch
from
vmclock-snapsafe
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,525
−91
Draft
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
1c5fe1c
vmclock ABI: add snapshot safety features
bchalios 40c6c86
vmclock: add snapshot safety features
bchalios 734f841
vmclock: test snapshot safety features
bchalios 5b6f3c8
vmclock: add support for Aarch64
bchalios 2fa64a6
ci: apply patches before build CI kernels
bchalios 7903cca
test(vmclock): enable VMClock for Aarch64 kernels
bchalios 99d68b0
test(vmclock): enable skipped tests
bchalios e419450
temp: point to CI bucket with VMClock kernels
bchalios File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
60 changes: 60 additions & 0 deletions
60
resources/patches/vmclock/5.10/0001-ptp-vmclock-add-vm-generation-counter.patch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| From a46562c571c6d50e7afc3994b33d0ffb61ff7409 Mon Sep 17 00:00:00 2001 | ||
| From: Babis Chalios <[email protected]> | ||
| Date: Tue, 2 Dec 2025 20:11:32 +0000 | ||
| Subject: [PATCH 1/4] ptp: vmclock: add vm generation counter | ||
|
|
||
| Similar to live migration, loading a VM from some saved state (aka | ||
| snapshot) is also an event that calls for clock adjustments in the | ||
| guest. However, guests might want to take more actions as a response to | ||
| such events, e.g. as discarding UUIDs, resetting network connections, | ||
| reseeding entropy pools, etc. These are actions that guests don't | ||
| typically take during live migration, so add a new field in the | ||
| vmclock_abi called vm_generation_counter which informs the guest about | ||
| such events. | ||
|
|
||
| Hypervisor advertises support for vm_generation_counter through the | ||
| VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT flag. Users need to check the | ||
| presence of this bit in vmclock_abi flags field before using this flag. | ||
|
|
||
| Signed-off-by: Babis Chalios <[email protected]> | ||
| Reviewed-by: David Woodhouse <[email protected]> | ||
| --- | ||
| include/uapi/linux/vmclock-abi.h | 15 +++++++++++++++ | ||
| 1 file changed, 15 insertions(+) | ||
|
|
||
| diff --git a/include/uapi/linux/vmclock-abi.h b/include/uapi/linux/vmclock-abi.h | ||
| index d7ca44313bf8..75deb6ae2b27 100644 | ||
| --- a/include/uapi/linux/vmclock-abi.h | ||
| +++ b/include/uapi/linux/vmclock-abi.h | ||
| @@ -119,6 +119,12 @@ struct vmclock_abi { | ||
| * bit again after the update, using the about-to-be-valid fields. | ||
| */ | ||
| #define VMCLOCK_FLAG_TIME_MONOTONIC (1 << 7) | ||
| + /* | ||
| + * If the VM_GEN_COUNTER_PRESENT flag is set, the hypervisor will | ||
| + * bump the vm_generation_counter field every time the guest is | ||
| + * loaded from some save state (restored from a snapshot). | ||
| + */ | ||
| +#define VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT (1 << 8) | ||
|
|
||
| uint8_t pad[2]; | ||
| uint8_t clock_status; | ||
| @@ -183,6 +189,15 @@ struct vmclock_abi { | ||
| uint64_t time_frac_sec; /* (seconds >> 64) */ | ||
| uint64_t time_esterror_picosec; /* (± picoseconds) */ | ||
| uint64_t time_maxerror_picosec; /* (± picoseconds) */ | ||
| + | ||
| + /* | ||
| + * This field changes to another non-repeating value when the guest | ||
| + * has been loaded from a snapshot. In addition to handling a | ||
| + * disruption in time (which will also be signalled through the | ||
| + * disruption_marker field), a guest may wish to discard UUIDs, | ||
| + * reset network connections, reseed entropy, etc. | ||
| + */ | ||
| + uint64_t vm_generation_counter; | ||
| }; | ||
|
|
||
| #endif /* __VMCLOCK_ABI_H__ */ | ||
| -- | ||
| 2.34.1 | ||
|
|
257 changes: 257 additions & 0 deletions
257
resources/patches/vmclock/5.10/0002-ptp-vmclock-support-device-notifications.patch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,257 @@ | ||
| From d0a6bf47dd6cd2a9ed17dbdc32dd34a6ba0f5b5f Mon Sep 17 00:00:00 2001 | ||
| From: Babis Chalios <[email protected]> | ||
| Date: Tue, 2 Dec 2025 20:11:44 +0000 | ||
| Subject: [PATCH 2/4] ptp: vmclock: support device notifications | ||
|
|
||
| Add optional support for device notifications in VMClock. When | ||
| supported, the hypervisor will send a device notification every time it | ||
| updates the seq_count to a new even value. | ||
|
|
||
| Moreover, add support for poll() in VMClock as a means to propagate this | ||
| notification to user space. poll() will return a POLLIN event to | ||
| listeners every time seq_count changes to a value different than the one | ||
| last seen (since open() or last read()/pread()). This means that when | ||
| poll() returns a POLLIN event, listeners need to use read() to observe | ||
| what has changed and update the reader's view of seq_count. In other | ||
| words, after a poll() returned, all subsequent calls to poll() will | ||
| immediately return with a POLLIN event until the listener calls read(). | ||
|
|
||
| The device advertises support for the notification mechanism by setting | ||
| flag VMCLOCK_FLAG_NOTIFICATION_PRESENT in vmclock_abi flags field. If | ||
| the flag is not present the driver won't setup the ACPI notification | ||
| handler and poll() will always immediately return POLLHUP. | ||
|
|
||
| Signed-off-by: Babis Chalios <[email protected]> | ||
| --- | ||
| drivers/ptp/ptp_vmclock.c | 130 ++++++++++++++++++++++++++++--- | ||
| include/uapi/linux/vmclock-abi.h | 5 ++ | ||
| 2 files changed, 126 insertions(+), 9 deletions(-) | ||
|
|
||
| diff --git a/drivers/ptp/ptp_vmclock.c b/drivers/ptp/ptp_vmclock.c | ||
| index 1ce69eada4b2..4673915c43e7 100644 | ||
| --- a/drivers/ptp/ptp_vmclock.c | ||
| +++ b/drivers/ptp/ptp_vmclock.c | ||
| @@ -5,6 +5,9 @@ | ||
| * Copyright © 2024 Amazon.com, Inc. or its affiliates. | ||
| */ | ||
|
|
||
| +#include "linux/poll.h" | ||
| +#include "linux/types.h" | ||
| +#include "linux/wait.h" | ||
| #include <linux/acpi.h> | ||
| #include <linux/device.h> | ||
| #include <linux/err.h> | ||
| @@ -37,6 +40,7 @@ struct vmclock_state { | ||
| struct resource res; | ||
| struct vmclock_abi *clk; | ||
| struct miscdevice miscdev; | ||
| + wait_queue_head_t disrupt_wait; | ||
| struct ptp_clock_info ptp_clock_info; | ||
| struct ptp_clock *ptp_clock; | ||
| enum clocksource_ids cs_id, sys_cs_id; | ||
| @@ -311,10 +315,15 @@ static const struct ptp_clock_info ptp_vmclock_info = { | ||
| .getcrosststamp = ptp_vmclock_getcrosststamp, | ||
| }; | ||
|
|
||
| +struct vmclock_file_state { | ||
| + struct vmclock_state *st; | ||
| + atomic_t seq; | ||
| +}; | ||
| + | ||
| static int vmclock_miscdev_mmap(struct file *fp, struct vm_area_struct *vma) | ||
| { | ||
| - struct vmclock_state *st = container_of(fp->private_data, | ||
| - struct vmclock_state, miscdev); | ||
| + struct vmclock_file_state *fst = fp->private_data; | ||
| + struct vmclock_state *st = fst->st; | ||
|
|
||
| if ((vma->vm_flags & (VM_READ|VM_WRITE)) != VM_READ) | ||
| return -EROFS; | ||
| @@ -333,11 +342,12 @@ static int vmclock_miscdev_mmap(struct file *fp, struct vm_area_struct *vma) | ||
| static ssize_t vmclock_miscdev_read(struct file *fp, char __user *buf, | ||
| size_t count, loff_t *ppos) | ||
| { | ||
| - struct vmclock_state *st = container_of(fp->private_data, | ||
| - struct vmclock_state, miscdev); | ||
| + struct vmclock_file_state *fst = fp->private_data; | ||
| + struct vmclock_state *st = fst->st; | ||
| + | ||
| ktime_t deadline = ktime_add(ktime_get(), VMCLOCK_MAX_WAIT); | ||
| size_t max_count; | ||
| - int32_t seq; | ||
| + int32_t seq, old_seq; | ||
|
|
||
| if (*ppos >= PAGE_SIZE) | ||
| return 0; | ||
| @@ -346,6 +356,7 @@ static ssize_t vmclock_miscdev_read(struct file *fp, char __user *buf, | ||
| if (count > max_count) | ||
| count = max_count; | ||
|
|
||
| + old_seq = atomic_read(&fst->seq); | ||
| while (1) { | ||
| seq = st->clk->seq_count & ~1ULL; | ||
| virt_rmb(); | ||
| @@ -354,8 +365,16 @@ static ssize_t vmclock_miscdev_read(struct file *fp, char __user *buf, | ||
| return -EFAULT; | ||
|
|
||
| virt_rmb(); | ||
| - if (seq == st->clk->seq_count) | ||
| - break; | ||
| + if (seq == st->clk->seq_count) { | ||
| + /* | ||
| + * Either we updated fst->seq to seq (the latest version we observed) | ||
| + * or someone else did (old_seq == seq), so we can break. | ||
| + */ | ||
| + if (atomic_try_cmpxchg(&fst->seq, &old_seq, seq) || | ||
| + old_seq == seq) { | ||
| + break; | ||
| + } | ||
| + } | ||
|
|
||
| if (ktime_after(ktime_get(), deadline)) | ||
| return -ETIMEDOUT; | ||
| @@ -365,9 +384,57 @@ static ssize_t vmclock_miscdev_read(struct file *fp, char __user *buf, | ||
| return count; | ||
| } | ||
|
|
||
| +static __poll_t vmclock_miscdev_poll(struct file *fp, poll_table *wait) | ||
| +{ | ||
| + struct vmclock_file_state *fst = fp->private_data; | ||
| + struct vmclock_state *st = fst->st; | ||
| + uint32_t seq; | ||
| + | ||
| + /* | ||
| + * Hypervisor will not send us any notifications, so fail immediately | ||
| + * to avoid having caller sleeping for ever. | ||
| + */ | ||
| + if (!(st->clk->flags & VMCLOCK_FLAG_NOTIFICATION_PRESENT)) | ||
| + return POLLHUP; | ||
| + | ||
| + poll_wait(fp, &st->disrupt_wait, wait); | ||
| + | ||
| + seq = st->clk->seq_count; | ||
| + if (atomic_read(&fst->seq) != seq) | ||
| + return POLLIN | POLLRDNORM; | ||
| + | ||
| + return 0; | ||
| +} | ||
| + | ||
| +static int vmclock_miscdev_open(struct inode *inode, struct file *fp) | ||
| +{ | ||
| + struct vmclock_state *st = container_of(fp->private_data, | ||
| + struct vmclock_state, miscdev); | ||
| + struct vmclock_file_state *fst = kzalloc(sizeof(*fst), GFP_KERNEL); | ||
| + | ||
| + if (!fst) | ||
| + return -ENOMEM; | ||
| + | ||
| + fst->st = st; | ||
| + atomic_set(&fst->seq, 0); | ||
| + | ||
| + fp->private_data = fst; | ||
| + | ||
| + return 0; | ||
| +} | ||
| + | ||
| +static int vmclock_miscdev_release(struct inode *inode, struct file *fp) | ||
| +{ | ||
| + kfree(fp->private_data); | ||
| + return 0; | ||
| +} | ||
| + | ||
| static const struct file_operations vmclock_miscdev_fops = { | ||
| - .mmap = vmclock_miscdev_mmap, | ||
| - .read = vmclock_miscdev_read, | ||
| + .open = vmclock_miscdev_open, | ||
| + .release = vmclock_miscdev_release, | ||
| + .mmap = vmclock_miscdev_mmap, | ||
| + .read = vmclock_miscdev_read, | ||
| + .poll = vmclock_miscdev_poll, | ||
| }; | ||
|
|
||
| /* module operations */ | ||
| @@ -413,6 +480,44 @@ static acpi_status vmclock_acpi_resources(struct acpi_resource *ares, void *data | ||
| return AE_ERROR; | ||
| } | ||
|
|
||
| +static void | ||
| +vmclock_acpi_notification_handler(acpi_handle __always_unused handle, | ||
| + u32 __always_unused event, void *dev) | ||
| +{ | ||
| + struct device *device = dev; | ||
| + struct vmclock_state *st = device->driver_data; | ||
| + | ||
| + wake_up_interruptible(&st->disrupt_wait); | ||
| +} | ||
| + | ||
| +static int vmclock_setup_notification(struct device *dev, struct vmclock_state *st) | ||
| +{ | ||
| + struct acpi_device *adev = ACPI_COMPANION(dev); | ||
| + acpi_status status; | ||
| + | ||
| + /* | ||
| + * This should never happen as this function is only called when | ||
| + * has_acpi_companion(dev) is true, but the logic is sufficiently | ||
| + * complex that Coverity can't see the tautology. | ||
| + */ | ||
| + if (!adev) | ||
| + return -ENODEV; | ||
| + | ||
| + /* The device does not support notifications. Nothing else to do */ | ||
| + if (!(st->clk->flags & VMCLOCK_FLAG_NOTIFICATION_PRESENT)) | ||
| + return 0; | ||
| + | ||
| + status = acpi_install_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY, | ||
| + vmclock_acpi_notification_handler, | ||
| + dev); | ||
| + if (ACPI_FAILURE(status)) { | ||
| + dev_err(dev, "failed to install notification handler"); | ||
| + return -ENODEV; | ||
| + } | ||
| + | ||
| + return 0; | ||
| +} | ||
| + | ||
| static int vmclock_probe_acpi(struct device *dev, struct vmclock_state *st) | ||
| { | ||
| struct acpi_device *adev = ACPI_COMPANION(dev); | ||
| @@ -495,6 +600,11 @@ static int vmclock_probe(struct platform_device *pdev) | ||
| goto out; | ||
| } | ||
|
|
||
| + init_waitqueue_head(&st->disrupt_wait); | ||
| + ret = vmclock_setup_notification(dev, st); | ||
| + if (ret) | ||
| + return ret; | ||
| + | ||
| /* If the structure is big enough, it can be mapped to userspace */ | ||
| if (st->clk->size >= PAGE_SIZE) { | ||
| st->miscdev.minor = MISC_DYNAMIC_MINOR; | ||
| @@ -544,6 +654,8 @@ static int vmclock_probe(struct platform_device *pdev) | ||
| goto out; | ||
| } | ||
|
|
||
| + dev->driver_data = st; | ||
| + | ||
| dev_info(dev, "%s: registered %s%s%s\n", st->name, | ||
| st->miscdev.minor ? "miscdev" : "", | ||
| (st->miscdev.minor && st->ptp_clock) ? ", " : "", | ||
| diff --git a/include/uapi/linux/vmclock-abi.h b/include/uapi/linux/vmclock-abi.h | ||
| index 75deb6ae2b27..4b7cd2b8532c 100644 | ||
| --- a/include/uapi/linux/vmclock-abi.h | ||
| +++ b/include/uapi/linux/vmclock-abi.h | ||
| @@ -125,6 +125,11 @@ struct vmclock_abi { | ||
| * loaded from some save state (restored from a snapshot). | ||
| */ | ||
| #define VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT (1 << 8) | ||
| + /* | ||
| + * If the NOTIFICATION_PRESENT flag is set, the hypervisor will send | ||
| + * a notification every time it updates seq_count to a new even number. | ||
| + */ | ||
| +#define VMCLOCK_FLAG_NOTIFICATION_PRESENT (1 << 9) | ||
|
|
||
| uint8_t pad[2]; | ||
| uint8_t clock_status; | ||
| -- | ||
| 2.34.1 | ||
|
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have this in our patches on top of this base config, otherwise we risk removing these when we rebase the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem is that our patches apply in the Linux tree and this config doesn't exist inside it. We would need a separate set of patches which apply in the configs, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not following. We're adding patches to the linux tree to support this device on aarch64. Why wouldn't the configs apply to it?
We do this after checking out the tree and applying the patches:
So having it here or in
vmclock.configshould be equivalent, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right. I had forgotten about the extra configs we're keeping. I though you meant having it as part of the linux kernel source code pathces (the one I just added). Sure, I'll move it there.