-
Notifications
You must be signed in to change notification settings - Fork 126
buffer allocators #6166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer allocators #6166
Conversation
vortex-io/src/write_target.rs
Outdated
| use vortex_error::VortexResult; | ||
|
|
||
| /// A destination for I/O reads that can be finalized into a [`BufferHandle`]. | ||
| pub trait WriteTarget: Send + 'static { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write destination
vortex-file/src/open.rs
Outdated
| @@ -326,6 +343,22 @@ mod tests { | |||
| self.inner.read_at(offset, length, alignment) | |||
| } | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
vortex-io/src/write_target.rs
Outdated
| use vortex_error::VortexResult; | ||
|
|
||
| /// A destination for I/O reads that can be finalized into a [`BufferHandle`]. | ||
| pub trait WriteTarget: Send + 'static { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a CPU only thing?
Signed-off-by: Onur Satici <[email protected]> Signed-off-by: Onur Satici <[email protected]>
Signed-off-by: Onur Satici <[email protected]>
Signed-off-by: Onur Satici <[email protected]>
029ee85 to
613e21c
Compare
| } | ||
| } | ||
|
|
||
| impl<T: VortexReadAt + Clone> VortexReadAt for AllocatingReadAt<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would not have this wrapper normally but push the allocators to the leaf ReadAt impl's. This is here to keep the diff small and get the device copy working when using a device allocator
Signed-off-by: Onur Satici <[email protected]>
CodSpeed Performance ReportMerging this PR will degrade performance by 33.09%Comparing
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | u8_FoR[10M] |
6.4 µs | 9.6 µs | -33.09% |
| ❌ | Simulation | bench_compare_primitive[(10000, 128)] |
165.9 µs | 184.8 µs | -10.24% |
| ❌ | Simulation | bench_compare_primitive[(10000, 2)] |
162 µs | 180.9 µs | -10.47% |
| ❌ | Simulation | bench_compare_primitive[(10000, 32)] |
162.9 µs | 181.8 µs | -10.42% |
| ❌ | Simulation | bench_compare_primitive[(10000, 4)] |
161.3 µs | 180.3 µs | -10.52% |
| ❌ | Simulation | bench_compare_primitive[(100000, 128)] |
903.7 µs | 1,094.1 µs | -17.4% |
| ❌ | Simulation | bench_compare_primitive[(10000, 8)] |
161.7 µs | 180.6 µs | -10.49% |
| ❌ | Simulation | bench_compare_primitive[(100000, 2)] |
899.4 µs | 1,089.9 µs | -17.47% |
| ❌ | Simulation | bench_compare_primitive[(100000, 2048)] |
1 ms | 1.2 ms | -15.96% |
| ❌ | Simulation | bench_compare_primitive[(100000, 4)] |
899.6 µs | 1,090 µs | -17.47% |
| ❌ | Simulation | bench_compare_primitive[(100000, 32)] |
901.4 µs | 1,091.8 µs | -17.44% |
| ❌ | Simulation | bench_compare_primitive[(100000, 512)] |
961.6 µs | 1,152 µs | -16.53% |
| ❌ | Simulation | bench_compare_primitive[(100000, 8)] |
900 µs | 1,090.4 µs | -17.46% |
| ❌ | Simulation | bench_compare_varbin[(10000, 32)] |
170.7 µs | 190.3 µs | -10.3% |
| ❌ | Simulation | bench_compare_varbin[(10000, 2)] |
166.3 µs | 185.9 µs | -10.53% |
| ❌ | Simulation | bench_compare_varbin[(10000, 4)] |
166.8 µs | 186.4 µs | -10.52% |
| ❌ | Simulation | bench_compare_varbin[(10000, 8)] |
167.2 µs | 186.8 µs | -10.49% |
| ❌ | Simulation | bench_compare_varbin[(100000, 128)] |
921 µs | 1,112 µs | -17.18% |
| ❌ | Simulation | bench_compare_varbin[(100000, 2)] |
904.1 µs | 1,095.2 µs | -17.45% |
| ❌ | Simulation | bench_compare_varbin[(100000, 2048)] |
1.2 ms | 1.4 ms | -13.53% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Footnotes
-
1219 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Signed-off-by: Joe Isaacs <[email protected]>
this PR introduces buffer allocators to the scan, which the caller can provide its own impl to scan directly into device buffer handles.
TODO:
example cuda scan using this: