-
Notifications
You must be signed in to change notification settings - Fork 21
Implement S3 interface - retrieving data #229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis PR refactors the version storage and image handling system from synchronous file-based I/O to asynchronous stream-based operations. Changes include updating the VersionStore trait to add derived file handling, removing metadata-based operations in favor of size queries, deprecating FileChunker, and converting image resize operations to return streams instead of file paths. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Controller as Server<br/>Controller
participant FSUtil as util::fs
participant VersionStore as VersionStore
participant Storage as Storage<br/>(Local/S3)
rect rgb(200, 220, 255)
Note over Client,Storage: Old Flow: Synchronous Path-Based
Client->>Controller: GET /file?resize=<width>x<height>
Controller->>FSUtil: handle_image_resize(path, width, height)
FSUtil->>VersionStore: open_version(hash)
VersionStore->>Storage: open file
Storage-->>VersionStore: file handle
VersionStore-->>FSUtil: ReadSeek
FSUtil->>FSUtil: resize image, save to disk
FSUtil-->>Controller: resized_path
Controller->>Controller: open file from path
Controller-->>Client: streaming response (from file)
end
rect rgb(220, 255, 220)
Note over Client,Storage: New Flow: Asynchronous Stream-Based
Client->>Controller: GET /file?resize=<width>x<height>
Controller->>FSUtil: handle_image_resize(hash, width, height)
activate FSUtil
FSUtil->>VersionStore: get_version_stream(hash)
activate VersionStore
VersionStore->>Storage: stream version bytes
Storage-->>VersionStore: Stream<Bytes>
VersionStore-->>FSUtil: Stream<Bytes>
deactivate VersionStore
FSUtil->>FSUtil: decode from stream,<br/>resize in-memory,<br/>encode to buffer
FSUtil->>VersionStore: store_version_derived(image, buf, path)
VersionStore->>Storage: write derived file
Storage-->>VersionStore: Result<()>
VersionStore-->>FSUtil: ✓
FSUtil->>VersionStore: get_version_derived_stream(path)
VersionStore->>Storage: stream derived bytes
Storage-->>VersionStore: Stream<Bytes>
VersionStore-->>FSUtil: Stream<Bytes>
FSUtil-->>Controller: Pin<Box<Stream>>
deactivate FSUtil
Controller-->>Client: streaming response (from stream)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
oxen-rust/src/lib/src/storage/s3.rs (1)
40-81: Fix bucket region resolution and header parsing ininit_clientTwo critical issues preventing compilation:
On a successful
get_bucket_locationcall, the response is ignored and the defaultregionstring is reused. For buckets in a different region, this causes misconfiguration and repeated redirects/failures. Extract and use the location constraint from the response instead.Header extraction uses
.map(str::to_owned)directly on the result ofheaders().get("x-amz-bucket-region"). This is a type mismatch—HeaderValuemust be converted to UTF‑8 via.to_str()first (which returnsResult<&str, ToStrError>). The correct chain is.and_then(|hv| hv.to_str().ok()).map(|s| s.to_owned()).
🧹 Nitpick comments (6)
oxen-rust/src/lib/src/core/v_latest/index/file_chunker.rs (1)
384-446: Remove commented-out code rather than leaving it in the codebase.The
FileChunkerstruct and its methods are not referenced anywhere else in the repository. Commented-out code should be removed entirely since version control preserves the history if it's needed later. Delete lines 384-446 instead of keeping them as comments.oxen-rust/src/lib/Cargo.toml (1)
71-71: Consider updating to image crate 0.25.9 or documenting the reason for version 0.25.2.Version 0.25.2 has no active security vulnerabilities (patched against RUSTSEC-2019-0014 and RUSTSEC-2020-0073), but version 0.25.9 is available as the latest stable release. If there's a specific reason to stay on 0.25.2, it would be helpful to document it. Otherwise, consider updating to benefit from bug fixes and improvements in the 7 intervening minor releases.
oxen-rust/src/server/src/controllers/versions.rs (1)
172-220: Tar streaming usesget_version_size+StreamReaderappropriatelyThe batch download writer now:
- fetches a stream via
get_version_stream,- derives tar header size via
get_version_size,- adapts the stream with
StreamReaderfortar.append.This is functionally correct; if backend latency becomes an issue, you could later optimize by reusing size metadata instead of issuing a separate size call per file.
oxen-rust/src/lib/src/storage/version_store.rs (1)
94-105: Update trait docs to match new derived + size APIsThe trait surface looks good, but the comments are now misleading:
store_version_derived’s docs still mention adataparameter, while the signature usesderived_imageandimage_buf.get_version_sizeis documented as “Get metadata of a version file” but now returns just the size.get_version_derived_streamis missing any explanation of when/why it should be used versusget_version_stream.Consider tightening these docs so backend implementors know exactly how to implement and use the new methods.
Also applies to: 158-163, 179-187
oxen-rust/src/lib/src/util/fs.rs (1)
94-121: Streaming resize helpers are solid; S3 cache reuse may need follow‑upThe new flow—
handle_image_resize➝detect_image_format_from_version➝resize_cache_image_version_store—correctly returns a stream, usesVersionStoreAPIs, and caches resized files on disk for the local backend viaresize_path.exists()+get_version_derived_stream.For S3, once
get_version_pathand related pieces are implemented, note that cache hits currently depend onresize_path.exists()on the local filesystem, while S3-derived objects are written only to the bucket. That means S3-backed resizes will always recompute and re-upload rather than reuse cached derived objects. You may want a backend-aware existence check (or direct use ofget_version_derived_stream) when enabling S3 resize caching.Also applies to: 1657-1686, 1689-1757
oxen-rust/src/lib/src/storage/s3.rs (1)
204-226: S3 read/stream implementations align with the VersionStore contract
store_version_derivedcorrectly uploads the providedimage_bufunder a key derived fromderived_path.get_version_size,get_version, andget_version_streamusehead_object/get_objectand adapt the body viaByteStreamAdapter, matching the async streaming expectations.get_version_derived_streammirrorsget_version_streamfor derived keys.One thing to keep in mind is that derived keys currently use
derived_path.to_string_lossy()directly, independent ofself.prefix. If you plan to list/prune derived objects later, you may want to standardize these keys under a well-defined prefix.Also applies to: 228-265, 267-287, 288-306
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
oxen-rust/src/lib/Cargo.tomloxen-rust/src/lib/src/core/v_latest/index/file_chunker.rsoxen-rust/src/lib/src/core/v_latest/prune.rsoxen-rust/src/lib/src/storage/local.rsoxen-rust/src/lib/src/storage/s3.rsoxen-rust/src/lib/src/storage/version_store.rsoxen-rust/src/lib/src/util/fs.rsoxen-rust/src/server/src/controllers/file.rsoxen-rust/src/server/src/controllers/versions.rsoxen-rust/src/server/src/controllers/workspaces/files.rs
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: Lilianxr
Repo: Oxen-AI/Oxen PR: 122
File: oxen-rust/src/server/src/controllers/file.rs:116-125
Timestamp: 2025-08-18T16:29:55.897Z
Learning: Range support (HTTP 206 Partial Content) was explicitly deemed unnecessary for the file streaming endpoints in oxen-rust/src/server/src/controllers/file.rs and related controllers. The full streaming approach using version_store.get_version_stream() is sufficient for their requirements.
📚 Learning: 2025-08-18T16:29:55.897Z
Learnt from: Lilianxr
Repo: Oxen-AI/Oxen PR: 122
File: oxen-rust/src/server/src/controllers/file.rs:116-125
Timestamp: 2025-08-18T16:29:55.897Z
Learning: Range support (HTTP 206 Partial Content) was explicitly deemed unnecessary for the file streaming endpoints in oxen-rust/src/server/src/controllers/file.rs and related controllers. The full streaming approach using version_store.get_version_stream() is sufficient for their requirements.
Applied to files:
oxen-rust/src/server/src/controllers/workspaces/files.rsoxen-rust/src/lib/src/core/v_latest/index/file_chunker.rsoxen-rust/src/lib/src/storage/local.rsoxen-rust/src/server/src/controllers/file.rsoxen-rust/src/lib/src/storage/version_store.rsoxen-rust/src/lib/src/storage/s3.rsoxen-rust/src/server/src/controllers/versions.rsoxen-rust/src/lib/src/util/fs.rs
📚 Learning: 2025-10-07T17:02:09.011Z
Learnt from: Lilianxr
Repo: Oxen-AI/Oxen PR: 150
File: oxen-rust/src/server/src/controllers/workspaces/files.rs:356-359
Timestamp: 2025-10-07T17:02:09.011Z
Learning: In Rust server applications handling file uploads, avoid reading entire files into memory using Vec<u8>. Instead, stream files directly to disk using tokio::fs::File, compute hashes incrementally with the Digest trait (e.g., sha2::Sha256), and use streaming decompression (e.g., GzDecoder with AsyncRead) for compressed uploads. This prevents memory exhaustion on large file uploads.
Applied to files:
oxen-rust/src/server/src/controllers/workspaces/files.rsoxen-rust/src/lib/src/core/v_latest/index/file_chunker.rsoxen-rust/src/lib/src/storage/local.rsoxen-rust/src/server/src/controllers/file.rsoxen-rust/src/lib/src/storage/version_store.rsoxen-rust/src/server/src/controllers/versions.rsoxen-rust/src/lib/src/util/fs.rs
📚 Learning: 2025-09-10T22:08:33.965Z
Learnt from: Lilianxr
Repo: Oxen-AI/Oxen PR: 142
File: oxen-rust/src/lib/src/core/v_latest/workspaces/files.rs:348-364
Timestamp: 2025-09-10T22:08:33.965Z
Learning: In the Oxen codebase, the `util::fs::path_relative_to_dir` function properly handles path sanitization to ensure paths stay within the specified workspace boundary, making additional canonicalization checks unnecessary for workspace-scoped operations.
Applied to files:
oxen-rust/src/server/src/controllers/workspaces/files.rs
📚 Learning: 2025-09-16T21:19:11.250Z
Learnt from: Lilianxr
Repo: Oxen-AI/Oxen PR: 146
File: oxen-rust/src/server/src/controllers/versions.rs:129-137
Timestamp: 2025-09-16T21:19:11.250Z
Learning: In oxen-rust/src/server/src/controllers/versions.rs batch_download endpoint, the batch size is limited to 10MB on the client side rather than having server-side payload size validation.
Applied to files:
oxen-rust/src/server/src/controllers/workspaces/files.rsoxen-rust/src/server/src/controllers/file.rsoxen-rust/src/server/src/controllers/versions.rs
📚 Learning: 2025-09-16T23:43:33.942Z
Learnt from: Lilianxr
Repo: Oxen-AI/Oxen PR: 146
File: oxen-rust/src/lib/src/api/client/versions.rs:235-276
Timestamp: 2025-09-16T23:43:33.942Z
Learning: In oxen-rust/src/lib/src/api/client/versions.rs batch_download functionality, the total payload size is limited to 10MB which provides protection against archive bomb attacks and resource exhaustion.
Applied to files:
oxen-rust/src/lib/src/storage/local.rsoxen-rust/src/server/src/controllers/versions.rs
🧬 Code graph analysis (6)
oxen-rust/src/server/src/controllers/workspaces/files.rs (2)
oxen-rust/src/lib/src/util/fs.rs (3)
handle_image_resize(94-121)version_path(152-161)version_path(2130-2153)oxen-rust/src/lib/src/storage/local.rs (1)
version_path(51-53)
oxen-rust/src/lib/src/storage/local.rs (1)
oxen-rust/src/lib/src/storage/version_store.rs (3)
store_version_derived(99-104)get_version_size(162-162)get_version_derived_stream(183-186)
oxen-rust/src/lib/src/core/v_latest/prune.rs (1)
oxen-rust/src/lib/src/model/repository/local_repository.rs (1)
version_store(83-88)
oxen-rust/src/lib/src/storage/version_store.rs (2)
oxen-rust/src/lib/src/storage/local.rs (3)
store_version_derived(124-139)get_version_size(141-145)get_version_derived_stream(166-176)oxen-rust/src/lib/src/storage/s3.rs (3)
store_version_derived(204-226)get_version_size(228-242)get_version_derived_stream(288-306)
oxen-rust/src/lib/src/storage/s3.rs (3)
oxen-rust/src/lib/src/error.rs (1)
basic_str(161-163)oxen-rust/src/lib/src/storage/local.rs (6)
new(37-41)store_version_derived(124-139)get_version_size(141-145)get_version(147-151)get_version_stream(153-164)get_version_derived_stream(166-176)oxen-rust/src/lib/src/storage/version_store.rs (5)
store_version_derived(99-104)get_version_size(162-162)get_version(168-168)get_version_stream(174-177)get_version_derived_stream(183-186)
oxen-rust/src/server/src/controllers/versions.rs (2)
oxen-rust/src/lib/src/util/fs.rs (3)
handle_image_resize(94-121)version_path(152-161)version_path(2130-2153)oxen-rust/src/lib/src/storage/local.rs (1)
version_path(51-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test Suite / Test Suite (ubuntu-latest)
- GitHub Check: Test Suite / Test Suite (windows-latest)
- GitHub Check: Test Suite / Test Suite (macos-latest)
- GitHub Check: Lint / Cargo check, format, clippy + Ruff
🔇 Additional comments (7)
oxen-rust/src/lib/src/core/v_latest/prune.rs (1)
296-299: Usingget_version_sizehere is appropriateReplacing metadata length with
version_store.get_version_sizekeeps bytes-freed accounting correct across backends while preserving the prior best-effort behavior on errors.oxen-rust/src/server/src/controllers/workspaces/files.rs (1)
136-159: Streaming image resize integration looks correctSwitching to
handle_image_resize(...).await?and returning.streaming(file_stream)keeps headers and MIME type intact while aligning with the new streaming resize API.oxen-rust/src/server/src/controllers/file.rs (1)
178-201: Download resize path correctly migrated to stream-based helperThe image-resize branch now delegates to
handle_image_resize(...).await?and streams the returned body, keeping MIME type andoxen-revision-idwhile avoiding extra file opens. This is consistent with other controllers.oxen-rust/src/server/src/controllers/versions.rs (2)
44-50: Metadata endpoint’s use ofget_version_sizeis consistentSwitching to
repo.version_store()?.get_version_size(&version_id).await?provides a backend-independent size while keeping the JSON response contract unchanged.
96-117: Image resize in versions download now correctly streams dataUsing
util::fs::handle_image_resize(...).await?and.streaming(file_stream)is aligned with the new resize API and keeps headers unchanged; the non-resize path still falls back toget_version_stream.oxen-rust/src/lib/src/storage/local.rs (1)
141-145: Size and derived-stream helpers look correct
get_version_sizeandget_version_derived_streamuse asyncfs::metadata/File::openplusReaderStream, which matches the trait’s expectations and keeps the local backend consistent with the streaming contract.Also applies to: 166-176
oxen-rust/src/lib/src/storage/s3.rs (1)
404-426:ByteStreamAdaptercorrectly adapts S3 bodies to the streaming interfaceWrapping
ByteStreamand mapping its items toResult<Bytes, io::Error>viapoll_nextis an appropriate way to integrate S3 responses with theStream-based VersionStore APIs.
| async fn store_version_derived( | ||
| &self, | ||
| hash: &str, | ||
| ) -> Result<Box<dyn ReadSeek + Send + Sync + 'static>, OxenError> { | ||
| let path = self.version_path(hash); | ||
| let file = std::fs::File::open(&path)?; | ||
| Ok(Box::new(file)) | ||
| derived_image: DynamicImage, | ||
| _image_buf: Vec<u8>, | ||
| derived_path: &Path, | ||
| ) -> Result<(), OxenError> { | ||
| let path = PathBuf::from(derived_path); | ||
| let derived_parent = path.parent().unwrap_or(Path::new("")); | ||
| if !derived_parent.exists() { | ||
| util::fs::create_dir_all(derived_parent)?; | ||
| } | ||
| derived_image.save(derived_path).unwrap(); | ||
| log::debug!("Saved derived version file {derived_path:?}"); | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, verify the file exists and check the exact code at the specified lines
cat -n oxen-rust/src/lib/src/storage/local.rs | sed -n '120,145p'Repository: Oxen-AI/Oxen
Length of output: 1006
🏁 Script executed:
# Check the OxenError type definition
rg "enum OxenError|impl.*OxenError" -A 5 --max-count=5Repository: Oxen-AI/Oxen
Length of output: 45859
🏁 Script executed:
# Search for similar error handling patterns with image.save() in the codebase
rg "\.save\(" -B 2 -A 2 --max-count=10Repository: Oxen-AI/Oxen
Length of output: 12987
🏁 Script executed:
# Check how errors are typically converted to OxenError in this file
rg "map_err|OxenError" oxen-rust/src/lib/src/storage/local.rs -B 1 -A 1 | head -50Repository: Oxen-AI/Oxen
Length of output: 1767
Replace unwrap() with proper error handling in store_version_derived
Line 135 contains derived_image.save(derived_path).unwrap(); which will panic on any IO or encoding error (e.g., disk full, permission denied), crashing the process. The method returns Result<(), OxenError> and already propagates errors elsewhere (line 133), so the error should be converted to OxenError:
- derived_image.save(derived_path).unwrap();
+ derived_image
+ .save(derived_path)
+ .map_err(|e| OxenError::basic_str(format!(
+ "Failed to save derived version file {derived_path:?}: {e}"
+ )))?;This pattern is already used elsewhere in the codebase for similar image save operations (e.g., util/fs.rs).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async fn store_version_derived( | |
| &self, | |
| hash: &str, | |
| ) -> Result<Box<dyn ReadSeek + Send + Sync + 'static>, OxenError> { | |
| let path = self.version_path(hash); | |
| let file = std::fs::File::open(&path)?; | |
| Ok(Box::new(file)) | |
| derived_image: DynamicImage, | |
| _image_buf: Vec<u8>, | |
| derived_path: &Path, | |
| ) -> Result<(), OxenError> { | |
| let path = PathBuf::from(derived_path); | |
| let derived_parent = path.parent().unwrap_or(Path::new("")); | |
| if !derived_parent.exists() { | |
| util::fs::create_dir_all(derived_parent)?; | |
| } | |
| derived_image.save(derived_path).unwrap(); | |
| log::debug!("Saved derived version file {derived_path:?}"); | |
| Ok(()) | |
| } | |
| async fn store_version_derived( | |
| &self, | |
| derived_image: DynamicImage, | |
| _image_buf: Vec<u8>, | |
| derived_path: &Path, | |
| ) -> Result<(), OxenError> { | |
| let path = PathBuf::from(derived_path); | |
| let derived_parent = path.parent().unwrap_or(Path::new("")); | |
| if !derived_parent.exists() { | |
| util::fs::create_dir_all(derived_parent)?; | |
| } | |
| derived_image | |
| .save(derived_path) | |
| .map_err(|e| OxenError::basic_str(format!( | |
| "Failed to save derived version file {derived_path:?}: {e}" | |
| )))?; | |
| log::debug!("Saved derived version file {derived_path:?}"); | |
| Ok(()) | |
| } |
🤖 Prompt for AI Agents
In oxen-rust/src/lib/src/storage/local.rs around lines 124 to 139, replace the
panic-inducing unwrap on derived_image.save(derived_path).unwrap() with proper
error propagation: call save and map any image IO/encoding error into an
OxenError (or use the ? operator after converting) so the function returns
Err(OxenError::from(...)) rather than panicking; follow the existing project
pattern for image save error handling (use map_err or a helper in util::fs to
convert the error) and keep the debug log only after a successful save.
Summary by CodeRabbit
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.