improve runtime performance and binary size of RFC 2822 printer #460
+876
−69
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This takes some ideas from @hanna-kruppe in #373 to decrease
code bloat in Jiff's various printers. In this PR, we just take
the first bite: we improve the RFC 2822 printer by using a new
abstraction for writing to uninitialized memory. The design of
this abstraction takes a lot of inspiration from the unstable
std::io::BorrowedBuffrom the standard library.For binary size, I used this program as my benchmark:
I then defined a
release-ltoCargo profile that setslto = "fat"and used
cargo llvm-lines --profile release-ltoto measure the numberof LLVM lines emitted. For this particular program, this changes
reduces the number of LLVM lines by about 2,000.
For runtime performance, this PR introduces some new RFC 2822 printing
benchmarks. We compare against the
chronoandtimecrates, which alsoprovided RFC 2822 printers.
We started off better than
chrono, but quite a bit worse thantime.But with this PR, we're not only faster than
time, but our "createa new
Stringallocation" API is as fast (or a hair faster) thantime's "write into caller provided&mut String" API. Which is...somewhat surprising.
There are a few reasons, from my perspective, for the improvement here.
to be completely monomorphic. We're no longer generic over a
jiff::fmt::Writeimplementation internally, so there's no reason togenerate multiple copies.
with no code needing to handle expansion generates much tigher code
than what we had. Moreover, we specialize some forms of integer
printing which I think also helps.
uninitialized buffer on the stack and then copy that data to the
provided
jiff::fmt::Writeimplementation once printing is done.But even with this second write, the code is so much tigher with the
uninitialized buffer and the sizes so small, that this is still a net
win.
allocfeature is enabled, we willtry to get the
jiff::fmt::Writeimplementation as a&mut Vec<u8>.Then we can expand its capacity as needed and write directly into its
spare capacity instead of writing to a stack buffer and then copying it
to the
jiff::fmt::Writeimplementation generically.write_int_pad4thatspecializes integer formatting for values in the range
0..=9999. Thislets us do formatting with less work than would be needed to support a
non-padded generic implementation for any integer.
Another thing that maybe helps is that there are far fewer error branches
in the core printing code.
The main downside here is that we need to futz with uninitialized
memory which increases the risk of undefined behavior. I've added Miri
tests for the new
jiff::fmt::buffermodule to CI to help mitigatethis risk. Another mitigation is that the abstraction exposes an entirely
safe API.
Given the scale of the improvements here, I plan to continue using this
same technique in Jiff's other printers.