improve runtime performance and binary size of RFC 2822 printer #460

BurntSushi · 2025-12-26T01:51:29Z

This takes some ideas from @hanna-kruppe in #373 to decrease
code bloat in Jiff's various printers. In this PR, we just take
the first bite: we improve the RFC 2822 printer by using a new
abstraction for writing to uninitialized memory. The design of
this abstraction takes a lot of inspiration from the unstable
std::io::BorrowedBuf from the standard library.

For binary size, I used this program as my benchmark:

use jiff::{
    fmt::{rfc2822, StdIoWrite},
    Timestamp,
};

static PRINTER: rfc2822::DateTimePrinter = rfc2822::DateTimePrinter::new();

fn main() -> anyhow::Result<()> {
    let ts = Timestamp::MAX;

    let mut buf = String::with_capacity(4);
    PRINTER.print_timestamp(&ts, &mut buf)?;
    println!("{buf}, {}", buf.len());

    PRINTER.print_timestamp(&ts, StdIoWrite(std::io::stdout()))?;
    println!();

    Ok(())
}

I then defined a release-lto Cargo profile that sets lto = "fat"
and used cargo llvm-lines --profile release-lto to measure the number
of LLVM lines emitted. For this particular program, this changes
reduces the number of LLVM lines by about 2,000.

For runtime performance, this PR introduces some new RFC 2822 printing
benchmarks. We compare against the chrono and time crates, which also
provided RFC 2822 printers.

$ critcmp base x01 -f print
group                             base                                   pr
-----                             ----                                   --
print/rfc2822/buffer/chrono       1.00     62.4±1.01ns        ? ?/sec
print/rfc2822/buffer/jiff         3.47     51.7±0.08ns        ? ?/sec    1.00     14.9±0.06ns        ? ?/sec
print/rfc2822/buffer/time         1.00     22.4±0.23ns        ? ?/sec
print/rfc2822/to_string/chrono    1.00    107.2±0.38ns        ? ?/sec
print/rfc2822/to_string/jiff      3.99     81.8±0.38ns        ? ?/sec    1.00     20.5±0.05ns        ? ?/sec
print/rfc2822/to_string/time      1.00     72.3±0.75ns        ? ?/sec

We started off better than chrono, but quite a bit worse than time.
But with this PR, we're not only faster than time, but our "create
a new String allocation" API is as fast (or a hair faster) than
time's "write into caller provided &mut String" API. Which is...
somewhat surprising.

There are a few reasons, from my perspective, for the improvement here.

For binary size, this switches the bulk of the RFC 2822 printer
to be completely monomorphic. We're no longer generic over a
jiff::fmt::Write implementation internally, so there's no reason to
generate multiple copies.
Also for binary size, writing directly to an uninitialized buffer
with no code needing to handle expansion generates much tigher code
than what we had. Moreover, we specialize some forms of integer
printing which I think also helps.
For runtime performance, it is now possible for us to write to an
uninitialized buffer on the stack and then copy that data to the
provided jiff::fmt::Write implementation once printing is done.
But even with this second write, the code is so much tigher with the
uninitialized buffer and the sizes so small, that this is still a net
win.
For runtime performance, when the alloc feature is enabled, we will
try to get the jiff::fmt::Write implementation as a &mut Vec<u8>.
Then we can expand its capacity as needed and write directly into its
spare capacity instead of writing to a stack buffer and then copying it
to the jiff::fmt::Write implementation generically.
For runtime performance, we add routines like write_int_pad4 that
specializes integer formatting for values in the range 0..=9999. This
lets us do formatting with less work than would be needed to support a
non-padded generic implementation for any integer.

Another thing that maybe helps is that there are far fewer error branches
in the core printing code.

The main downside here is that we need to futz with uninitialized
memory which increases the risk of undefined behavior. I've added Miri
tests for the new jiff::fmt::buffer module to CI to help mitigate
this risk. Another mitigation is that the abstraction exposes an entirely
safe API.

Given the scale of the improvements here, I plan to continue using this
same technique in Jiff's other printers.

We're going to be doing some surgery on the RFC 2822 printer. Mostly motivated by decreasing binary size. But we should be able to optimize runtime as well. The printer is actually already pretty slow compared to `time`: ``` $ critcmp base -g '(.*)/(?:humantime|jiff|chrono|time)$' group base//chrono base//jiff base//time ----- ------------ ---------- ---------- parse/rfc2822 3.08 60.1±0.78ns ? ?/sec 1.00 19.5±0.29ns ? ?/sec 3.73 72.6±0.83ns ? ?/sec print/rfc2822 2.94 65.4±0.57ns ? ?/sec 2.34 52.1±0.42ns ? ?/sec 1.00 22.2±0.22ns ? ?/sec ``` So hopefully we can fix that as well.

We'll use this to overhaul the RFC 2822 printer in a subsequent commit.

The printers can use this method to write directly into the `Vec<u8>`'s spare capacity, instead of needing to write to our new uninitialized buffer and then copy the data via the `jiff::fmt::Write` interface. Using the uninitialized buffer without this is still a dramatic improvement. But this helps a bit more with minimal impact on code size.

…ction This uses the new code added in the previous two commits. There should be no behavior changes here. We're just changing the implementation to write directly into uninitialized data. Whether it's a fixed size buffer on the stack or directly into the spare capacity of a `String` or a `Vec<u8>`. Note that this also adds a new error case to the RFC 2822 printer: when rounding the offset would result in an out-of-bounds offset, we now return an error. Previously, we would print an offset that Jiff would then later fail to parse.

Kind of interesting that this doesn't appear on newer versions of Rust. I think it's technically correct?

BurntSushi force-pushed the ag/buffer-experiment-rfc2822 branch from 227e36f to 9670e8d Compare December 26, 2025 01:58

BurntSushi added 2 commits December 26, 2025 09:45

fmt: add new uninitialized buffer abstraction

f24ba4f

We'll use this to overhaul the RFC 2822 printer in a subsequent commit.

BurntSushi force-pushed the ag/buffer-experiment-rfc2822 branch from 9670e8d to 92554af Compare December 26, 2025 15:16

BurntSushi added 2 commits December 26, 2025 10:22

error: squash warning on older versions of Rust

867ea6d

Kind of interesting that this doesn't appear on newer versions of Rust. I think it's technically correct?

BurntSushi force-pushed the ag/buffer-experiment-rfc2822 branch from 92554af to 867ea6d Compare December 26, 2025 15:23

BurntSushi merged commit 25c9b33 into master Dec 26, 2025
40 checks passed

BurntSushi deleted the ag/buffer-experiment-rfc2822 branch December 26, 2025 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

improve runtime performance and binary size of RFC 2822 printer #460

improve runtime performance and binary size of RFC 2822 printer #460

BurntSushi commented Dec 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

improve runtime performance and binary size of RFC 2822 printer #460

improve runtime performance and binary size of RFC 2822 printer #460

Conversation

BurntSushi commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BurntSushi commented Dec 26, 2025 •

edited

Loading