Skip to content

Conversation

@BurntSushi
Copy link
Owner

@BurntSushi BurntSushi commented Dec 26, 2025

This takes some ideas from @hanna-kruppe in #373 to decrease
code bloat in Jiff's various printers. In this PR, we just take
the first bite: we improve the RFC 2822 printer by using a new
abstraction for writing to uninitialized memory. The design of
this abstraction takes a lot of inspiration from the unstable
std::io::BorrowedBuf from the standard library.

For binary size, I used this program as my benchmark:

use jiff::{
    fmt::{rfc2822, StdIoWrite},
    Timestamp,
};

static PRINTER: rfc2822::DateTimePrinter = rfc2822::DateTimePrinter::new();

fn main() -> anyhow::Result<()> {
    let ts = Timestamp::MAX;

    let mut buf = String::with_capacity(4);
    PRINTER.print_timestamp(&ts, &mut buf)?;
    println!("{buf}, {}", buf.len());

    PRINTER.print_timestamp(&ts, StdIoWrite(std::io::stdout()))?;
    println!();

    Ok(())
}

I then defined a release-lto Cargo profile that sets lto = "fat"
and used cargo llvm-lines --profile release-lto to measure the number
of LLVM lines emitted. For this particular program, this changes
reduces the number of LLVM lines by about 2,000.

For runtime performance, this PR introduces some new RFC 2822 printing
benchmarks. We compare against the chrono and time crates, which also
provided RFC 2822 printers.

$ critcmp base x01 -f print
group                             base                                   pr
-----                             ----                                   --
print/rfc2822/buffer/chrono       1.00     62.4±1.01ns        ? ?/sec
print/rfc2822/buffer/jiff         3.47     51.7±0.08ns        ? ?/sec    1.00     14.9±0.06ns        ? ?/sec
print/rfc2822/buffer/time         1.00     22.4±0.23ns        ? ?/sec
print/rfc2822/to_string/chrono    1.00    107.2±0.38ns        ? ?/sec
print/rfc2822/to_string/jiff      3.99     81.8±0.38ns        ? ?/sec    1.00     20.5±0.05ns        ? ?/sec
print/rfc2822/to_string/time      1.00     72.3±0.75ns        ? ?/sec

We started off better than chrono, but quite a bit worse than time.
But with this PR, we're not only faster than time, but our "create
a new String allocation" API is as fast (or a hair faster) than
time's "write into caller provided &mut String" API. Which is...
somewhat surprising.

There are a few reasons, from my perspective, for the improvement here.

  • For binary size, this switches the bulk of the RFC 2822 printer
    to be completely monomorphic. We're no longer generic over a
    jiff::fmt::Write implementation internally, so there's no reason to
    generate multiple copies.
  • Also for binary size, writing directly to an uninitialized buffer
    with no code needing to handle expansion generates much tigher code
    than what we had. Moreover, we specialize some forms of integer
    printing which I think also helps.
  • For runtime performance, it is now possible for us to write to an
    uninitialized buffer on the stack and then copy that data to the
    provided jiff::fmt::Write implementation once printing is done.
    But even with this second write, the code is so much tigher with the
    uninitialized buffer and the sizes so small, that this is still a net
    win.
  • For runtime performance, when the alloc feature is enabled, we will
    try to get the jiff::fmt::Write implementation as a &mut Vec<u8>.
    Then we can expand its capacity as needed and write directly into its
    spare capacity instead of writing to a stack buffer and then copying it
    to the jiff::fmt::Write implementation generically.
  • For runtime performance, we add routines like write_int_pad4 that
    specializes integer formatting for values in the range 0..=9999. This
    lets us do formatting with less work than would be needed to support a
    non-padded generic implementation for any integer.

Another thing that maybe helps is that there are far fewer error branches
in the core printing code.

The main downside here is that we need to futz with uninitialized
memory which increases the risk of undefined behavior. I've added Miri
tests for the new jiff::fmt::buffer module to CI to help mitigate
this risk. Another mitigation is that the abstraction exposes an entirely
safe API.

Given the scale of the improvements here, I plan to continue using this
same technique in Jiff's other printers.

We're going to be doing some surgery on the RFC 2822 printer. Mostly
motivated by decreasing binary size. But we should be able to optimize
runtime as well.

The printer is actually already pretty slow compared to `time`:

```
$ critcmp base -g '(.*)/(?:humantime|jiff|chrono|time)$'
group            base//chrono                           base//jiff                             base//time
-----            ------------                           ----------                             ----------
parse/rfc2822    3.08     60.1±0.78ns        ? ?/sec    1.00     19.5±0.29ns        ? ?/sec    3.73     72.6±0.83ns        ? ?/sec
print/rfc2822    2.94     65.4±0.57ns        ? ?/sec    2.34     52.1±0.42ns        ? ?/sec    1.00     22.2±0.22ns        ? ?/sec
```

So hopefully we can fix that as well.
@BurntSushi BurntSushi force-pushed the ag/buffer-experiment-rfc2822 branch from 227e36f to 9670e8d Compare December 26, 2025 01:58
We'll use this to overhaul the RFC 2822 printer in a
subsequent commit.
The printers can use this method to write directly into the
`Vec<u8>`'s spare capacity, instead of needing to write to our
new uninitialized buffer and then copy the data via the
`jiff::fmt::Write` interface.

Using the uninitialized buffer without this is still a dramatic
improvement. But this helps a bit more with minimal impact on
code size.
@BurntSushi BurntSushi force-pushed the ag/buffer-experiment-rfc2822 branch from 9670e8d to 92554af Compare December 26, 2025 15:16
…ction

This uses the new code added in the previous two commits.

There should be no behavior changes here. We're just changing
the implementation to write directly into uninitialized data.
Whether it's a fixed size buffer on the stack or directly into
the spare capacity of a `String` or a `Vec<u8>`.

Note that this also adds a new error case to the RFC 2822 printer: when
rounding the offset would result in an out-of-bounds offset, we now
return an error. Previously, we would print an offset that Jiff would
then later fail to parse.
Kind of interesting that this doesn't appear on newer
versions of Rust. I think it's technically correct?
@BurntSushi BurntSushi force-pushed the ag/buffer-experiment-rfc2822 branch from 92554af to 867ea6d Compare December 26, 2025 15:23
@BurntSushi BurntSushi merged commit 25c9b33 into master Dec 26, 2025
40 checks passed
@BurntSushi BurntSushi deleted the ag/buffer-experiment-rfc2822 branch December 26, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants