Skip to content

Conversation

@Ekrekr
Copy link

@Ekrekr Ekrekr commented Aug 23, 2024

Work in progress - I haven't yet come up with a way to efficiently fork a new message directly within the contigious memory space, for length delimited records.

Efficiency optimisations through avoiding copying of array buffers, by writing directly to an expanding array buffer with dynamic contiguous memory space.

TODO: add performance tests and flamegraphs once working.

@CLAassistant
Copy link

CLAassistant commented Aug 23, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Elias Kassell seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Ekrekr Ekrekr changed the title Improve Binary Encode Performance (WIP) Improve Binary Encode Performance Aug 23, 2024
Copy link
Member

@timostamm timostamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, Elias!

Unfortunately, resizable ArrayBuffers are relatively new. We'll have to wait until they are more widely available before we can use them.

I left a couple of informative comments.

Comment on lines +195 to +197
// NodeJS strings are by default UTF-8, so we can assume the byte length as the length of
// the string.
const valueBytesLength = value.length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this won't work. Strings are UTF-16, and length returns the number of code units.

if (opts.writeUnknownFields) {
for (const { no, wireType, data } of msg.getUnknown() ?? []) {
writer.tag(no, wireType).raw(data);
writer.tag(no, wireType).bytes(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bytes method prefixes the length. The change will corrupt data, and needs to be reverted.

writeScalarValue(writer, scalarType, item as ScalarValue);
}
writer.join();
writer.tag(field.number, WireType.LengthDelimited);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fork() and join() will take everything between, and write it length-prefixed. The tag needs to come first, then the length prefix, then the data.

break;
}
writer.join();
writer.tag(field.number, WireType.LengthDelimited);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as L178.

Comment on lines +144 to +146
// TODO(ekrekr): this is really slow, because it has to allocate a whole new array buffer.
// Instead we should be writing the message directly to the original arraybuffer, then inserting
// the length beforehand.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Unfortunately, varints have variable length, so this isn't straight-forward.

/**
* Encode UTF-8 text to an existing binary.
*/
encodeInto: (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a breaking change to add a mandatory property to the interface, so we'll have to find other means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants