Skip to content

src: use simdutf for one-byte string UTF-8 write in stringBytes#61696

Open
mertcanaltin wants to merge 1 commit intonodejs:mainfrom
mertcanaltin:mert/simdutf-write-utf8
Open

src: use simdutf for one-byte string UTF-8 write in stringBytes#61696
mertcanaltin wants to merge 1 commit intonodejs:mainfrom
mertcanaltin:mert/simdutf-write-utf8

Conversation

@mertcanaltin
Copy link
Member

I use simdutf::convert_latin1_to_utf8_safe for one-byte strings in StringBytes::Write UTF-8 path instead of V8's
WriteUtf8V2.

buffer.write(str, 'utf8') 577% faster

benchmark results:

➜  node git:(mert/simdutf-write-utf8) ✗ node-benchmark-compare ./result.csv
                                                                                          confidence improvement accuracy (*)    (**)   (***)
buffers/buffer-from.js n=800000 len=100 source='array'                                                    0.57 %       ±1.29%  ±1.76%  ±2.40%
buffers/buffer-from.js n=800000 len=100 source='arraybuffer-middle'                                       0.40 %       ±3.23%  ±4.42%  ±6.03%
buffers/buffer-from.js n=800000 len=100 source='arraybuffer'                                             -0.22 %       ±2.94%  ±4.09%  ±5.70%
buffers/buffer-from.js n=800000 len=100 source='buffer'                                                  -0.20 %       ±2.55%  ±3.51%  ±4.84%
buffers/buffer-from.js n=800000 len=100 source='object'                                                  -1.26 %       ±3.31%  ±4.65%  ±6.60%
buffers/buffer-from.js n=800000 len=100 source='string-base64'                                           -0.62 %       ±1.37%  ±1.88%  ±2.57%
buffers/buffer-from.js n=800000 len=100 source='string-utf8'                                              1.73 %       ±2.03%  ±2.86%  ±4.08%
buffers/buffer-from.js n=800000 len=100 source='string'                                                   1.13 %       ±1.52%  ±2.09%  ±2.86%
buffers/buffer-from.js n=800000 len=100 source='uint16array'                                              0.54 %       ±1.54%  ±2.12%  ±2.89%
buffers/buffer-from.js n=800000 len=100 source='uint8array'                                              -0.18 %       ±1.62%  ±2.23%  ±3.06%
buffers/buffer-from.js n=800000 len=2048 source='array'                                                   0.86 %       ±3.64%  ±5.03%  ±6.96%
buffers/buffer-from.js n=800000 len=2048 source='arraybuffer-middle'                                      0.41 %       ±3.35%  ±4.62%  ±6.36%
buffers/buffer-from.js n=800000 len=2048 source='arraybuffer'                                            -5.09 %      ±10.56% ±15.15% ±22.24%
buffers/buffer-from.js n=800000 len=2048 source='buffer'                                          **      1.53 %       ±0.93%  ±1.28%  ±1.75%
buffers/buffer-from.js n=800000 len=2048 source='object'                                                  0.34 %       ±1.59%  ±2.19%  ±3.01%
buffers/buffer-from.js n=800000 len=2048 source='string-base64'                                          -0.41 %       ±0.88%  ±1.21%  ±1.65%
buffers/buffer-from.js n=800000 len=2048 source='string-utf8'                                    ***      5.89 %       ±1.13%  ±1.57%  ±2.17%
buffers/buffer-from.js n=800000 len=2048 source='string'                                         ***      6.34 %       ±1.27%  ±1.74%  ±2.38%
buffers/buffer-from.js n=800000 len=2048 source='uint16array'                                             0.48 %       ±1.03%  ±1.41%  ±1.93%
buffers/buffer-from.js n=800000 len=2048 source='uint8array'                                              0.78 %       ±1.56%  ±2.16%  ±2.98%
buffers/buffer-write-string-short.js n=1000000 len=1 encoding='ascii'                                     0.85 %       ±1.38%  ±1.89%  ±2.58%
buffers/buffer-write-string-short.js n=1000000 len=1 encoding='latin1'                             *      1.83 %       ±1.66%  ±2.30%  ±3.17%
buffers/buffer-write-string-short.js n=1000000 len=1 encoding='utf8'                               *      1.88 %       ±1.60%  ±2.21%  ±3.04%
buffers/buffer-write-string-short.js n=1000000 len=16 encoding='ascii'                                    0.18 %       ±1.45%  ±1.99%  ±2.74%
buffers/buffer-write-string-short.js n=1000000 len=16 encoding='latin1'                                   7.48 %      ±15.46% ±22.20% ±32.63%
buffers/buffer-write-string-short.js n=1000000 len=16 encoding='utf8'                                     4.34 %       ±7.11% ±10.18% ±14.89%
buffers/buffer-write-string-short.js n=1000000 len=32 encoding='ascii'                                    1.44 %       ±1.82%  ±2.50%  ±3.42%
buffers/buffer-write-string-short.js n=1000000 len=32 encoding='latin1'                                   0.81 %       ±1.28%  ±1.76%  ±2.42%
buffers/buffer-write-string-short.js n=1000000 len=32 encoding='utf8'                                     0.43 %       ±2.93%  ±4.03%  ±5.50%
buffers/buffer-write-string-short.js n=1000000 len=8 encoding='ascii'                                     0.22 %       ±1.10%  ±1.51%  ±2.07%
buffers/buffer-write-string-short.js n=1000000 len=8 encoding='latin1'                                    0.92 %       ±1.86%  ±2.60%  ±3.67%
buffers/buffer-write-string-short.js n=1000000 len=8 encoding='utf8'                                      0.41 %       ±2.33%  ±3.20%  ±4.36%
buffers/buffer-write-string.js n=1000000 len=2048 args='' encoding=''                            ***    577.07 %       ±6.13%  ±8.80% ±12.93%
buffers/buffer-write-string.js n=1000000 len=2048 args='' encoding='ascii'                                1.27 %       ±1.34%  ±1.83%  ±2.50%
buffers/buffer-write-string.js n=1000000 len=2048 args='' encoding='hex'                                 -0.36 %       ±0.59%  ±0.83%  ±1.19%
buffers/buffer-write-string.js n=1000000 len=2048 args='' encoding='latin1'                               0.90 %       ±1.22%  ±1.67%  ±2.30%
buffers/buffer-write-string.js n=1000000 len=2048 args='' encoding='utf16le'                       *     -1.33 %       ±1.09%  ±1.50%  ±2.05%
buffers/buffer-write-string.js n=1000000 len=2048 args='' encoding='utf8'                        ***    573.78 %       ±5.59%  ±7.94% ±11.44%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset' encoding=''                      ***     12.86 %       ±1.85%  ±2.57%  ±3.59%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset' encoding='ascii'                         -2.36 %       ±4.06%  ±5.77%  ±8.35%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset' encoding='hex'                           -0.33 %       ±0.64%  ±0.88%  ±1.21%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset' encoding='latin1'                         0.67 %       ±1.75%  ±2.40%  ±3.27%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset' encoding='utf16le'                       -0.39 %       ±0.97%  ±1.33%  ±1.81%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset' encoding='utf8'                           0.12 %      ±10.16% ±14.56% ±21.33%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset+length' encoding=''               ***      9.47 %       ±1.17%  ±1.60%  ±2.18%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset+length' encoding='ascii'           **      1.49 %       ±1.01%  ±1.38%  ±1.88%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset+length' encoding='hex'                     0.52 %       ±1.73%  ±2.47%  ±3.57%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset+length' encoding='latin1'          **      1.91 %       ±1.15%  ±1.60%  ±2.22%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset+length' encoding='utf16le'                 0.45 %       ±1.07%  ±1.47%  ±2.03%
buffers/buffer-write-string.js n=1000000 len=2048 args='offset+length' encoding='utf8'           ***      8.09 %       ±1.04%  ±1.43%  ±1.96%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 50 comparisons, you can thus expect the following amount of false-positive results:
  2.50 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.50 false positives, when considering a   1% risk acceptance (**, ***),
  0.05 false positives, when considering a 0.1% risk acceptance (***)

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Feb 5, 2026
@mertcanaltin
Copy link
Member Author

@nodejs/performance ❤️

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Feb 5, 2026
@anonrig
Copy link
Member

anonrig commented Feb 5, 2026

@anonrig anonrig added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 5, 2026
@anonrig anonrig requested review from addaleax and lemire February 5, 2026 19:18
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 5, 2026
@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig added the commit-queue Add this label to land a pull request using GitHub Actions. label Feb 5, 2026
Copy link
Member

@ChALkeR ChALkeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking, can go into a follow-up: this logic likely can be improved further or even unified with #61496

Copy link
Member

@gurgunday gurgunday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.76%. Comparing base (e7d6728) to head (7f66191).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #61696      +/-   ##
==========================================
+ Coverage   89.74%   89.76%   +0.01%     
==========================================
  Files         674      674              
  Lines      204395   204423      +28     
  Branches    39274    39283       +9     
==========================================
+ Hits       183427   183492      +65     
+ Misses      13275    13222      -53     
- Partials     7693     7709      +16     
Files with missing lines Coverage Δ
src/string_bytes.cc 69.66% <100.00%> (+0.65%) ⬆️

... and 44 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author ready PRs that have at least one approval, no pending requests for changes, and a CI started. buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue Add this label to land a pull request using GitHub Actions. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants