Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimized textwidth(::Char) for ASCII #55398

Merged
merged 9 commits into from
Aug 7, 2024
Merged

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Aug 6, 2024

For the common case of ASCII characters, I found in https://github.com/JuliaLang/julia/pull/55351/files#r1705746669 that textwidth can be sped up by 60% 2.7x or more by adding a fast-path ASCII lookup table hard-coded rule.

The slowdown for non-ASCII characters seems negligible (1% or less). Here, textwidth2 is the new function, and textwidth is the old function (Julia 1.10.4, Apple M1 Pro):

julia> @btime textwidth('x'); @btime textwidth2('x')
  3.167 ns (0 allocations: 0 bytes)
  1.166 ns (0 allocations: 0 bytes)
1

julia> @btime textwidth('α'); @btime textwidth2('α')
  3.208 ns (0 allocations: 0 bytes)
  3.166 ns (0 allocations: 0 bytes)
1

@stevengj stevengj added performance Must go faster unicode Related to unicode characters and encodings labels Aug 6, 2024
@jakobnissen
Copy link
Contributor

Can it be faster if the const lookup table is a Tuple? The reason is that const-ness at the moment only refers to the binding, so a const vector will do a triple load (load the underlying memoryref, then load the pointer, then load the offset) whereas an immutable Tuple might generate better code.

@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2024

@jakobnissen, good catch, a tuple table indeed seems to be faster:

julia> @btime textwidth('x'); @btime textwidth2('x'); @btime textwidth3('x');
  3.166 ns (0 allocations: 0 bytes)
  1.958 ns (0 allocations: 0 bytes)
  1.125 ns (0 allocations: 0 bytes)

(textwidth is the old version, textwidth2 is the array version, and textwidth3 is the tuple version)

@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2024

Another possibility would be to omit the table entirely and replace it with a couple of if statements, since the table is quite simple right now:

(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0)

Update it looks like

b < 0x7f && return Int(b >= 0x20);

is about the same speed as the tuple lookup table, maybe 1% faster, and is simpler, so I'll switch to this. (It omits a fast path for the ASCII U+007f "DEL" character, which has width 0, but having that odd character being slightly slower doesn't seem like a big deal to me).

@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2024

Looks like an unrelated failure in the REPL tests?

@stevengj stevengj added the merge me PR is reviewed. Merge when all tests are passing label Aug 7, 2024
@IanButterworth IanButterworth merged commit b43e247 into master Aug 7, 2024
8 checks passed
@IanButterworth IanButterworth deleted the ascii_textwidth branch August 7, 2024 11:48
@oscardssmith oscardssmith removed the merge me PR is reviewed. Merge when all tests are passing label Aug 7, 2024
lazarusA pushed a commit to lazarusA/julia that referenced this pull request Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants