Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ultimate String Guide #7

Merged
merged 17 commits into from
May 12, 2024
Merged

Add Ultimate String Guide #7

merged 17 commits into from
May 12, 2024

Conversation

hasufell
Copy link
Owner

@hasufell hasufell commented May 8, 2024

TODO:

Rendered: https://github.com/hasufell/hasufell.github.io/blob/strings/posts/2024-05-07-ultimate-string-guide.md
(The html color codes in the UTF-8 table do not show)


@Bodigrim this isn't completely finished yet, but I would appreciate a review regardless.

Pinging a couple of other folks if they want to drop some comments: @tomjaguarpaw @mpilgrem @angerman @Ericson2314 @bgamari

@hasufell hasufell marked this pull request as draft May 8, 2024 16:56
@mpilgrem
Copy link

mpilgrem commented May 8, 2024

@hasufell, looks good! A couple of thoughts:

  • you might acknowledge the existence of the Haddock documentation in Data.Char (which is more elaborate than the Haskell Report) and in Data.String (which does explain the performance considerations);

  • you may recall that one thing I was stuck on was filepaths and FromJSON instances. Could an 'ultimate' guide capture your advice there?

Copy link

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

posts/2024-05-07-ultimate-string-guide.md Show resolved Hide resolved
posts/2024-05-07-ultimate-string-guide.md Outdated Show resolved Hide resolved
posts/2024-05-07-ultimate-string-guide.md Outdated Show resolved Hide resolved
@Bodigrim
Copy link

Bodigrim commented May 9, 2024

For an ultimate guide it would be great to mention:

  • byteslice aka unpinned ByteString aka sliceable ShortByteString.
  • text-short which is like ShortByteString but for Text (used in aeson IIRC).

@hasufell
Copy link
Owner Author

hasufell commented May 10, 2024

@Bodigrim now I'm actually wondering what is the advantage of ShortByteString over Bytes?

From what I understand:

  • slightly more space efficient: takes up two Int records less (seems quite minor)
  • since it doesn't do slicing, everything is effectively a copy (tail, init, splitAt,. ..), allowing GC to clean up early

The latter is a bit of a weak argument though, isn't it, given that you can just clone Bytes back to e.g. a ShortByteString.

Did I miss anything?

@hasufell
Copy link
Owner Author

I'm also trying to figure out the exact memory overheads of all types. But I'm not very deep in that topic: https://gist.github.com/hasufell/61ca8ef438cc912e7446bcc7b1f25028

@Bodigrim
Copy link

@Bodigrim now I'm actually wondering what is the advantage of ShortByteString over Bytes?

On a modern CPU memcpy is extremely fast, so it could be faster just to copy data than manipulate two Int fields with offset and length. Also a bit less indirection in runtime, a bit less memory pressure (which is useful to fit into CPU cache).

I'm also trying to figure out the exact memory overheads of all types. But I'm not very deep in that topic: https://gist.github.com/hasufell/61ca8ef438cc912e7446bcc7b1f25028

I wonder if https://hackage.haskell.org/package/weigh-0.0.17/docs/Weigh.html could be of any help. Also maybe try ghc-vis (but it probably does not account for {-# UNPACK #-}, see below).

For lists we have list constructor tag (essentially, (:)), then field with a pointer to Char, then field with a pointer to the rest of the list, which is 3 words. Char takes 2 words: one for constructor C# and another for unboxed Char#. Altogether 5 words per character. Empty list object is shared by all lists in the program, so effectively it does not contribute.

For Text bear in mind that {-# UNPACK #-} of a monomorphic strict field kinda "inlines" it, so there is no constructor tag.

@hasufell hasufell marked this pull request as ready for review May 12, 2024 11:52
@hasufell hasufell merged commit 5f7d3f2 into master May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants