Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Place components in namespaces #17

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cal-pratt
Copy link
Contributor

@cal-pratt cal-pratt commented Jan 24, 2025

From mxmlnkn/rapidgzip#48

edited

  • Moved all of core and rapidgzip to rapidgzip namespace.
  • Redefined BitReader to BitReader64 to not conflict with template name definition now in rapidgzip namespace.
  • Adding using namespace rapidgzip to test and app files.

@cal-pratt
Copy link
Contributor Author

Another question, I noticed you also have a bzip2 namespace in src/indexed_bzip2/bzip2.hpp. Should I leave that alone or move it to indexed_bzip2?

@mxmlnkn
Copy link
Owner

mxmlnkn commented Jan 25, 2025

* For unit tests and tools files, I added `using namespace rapidgzip` and `using namespace rapidgzip::core` to allow for brevity.

That's absolutely fine. Looking at the PR, I feel like the other files would also benefit from this... I am normally prefixing std::, but even to me, the core:: prefixed everywhere still seems overly verbose and distracting to me. But, I guess using namespace is much harder to do inside headers because it would pollute either the global or rapidgzip namespace.

It would probably be the more natural and less intrusive choice to move everything in rapidgzip::core into rapidgzip :S

* Do you have setup steps for running `uncrustify`? Haven't used this tool before!

make beautify-all, but the current code also does not fully match the uncrustify specification because:

  • Some code lines are wrongly formated by uncrustify because of bugs in uncrustify.
  • Some code locations would look worse with the current uncrustify config. I would have to invest some time in either relaxing some options, of which there are thousands and it is a hassle to do this, or adding fmt: off/on comments aroudn some locations such as where I really think that aligning code aids readability, e.g., in the tests.
  • Some code locations simply because I haven't run uncrustify in a while.
  • Formatting changes subtly with each uncrustify 0.x release and therefore an exact uncrustify version to be used should be specified somewhere.
* Do you have a good way to validate the code in non-included ifdefs get covered correctly?

The CI builds with some major macro definitions being toggled, for example the Check-Without-Isal step inside test-cpp.yml. Locally, I either toggle these manually via ccmake and/or have separate build folders for each configuration.

Another question, I noticed you also have a bzip2 namespace in src/indexed_bzip2/bzip2.hpp. Should I leave that alone or move it to indexed_bzip2?

If it doesn't bother your project, it might be better to leave it alone.

Else, I would split the src/indexed_bzip2/ folder to move the bzip2.hpp and BZ2Reader into their own folder and/or namespace, e.g., rapidgzip::bzip2 and leave the BZ2BlockFetcher in that folder and move it into the namespace indexed_bzip2.

The reason for this being that bzip2.hpp and BZ2Reader are also used by rapidgzip in my attempt to extend the architecture to be sufficiently generic to support gzip, bzip2, lz4, and others. indexed_bzip2 is kinda the legacy lightweight library for bzip2-only support, and currently it is also a less memory-intensive when used.

src/CMakeLists.txt Outdated Show resolved Hide resolved
src/core/AlignedAllocator.hpp Outdated Show resolved Hide resolved
@cal-pratt
Copy link
Contributor Author

It would probably be the more natural and less intrusive choice to move everything in rapidgzip::core into rapidgzip :S

Was thinking about that yesterday as well! I think that would simplify things greatly for most files.

If it doesn't bother your project, it might be better to leave it alone.

It doesn't bother me, and that explanation makes sense to me. I'll leave the file structure alone in this PR, but I want to change directory structure/include-paths in a follow up to prevent file name conflicts.

Will get to this early next week! :)

@cal-pratt cal-pratt force-pushed the cal-pratt/namespaces branch from fa4ad7e to ec493b1 Compare January 27, 2025 21:19
@cal-pratt
Copy link
Contributor Author

@mxmlnkn I cleaned up the PR to have core included in rapidgzip namespace. Only a few spots needed additional prefixing now. One thing I did have to change was the naming of the BitReader alias, which would conflict being in both namespaces. So for that change I edited it to: using BitReader64 = BitReader<false, uint64_t>;. I think that also makes it a bit easier to follow along in the code. I found myself questioning a lot what the size of the reader was in many places.

@cal-pratt cal-pratt marked this pull request as ready for review January 27, 2025 22:21
@cal-pratt
Copy link
Contributor Author

Alternatively, would using GzipBitReader = BitReader<false, uint64_t>; and BzipBitReader = BitReader<true, uint64_t>; make more sense?

@mxmlnkn
Copy link
Owner

mxmlnkn commented Jan 27, 2025

@mxmlnkn I cleaned up the PR to have core included in rapidgzip namespace. Only a few spots needed additional prefixing now. One thing I did have to change was the naming of the BitReader alias, which would conflict being in both namespaces. So for that change I edited it to: using BitReader64 = BitReader<false, uint64_t>;. I think that also makes it a bit easier to follow along in the code. I found myself questioning a lot what the size of the reader was in many places.

Hm, I see the need to have a different name, but I think that the size of the internal buffer should not be encoded in the name as it may change for performance reasons and is not even fully true because it also has a larger kilo-byte-sized byte buffer. The other template argument is much more meaningful, at least to me, i.e., BitReaderLSB vs BitReaderMSB or much better gzip::BitReader vs. bzip2::BitReader because this naming is more teleological.
bzip2::BitReader actually already exists and is correctly used as such in src/rapidgzip/chunkdecoding/Bzip2Chunk.hpp. At least BitReader64 is unique enough that it can be renamed easily automatically.

As for the commits. Personally, I would implement this in two commits:

[refactor] Move operator<< for std::set into TestHelpers.hpp
[API] Move BitReader shorthand into gzip namespace
[API] Add indexed_bzip2 namespace
[API] Add rapidgzip namespace

If this is too much style feedback, I can also merge it and do the touch-ups.

src/core/FileUtils.hpp Show resolved Hide resolved
@@ -327,4 +328,5 @@ class FetchMultiStream :
private:
const size_t m_memorySizePerStream;
};
} // FetchingStrategy
} // namespace FetchingStrategy
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style enforced by uncrustify should be two spaces before end-of-line comments. But, it's fine, I can also change this after merging the next time I try to iron out the uncrustify config.

@@ -153,3 +154,4 @@ class StreamedResults
Values m_results;
std::atomic<bool> m_finalized = false;
};
} // namespace rapidgzip
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline at the end of the file. I can fix this later myself if you want. Also applies to some other files.

src/rapidgzip/ParallelGzipReader.hpp Show resolved Hide resolved
@cal-pratt cal-pratt force-pushed the cal-pratt/namespaces branch from e4d693f to fd92808 Compare January 28, 2025 00:28
@cal-pratt
Copy link
Contributor Author

The other template argument is much more meaningful, at least to me, i.e., BitReaderLSB vs BitReaderMSB or much better gzip::BitReader vs. bzip2::BitReader because this naming is more teleologica

Added it as rapidgzip::gzip::BitReader

Missing newline at the end of the file. I can fix this later myself if you want. Also applies to some other files.

Should be fixed.. not sure what my IDE is up to 🤷

The empty line needs to be kept to separate STL includes from rapidgzip ones.

Added back

This [using BitReader = rapidgzip::BitReader;] is part of the API and should not be deleted.

Added back and referenced gzip::BitReader

This operator<< would be more apt inside src/core/TestHelpers.hpp because it is only intended for debugging output and does not define any stable output format.

It turns out this cannot be easily done. Didn't dive too deeply into it, but it appears to be needed by some of common.hpp, and it seems like import order will matter if common is included before TestHelpers. I added it back to the top of testCLI just put into the namespace.

As for the commits. Personally, I would implement this in two commits.
If this is too much style feedback, I can also merge it and do the touch-ups.

I'm fine either way. Want to get your thoughts on rapidgzip::gzip::BitReader and if that is what you intended first before I go into rebasing. For right now I just squashed all my commits together.

@cal-pratt cal-pratt force-pushed the cal-pratt/namespaces branch from fd92808 to 0406aa5 Compare January 28, 2025 00:38
@cal-pratt cal-pratt force-pushed the cal-pratt/namespaces branch from 0406aa5 to ce9e396 Compare January 28, 2025 00:43
Copy link
Owner

@mxmlnkn mxmlnkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way. Want to get your thoughts on rapidgzip::gzip::BitReader and if that is what you intended first before I go into rebasing. For right now I just squashed all my commits together.

Looks good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants