-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(clp-s): json to irv2 #657
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request introduces a new command-line option Changes
Sequence DiagramsequenceDiagram
participant CLI as Command Line Interface
participant Parser as CommandLineArguments
participant Generator as JSON to IR Generator
participant Serializer as Serializer
CLI->>Parser: Parse JsonToIr command
Parser->>Generator: Validate and prepare options
Generator->>Serializer: Initialize serialization
Serializer->>Generator: Process JSON files
Generator-->>CLI: Return conversion status
Possibly Related PRs
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
components/core/src/clp_s/CommandLineArguments.cpp (2)
758-764
: Fix typographical errors in option descriptions.There are minor typos in the option descriptions:
- Line 758: "before ir generation fails" should be "before IR generation fails".
- Line 764: "befroe" should be "before".
Apply this diff to correct the typos:
- "Maximum allowed size (B) for a single document before ir generation fails." + "Maximum allowed size (B) for a single document before IR generation fails." ... - "Maximum allowed size (B) for an in memory IR buffer befroe being written to file." + "Maximum allowed size (B) for an in-memory IR buffer before being written to file."
747-747
: Consider renaming "Compression options" to "JSON to IR options".For clarity, rename the header of the options from "Compression options" to "JSON to IR options", as these options are specific to the
JsonToIr
command.Apply this diff to rename the options group:
- po::options_description compression_options("Compression options"); + po::options_description compression_options("JSON to IR options");components/core/src/clp_s/CommandLineArguments.hpp (2)
29-30
: Consider documenting the command character choiceThe character 'r' for
JsonToIr
might not be immediately intuitive to users. Consider adding a comment explaining the rationale for this choice, or consider a more descriptive character if available.
202-203
: Use named constants for magic numbersConsider replacing the magic numbers with named constants to improve code readability and maintainability:
- The encoding type value of 8 should be a named constant with documentation explaining its significance
- The buffer size of 512MB could use a named constant similar to other size constants in the file
+ // Default encoding type for IR conversion + static constexpr int DEFAULT_ENCODING_TYPE = 8; + // Default maximum buffer size for IR conversion (512MB) + static constexpr size_t DEFAULT_MAX_IR_BUFFER_SIZE = 512ULL * 1024 * 1024; - int m_encoding_type{8}; - size_t m_max_ir_buffer_size{512ULL * 1024 * 1024}; + int m_encoding_type{DEFAULT_ENCODING_TYPE}; + size_t m_max_ir_buffer_size{DEFAULT_MAX_IR_BUFFER_SIZE};
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
components/core/src/clp_s/CommandLineArguments.cpp
(4 hunks)components/core/src/clp_s/CommandLineArguments.hpp
(4 hunks)components/core/src/clp_s/JsonParser.hpp
(1 hunks)components/core/src/clp_s/clp-s.cpp
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
components/core/src/clp_s/JsonParser.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/clp-s.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/CommandLineArguments.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/CommandLineArguments.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
🪛 cppcheck (2.10-2)
components/core/src/clp_s/clp-s.cpp
[error] 191-191: Exception thrown in function declared not to throw exceptions.
(throwInNoexceptFunction)
🔇 Additional comments (4)
components/core/src/clp_s/clp-s.cpp (1)
191-191
: Verify exception safety in functions to prevent throwing exceptions fromnoexcept
functions.Static analysis has flagged that an exception may be thrown in a function that should not throw exceptions. Please verify that
serialize_msgpack_map
does not throw exceptions, or ensure it is not declared withnoexcept
.Run the following script to check if
serialize_msgpack_map
is declarednoexcept
:🧰 Tools
🪛 cppcheck (2.10-2)
[error] 191-191: Exception thrown in function declared not to throw exceptions.
(throwInNoexceptFunction)
components/core/src/clp_s/JsonParser.hpp (1)
55-62
: StructJsonToIrParserOption
added successfully.The new structure
JsonToIrParserOption
is well-defined and follows appropriate coding standards.components/core/src/clp_s/CommandLineArguments.hpp (2)
69-72
: LGTM! Modern C++ practices well appliedThe getter methods follow modern C++ best practices with
[[nodiscard]]
attribute and trailing return types. The const-qualification ensures thread safety.
178-179
: LGTM! Consistent with existing usage methodsThe method declaration follows the established pattern of other usage printing methods in the class.
in_file.open(path, std::ifstream::in); | ||
|
||
std::filesystem::path input_path{path}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for input file opening failure.
The code does not check if the input file fails to open, which may lead to unexpected behaviour when in_file.open
fails.
Apply this diff to handle file opening errors:
in_file.open(path, std::ifstream::in);
+ if (false == in_file.is_open()) {
+ SPDLOG_ERROR("Failed to open input file: {}", path);
+ return false;
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
in_file.open(path, std::ifstream::in); | |
std::filesystem::path input_path{path}; | |
in_file.open(path, std::ifstream::in); | |
if (false == in_file.is_open()) { | |
SPDLOG_ERROR("Failed to open input file: {}", path); | |
return false; | |
} | |
std::filesystem::path input_path{path}; |
po::value<size_t>(&m_max_ir_buffer_size)->value_name("BUFFER_SIZE")-> | ||
default_value(m_max_ir_buffer_size), | ||
"Maximum allowed size (B) for an in memory IR buffer befroe being written to file." | ||
)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add input validation for maximum IR buffer size.
The max-ir-buffer-size
should be validated to ensure it is greater than zero to prevent potential division by zero or other unexpected behaviours.
Apply this diff to add input validation:
+ if (0 == m_max_ir_buffer_size) {
+ throw std::invalid_argument("max-ir-buffer-size must be greater than zero.");
+ }
Committable suggestion skipped: line range outside the PR's diff.
default_value(m_encoding_type), | ||
"4 (four byte encoding) or 8 (eight byte encoding)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add input validation for encoding type.
The encoding-type
parameter should be validated to ensure that it is either 4
or 8
, preventing unexpected behaviour or errors when invalid values are provided.
Apply this diff to add input validation:
+ if (m_encoding_type != 4 && m_encoding_type != 8) {
+ throw std::invalid_argument("encoding-type must be either 4 or 8.");
+ }
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments about using IR v2 APIs
for (auto& path : all_file_paths) { | ||
bool success; | ||
if (option.encoding == 4) { | ||
success = run_serializer<int32_t>(option, path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we should only generate 4-byte encoded IR stream (despite we do support 8-byte encoding). Correct me if I'm wrong @kirkrodrigues
} | ||
auto& serializer{result.value()}; | ||
std::vector<int8_t> ir_buf; | ||
flush_and_clear_serializer_buffer(serializer, ir_buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helper is supposed to be used for unit test only. The IR serializer already has an internal IR buffer; using an extra ir_buf
here will introduce unnecessary memory copy. We should use the serializer's internal IR buffer directly.
auto const view{serializer.get_ir_buf_view()}; | ||
byte_buf.insert(byte_buf.cend(), view.begin(), view.end()); | ||
serializer.clear_ir_buf(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how to access the serializer's internal IR buffer.
Description
This PR:
Validation performed
Generated IR V2 format for all 5 JSON public datasets
ex) ./clp-s r elasticsearch_ir elasticsearch/
Summary by CodeRabbit
Release Notes
New Features
Improvements