Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Create database connection on need instead of keeping long-living connection #57

Merged
merged 6 commits into from
Jan 17, 2025

Conversation

sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Jan 17, 2025

Description

MySqlMetadataStorage and MySqlDataStorage connects to database after it is created and keeps the connection alive to reduce the latency in database operations. However, database connection is a scarce resource. Keeping long-living connections significantly restricts scalability of the system. Thus, we now create database connection on need instead of keeping long-living connection.

Validation performed

  • GitHub workflows pass
  • Unit tests pass in dev container
  • Integration tests pass in dev container

Summary by CodeRabbit

Release Notes: Storage Connection Management Update

  • New Features

    • Introduced MySqlConnection class for robust database connection management.
    • Enhanced connection handling with improved error management.
  • Changes

    • Renamed MySQL-related files to use consistent "MySql" naming convention.
    • Removed explicit connect() and close() methods from storage classes.
    • Updated storage classes to require connection URL during instantiation.
  • Improvements

    • Simplified database connection initialization process.
    • Improved resource management for database connections.
    • Enhanced error handling for storage operations.
  • Compatibility

    • Requires updating code that interacts with storage classes to provide connection URL.

Copy link

coderabbitai bot commented Jan 17, 2025

Walkthrough

This pull request introduces a comprehensive refactoring of MySQL storage connection management across the Spider project. The changes primarily focus on introducing a new MySqlConnection class that encapsulates database connection logic, removing explicit connect and close methods from storage interfaces, and standardizing the connection initialization process. The modifications affect multiple components, including storage classes, client drivers, and scheduler implementations, with a consistent approach to handling database connections through constructor-based initialization.

Changes

File Change Summary
src/spider/CMakeLists.txt Renamed MySQL-related files from "Mysql" to "MySql", added MySqlConnection.cpp and MySqlConnection.hpp
src/spider/storage/DataStorage.hpp Removed connect() and close() virtual methods
src/spider/storage/MetadataStorage.hpp Removed connect() and close() virtual methods
src/spider/storage/MySqlConnection.{cpp,hpp} New implementation for MySQL connection management with RAII principles
src/spider/storage/MySqlStorage.{cpp,hpp} Refactored connection handling, added URL-based constructor, removed direct connection management
Multiple client files (Driver.cpp, scheduler.cpp, task_executor.cpp, worker.cpp) Updated storage object instantiation to use URL-based constructors

Possibly related PRs

Finishing Touches

  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (12)
src/spider/storage/MySqlConnection.hpp (5)

18-23: Ensure consistent use of const qualifiers.

In the create method, the parameter std::string const& url uses a const qualifier after the type, whereas in the rest of the code, the const qualifier is placed before the type (e.g., MySqlConnection(MySqlConnection const&)). For consistency and readability, it's recommended to place the const qualifier before the type in all cases.

Apply this diff to standardize the const placement:

-static auto create(std::string const& url) -> std::variant<MySqlConnection, StorageErr>;
+static auto create(const std::string& url) -> std::variant<MySqlConnection, StorageErr>;

21-25: Consider deleting move operations for clarity.

Currently, the copy constructor and copy assignment operator are deleted, but the move constructor and move assignment operator are defaulted. Since MySqlConnection manages a unique resource (std::unique_ptr<sql::Connection>), allowing move semantics is acceptable. However, to prevent accidental misuse and to make the class non-copyable and non-movable, consider deleting the move constructor and move assignment operator as well.

If intentional, please disregard this suggestion.


29-30: Implement const correctness in operator overloads.

The operator* and operator-> methods allow access to the underlying sql::Connection. To promote const correctness and allow access to const member functions of sql::Connection, consider adding const versions of these operators.

Apply this diff to add const versions:

 auto operator*() -> sql::Connection&;
+auto operator*() const -> const sql::Connection&;
 auto operator->() -> sql::Connection*;
+auto operator->() const -> const sql::Connection*;

33-35: Initialize member variables in the constructor initializer list.

In the private constructor, m_connection is initialized using an initializer list. The default member initialization = nullptr on line 35 is redundant since m_connection is assigned a new value. Remove the default initialization for clarity.

Apply this diff to remove the redundant initialization:

 std::unique_ptr<sql::Connection> m_connection = nullptr;
+std::unique_ptr<sql::Connection> m_connection;

15-37: Add documentation for the MySqlConnection class and its methods.

To improve maintainability and readability, consider adding documentation comments to the MySqlConnection class and its methods. This will help other developers understand the purpose and usage of the class.

src/spider/storage/MySqlConnection.cpp (1)

52-58: Check for null pointers before dereferencing.

In the operator* and operator-> methods, there is an implicit assumption that m_connection is not null. To prevent potential null pointer dereferences, consider adding assertions or checks to ensure m_connection is valid before accessing it.

Apply this diff to add checks:

 auto MySqlConnection::operator*() -> sql::Connection& {
+    assert(m_connection && "m_connection is null");
     return *m_connection;
 }

 auto MySqlConnection::operator->() -> sql::Connection* {
+    assert(m_connection && "m_connection is null");
     return m_connection.get();
 }
src/spider/storage/MySqlStorage.hpp (5)

27-28: Provide meaningful default constructors or delete them with rationale.

The default constructor for MySqlMetadataStorage is explicitly deleted. Deleting default constructors can be acceptable, but consider providing a comment explaining why the default constructor is deleted to aid future maintainability.


76-79: Declare add_task and fetch_full_task as static or consider design alternatives.

The methods add_task and fetch_full_task are declared as static, which may not align with object-oriented design principles if they manipulate class member data. Evaluate whether these methods should be static or whether they should be member functions.


78-79: Consistency in method declarations between classes.

In MySqlMetadataStorage, there are private static methods add_task and fetch_full_task, but similar methods are not present in MySqlDataStorage. For consistency and maintainability, consider reviewing and aligning the design of both classes.


84-85: Same as earlier: Provide meaningful default constructors or delete them with rationale.

The default constructor for MySqlDataStorage is also explicitly deleted. As with MySqlMetadataStorage, consider providing a comment explaining the deletion.


121-121: Consider adding access modifiers for member variables for clarity.

The member variable m_url in MySqlDataStorage is declared without an explicit access modifier. While it's currently under private, explicitly specifying private improves clarity.

Apply this diff to add the access modifier:

+private:
     std::string m_url;
src/spider/storage/MySqlStorage.cpp (1)

Line range hint 386-452: Consider adding exception handling within add_task to ensure exception safety.

The add_task function performs multiple database operations that may throw sql::SQLException. Currently, it relies on the caller to handle exceptions. For better encapsulation and robustness, consider wrapping the body of add_task in a try-catch block to handle exceptions internally.

Apply this diff to include exception handling within add_task:

 void MySqlMetadataStorage::add_task(MySqlConnection& conn, sql::bytes job_id, Task const& task) {
+   try {
       // Add task
       std::unique_ptr<sql::PreparedStatement> task_statement(
           conn->prepareStatement("INSERT INTO `tasks` (`id`, `job_id`, `func_name`, `state`, "
                                  "`timeout`, `max_retry`) VALUES (?, ?, ?, ?, ?, ?)")
       );
       // ... [rest of the function body]
+   } catch (sql::SQLException& e) {
+       conn->rollback();
+       throw; // Rethrow the exception to be handled by the caller if necessary
+   }
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between becbf55 and 876aace.

📒 Files selected for processing (12)
  • src/spider/CMakeLists.txt (2 hunks)
  • src/spider/client/Driver.cpp (3 hunks)
  • src/spider/scheduler/scheduler.cpp (2 hunks)
  • src/spider/storage/DataStorage.hpp (0 hunks)
  • src/spider/storage/MetadataStorage.hpp (0 hunks)
  • src/spider/storage/MySqlConnection.cpp (1 hunks)
  • src/spider/storage/MySqlConnection.hpp (1 hunks)
  • src/spider/storage/MySqlStorage.cpp (58 hunks)
  • src/spider/storage/MySqlStorage.hpp (4 hunks)
  • src/spider/worker/task_executor.cpp (2 hunks)
  • src/spider/worker/worker.cpp (2 hunks)
  • tests/storage/StorageTestHelper.hpp (3 hunks)
💤 Files with no reviewable changes (2)
  • src/spider/storage/MetadataStorage.hpp
  • src/spider/storage/DataStorage.hpp
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
  • GitHub Check: lint
🔇 Additional comments (8)
src/spider/storage/MySqlConnection.cpp (2)

20-24: Improve URL validation and error messages.

The regular expression for parsing the JDBC URL may not cover all valid scenarios and could lead to false negatives. Additionally, the error message "Invalid url" lacks detail. Consider enhancing the URL validation to cover more cases and providing more informative error messages.

Would you like me to suggest an improved regular expression or validation approach?


28-37: Securely handle sensitive information in exceptions and logs.

When an exception occurs, the error message e.what() could contain sensitive information, such as database connection details. Ensure that sensitive information is not exposed in logs or error messages.

[security_issue]
Consider sanitizing error messages before returning or logging them.

src/spider/storage/MySqlStorage.cpp (1)

269-273: Connection handling updated to create connections on-demand.

The changes appropriately modify the connection handling by creating a new MySqlConnection for each database operation in initialize(), aligning with the PR objectives to avoid long-lived connections.

tests/storage/StorageTestHelper.hpp (1)

14-14: Update storage constructors to accept cStorageUrl parameter.

The updates correctly modify the constructors to include cStorageUrl, conforming to the new requirement that storage classes are initialized with the storage URL.

Also applies to: 27-27, 35-35, 44-44, 46-46

src/spider/client/Driver.cpp (1)

17-17: Simplify storage initialization and error handling in Driver constructor.

The changes appropriately adjust the constructors to pass storage_url directly to the storage classes and streamline error handling by declaring err as a constant. This enhances code clarity and aligns with the updated connection management strategy.

Also applies to: 26-27, 29-29, 51-52, 54-54

src/spider/worker/task_executor.cpp (1)

127-129: Consider adding error handling for storage initialization.

The storage objects are initialized with the storage_url parameter, but there's no explicit error handling for potential connection failures. This could lead to runtime issues if the database connection cannot be established.

Run this script to check if error handling exists in the MySqlStorage implementation:

src/spider/scheduler/scheduler.cpp (1)

169-174: LGTM! Well-structured error handling.

The initialization of storage objects includes proper error handling through the StorageErr mechanism, which is a good practice for managing database connection failures.

src/spider/CMakeLists.txt (1)

3-4: LGTM! Consistent naming convention and proper integration.

The addition of MySqlConnection files and the renaming from Mysql to MySql maintains consistency across the codebase. The new connection management class is properly integrated into the build system.

Also applies to: 27-28

Comment on lines 45 to 50
MySqlConnection::~MySqlConnection() {
if (m_connection) {
m_connection->close();
m_connection.reset();
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure proper exception handling in the destructor.

The destructor calls m_connection->close(), which may throw an exception. Throwing exceptions from a destructor can lead to undefined behaviour. Consider wrapping the close call in a try-catch block to handle any potential exceptions.

Apply this diff to handle exceptions:

 void MySqlConnection::~MySqlConnection() {
     if (m_connection) {
+        try {
             m_connection->close();
+        } catch (const sql::SQLException& e) {
+            // Log the exception or handle it appropriately
+        }
         m_connection.reset();
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
MySqlConnection::~MySqlConnection() {
if (m_connection) {
m_connection->close();
m_connection.reset();
}
}
MySqlConnection::~MySqlConnection() {
if (m_connection) {
try {
m_connection->close();
} catch (const sql::SQLException& e) {
// Log the exception or handle it appropriately
}
m_connection.reset();
}
}

Comment on lines +25 to +37
bool const credential = match[2].matched && match[4].matched;
std::unique_ptr<sql::Connection> conn;
try {
sql::Driver* driver = sql::mariadb::get_driver_instance();
if (credential) {
conn = std::unique_ptr<sql::Connection>(
driver->connect(sql::SQLString(url), match[2].str(), match[4].str())
);
} else {
conn = std::unique_ptr<sql::Connection>(
driver->connect(sql::SQLString(url), sql::Properties{})
);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle cases where only username or password is provided.

The current logic checks for both username and password (match[2].matched && match[4].matched). If only a username or only a password is provided in the URL, the connection may fail or behave unexpectedly. Consider handling cases where only one credential is provided.

Apply this diff to adjust the credential handling:

-bool const credential = match[2].matched && match[4].matched;
+bool const has_user = match[2].matched;
+bool const has_password = match[4].matched;

Then update the connection logic accordingly to handle all combinations.

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 1655 to 1665
std::variant<MySqlConnection, StorageErr> conn_result = MySqlConnection::create(m_url);
if (std::holds_alternative<StorageErr>(conn_result)) {
return std::get<StorageErr>(conn_result);
}
auto& conn = std::get<MySqlConnection>(conn_result);
try {
std::variant<MySqlConnection, StorageErr> conn_result = MySqlConnection::create(m_url);
if (std::holds_alternative<StorageErr>(conn_result)) {
return std::get<StorageErr>(conn_result);
}
auto& conn = std::get<MySqlConnection>(conn_result);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove redundant connection initialization in MySqlDataStorage::initialize().

The connection initialization code is duplicated between lines 1655 and 1665. This redundancy can lead to confusion and potential errors. Please remove the duplicate code to maintain clarity and prevent unnecessary operations.

Apply this diff to eliminate the redundancy:

 auto MySqlDataStorage::initialize() -> StorageErr {
     std::variant<MySqlConnection, StorageErr> conn_result = MySqlConnection::create(m_url);
     if (std::holds_alternative<StorageErr>(conn_result)) {
         return std::get<StorageErr>(conn_result);
     }
     auto& conn = std::get<MySqlConnection>(conn_result);
-    std::variant<MySqlConnection, StorageErr> conn_result = MySqlConnection::create(m_url);
-    if (std::holds_alternative<StorageErr>(conn_result)) {
-        return std::get<StorageErr>(conn_result);
-    }
-    auto& conn = std::get<MySqlConnection>(conn_result);
     // Need to initialize metadata storage first so that foreign constraint is not violated
     for (char const* create_table_str : cCreateStorage) {
         std::unique_ptr<sql::Statement> statement(conn->createStatement());
         statement->executeUpdate(create_table_str);
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::variant<MySqlConnection, StorageErr> conn_result = MySqlConnection::create(m_url);
if (std::holds_alternative<StorageErr>(conn_result)) {
return std::get<StorageErr>(conn_result);
}
auto& conn = std::get<MySqlConnection>(conn_result);
try {
std::variant<MySqlConnection, StorageErr> conn_result = MySqlConnection::create(m_url);
if (std::holds_alternative<StorageErr>(conn_result)) {
return std::get<StorageErr>(conn_result);
}
auto& conn = std::get<MySqlConnection>(conn_result);
std::variant<MySqlConnection, StorageErr> conn_result = MySqlConnection::create(m_url);
if (std::holds_alternative<StorageErr>(conn_result)) {
return std::get<StorageErr>(conn_result);
}
auto& conn = std::get<MySqlConnection>(conn_result);

Comment on lines +363 to +365
= std::make_shared<spider::core::MySqlMetadataStorage>(storage_url);
std::shared_ptr<spider::core::DataStorage> const data_store
= std::make_shared<spider::core::MySqlDataStorage>();
err = data_store->connect(storage_url);
if (!err.success()) {
spdlog::error("Cannot connect to data storage: {}", err.description);
return cStorageConnectionErr;
}
= std::make_shared<spider::core::MySqlDataStorage>(storage_url);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Add initialization error handling after storage creation

The storage classes require explicit initialization after construction. Add error checking similar to the pattern in scheduler.cpp:

  • After creating the storage objects, call initialize() on both stores
  • Check the returned StorageErr and handle any connection failures

Example locations requiring error handling:

  • src/spider/worker/worker.cpp:363-365: Add initialization and error checking for both metadata_store and data_store
🔗 Analysis chain

Consider adding error handling for storage initialization.

While error handling exists for driver addition, there's no explicit error handling for potential connection failures during storage initialization.

Run this script to verify the error handling pattern across the codebase:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent error handling patterns
rg -A 5 "std::make_shared<.*MySql.*Storage>" | grep -B 5 "if.*err.*success"

Length of output: 1393


Script:

#!/bin/bash
# Check storage class implementations
ast-grep --pattern 'class MySql(Meta)?DataStorage {
  $$$
}'

# Also check for initialize method implementations
ast-grep --pattern 'StorageErr $_(MySql(Meta)?DataStorage::initialize)() {
  $$$
}'

Length of output: 148


Script:

#!/bin/bash
# Find storage implementation files
fd -e cpp -e hpp -e h "MySql.*Storage"

# Search for constructor and initialization patterns
rg -A 5 "MySql(Meta)?DataStorage::(MySql|initialize)" --type cpp

Length of output: 711

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/spider/storage/MySqlStorage.hpp (1)

Line range hint 27-122: Document connection lifecycle management.

Consider adding documentation that describes:

  1. Connection lifecycle management
  2. Thread-safety requirements
  3. Best practices for connection sharing/pooling
  4. Error handling expectations

This will help users of these classes properly manage database connections and avoid potential issues.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 876aace and 051b973.

📒 Files selected for processing (1)
  • src/spider/storage/MySqlStorage.hpp (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: lint
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
🔇 Additional comments (4)
src/spider/storage/MySqlStorage.hpp (4)

6-6: LGTM! Required includes added for new connection management.

The addition of necessary headers supports the refactoring of connection management.

Also applies to: 22-22


27-28: LGTM! Constructor changes enforce proper initialization.

The explicit constructor with URL parameter and deletion of the default constructor ensure proper initialization of the storage object. The m_url member appropriately replaces the connection pointer for the new connection management strategy.

Also applies to: 76-76


85-86: LGTM! Consistent implementation with MySqlMetadataStorage.

The changes to MySqlDataStorage mirror those in MySqlMetadataStorage, maintaining consistency in the connection management approach across storage classes.

Also applies to: 122-122


78-80: Consider thread-safety implications of shared connections.

The static methods accepting a MySqlConnection reference might lead to thread-safety issues if the same connection is shared across multiple threads. Consider documenting thread-safety requirements or adding synchronization mechanisms.

Let's verify the thread-safety implications:

@sitaowang1998
Copy link
Collaborator Author

@coderabbitai review

Copy link

coderabbitai bot commented Jan 17, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/spider/storage/MySqlConnection.cpp (1)

25-37: 🛠️ Refactor suggestion

Handle cases where only username or password is provided

The current logic at line 25 checks for both username and password (match[2].matched && match[4].matched). If only a username or only a password is provided in the JDBC URL, the connection may fail or behave unexpectedly. Consider updating the logic to handle cases where only one of the credentials is provided.

You can adjust the credential handling as follows:

-bool const credential = match[2].matched && match[4].matched;
+bool const has_user = match[2].matched;
+bool const has_password = match[4].matched;

Then update the connection logic accordingly to handle all credential combinations.

src/spider/storage/MySqlStorage.cpp (1)

1655-1671: 🛠️ Refactor suggestion

Consolidate storage initialization to avoid redundancy

The MySqlDataStorage::initialize() method duplicates the storage initialization code from MySqlMetadataStorage::initialize(). To prevent redundancy and potential maintenance issues, consider invoking the metadata storage initialization from within the data storage initialization.

You can modify MySqlDataStorage::initialize() as follows:

auto MySqlDataStorage::initialize() -> StorageErr {
    // Initialize metadata storage first
    MySqlMetadataStorage metadata_storage(m_url);
    StorageErr err = metadata_storage.initialize();
    if (err.has_error()) {
        return err;
    }
    // Additional data storage initialization if necessary
    return StorageErr{};
}
🧹 Nitpick comments (4)
src/spider/storage/MySqlConnection.cpp (2)

39-42: Catch exceptions by constant reference

In the create method, it is preferable to catch exceptions by constant reference to avoid unnecessary copying and to adhere to best practices.

Update the catch block as follows:

-} catch (sql::SQLException& e) {
+} catch (const sql::SQLException& e) {

50-51: Log exceptions in the destructor for debugging

While ignoring exceptions in destructors prevents potential termination during stack unwinding, logging the exception details can aid in debugging connection closure issues.

Modify the catch block to log the exception:

} catch (const sql::SQLException& e) {
-    // Ignore
+    spdlog::error("Exception caught in MySqlConnection destructor: {}", e.what());
}

Ensure that the appropriate logging headers are included.

src/spider/storage/MySqlStorage.cpp (2)

382-386: Limit the scope of prepared statements to prevent resource retention

In the add_task function, prepared statements inside loops can retain resources longer than necessary. Limiting their scope ensures resources are released promptly.

Enclose prepared statements within their own scope:

for (std::uint64_t i = 0; i < task.get_num_inputs(); ++i) {
    {
        // Existing code
        std::unique_ptr<sql::PreparedStatement> input_statement(/* ... */);
        // Use input_statement
    } // input_statement is destroyed here
}

Line range hint 1466-1514: Optimize queries to fetch only necessary task details

The get_child_tasks and get_parent_tasks functions retrieve full task details, which may be inefficient for large datasets if only specific fields are needed. Consider fetching only required fields to enhance performance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 051b973 and ca4902a.

📒 Files selected for processing (3)
  • src/spider/storage/MySqlConnection.cpp (1 hunks)
  • src/spider/storage/MySqlConnection.hpp (1 hunks)
  • src/spider/storage/MySqlStorage.cpp (58 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/spider/storage/MySqlConnection.hpp
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: lint

Comment on lines +269 to +284
std::variant<MySqlConnection, StorageErr> conn_result = MySqlConnection::create(m_url);
if (std::holds_alternative<StorageErr>(conn_result)) {
return std::get<StorageErr>(conn_result);
}
auto& conn = std::get<MySqlConnection>(conn_result);
try {
for (char const* create_table_str : cCreateStorage) {
std::unique_ptr<sql::Statement> statement(m_conn->createStatement());
std::unique_ptr<sql::Statement> statement(conn->createStatement());
statement->executeUpdate(create_table_str);
}
} catch (sql::SQLException& e) {
m_conn->rollback();
conn->rollback();
return StorageErr{StorageErrType::OtherErr, e.what()};
}

m_conn->commit();
conn->commit();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Wrap connection initialization in a try-catch block

In MySqlMetadataStorage::initialize(), exceptions during connection creation are not currently caught. Wrapping the connection initialization in a try-catch block ensures that any exceptions are properly handled, preventing unexpected behaviour.

Apply this change to handle exceptions during connection initialization:

 auto MySqlMetadataStorage::initialize() -> StorageErr {
+    try {
     std::variant<MySqlConnection, StorageErr> conn_result = MySqlConnection::create(m_url);
     if (std::holds_alternative<StorageErr>(conn_result)) {
         return std::get<StorageErr>(conn_result);
     }
     auto& conn = std::get<MySqlConnection>(conn_result);
     try {
         for (char const* create_table_str : cCreateStorage) {
             std::unique_ptr<sql::Statement> statement(conn->createStatement());
             statement->executeUpdate(create_table_str);
         }
-    } catch (sql::SQLException& e) {
+    } catch (const sql::SQLException& e) {
         conn->rollback();
         return StorageErr{StorageErrType::OtherErr, e.what()};
     }
     conn->commit();
     return StorageErr{};
+    } catch (const sql::SQLException& e) {
+        return StorageErr{StorageErrType::ConnectionErr, e.what()};
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::variant<MySqlConnection, StorageErr> conn_result = MySqlConnection::create(m_url);
if (std::holds_alternative<StorageErr>(conn_result)) {
return std::get<StorageErr>(conn_result);
}
auto& conn = std::get<MySqlConnection>(conn_result);
try {
for (char const* create_table_str : cCreateStorage) {
std::unique_ptr<sql::Statement> statement(m_conn->createStatement());
std::unique_ptr<sql::Statement> statement(conn->createStatement());
statement->executeUpdate(create_table_str);
}
} catch (sql::SQLException& e) {
m_conn->rollback();
conn->rollback();
return StorageErr{StorageErrType::OtherErr, e.what()};
}
m_conn->commit();
conn->commit();
try {
std::variant<MySqlConnection, StorageErr> conn_result = MySqlConnection::create(m_url);
if (std::holds_alternative<StorageErr>(conn_result)) {
return std::get<StorageErr>(conn_result);
}
auto& conn = std::get<MySqlConnection>(conn_result);
try {
for (char const* create_table_str : cCreateStorage) {
std::unique_ptr<sql::Statement> statement(conn->createStatement());
statement->executeUpdate(create_table_str);
}
} catch (const sql::SQLException& e) {
conn->rollback();
return StorageErr{StorageErrType::OtherErr, e.what()};
}
conn->commit();
return StorageErr{};
} catch (const sql::SQLException& e) {
return StorageErr{StorageErrType::ConnectionErr, e.what()};
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/spider/storage/MySqlConnection.cpp (1)

26-38: ⚠️ Potential issue

Handle cases where only username or password is provided.

The logic currently checks if both username and password are provided (match[2].matched && match[4].matched). If only one is present, the connection may fail. Consider handling cases where only the username or only the password is provided to improve flexibility and robustness.

Apply this diff to adjust the credential handling:

-bool const credential = match[2].matched && match[4].matched;
+bool const has_user = match[2].matched;
+bool const has_password = match[4].matched;

Then update the connection logic accordingly to handle all combinations of user credentials.

🧹 Nitpick comments (3)
src/spider/worker/task_executor.cpp (1)

126-128: Consider adding error handling for storage initialization.

While the connection handling has been moved into the constructors, it would be beneficial to catch and handle any potential connection errors at this level to provide more specific error codes and logging.

 std::shared_ptr<spider::core::MetadataStorage> const metadata_store
-        = std::make_shared<spider::core::MySqlMetadataStorage>(storage_url);
+        = nullptr;
 std::shared_ptr<spider::core::DataStorage> const data_store
-        = std::make_shared<spider::core::MySqlDataStorage>(storage_url);
+        = nullptr;
+try {
+    metadata_store = std::make_shared<spider::core::MySqlMetadataStorage>(storage_url);
+    data_store = std::make_shared<spider::core::MySqlDataStorage>(storage_url);
+} catch (const std::exception& e) {
+    spdlog::error("Failed to initialize storage: {}", e.what());
+    return cStorageErr;
+}
src/spider/storage/MySqlConnection.cpp (2)

21-25: Improve URL parsing robustness.

The current regular expression may not handle all valid JDBC URLs, especially if the URL includes additional query parameters or lacks user credentials. Consider using a more robust URL parsing method or a dedicated URI parsing library to handle various URL formats and edge cases.


24-24: Enhance error message for invalid URL.

Including the invalid URL in the error message can help users debug issues more effectively.

Apply this diff to improve the error message:

-        return StorageErr{StorageErrType::OtherErr, "Invalid url"};
+        return StorageErr{StorageErrType::OtherErr, "Invalid URL: " + url};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca4902a and 5a3a2fa.

📒 Files selected for processing (2)
  • src/spider/storage/MySqlConnection.cpp (1 hunks)
  • src/spider/worker/task_executor.cpp (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: lint
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
🔇 Additional comments (1)
src/spider/worker/task_executor.cpp (1)

27-27: LGTM! Header inclusion is appropriate.

The inclusion of MySqlStorage.hpp is necessary for the MySQL storage implementations and follows the changes in storage initialization.

@sitaowang1998 sitaowang1998 merged commit f6e82ce into y-scope:main Jan 17, 2025
5 checks passed
@sitaowang1998 sitaowang1998 deleted the storage_connection branch January 17, 2025 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant