Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimised SQL query for exporting members #22005

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

vershwal
Copy link
Member

@vershwal vershwal commented Jan 14, 2025

ref https://linear.app/ghost/issue/ONC-699/lever-member-export-unresponsive

  • Split large SQL queries into smaller, focused queries to improve performance and reduce database load.
  • Shifted aggregation logic from database to in-memory processing for improved query efficiency and faster execution.
  • Added logging to identify performance bottlenecks and measure execution time for each step in production environment as things are pretty fast in local setup and staging.

@Ghost-Slimer
Copy link

Deployed to staging with ID: 3190

How does this work?

Copy link

coderabbitai bot commented Jan 15, 2025

Walkthrough

The changes introduce a new filtering mechanism for member data retrieval in the Ghost CMS export service. A variable hasFilter determines if filtering is necessary, leading to a minimal query that fetches member IDs based on the filter criteria. The overall structure of the code has been modified to enhance modularity in data retrieval while retaining the existing mapping of results.

Changes

File Change Summary
ghost/core/core/server/services/members/exporter/query.js Introduced conditional filtering for member data retrieval based on the presence of a filter, search term, or limit. Added logic for minimal query to fetch member IDs. Modified product and label retrieval to use knex for querying. Retained existing query structure for member details and result mapping.

Sequence Diagram

sequenceDiagram
    participant Exporter as Member Exporter
    participant DB as Database
    
    Exporter->>+DB: Check for Filters
    alt Has Filters
        Exporter->>+DB: Fetch Filtered Member IDs
        DB-->>-Exporter: Filtered Member IDs
        Exporter->>+DB: Fetch Member Details
    else No Filters
        Exporter->>+DB: Fetch All Members
    end
    
    DB-->>-Exporter: Member Details
    Exporter->>Exporter: Process Tiers and Labels
Loading

Poem

🐰 In the garden of data, filters bloom bright,
Hopping through queries, a wondrous sight!
With conditions now set, we fetch with delight,
Members and tiers dance in the soft moonlight.
A code transformation, a joyful spree! 🌟

Finishing Touches

  • 📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
ghost/core/core/server/services/members/exporter/query.js (1)

84-100: Optimize performance by using lookup maps for products and labels

Using .find() within a loop can lead to performance issues, especially with large datasets. Consider creating a lookup map for allProducts and allLabels to improve the efficiency of your code.

Example Implementation:

Create lookup maps outside the loop:

const productMap = new Map(allProducts.map(product => [product.id, product]));
const labelMap = new Map(allLabels.map(label => [label.id, label]));

Refactor the tier and label mapping:

const tierDetails = tierIds.map((id) => {
    const tier = productMap.get(id);
    if (tier) {
        return {
            name: tier.get('name')
        };
    }
}).filter(Boolean);

const labelDetails = labelIds.map((id) => {
    const label = labelMap.get(id);
    if (label) {
        return {
            name: label.get('name')
        };
    }
}).filter(Boolean);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ca066c and f0ae3d6.

📒 Files selected for processing (1)
  • ghost/core/core/server/services/members/exporter/query.js (1 hunks)

ghost/core/core/server/services/members/exporter/query.js Outdated Show resolved Hide resolved
ghost/core/core/server/services/members/exporter/query.js Outdated Show resolved Hide resolved
ghost/core/core/server/services/members/exporter/query.js Outdated Show resolved Hide resolved
ghost/core/core/server/services/members/exporter/query.js Outdated Show resolved Hide resolved
@Ghost-Slimer
Copy link

Deployed to staging with ID: 3190

How does this work?

1 similar comment
@Ghost-Slimer
Copy link

Deployed to staging with ID: 3190

How does this work?

@vershwal vershwal force-pushed the exportMemberSQLOptimisation branch from ba67fcd to ad16d4e Compare January 16, 2025 06:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
ghost/core/core/server/services/members/exporter/query.js (2)

34-44: Optimize product and label retrieval for filtered cases.

When hasFilter is true, consider fetching only the required products and labels based on the filtered member IDs to improve performance with large datasets.

    const allProducts = await knex('products')
+       .modify((query) => {
+           if (hasFilter) {
+               query.whereIn('id', function() {
+                   this.select('product_id')
+                       .from('members_products')
+                       .whereIn('member_id', ids)
+                       .distinct();
+               });
+           }
+       })
        .select('id', 'name')
        .then(rows => rows.reduce((acc, product) => {
            acc[product.id] = product.name;
            return acc;
        }, {}));

    const allLabels = await knex('labels')
+       .modify((query) => {
+           if (hasFilter) {
+               query.whereIn('id', function() {
+                   this.select('label_id')
+                       .from('members_labels')
+                       .whereIn('member_id', ids)
+                       .distinct();
+               });
+           }
+       })
        .select('id', 'name')
        .then(rows => rows.reduce((acc, label) => {
            acc[label.id] = label.name;
            return acc;
        }, {}));

Line range hint 13-23: Remove commented-out code.

The commented-out code block appears to be from the old implementation and should be removed to maintain code cleanliness. If needed, this code can be referenced from version control history.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba67fcd and 24d3fed.

📒 Files selected for processing (1)
  • ghost/core/core/server/services/members/exporter/query.js (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Regression tests (Node 20.11.1, sqlite3)
  • GitHub Check: Regression tests (Node 20.11.1, mysql8)
  • GitHub Check: Ghost-CLI tests
  • GitHub Check: Database tests (Node 20.11.1, sqlite3)
  • GitHub Check: Database tests (Node 20.11.1, mysql8)
  • GitHub Check: Database tests (Node 18.12.1, mysql8)
  • GitHub Check: Unit tests (Node 20.11.1)
🔇 Additional comments (3)
ghost/core/core/server/services/members/exporter/query.js (3)

46-52: LGTM! Clean and efficient member query implementation.

The query is well-structured with appropriate field selection and filter handling.


54-70: Ensure database compatibility of GROUP_CONCAT function.

The use of GROUP_CONCAT in the queries may not be compatible across all database systems, particularly PostgreSQL which uses STRING_AGG instead.


89-114: LGTM! Robust data mapping with proper null handling.

The implementation efficiently maps related data using appropriate data structures and includes proper null checks with the ? operator. The use of Map and Set for lookups is a good performance optimization.

@Ghost-Slimer
Copy link

Deployed to staging with ID: 3190

How does this work?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
ghost/core/core/server/services/members/exporter/query.js (2)

Line range hint 22-34: Remove commented code

The commented code block appears to be obsolete and should be removed to maintain code cleanliness.

🧰 Tools
🪛 eslint

[error] 149-149: Operator '/' must be spaced.

(space-infix-ops)


149-149: Fix operator spacing

Add spaces around the division operator.

-logging.info('[MembersExporter] Total time taken: ' + (Date.now() - start)/1000 + 's');
+logging.info('[MembersExporter] Total time taken: ' + (Date.now() - start) / 1000 + 's');
🧰 Tools
🪛 eslint

[error] 149-149: Operator '/' must be spaced.

(space-infix-ops)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24d3fed and 344236a.

📒 Files selected for processing (1)
  • ghost/core/core/server/services/members/exporter/query.js (2 hunks)
🧰 Additional context used
🪛 eslint
ghost/core/core/server/services/members/exporter/query.js

[error] 149-149: Operator '/' must be spaced.

(space-infix-ops)

⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Unit tests (Node 20.11.1)
  • GitHub Check: Unit tests (Node 18.12.1)
  • GitHub Check: Regression tests (Node 20.11.1, sqlite3)
  • GitHub Check: Regression tests (Node 20.11.1, mysql8)
  • GitHub Check: Database tests (Node 20.11.1, sqlite3)
  • GitHub Check: Database tests (Node 20.11.1, mysql8)
  • GitHub Check: Lint
  • GitHub Check: i18n
  • GitHub Check: Database tests (Node 18.12.1, mysql8)
  • GitHub Check: Ghost-CLI tests
🔇 Additional comments (10)
ghost/core/core/server/services/members/exporter/query.js (10)

4-4: LGTM! Good addition of logging module

Adding structured logging will help with monitoring and debugging the export process.


40-44: LGTM! Efficient data structure transformation

Converting products to a lookup map will provide O(1) access time during processing.


56-62: LGTM! Clean and efficient query structure

The query is well-organized with proper filtering conditions.


68-75: Ensure database compatibility of GROUP_CONCAT function

The use of GROUP_CONCAT may not be compatible across all database systems. PostgreSQL uses STRING_AGG instead.


81-88: Ensure database compatibility of GROUP_CONCAT function for labels

Similar to tiers, the GROUP_CONCAT usage here may not be compatible with PostgreSQL.


94-101: LGTM! Good use of MIN aggregation

Using MIN to get a single customer ID per member is a clean approach.


107-113: LGTM! Clean subscription query

The query efficiently fetches distinct subscribed members.


119-122: LGTM! Efficient use of Map and Set

Using Map and Set data structures provides optimal lookup performance during processing.


133-138: ⚠️ Potential issue

Handle potential undefined label names

Similar to products, accessing allLabels[id] needs null checking.

const labelDetails = labelIds.map((id) => {
+   const name = allLabels[id];
+   if (!name) {
+       return null;
+   }
    return {
-       name: allLabels[id]
+       name
    };
-});
+}).filter(Boolean);

Likely invalid or redundant comment.


125-130: ⚠️ Potential issue

Handle potential undefined product names

When accessing allProducts[id], there's no check for undefined values which could cause issues if a product ID doesn't exist.

const tierDetails = tierIds.map((id) => {
+   const name = allProducts[id];
+   if (!name) {
+       return null;
+   }
    return {
-       name: allProducts[id]
+       name
    };
-});
+}).filter(Boolean);

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
ghost/core/core/server/services/members/exporter/query.js (3)

Line range hint 23-35: Remove commented-out code.

The commented-out code block appears to be from the old implementation and should be removed to maintain code cleanliness.


94-101: Document the use of MIN for stripe_customer_id.

Please add a comment explaining why MIN(customer_id) is used to select the Stripe customer ID. This will help future maintainers understand the business logic behind this choice.


147-151: Overall implementation feedback.

The changes introduce good performance logging and efficient data structures. However, please address:

  1. Database compatibility issues with GROUP_CONCAT
  2. Null checks for product and label lookups
  3. Remove commented-out code
  4. Document the Stripe customer ID selection logic

The performance logging will be valuable for monitoring and optimization.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 344236a and adbf5ce.

📒 Files selected for processing (1)
  • ghost/core/core/server/services/members/exporter/query.js (2 hunks)
🔇 Additional comments (8)
ghost/core/core/server/services/members/exporter/query.js (8)

4-4: LGTM! Good addition of performance logging.

The addition of logging will help track performance metrics across different stages of the export process.

Also applies to: 8-8


40-50: LGTM! Efficient data structure transformation.

Converting products and labels to lookup objects is an efficient approach for the subsequent data processing.


56-62: LGTM! Clean and efficient member data retrieval.

The query is well-structured and only fetches the required columns with appropriate filtering.


68-75: Ensure database compatibility of GROUP_CONCAT function

The use of GROUP_CONCAT may not be compatible across all database systems. For example, PostgreSQL uses STRING_AGG instead.

Consider using database-agnostic methods or handle database-specific functions conditionally:

const isPostgres = knex.client.config.client === 'pg';
query.select(
    'member_id',
    isPostgres
        ? knex.raw('STRING_AGG(product_id::text, \',\') as tiers')
        : knex.raw('GROUP_CONCAT(product_id) as tiers')
);

81-88: Ensure database compatibility of GROUP_CONCAT function for labels

Similar to tiers, the use of GROUP_CONCAT for labels needs database-specific handling.


107-113: LGTM! Clean subscription status retrieval.

The query efficiently retrieves distinct member subscriptions.


119-123: LGTM! Efficient use of Maps and Sets.

Using Maps for lookups and Sets for membership tests is a performant approach.


125-139: ⚠️ Potential issue

Add null checks for product and label lookups.

The current implementation might throw errors if allProducts[id] or allLabels[id] is undefined.

Apply this fix:

 const tierDetails = tierIds.map((id) => {
+    const name = allProducts[id];
+    if (!name) {
+        return null;
+    }
     return {
-        name: allProducts[id]
+        name
     };
-});
+}).filter(Boolean);

 const labelDetails = labelIds.map((id) => {
+    const name = allLabels[id];
+    if (!name) {
+        return null;
+    }
     return {
-        name: allLabels[id]
+        name
     };
-});
+}).filter(Boolean);

Likely invalid or redundant comment.

@Ghost-Slimer
Copy link

Deployed to staging with ID: 3190

How does this work?

@vershwal vershwal changed the title Initial commit Optimised SQL query for exporting members Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants