feat: add runDiscourse extraction function #395

Behzad-rabiei · 2024-10-09T12:50:31Z

Summary by CodeRabbit

New Features
- Introduced a new environment variable for Discourse extraction URL.
- Added a method to invoke an extraction application for platforms, specifically for Discourse.
Improvements
- Enhanced error handling and validation for platform connections.
- Updated logic for Google and Notion platform connections to improve user profile handling.
Refactor
- Renamed and restructured the Discourse forum creation function to focus on data extraction.

coderabbitai · 2024-10-09T12:50:42Z

Walkthrough

The changes introduced in this pull request involve the addition of a new environment variable, DISCOURSE_EXTRACTION_URL, to the configuration schema, enhancing the application's configuration capabilities. Several modifications were made to the platform.controller.ts to improve error handling and validation in platform connection methods. Additionally, the core.service.ts was updated to rename and repurpose a function for extracting data from Discourse, utilizing the new environment variable. A new method in platform.service.ts was also created to invoke the extraction application based on platform names.

Changes

File Path	Change Summary
src/config/index.ts	Added new environment variable `DISCOURSE_EXTRACTION_URL` to `envVarsSchema` and updated exported configuration object to include `discourse: { extractionURL: envVars.DISCOURSE_EXTRACTION_URL }`.
src/controllers/platform.controller.ts	Modified `createPlatform` to call `platformService.callExtractionApp(platform)`. Updated `connectPlatform` for Google scopes handling, enhanced error handling in platform connection callbacks, and improved user profile retrieval in `connectGoogleCallback`. Updated `deletePlatform` for handling deletions and refined permission handling in `requestAccess`.
src/services/discourse/core.service.ts	Renamed `createDiscourseForum` to `runDiscourseExtraction`, changed parameter from `endpoint` to `platformId`, updated fetch URL to use `config.discourse.extractionURL`, and modified request headers and error messages accordingly.
src/services/platform.service.ts	Added new method `callExtractionApp` to invoke extraction applications based on platform names, specifically calling `runDiscourseExtraction` for `PlatformNames.Discourse`.

Possibly related PRs

feat: add discourse/heatmap APIs #387: The addition of the DISCOURSE_EXTRACTION_URL environment variable and the discourse property in the configuration object is directly related to the changes in the src/services/discourse/core.service.ts, where the runDiscourseExtraction function now uses config.discourse.extractionURL for its fetch request.
390 discourse platform + violation detection module #392: The modifications to the createPlatform validation schema in src/validations/platform.validation.ts to include conditional validation for the Discourse platform are relevant as they align with the new configuration capabilities introduced in the main PR, which supports the Discourse extraction URL.

🐰 In the garden where ideas bloom,
A new URL brings forth the room.
With platforms connected, we leap and bound,
Extracting knowledge from the ground.
So hop along, let’s make a start,
For every change is a work of art! 🌼

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (5)

src/services/discourse/core.service.ts (1)
Line range hint 22-42: Address content type mismatch and improve error handling.

There are a few issues in the implementation of runDiscourseExtraction:

Content type mismatch: The Content-Type header is set to 'application/json', but the data is sent using URLSearchParams. This is inconsistent and may cause issues.

Error handling: The current implementation logs the error but doesn't provide specific error information in the thrown ApiError.

Please consider the following improvements:

Fix the content type mismatch:
-      body: new URLSearchParams(data),
-      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify(data),
+      headers: { 'Content-Type': 'application/json' },
Improve error handling:
   } catch (error) {
-    logger.error(error, 'Failed to run discourse extraction discourse');
-    throw new ApiError(590, 'Failed to run discourse extraction discourse');
+    const errorMessage = error instanceof Error ? error.message : 'Unknown error';
+    logger.error({ error }, `Failed to run discourse extraction: ${errorMessage}`);
+    throw new ApiError(590, `Failed to run discourse extraction: ${errorMessage}`);
   }
These changes will ensure consistency in the request format and provide more informative error messages.
src/controllers/platform.controller.ts (4)
Line range hint 44-44: Security concern: Avoid setting 'userId' from untrusted sources

Setting req.session.userId = req.query.userId; directly from a query parameter can lead to security vulnerabilities like session fixation or impersonation attacks. It's safer to retrieve the userId from the authenticated session or a secure token rather than user-controlled input.

To fix this issue, remove the assignment from the query parameter and ensure userId is obtained from a trusted source. Here's the suggested change:
- req.session.userId = req.query.userId;
Line range hint 78-80: Ensure 'userId' is obtained from a secure source

In the connectGoogleCallback function, userId is retrieved from req.session.userId, which was previously set from a query parameter. Since query parameters can be manipulated, this may lead to security issues.

Ensure that userId in the session is set from a trusted source. Remove any assignments from query parameters to req.session.userId.

Line range hint 131-131: Security concern: Missing state parameter validation in 'connectGithubCallback'

The connectGithubCallback function does not validate a state parameter to protect against Cross-Site Request Forgery (CSRF) attacks. Including and verifying a state parameter is a standard security practice in OAuth flows to ensure the integrity of the request.

Consider modifying the connectGithubCallback function to include state parameter generation and validation similar to other platform callbacks.

Line range hint 159-159: Typo: Duplicate 'await' in function call

There's a duplicated await in the following line:
const installation = await await githubService.coreService.getInstallationDetails(appAccessToken, installationId);
This may cause unintended behavior or a syntax error.

Apply this diff to fix the typo:
- const installation = await await githubService.coreService.getInstallationDetails(appAccessToken, installationId);
+ const installation = await githubService.coreService.getInstallationDetails(appAccessToken, installationId);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 394e687 and 863e941.

📒 Files selected for processing (4)

src/config/index.ts (2 hunks)
src/controllers/platform.controller.ts (1 hunks)
src/services/discourse/core.service.ts (3 hunks)
src/services/platform.service.ts (2 hunks)

🧰 Additional context used

🔇 Additional comments (8)

src/services/discourse/core.service.ts (2)
48-48: LGTM: Export of new function.

The addition of runDiscourseExtraction to the default export is correct and consistent with the function renaming.

5-6: Verify the usage of the Types import.

The Types import from 'mongoose' has been added, but it's not used in the visible parts of the code. Please ensure it's necessary for other parts of the file or remove it if unused.

✅ Verification successful

Remove the unused Types import.

The Types import from 'mongoose' is not used in src/services/discourse/core.service.ts and can be safely removed.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check if Types from mongoose is used in the file.
rg --type typescript -w "Types" src/services/discourse/core.service.ts
Length of output: 105

Script:
#!/bin/bash
# Description: Check if Types from mongoose is used in the file.
rg --type ts -w "Types" src/services/discourse/core.service.ts
Length of output: 96
src/config/index.ts (3)
58-58: LGTM: New environment variable added correctly.

The new DISCOURSE_EXTRACTION_URL environment variable has been added to the envVarsSchema following the existing pattern. It's correctly defined as a required string with a clear description.

158-160: LGTM: New configuration property added correctly.

The new discourse property has been added to the exported configuration object following the existing structure. The extractionURL field correctly maps to the new environment variable, and the formatting is consistent with the rest of the file.

58-58: Summary: Discourse extraction URL configuration added successfully.

The changes to this file are minimal and focused, adding only the necessary new configuration for the Discourse extraction URL. The additions are consistent with the existing code style and structure, and no potential issues or side effects are apparent from these changes.

To ensure the new configuration is used correctly:

Also applies to: 158-160

✅ Verification successful

Summary: Discourse extraction URL configuration verified successfully.

The DISCOURSE_EXTRACTION_URL has been correctly added to src/config/index.ts and is properly utilized in src/services/discourse/core.service.ts. No issues were found with the implementation or usage of this configuration.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify the usage of the new DISCOURSE_EXTRACTION_URL configuration

# Check if the new environment variable is used in the codebase
echo "Checking usage of DISCOURSE_EXTRACTION_URL:"
rg "DISCOURSE_EXTRACTION_URL" --type ts

# Check if the new configuration property is accessed in the codebase
echo "\nChecking usage of config.discourse.extractionURL:"
rg "config\.discourse\.extractionURL" --type ts
Length of output: 593
src/services/platform.service.ts (1)

268-268: LGTM: Proper export of the new function

The callExtractionApp function is correctly added to the default export object, making it accessible to other modules that import this service. This integration is clean and follows the existing pattern in the file.
src/controllers/platform.controller.ts (2)
Line range hint 44-54: Validate scopes and handle empty or invalid scopes array

When aggregating requested scopes for Google OAuth:
let requestedScopes = req.query.scopes as string[];
let aggregatedScopes: string[] = [];
requestedScopes.forEach((scope) => {
  if (google.scopes[scope]) {
    aggregatedScopes = [...aggregatedScopes, ...google.scopes[scope]];
  }
});
Ensure that requestedScopes is properly validated, and handle cases where it might be empty or contain invalid scope values to prevent errors during the OAuth process.

To check if all scope values are valid and handle potential issues, consider adding validation and default scopes if necessary.

Line range hint 180-184: Verify the implementation of 'getPropertyHandler' for Discourse

The getProperties function now includes handling for PlatformNames.Discourse. Ensure that discourseService.coreService.getPropertyHandler(req); is properly implemented and that it correctly handles the request.

To verify that getPropertyHandler exists and is correctly implemented in discourseService:

✅ Verification successful

Verification Successful: The getPropertyHandler method is properly implemented in discourseService.coreService and correctly handles the request.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the existence of 'getPropertyHandler' in 'discourseService.coreService'

# Test: Search for the 'getPropertyHandler' method in 'discourseService.coreService.ts'
ast-grep --lang typescript --pattern 'getPropertyHandler($_)' src/services/discourse/core.service.ts
Length of output: 102

Script:
#!/bin/bash
# Description: Search for 'getPropertyHandler' across the entire TypeScript codebase to locate its implementation

rg 'getPropertyHandler' --type typescript
Length of output: 76

Script:
#!/bin/bash
# Description: Search for 'getPropertyHandler' across all TypeScript files in the codebase

rg 'getPropertyHandler' --type ts
Length of output: 570

Script:
#!/bin/bash
# Description: Extract the implementation of 'getPropertyHandler' from 'src/services/discourse/core.service.ts'

ast-grep --lang typescript --pattern 'async function getPropertyHandler($_) { $$$ }' src/services/discourse/core.service.ts
Length of output: 753

coderabbitai · 2024-10-09T12:58:07Z

src/services/discourse/core.service.ts

+ * run discourse extraction
+ * @param {Strin} platformId
 * @returns {Promise<IDiscordUser>}
 */


⚠️ Potential issue

Update the return type in the function comment.

The function comment incorrectly states that the return type is Promise<IDiscordUser>. This should be updated to match the actual return type of the function, which is Promise<void>.

Please update the comment as follows:

/** * run discourse extraction - * @param {Strin} platformId - * @returns {Promise<IDiscordUser>} + * @param {string} platformId + * @returns {Promise<void>} */

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

* run discourse extraction

* @param {Strin} platformId

* @returns {Promise<IDiscordUser>}

*/

/**

* run discourse extraction

* @param {string} platformId

* @returns {Promise<void>}

*/

coderabbitai · 2024-10-09T12:58:07Z

src/services/platform.service.ts

+/**
+ * Call extraction app for the given platform
+ * @param {HydratedDocument<IPlatform>} platform
+ * @returns {Promise<Void>}
+ */
+const callExtractionApp = async (platform: HydratedDocument<IPlatform>): Promise<void> => {
+  switch (platform.name) {
+    case PlatformNames.Discourse: {
+      await discourseService.coreService.runDiscourseExtraction(platform.id as string);
+      return;
+    }
+    default: {
+      return;
+    }
+  }
+};


🛠️ Refactor suggestion

Enhance error handling, logging, and type safety in callExtractionApp

The new callExtractionApp function looks good overall, but consider the following improvements:

Error handling: Wrap the discourseService.coreService.runDiscourseExtraction call in a try-catch block to handle potential errors.

Logging: Add logging statements to track the function's execution and any errors that occur.

Type safety: Use a type guard to ensure platform.id is a string before passing it to runDiscourseExtraction.

Here's a suggested implementation incorporating these improvements:

import { logger } from '../utils/logger'; // Assume a logger utility exists const callExtractionApp = async (platform: HydratedDocument<IPlatform>): Promise<void> => { logger.info(`Calling extraction app for platform: ${platform.name}`); switch (platform.name) { case PlatformNames.Discourse: { if (typeof platform.id !== 'string') { throw new Error(`Invalid platform id type for Discourse: ${typeof platform.id}`); } try { await discourseService.coreService.runDiscourseExtraction(platform.id); logger.info(`Discourse extraction completed for platform id: ${platform.id}`); } catch (error) { logger.error(`Error during Discourse extraction: ${error}`); throw error; // Re-throw or handle as appropriate } return; } default: { logger.warn(`No extraction process defined for platform: ${platform.name}`); return; } } };

This implementation adds error handling, logging, and ensures type safety for the platform id.

coderabbitai · 2024-10-09T12:58:07Z

src/controllers/platform.controller.ts

@@ -25,7 +25,7 @@ const logger = parentLogger.child({ module: 'PlatformController' });
 const createPlatform = catchAsync(async function (req: IAuthRequest, res: Response) {
  const community = req.community;
  const platform = await platformService.managePlatformConnection(community?.id, req.body);
-  // await airflowService.triggerDag(platform);
+  await platformService.callExtractionApp(platform);


🛠️ Refactor suggestion

Ensure all dependencies are updated after changing the extraction call

The call to platformService.callExtractionApp(platform); replaces the previous call to airflowService.triggerDag(platform);. Make sure that any dependencies on airflowService are properly updated or removed if they are no longer needed to prevent orphaned code and potential confusion.

feat: add runDiscourse extraction function

863e941

coderabbitai bot reviewed Oct 9, 2024

View reviewed changes

Behzad-rabiei merged commit f9f9394 into main Oct 9, 2024
13 checks passed

This was referenced Oct 9, 2024

390 discourse platform + violation detection module #396

Merged

fix: fix the body issue #397

Merged

This was referenced Nov 11, 2024

chore: delete no needed code #401

Merged

Add temporal #407

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add runDiscourse extraction function #395

feat: add runDiscourse extraction function #395

Behzad-rabiei commented Oct 9, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 9, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Oct 9, 2024

coderabbitai bot Oct 9, 2024

coderabbitai bot Oct 9, 2024

feat: add runDiscourse extraction function #395

feat: add runDiscourse extraction function #395

Conversation

Behzad-rabiei commented Oct 9, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Oct 9, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 9, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 9, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 9, 2024

Choose a reason for hiding this comment

Behzad-rabiei commented Oct 9, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 9, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)