Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: slackbot command for scraping top-pages #470

Merged
merged 47 commits into from
Oct 8, 2024
Merged

Conversation

ssilare-adobe
Copy link
Contributor

@ssilare-adobe ssilare-adobe commented Aug 27, 2024

Relates to #469

JIRA (a subpart of this issue) : H1: auto-detect

Description

This PR relates to the point 1 for the wiki SEO | Title, Description and H1 Tags Optimisation, which includes:

  • Developing a scrape command handler which accepts user input from slackbot.
  • This worker scrapers the data and stores in AWS S3 for top 200 pages.
  • Command pattern : @spacecat-dev run scrape {baseURL}

Tasks done :

  • Implemented a scrape command handler
  • Refactored isValidDateInterval(startDate, endDate) command to date-utils.js
  • Wrote unit tests for the command handler & date-utils.js

Please ensure your pull request adheres to the following guidelines:

  • make sure to link the related issues in this description. Or if there's no issue created, make sure you
    describe here the problem you're solving.
  • when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes

If the PR is changing the API specification:

  • make sure you add a "Not implemented yet" note the endpoint description, if the implementation is not ready
    yet. Ideally, return a 501 status code with a message explaining the feature is not implemented yet.
  • make sure you add at least one example of the request and response.

If the PR is changing the API implementation or an entity exposed through the API:

  • make sure you update the API specification and the examples to reflect the changes.

If the PR is introducing a new audit type:

  • make sure you update the API specification with the type, schema of the audit result and an example

Related Issues

Thanks for contributing!

Copy link

This PR will trigger a minor release when merged.

@ssilare-adobe ssilare-adobe changed the title feat: implementing spacecat run scrape {siteId} command handler for scraping top 200 pages feat: implementing spacecat run scrape {baseUrl} command handler for scraping top 200 pages Aug 28, 2024
@ssilare-adobe ssilare-adobe changed the title feat: implementing spacecat run scrape {baseUrl} command handler for scraping top 200 pages feat: slackbot command for scraping top-pages Aug 28, 2024
@ssilare-adobe ssilare-adobe changed the title feat: slackbot command for scraping top-pages feat: slackbot command handler for scraping top-pages Aug 28, 2024
@ssilare-adobe ssilare-adobe changed the title feat: slackbot command handler for scraping top-pages feat: slackbot command for scraping top-pages Aug 28, 2024
Copy link
Member

@solaris007 solaris007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great - just minor comments

src/support/slack/commands/run-scrape.js Outdated Show resolved Hide resolved

if (!admins.includes(user)) {
await say(':error: Only selected SpaceCat fluid team members can run scraper.');
// return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be uncommented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I kept it for testing, removing it now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssilare-adobe still commented...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanted to test, sorry

src/support/slack/commands/run-scrape.js Outdated Show resolved Hide resolved
src/support/slack/commands/run-scrape.js Outdated Show resolved Hide resolved
src/support/slack/commands/run-scrape.js Outdated Show resolved Hide resolved
Copy link
Member

@solaris007 solaris007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found wrong processing type

src/support/utils.js Outdated Show resolved Hide resolved
@ssilare-adobe
Copy link
Contributor Author

@solaris007 can we merge this PR?

@tripodsan
Copy link

@solaris007 , is it required that spacecat-contributors are added as reviewers? IMO this only generates additional notifications. there is no way, that all 30 contributors are going to sing-off a PR. and the individuals that are interested in new PRs have subscribed to the repo anyways.,

@dipratap dipratap removed the request for review from a team September 17, 2024 13:38
@blefebvre
Copy link
Contributor

Hi @sahil9001, could you please integrate the latest from main into this branch? The push earlier today took ci "back in time" to when the branch was last updated.

@ssilare-adobe
Copy link
Contributor Author

Hi @sahil9001, could you please integrate the latest from main into this branch? The push earlier today took ci "back in time" to when the branch was last updated.

Thanks, I have merged the main branch.

src/support/slack/commands/run-scrape.js Outdated Show resolved Hide resolved
src/support/slack/commands/run-scrape.js Outdated Show resolved Hide resolved
src/support/slack/commands/run-scrape.js Show resolved Hide resolved
@ssilare-adobe ssilare-adobe merged commit 0a7b37b into main Oct 8, 2024
4 checks passed
@ssilare-adobe ssilare-adobe deleted the SITES-23344-1 branch October 8, 2024 09:28
solaris007 pushed a commit that referenced this pull request Oct 8, 2024
# [1.69.0](v1.68.4...v1.69.0) (2024-10-08)

### Features

* slackbot command for scraping top-pages  ([#470](#470)) ([0a7b37b](0a7b37b))
@solaris007
Copy link
Member

🎉 This PR is included in version 1.69.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants