Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add a feature to only crawl the given list of urls #12

Open
indrajithi opened this issue Jun 15, 2024 · 5 comments
Open

Feature: Add a feature to only crawl the given list of urls #12

indrajithi opened this issue Jun 15, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@indrajithi
Copy link
Collaborator

indrajithi commented Jun 15, 2024

  • Accept a argument from the user. Something like url_list
  • Crawl only the urls provided by the users as an argument and nothing else.
@indrajithi indrajithi added enhancement New feature or request good first issue Good for newcomers labels Jun 15, 2024
@lodenrogue
Copy link

Wouldn't that be set by the Spider.max_links value?

@indrajithi
Copy link
Collaborator Author

@lodenrogue
Max max_links is basically the max hops the crawler will make. Let us say we start from github.com as the root url. In the first crawl we will fetch all the links in github.com and then recursively crawl all the links we fetched until max_link count is reached.

Eg: Say we found three links from the root url: [URL1, URL2, URL3]
If the max link is set as 2. We will only crawl [URL1, URL2] and fetch the links in that.

This feature we are expecting the crawler to fetch the urls provided by the user and nothing more. The list of urls to crawl will be a custom set provided by the user as input. There will be no root url base crawls and hops.

@faisalalisayyed
Copy link

For example, url_list = [URL1, URL2, URL3], so we will loop through this url_list and fetch the link, but there will be no root url. If I am getting it right, I would love to solve it and ask for the assignment of this issue to me.

@indrajithi
Copy link
Collaborator Author

Hi @C0DE-SLAYER.

Please let us know if you are working on this?

@faisalalisayyed
Copy link

@indrajithi yes I will open a PR by today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants