Feature: Add a feature to only crawl the given list of urls #12

indrajithi · 2024-06-15T03:01:27Z

Accept a argument from the user. Something like url_list
Crawl only the urls provided by the users as an argument and nothing else.

The text was updated successfully, but these errors were encountered:

lodenrogue · 2024-06-15T03:11:34Z

Wouldn't that be set by the Spider.max_links value?

indrajithi · 2024-06-15T03:40:09Z

@lodenrogue
Max max_links is basically the max hops the crawler will make. Let us say we start from github.com as the root url. In the first crawl we will fetch all the links in github.com and then recursively crawl all the links we fetched until max_link count is reached.

Eg: Say we found three links from the root url: [URL1, URL2, URL3]
If the max link is set as 2. We will only crawl [URL1, URL2] and fetch the links in that.

This feature we are expecting the crawler to fetch the urls provided by the user and nothing more. The list of urls to crawl will be a custom set provided by the user as input. There will be no root url base crawls and hops.

faisalalisayyed · 2024-06-16T08:36:34Z

For example, url_list = [URL1, URL2, URL3], so we will loop through this url_list and fetch the link, but there will be no root url. If I am getting it right, I would love to solve it and ask for the assignment of this issue to me.

indrajithi · 2024-06-17T21:26:23Z

Hi @C0DE-SLAYER.

Please let us know if you are working on this?

faisalalisayyed · 2024-06-18T05:32:47Z

@indrajithi yes I will open a PR by today

indrajithi added enhancement New feature or request good first issue Good for newcomers labels Jun 15, 2024

indrajithi assigned faisalalisayyed Jun 16, 2024

indrajithi added this to the First major release v.1.0.0 milestone Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add a feature to only crawl the given list of urls #12

Feature: Add a feature to only crawl the given list of urls #12

indrajithi commented Jun 15, 2024 •

edited

Loading

lodenrogue commented Jun 15, 2024

indrajithi commented Jun 15, 2024

faisalalisayyed commented Jun 16, 2024

indrajithi commented Jun 17, 2024

faisalalisayyed commented Jun 18, 2024

Feature: Add a feature to only crawl the given list of urls #12

Feature: Add a feature to only crawl the given list of urls #12

Comments

indrajithi commented Jun 15, 2024 • edited Loading

lodenrogue commented Jun 15, 2024

indrajithi commented Jun 15, 2024

faisalalisayyed commented Jun 16, 2024

indrajithi commented Jun 17, 2024

faisalalisayyed commented Jun 18, 2024

indrajithi commented Jun 15, 2024 •

edited

Loading