Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change regex rule #371

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tatsuhirochiba
Copy link
Contributor

This PR is for the issue #370 .

Signed-off-by: Tatsuhiro Chiba [email protected]

Signed-off-by: Tatsuhiro Chiba <[email protected]>
@nadgowdas
Copy link
Contributor

@tatsuhirochiba We still need to update the exclude dirs in the code itself ?

@tatsuhirochiba
Copy link
Contributor Author

@nadgowdas Sorry, it is my mistake. I retried testing, and regex rule generated by fnmatch works fine, so we do not need to change it...

It is not directly related to this PR, but we may require plugin reload feature (e.g. plugin_reload=True in crawler.conf) without restarting crawler daemon and a function to load the exclude dir from file,
since we can not change the rule without daemon restart.

@nadgowdas
Copy link
Contributor

@tatsuhirochiba Yes, we need to enable CRUD on exclude list.
Sorry, I didn;t get how you proposed we implement that above ? Can you explain it?

@nadgowdas
Copy link
Contributor

@sahilsuneja1 do you have any thoughts ^^ on this ? We need to implement that feature in the next release ?

One way, I think - is to run docker inspect and find out mount map that would give us externally mounted directory inside container, we can add those to exclude list ?

@sahilsuneja1
Copy link
Contributor

Not fully clear, but @tatsuhirochiba has confirmed this PR is not required, fnmatch worked for him.

@nadgowdas
Copy link
Contributor

@sahilsuneja1 @tatsuhirochiba sorry to piggyback on this issue, but, the real problem we want to solve is-- how to extend excludelist in crawler, thats what I was inferring above.

@sahilsuneja1
Copy link
Contributor

Hmm, exclude_dirs could be sent at runtime from crawler.conf. This would prevent any direct change in the code. Are you referring to changing exclude_dirs dynamically while the crawler is running, instead of restarting crawler?

@sahilsuneja1
Copy link
Contributor

@tatsuhirochiba we should close this, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants