Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3 compatible #21

Open
threatlead opened this issue Jan 4, 2016 · 3 comments
Open

Python 3 compatible #21

threatlead opened this issue Jan 4, 2016 · 3 comments

Comments

@threatlead
Copy link

Suggestions:

try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO
def __init__(self, patterns_ini=None, ..., library='pypdf2', ...):
@armbues
Copy link
Owner

armbues commented Jan 20, 2016

The default PDF library was switched to pdfminer because of the parsing better performance. In a head-to-head test it was able to parse considerably more text from a report set than pypdf2, therefore also generating more IOCs.

An option would be to dynamically check the Python version during runtime and accordingly change the default PDF library.

@bernardyim
Copy link

For anyone with issues with pdfminer on python3, consider using pdfminer.six, a fork for compatibility with python3
https://github.com/pdfminer/pdfminer.six

Also, as a totally unrelated side-note (no idea where to put this), you might want to set the re.compile flag to IGNORECASE, so that you can catch cases that are typed in all caps, at parser.py line 133:
ind_regex = re.compile(ind_pattern, flags=re.IGNORECASE)

@fhightower
Copy link

As far as IGNORECASE support is concerned, this is handled with #34.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants