-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doesn't handle binary logfile content #63
Comments
I was about to say that
isn't true in case of nginx default error logs at least, it warns, while reading the year as IP (expecting it in the first column). That's why it didn't anonymize your IP, it just skips the line.
There is no 'clean/standard' way to do this yet as far as I know. If nginx devs happen to read this: It's best to have the anonymization as a second process and not in nginx itself, so we can have x days of full logs for debugging/security/whatever and then just anonymize later for archiving.
The way I see it http logs are only useful for rough statistics anyway, so we really don't need the errors. Dropping a few bits still gives us a little country level accuracy, while respecting privacy. If we refuse to terrorize the people via Google analytics or similar, we will only have basic stats anyhow. For now I just delete the logs regularly. |
Hey @jplitza Thanks for reporting this issue. Could you please provide an example file, that leads to this exception? |
Well as I said, I encountered this in nginx's error.log. Something like
Note that is just less' rendition of the actual Here's that line as a single file: anonip_issue_63.log |
My minimal workaround for this problem is now using |
When processing a logfile that contains binary parts, the following exception gets thrown:
While obviously processing purely binary content isn't the target of this project, this issue arose while anonymizing an nginx error.log which contained the following line:
Note that there's even an IP address in that line that needs to be anonymized!
So maybe the file shouldn't be read as UTF-8, or as string at all for that matter, but as bytes?
The text was updated successfully, but these errors were encountered: