Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define privacy policy #5

Open
matteodelabre opened this issue Sep 23, 2020 · 4 comments
Open

Define privacy policy #5

matteodelabre opened this issue Sep 23, 2020 · 4 comments
Assignees

Comments

@matteodelabre
Copy link
Member

The published repository runs nginx, which keeps logs about who accesses the files. Collected information is (following default nginx rules):

  • Complete IP address.
  • Date and time of access.
  • HTTP verb, version and response code.
  • User agent (including browser name & version, OS name & version).

We should probably define a retention time for this information, and maybe reduce the scope of collected information (e.g. anonymise IP addresses after a given time). The logs could be used anonymously to publish stats about which packages are used.

@Eeems
Copy link
Member

Eeems commented Sep 23, 2020

Is this a prerequisite for toltec-dev/toltec#15?

@matteodelabre
Copy link
Member Author

I think it is. I’ll add it to the milestone. Thanks!

@LinusCDE
Copy link
Member

LinusCDE commented Sep 24, 2020

You could probably use fluentd to redirect the logs into some database. From there it would be easier to auto-delete or not add the ip at all to a db.

Here is a fluentd.conf i made and used: https://gist.github.com/LinusCDE/9ba8b79f115272dcbe2371cacb815288

There is also a docker-compose you can use to spin up a db and have a simple interface to look into with.
The elasticsearch part can be removed, though you can also go down the rabbit hole of using that with Kibana and get a lot a statistics very easily.

The cool thing about fluentd is, that it can take a lot of stuff (docker natively supports logging to them as a driver) and spit out nicely formatted json per log entry that can be sent basically anywhere.

Here is a sample entry from my mongo db that fluentd put there (was a nginx log entry with added server_name and got machine_id added by fluentd):

{
    _id: ObjectId('5ec42b813395ae000fde7593'),
    remote: 'xxx.xxx.xxx.xxx',  # IP removed
    host: '-',
    user: '-',
    method: 'POST',
    path: '/api/v4/jobs/request',
    code: '204',
    size: '0',
    referer: '-',
    agent: 'gitlab-runner 12.10.1 (12-10-stable; go1.13.8; linux/amd64)',
    machine_id: 'ozelo',
    time: ISODate('2020-05-19T18:54:33.000Z')
}

Whether nginx doesn't log the IP, fluentd removes it or it gets periodically removed by some client connected to the database is up to you.

One could probably also use a grafana server to have statistics of the data in the MongoDB (or whatever backend you choose).

If you need help regarding the fluentd, mongodb or grafana setup, feel free to ask.

@matteodelabre matteodelabre self-assigned this Jan 15, 2021
@LinusCDE LinusCDE transferred this issue from toltec-dev/toltec Jan 15, 2021
@matteodelabre
Copy link
Member Author

Here’s the relevant section of the GDPR regarding whether it is necessary to obtain user consent before collecting and processing user information. In particular, consent is not required when the processing is necessary for compliance with a legal obligation or for “legitimate interests”. I would say that keeping a log containing IP addresses and user-agents, at least for a set amount of time, is necessary for security purposes. The French law actually mandates that such logs be kept for one year (not sure about other countries).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants