Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-76 Relay Read Permissions #1497

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

vitorpamplona
Copy link
Collaborator

@vitorpamplona vitorpamplona commented Sep 13, 2024

This adds two special tags to authorize certain keys to download events.

This is similar to NIP-70, but in the opposing direction (read instead of write).

I need something like this to protect health data, but maybe NIP-29 (@fiatjaf @staab) could also use this to tell the relay which events each user can download.

Read here

@fiatjaf
Copy link
Member

fiatjaf commented Sep 13, 2024

I like this, it's hard to believe bloom filters are so powerful though.

@vitorpamplona
Copy link
Collaborator Author

@Giszmo, check this one out.

@AsaiToshiya
Copy link
Collaborator

What do you think about broadcasting events? Can only author publish an event to relays similar to NIP-70?

@vitorpamplona
Copy link
Collaborator Author

Only if it has NIP-70's -, otherwise it should be treated like any other event.

@fiatjaf
Copy link
Member

fiatjaf commented Sep 14, 2024

On the other hand putting access control information inside the event sounds wrong.

Given that this will require relay cooperation wouldn't it be better to make access to these events based on a relay policy that is specified through other means outside the event?

@kehiy
Copy link
Contributor

kehiy commented Sep 14, 2024

I think nip-9, nip-70, and nip-76 are some cases that will be hard to see in action to work as intended. only they can work if we run a super limited relay for heavily trusted people. otherwise, there will be huge indexers who index everything or people will simply rebroadcast stuff to these indexers, old relays or bad relays. (since they won't be detected.).

what do you think?

@vitorpamplona
Copy link
Collaborator Author

On the other hand putting access control information inside the event sounds wrong.

Given that this will require relay cooperation wouldn't it be better to make access to these events based on a relay policy that is specified through other means outside the event?

It depends on how much variance between events there is. If the use case can use a global policy, then sure. But if each event takes a new set of receivers, then this is a requirement.

On the bloom filters, we need to use more of it. It's extremely easy to code and important for privacy of large groups' member lists.

@vitorpamplona
Copy link
Collaborator Author

I think nip-9, nip-70, and nip-76 are some cases that will be hard to see in action to work as intended. only they can work if we run a super limited relay for heavily trusted people. otherwise, there will be huge indexers who index everything or people will simply rebroadcast stuff to these indexers, old relays or bad relays. (since they won't be detected.).

what do you think?

Not really much anyone can do about this. But the DM relays have been keeping their stuff quite well. Health data as well (there are no public kind 82s around)

@Giszmo
Copy link
Member

Giszmo commented Sep 14, 2024

So I'm against seeing broad use of this as I consider any group chat public anyway if the group is more than two but I see a use case where you instruct the relay to serve content only to your follows. The nice thing of putting the permission into the event is that relays violating this nip could easily be identified but then ... so what?

On the technical side, leaving the parameters to the author is the most flexible and most secure thing to do.

I'm not sure what's the point of mixing prp and rp and using multiple prp and think it would make sense to limit it to either n rp or one prp.

prp of course can be gamed as the attacker might be able to reconstruct the bloom filter if it's "all my follows" for example and could roll his pubkey accordingly.

Also if the bits and rounds become standard, there might be a point in brute-forcing a set of pubkeys that qualify always.

As much as I love probabilistic filters for other use cases, I don't see them a good fit for access control. Not without further counter measures for the brute forcing.

Edit: A counter measure to brute force would be to add some salt

["prp", "<bits>:<rounds>:<base64>:<salt>"]

and use it when hashing: sha256(value || salt || index).

@vitorpamplona
Copy link
Collaborator Author

The goal of this PR is not to make things suddenly private (that's for encryption to do) but to hide information that doesn't need to be completely public. It's something to be used together with an encryption or other access control frameworks. The goal is purely to reduce the amount of data that can be queried.

On DMs for instance, they are already encrypted. So, in theory they could just be out there without this. However if we hide them away it gets even harder to assess the total amount of messages and other metadata-level information.

My photos in social media are not private, but I want to reduce the amount of people that can get access to them all.

@Giszmo
Copy link
Member

Giszmo commented Sep 14, 2024

Absent the use of salt, I assume default bits and rounds would emerge. You can easily create accounts that fit all those filters then. To play around with rounds and bits to avoid this is wrong. To add salt might be right. And it would not add much to complexity.

@vitorpamplona
Copy link
Collaborator Author

Yeah, I really like the salt idea.

I'm not sure what's the point of mixing prp and rp and using multiple prp

There is no need for it, but we can't block clients from creating one events that includes many of them. So I tried to provide some guidance on what should happen if many are found (OR)

76.md Outdated
Comment on lines 85 to 89
```json
["prp", "100:10:QGKCgBEBAAhIAApO"]
```

It includes keys `ca29c211f1c72d5b6622268ff43d2288ea2b2cb5b9aa196ff9f1704fc914b71b` and `460c25e682fda7832b52d1f22d3d22b3176d972f60dcdc3212ed8c92ef85065c`
Copy link
Member

@Giszmo Giszmo Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this and under the assumption of salt getting added, I wonder what's even the point of rp apart from superstition.

Let's say you want to encode just one key and spend the same amount of bytes, what are the odds of a successful brute force or random collision?

[
  ["pr", "ca29c211f1c72d5b6622268ff43d2288ea2b2cb5b9aa196ff9f1704fc914b71b"],
  ["prp", "200:121:QGKCgBEBAAhIAApOQGKCgBEBAAhIAApO:saltsaltsaltsaltsaltsa"]

This example doesn't really work. It's just to illustrate the idea. The giant salt should totally stop interactive brute forcing.

Edit: According to this calculator, an attacker would have to probe the relay with 319784383802483100000000000000000000000000 pubkeys to maybe find a match.

Edit 2: Well, you can include pr for computational simplicity (pr entries can be part of the sql query, while the prp entries require an expensive check) but for security and space saving, prp is just good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I added the rp for use cases when certainty is absolutely needed. But I am not sure if it will be used.

@dadofsambonzuki
Copy link

Could be also used to only show check-in (i.e. to a Place) information to certain people.

@jooray
Copy link

jooray commented Sep 15, 2024

Just a note about the calculation at the end (I think it is wrong, but let me do the math in the morning).

The filter below has 100 bits and uses 10 rounds of hashing, which should be capable of handling up to 10,000,000 keys without producing any false positives.

A thing to note about this - the false positive rate goes up quite fast as you insert the keys into the filter. But I think the probability of one false positive with 10,000,000 queries of a bloom filter with two members, 100 bits and 10 rounds is higher.

Do you by any chance have a note how you arrived at the conclusion that it is low? (My probability is 33%, but again - let me check again in the morning).

Comment on lines +84 to +90
The filter below has 100 bits and uses 10 rounds of hashing, which should be capable of handling up to 10,000,000 keys without producing any false positives.

```json
["prp", "100:10:AAAkAQANcYQFCQoB:hZkZYqqdxcE="]
```

It includes keys `ca29c211f1c72d5b6622268ff43d2288ea2b2cb5b9aa196ff9f1704fc914b71b` and `460c25e682fda7832b52d1f22d3d22b3176d972f60dcdc3212ed8c92ef85065c`
Copy link
Member

@Giszmo Giszmo Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @jooray mentions, the math must be off here. Also the wording is wrong.

A filter by itself has no false positives ever. It has a false positive rate (FPR).

So what the nip author meant was 10 million npubs not colliding randomly with a bloom filter that has 2 elements, 100 bits and 10 rounds.

With the chosen parameters, the FPR is of the order E-8 - 1 in 26 million, so yes, with 10 million accounts, it's more likely none of them collides with any specific such filter than not but there is no guarantee about this being collision-free.

Suggested change
The filter below has 100 bits and uses 10 rounds of hashing, which should be capable of handling up to 10,000,000 keys without producing any false positives.
```json
["prp", "100:10:AAAkAQANcYQFCQoB:hZkZYqqdxcE="]
```
It includes keys `ca29c211f1c72d5b6622268ff43d2288ea2b2cb5b9aa196ff9f1704fc914b71b` and `460c25e682fda7832b52d1f22d3d22b3176d972f60dcdc3212ed8c92ef85065c`
The filter below has 100 bits and uses 10 rounds of hashing, which would achieve a false positive rate of 1 in 26 million.
```json
["prp", "100:10:AAAkAQANcYQFCQoB:hZkZYqqdxcE="]
```
It includes only the keys `ca29c211f1c72d5b6622268ff43d2288ea2b2cb5b9aa196ff9f1704fc914b71b` and `460c25e682fda7832b52d1f22d3d22b3176d972f60dcdc3212ed8c92ef85065c`

That said, 10 rounds are not a good choice of parameters for 2 elements. I'm not an expert on this and only plaid around with https://hur.st/bloomfilter/ but that tool suggests that with the same filter size but 35 rounds you would get an FPR of 1 in 27 billion.

Edit: As base64 encodes 6bit/char, 100bit is a bad choice, too. You can get 2 more bits for free. This might not sound like much but it brings the FPR down to 1 in 44 billion.

Copy link
Collaborator Author

@vitorpamplona vitorpamplona Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitorpamplona
Copy link
Collaborator Author

vitorpamplona commented Sep 15, 2024

Do you by any chance have a note how you arrived at the conclusion that it is low?

I just generated 10,000,000 keys and tried all of them against the filter. I run this test 10 times without getting a single incorrect result. So, that's where it comes from. :)

But yes, the false positive rate grows as you add keys, but because of the way we spec'ed it, as the number of keys grow, you can also grow the size or rounds of the filter when creating the event. Meaning that the writer can have adjust the filter to match the probability it wants out of the REQ calls.

It would be nice to have a simpler equation designed for this use case, though. Or like, something that keeps the probability stable but automatically readjust the variables to match it.

@Giszmo
Copy link
Member

Giszmo commented Sep 15, 2024

I'm not eager to actually try it out but ChatGPT shares our concerns about the expected zero collisions in the test: https://chatgpt.com/share/66e76256-66a8-8002-bd17-d4a43c13f373

I'm not sure about the concatenation issue but you might want to look into that, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants