Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Garbage Collector removing relation still valid #1471

Open
gign0766 opened this issue Aug 19, 2024 · 17 comments
Open

[BUG] Garbage Collector removing relation still valid #1471

gign0766 opened this issue Aug 19, 2024 · 17 comments
Assignees
Labels
bug Someting isn't working

Comments

@gign0766
Copy link

Describe the bug
When a permission is present twice, one valid and the other expired, the Garbage Collector invalidate the relation in spite of relation their is still valid one

To Reproduce
Steps to reproduce the behavior:

Setup a Permify with a PostgreSQL database, configure the garbage collector as following :

database:
  engine: postgres
  uri: 
  garbage_collection:
     enabled: true
     interval: 10m
     window: 5m
     timeout: 5m

Add a relation, then add the same relation (for the same tenant) a bit later

For example :
image

The permission check work until the garbage collector run, after that, the permission is considered as invalid

Expected behavior
The expired relation should be removed from the database, but the permission check should still be allowed since there is a valid relation.

Additional context
The permify instance run with the helm chart as well as the postgres instance

Environment :

  • Permify : helmchart permify v0.3.5
  • Postgres : bitnami postgresql 12.5
@gign0766 gign0766 added the bug Someting isn't working label Aug 19, 2024
@tolgaOzen tolgaOzen assigned ucatbas and unassigned EgeAytin and tolgaOzen Aug 21, 2024
@ucatbas
Copy link
Contributor

ucatbas commented Aug 22, 2024

Hi @gign0766, I tried to reproduce the issue but couldn't replicate it. Everything seems to be working as expected on my end. I'll update the database tests shortly to include this scenario. Also, can you provide the specific permission check requests you used to ensure I don't miss any details?

@gign0766
Copy link
Author

Hi,
I couldn't reproduce the problem outside of the helm version.
When I tryed to recreate it with a docker environment everything work as expected.

The permission check was :
POST http://localhost:13476/v1/tenants/4849b21a-df59-45b2-ab1e-e05bb1db8f28/permissions/check

{
  "metadata": {
    "schema_version": "",
    "snap_token": "",
    "depth": 20
  },
  "entity": {
    "type": "organisation",
    "id": "1"
  },
  "permission": "is_member",
  "subject": {
    "type": "user",
    "id": "b56661f8-7be6-4342-a4c0-918ee04e5983"
  }
}

The test was made though the Go SDK :

cr, err := client.Permission.Check(ctx, &permifyPayload.PermissionCheckRequest{
		TenantId: tenantId,
		Metadata: &permifyPayload.PermissionCheckRequestMetadata{
			SnapToken:     "",
			SchemaVersion: "",
			Depth:         20,
		},
		Entity:     tuple.Entity,
		Permission: tuple.Relation,
		Subject:    tuple.Subject,
	})

and with the following schema :

entity user {}

entity organisation {
    relation admin @user
    relation member @user
    relation billing @user
    relation guest @user

    permission is_member = admin or member or billing or guest
}

// Billing
entity billing {
    relation admin @organisation#admin
    relation billing @organisation#billing

	permission is_member = admin or billing

    permission full_access = is_member
    permission read_only = is_member
}

// Project
entity project {
    relation admin @organisation#admin
    relation member @organisation#member

    permission is_member = admin or member


    permission full_access = admin
    permission read_only = is_member
}

entity instance {
    relation project @project

    permission full_access = project.full_access
    permission read_only = project.read_only
}

@gign0766
Copy link
Author

To add some information on the environment :
It's a Kubernetes Cluster with 3 worker nodes.
Permify helm as the default replicas number (so 3), so each worker as one permify instance.

@gign0766
Copy link
Author

Hello,

I just notice there is a distributed configuration for the cache system.
Could this be the cause of the problem ?

I'll try to configure it on the cluster and keep you inform of the result

@ucatbas
Copy link
Contributor

ucatbas commented Aug 27, 2024

Hi, thank you @gign0766. I couldn't test with your setup. This week I will be updating helm charts, related migration scripts and issues about k8s. I will test both cases!

@tolgaOzen
Copy link
Member

Hi @gign0766 , the issue might be related to the version of Postgres you're using. Permify supports Postgres 13.8 and above.

cc: @ucatbas

@gign0766
Copy link
Author

Hi @tolgaOzen
My bad, I forgot to write that was the bitnami helm chart version
The postgres version is 15.3
Image : docker.io/bitnami/postgresql:15.3.0-debian-11-r17

So far, since I've activated the distributed system, the problem hasn't shown up again.

@gign0766
Copy link
Author

Hi,

I'm adding information to the ticket.

I'm working on a test environment, the permify is still the same with the same versions of chart helm.
Last night everything was working when checking permission and today the schema doesn't seem to be available anymore.

On my application I use permify-go with grpc to make the request, and I also tried with Insomnia with grpc request reproduce the error.

Extract from Permify db - table relation_tuples
image

Extract from Permify db - table schema_definitions
image

Organization schema entity

entity organization {
	relation admin @user 
	relation member @user 
	relation billing @user 
	relation guest @user 


	permission OrganizationMember = (((admin or member) or billing) or guest)
	permission OrganizationIAMFullAccess = admin
	permission OrganizationIAMReadOnly = ((admin or member) or billing)
	permission OrganizationSettingsFullAccess = admin
	permission OrganizationSettingsReadOnly = ((admin or member) or billing)
	permission OrganizationBillingFullAccess = (admin or billing)
	permission OrganizationBillingReadOnly = (admin or billing)
	permission OrganizationProjectManagerFullAccess = admin
	permission OrganizationProjectManagerReadOnly = ((admin or member) or guest)
} 

Extract from Permify db - table tenants
image

Screenshot of Insomnia gRPC call on Permission Check route
image

@tolgaOzen
Copy link
Member

Hi @gign0766 , could you share your Permify config file while keeping sensitive information hidden?

@gign0766
Copy link
Author

Hi @tolgaOzen

Of course, here is the config :

# The logger section sets the logging level for the service.
logger:
  level: debug

#  The database section specifies the database engine and connection settings,
# including the URI for the database, whether or not to auto-migrate the database,
# and connection pool settings.
database:
  engine: postgres
  uri: postgres_url
  garbage_collection:
    enabled: true
    interval: 10m
    window: 5m
    timeout: 5m

distributed:
  # Indicates whether the distributed mode is enabled or not
  enabled: true

  # The address of the distributed service.
  # Using a Kubernetes DNS name suggests this service runs in a Kubernetes cluster
  # under the 'default' namespace and is named 'permify'
  address:  service_address

  # The port on which the service is exposed
  port: "5000"

@tolgaOzen
Copy link
Member

Can you share any logs? The configuration looks correct. Also, can I get some information about the Postgres you're using? Are you deploying it within Helm?

@tolgaOzen
Copy link
Member

Could the issue be related to Bitnami Postgres? Are you creating a volume for persistency?

@gign0766
Copy link
Author

We don't have any logs from Permify, they don't logs correctly
image

For postgres we use the bitnami posgresql chart in version 12.5.9 (corresponding to a postgres 15.3)
The database is deployed with a volume to persist the data.

It's probably not related to Postgres, because when we disable garbage collection, the problem no longer appears.

@tolgaOzen
Copy link
Member

I'm trying to understand the issue, but I don't think it's related to the garbage collector because the garbage collector removes old data but doesn't delete the schema. The schema not found error you're receiving typically occurs when the schema doesn't exist. I'm looking into it in more detail to better understand the problem. I'll update you. In the meantime, could you also try with Permify v1.1.1?

@gign0766
Copy link
Author

@tolgaOzen This is also what I understood from reading the permify code. But the bug only seems to appear when garbage collection is enabled.

I will update the chart of permify to the latest version.

@gign0766
Copy link
Author

gign0766 commented Sep 24, 2024

There was an error on the latest version of the chart helm, I have submitted a PR to fix the problem on the permify helm chart repo.

@gign0766
Copy link
Author

I've updated to version 0.3.7 of the chart helm. Now the logs work properly.

I'll update you if I notice the bug again !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Someting isn't working
Projects
None yet
Development

No branches or pull requests

4 participants