Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Optimize SQL Registry proto() method #4425

Closed

Conversation

EXPEbdodla
Copy link
Contributor

@EXPEbdodla EXPEbdodla commented Aug 20, 2024

What this PR does / why we need it:

Problem Summary: We are using SQL Registry with 150+ projects and ~600 feature views at the moment. They are going to grow in future.
1 . Starting up a Remote registry service is taking longer than expected due to proto() call during Registry initialization. This has to refresh all 150 projects metadata on start up.
2 . Refreshing the cache on a regular interval with cache_mode = thread taking longer even though most of the projects object (like entities, FVs, ODFVs) doesn't change so often.

Fixes:

  1. Running the proto() method using ThreadPoolExecutor to run in parallel with 5 default workers. This doesn't fully solve the problem. We may need to look into the lazy caching options which reduces the start up times and load only the projects for which it's serving the calls rather than loading all projects into the cache. Along with that we may need to control the size of cache_registry_proto object with some eviction policy for the cache.
  2. Leveraging the previously cached RegistryProto if the last_updated_timestamp of the project hasn't changed in the feast_metadata table. Reduces the number of database calls we make.
  3. Adding last_updated_timestamp column to ProjectMetadata table
  4. Added Indexes to the tables
  5. _maybe_init_project_metadata is called multiple times (Ex: In get, list and apply). Proto() method depends on list calls (~10 calls per project. With 150 projects, it just makes the function call 1500 times which is not needed)
  6. _get_all_projects() method is using individual tables to get the projects list instead of using feast_metadata table
  7. Added get_all_projects(), get_project_metadata() and delete_project() methods to SQL Registry.
  8. Updated test case for ODFV backward compatibility issue
  9. Added tests cases to ProjectMetadata object
  10. Added integration tests for the changes
  11. Switched sqlite fixture scope to "function". sqlite is not thread safe so added thread_pool_executor_worker_count configuration. If its 0, doesn't use ThreadPoolExecutor.

Questions:

  1. It make sense to refresh the proto() with all projects for remote registry service if SQL registry is the proxy. If users pass the project_name as part of Repo Config, proto() can only refresh the project passed. I'm not sure how to make the distinction when registry is a Proxy registry for Registry Service vs Regular Registry. Appreciate any thoughts on this.

Which issue(s) this PR fixes:

Misc

Bhargav Dodla added 2 commits August 19, 2024 18:34
Signed-off-by: Bhargav Dodla <[email protected]>
@franciscojavierarceo
Copy link
Member

This is great! Would you mind adding maybe a test or two? Since we're changing the function names and quite a bit of the code it'd be helpful.

Separately, it would be super cool to do a blood post to hear about your experience scaling Feast, if you'd be open to it!

@EXPEbdodla
Copy link
Contributor Author

This is great! Would you mind adding maybe a test or two? Since we're changing the function names and quite a bit of the code it'd be helpful.

Separately, it would be super cool to do a blood post to hear about your experience scaling Feast, if you'd be open to it!

@franciscojavierarceo Sorry. I would have marked this as a Draft PR. Added tests, integration tests for the changes.

I assume you are asking for blog post. It's in our plan to publish a blog. We will share the details when we have a Draft ready.

Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! /lgtm

@franciscojavierarceo franciscojavierarceo enabled auto-merge (squash) August 20, 2024 23:47
@tokoko
Copy link
Collaborator

tokoko commented Aug 22, 2024

@EXPEbdodla I think this definitely needs to be broken up into several PRs.

  • Let's handle the new project methods in the registry first. I'm not really comfortable with adding new methods in a single implementation only. What's blocking us from changing BaseRegistry right from the start?
  • Now there's also authorization to think of as well. I'm particularly not sure about how delete_project can fit in there, right now we define permissions on FeastObject-level. Trying to fit an endpoint like delete_project might be a headache.

@tokoko tokoko disabled auto-merge August 22, 2024 11:54
@franciscojavierarceo
Copy link
Member

  • Now there's also authorization to think of as well. I'm particularly not sure about how delete_project can fit in there, right now we define permissions on FeastObject-level. Trying to fit an endpoint like delete_project might be a headache.

An endpoint like delete_project is likely quite valuable but should be configurable by the users on who has the right access to execute it. I don't think we have to be prescriptive but I would think this is configurable by our new RBAC functionality, right?

@EXPEbdodla
Copy link
Contributor Author

EXPEbdodla commented Aug 22, 2024

Let's handle the new project methods in the registry first. I'm not really comfortable with adding new methods in a single implementation only. What's blocking us from changing BaseRegistry right from the start?

@tokoko I haven't used the other registry's so I'm cautious in touching them at the moment. Do we need to do that before merging this PR?

@EXPEbdodla
Copy link
Contributor Author

@tokoko @franciscojavierarceo Created PR #4441 to add methods to base registry. I'll update this PR to cover the tests cases fully along with the changes for this.

@EXPEbdodla
Copy link
Contributor Author

Closing this. This is split into multiple PRs already. Thanks everyone for inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants