You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Open Cluster Management (OCM) streamlines multi-cluster workload management through APIs that align with SIG-Multicluster standards. Beyond traditional workload orchestration, OCM enables scalable AI training and inference across distributed environments.
As machine learning (ML) expands across clusters, data privacy becomes a critical concern. ML models rely on vast datasets, making it essential to safeguard sensitive information across clusters without compromising model performance.
This project integrates Federated Learning (FL) into OCM, enabling privacy-preserving, collaborative model training without transferring raw data between clusters. Instead, training occurs locally where the data resides, ensuring compliance, enhancing efficiency, and reducing bandwidth and storage costs.
By leveraging OCM’s Placement, ManifestWork, and other APIs. we standardize FL workflows and seamlessly integrate frameworks like Flower and OpenFL through a unified interface. This approach harnesses OCM’s capabilities to deliver scalable, cost-efficient, and privacy-preserving AI solutions in multi-cluster environments.
Expected Outcome
Comprehensive Documentation:
Define the scenarios addressed by the prototype, highlighting its purpose and value.
Provide an intuitive and architectural comparison between Federated Learning (FL) and OCM, mapping FL terminology to OCM APIs to showcase OCM’s native support for FL.
Illustrate the complete Federated Learning workflow within Open Cluster Management.
Extended Prototype (or CRD) Support:
Enable the aggregation model persistence in AWS S3 (currently supports only native PVC).
Extend compatibility to support additional Federated Learning frameworks like OpenFL (currently supports Flower). This requires understanding how OpenFL works, containerizing it, and integrating it into the prototype.
Recommended Skills
Golang, Kubernetes, Federated Learning, Open Cluster Management, Scheduling
Description
Open Cluster Management (OCM) streamlines multi-cluster workload management through APIs that align with SIG-Multicluster standards. Beyond traditional workload orchestration, OCM enables scalable AI training and inference across distributed environments.
As machine learning (ML) expands across clusters, data privacy becomes a critical concern. ML models rely on vast datasets, making it essential to safeguard sensitive information across clusters without compromising model performance.
This project integrates Federated Learning (FL) into OCM, enabling privacy-preserving, collaborative model training without transferring raw data between clusters. Instead, training occurs locally where the data resides, ensuring compliance, enhancing efficiency, and reducing bandwidth and storage costs.
By leveraging OCM’s Placement, ManifestWork, and other APIs. we standardize FL workflows and seamlessly integrate frameworks like Flower and OpenFL through a unified interface. This approach harnesses OCM’s capabilities to deliver scalable, cost-efficient, and privacy-preserving AI solutions in multi-cluster environments.
Expected Outcome
Comprehensive Documentation:
Extended Prototype (or CRD) Support:
Recommended Skills
Golang, Kubernetes, Federated Learning, Open Cluster Management, Scheduling
Mentor(s)
Meng Yan (@yanmxa, [email protected]) - primary
Qing Hao (@haoqing0110, [email protected])
References
Open Cluster Management
Federated Framework - Flower
Federated Framework - OpenFL
Placement concept
ManifestWork concept
Federated Learning Controller for Open Cluster Management
Implementing a controller
Generating CRDs
Discussion
Feel free to raise your questions here. Can also reach out to us in the slack channel.
The text was updated successfully, but these errors were encountered: