Synthesized Scientific Data Kit (SDK) is a comprehensive framework for generative modelling for structured data (tabular, time-series and event-based data). The SDK helps you create compliant statistical-preserving data snapshots for BI/Analytics and ML/AI applications. Right-size your data with AI-supported data transformations.
This delivery contains the SDK bundled up with a Jupyter notebook. This provides an easy platform to start working with synthetic data.
Available on the GCP Cloud Marketplace: https://console.cloud.google.com/marketplace/product/synthesized-marketplace-public/synthesized-sdk-notebook-byol
To install Synthesized SDK Jupyter Notebook to a Google Kubernetes Engine cluster via Google Cloud Marketplace, follow the on-screen instructions.
You need the following tools in your development environment:
Configure gcloud
as a Docker credential helper:
gcloud auth configure-docker
Create a new cluster from the command line:
export CLUSTER=sdk-cluster
export ZONE=us-west1-a
gcloud container clusters create "${CLUSTER}" --zone "${ZONE}"
Configure kubectl
to connect to the new cluster:
gcloud container clusters get-credentials "${CLUSTER}" --zone "${ZONE}"
Clone this repo, as well as its associated tools repo:
git clone --recursive https://github.com/synthesized-io/sdk-on-gcp.git
An Application resource is a collection of individual Kubernetes components, such as Services, Deployments, and so on, that you can manage as a group.
To set up your cluster to understand Application resources, run the following command:
kubectl apply -f "https://raw.githubusercontent.com/GoogleCloudPlatform/marketplace-k8s-app-tools/master/crd/app-crd.yaml"
You need to run this command once.
The Application resource is defined by the Kubernetes SIG-apps community. You can find the source code at github.com/kubernetes-sigs/application.
Choose an instance name and
namespace
for the app. In most cases, you can use the default
namespace.
export APP_INSTANCE_NAME=sdk-jupyter-server
export NAMESPACE=default
Set up the image tag. Example:
export TAG="2.7.6"
Configure the container images:
export IMAGE_REGISTRY="gcr.io/synthesized-marketplace-public/sdk-jupyter-server"
Configure the Synthesized licence key:
export SYNTHESIZED_KEY=[YOUR KEY]
(Optional) Set computation resources limit:
export RESOURCES_LIMITS_CPU=1
export RESOURCES_LIMITS_MEMORY=1Gi
If you use a different namespace than the default
, create a new namespace by
running the following command:
kubectl create namespace "${NAMESPACE}"
To create the Service Account and ClusterRoleBinding:
export SDK_SERVICE_ACCOUNT="${APP_INSTANCE_NAME}-serviceaccount"
kubectl create serviceaccount "${SDK_SERVICE_ACCOUNT}" --namespace "${NAMESPACE}"
kubectl create clusterrole "${SDK_SERVICE_ACCOUNT}-role" --verb=get,list,watch --resource=services,nodes,pods,namespaces
kubectl create clusterrolebinding "${SDK_SERVICE_ACCOUNT}-rule" --clusterrole="${SDK_SERVICE_ACCOUNT}-role" --serviceaccount="${NAMESPACE}:${SDK_SERVICE_ACCOUNT}"
Use helm template
to expand the template. We recommend that you save the
expanded manifest file for future updates to your app.
helm template chart/sdk-jupyter-server \
--name-template "${APP_INSTANCE_NAME}" \
--namespace "${NAMESPACE}" \
--set envRenderSecret.SYNTHESIZED_KEY "${SYNTHESIZED_KEY}" \
--set image.repository="${IMAGE_REGISTRY}" \
--set image.tag="${TAG}" \
--set resources.limits.cpu="${RESOURCES_LIMITS_CPU}" \
--set resources.limits.memory="${RESOURCES_LIMITS_MEMORY}" \
> "${APP_INSTANCE_NAME}_manifest.yaml"
To apply the manifest to your Kubernetes cluster, use kubectl
:
kubectl apply -f "${APP_INSTANCE_NAME}_manifest.yaml" --namespace "${NAMESPACE}"
To get the Cloud Console URL for your app, run the following command:
echo "https://console.cloud.google.com/kubernetes/application/${ZONE}/${CLUSTER}/${NAMESPACE}/${APP_INSTANCE_NAME}"
To view the app, open the URL in your browser.
You can expose Jupyter Web Server port:
kubectl port-forward \
--namespace "${NAMESPACE}" \
svc/${APP_INSTANCE_NAME}-service \
8888:8888
- Open the demo notebook or create a new notebook. The password dor demo notebook:
synthesized123
. - Inside the notebook, if you did not set your license key earlier, ensure you set the license key by adding the following at the top:
import os; os.environ["SYNTHESIZED_KEY"] = <INSERT_KEY_HERE>
- Now synthesized can be imported into the notebook, data can be loaded, a synthesizer trained, and synthetic data generated as explained in the quickstart guide in Synthesized’s docs
- After data has been generated, it can be saved to a permanent location using standard python libraries and functions. To save files to gcp for example follow instructions here
This is a single-instance version of SDK Jupyter Notebook. It is not intended to be scaled out with its current configuration.
At the moment, the application does not support exporting Prometheus metrics and does not have any exporter.
-
In the Cloud Console, open Kubernetes Applications.
-
From the list of apps, choose your app installation.
-
On the Application Details page, click Delete.
Set your installation name and Kubernetes namespace:
export APP_INSTANCE_NAME=sdk-jupyter-server
export NAMESPACE=default
NOTE: We recommend using a
kubectl
version that is the same as the version of your cluster. Using the same version forkubectl
and the cluster helps to avoid unforeseen issues.
Run kubectl
on the expanded manifest file:
kubectl delete -f ${APP_INSTANCE_NAME}_manifest.yaml --namespace ${NAMESPACE}
If you don't have the expanded manifest file, delete the resources by using types and a label:
kubectl delete application,deployment,secret,service,backendconfig \
--namespace ${NAMESPACE} \
--selector name=${APP_INSTANCE_NAME}