Objective

This pipeline takes the JSON data that is in FHIR standard format from our "raw" ADLS container and converts it to parquet. Since Parquet is a columnar compressed file format this makes it much faster to import and work with the data. We store the parquet output in our "processed" container in ADLS under a folder called "Observation".
We plan to eventually load this data into Dedicated SQL Pool in a table called [fhir].[ObservationMain]. We need to extract the data needed for the table, clean it, and write it back to ADLS. The second activity in our pipeline handles all of this in a single Synapse Spark Notebook.
Now that the data is prepared and cleaned we are ready to load it into our Dedicated Pool, but we need to create the tables first. We have a script activity that will run against our Dedicated Pool to create these artifacts for us.

Note: Make sure your Dedicated Pool is running prior to executing this pipeline. You can see this in the SQL Pools tab under the Manage Hub.

We are now all setup with data ready to go and a table to load it in and we'll use a Copy Activity to perform the load.

STEP 1: Parameter Setup

Prior to running the observations pipeline (FHIR_Pipeline4Observation_Spark_OC) you will need to set the pipeline parameters to use the artifact names you chose during deployment. Go to the integrate hub, expand the obsevation folder, and select the pipeline to open it.

Once the pipeline opens you will need to click somewhere on the canvas (open space or background) to see the pipeline level parameters. This means that NONE of the activities should be highlighted or selected. Now select the Parameters tab in the bottom pane to view the pipeline level parameters.

Change the default value for each of the following five parameters to what you chose during deployment:

StorageName - This is the name of your Synapse workspace ADLS account
DatabaseName - This is the name of your database in Synapse Dedicated SQL Pool
ServerName - This is the name of your Synapse Dedicated SQL Pool
SparkPoolName - This is the name of your Synapse Spark Pool
DatasetSize - This is either "1tb" or "30tb" depending on which size dataset you want to use

STEP 2: Execute Pipeline

You need to hit the debug button to kick off the pipeline run.

Note: Make sure your Dedicated Pool is running prior to executing this pipeline. You can see this in the SQL Pools tab under the Manage Hub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Objective

STEP 1: Parameter Setup

STEP 2: Execute Pipeline

Congratulations on completing Exercise 02. You are now ready to move to Exercise 03 - Patients

Files

README.md

Latest commit

History

README.md

File metadata and controls

Objective

STEP 1: Parameter Setup

STEP 2: Execute Pipeline

Congratulations on completing Exercise 02. You are now ready to move to Exercise 03 - Patients