Skip to content

Commit f3c007c

Browse files
abehouharshach
andauthoredApr 13, 2023
added a installation manual (postgres demo) (open-metadata#5855)
Co-authored-by: Sriharsha Chintalapani <[email protected]>
1 parent 26c7c20 commit f3c007c

File tree

1 file changed

+199
-0
lines changed

1 file changed

+199
-0
lines changed
 
+199
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
# Installation and deployment instructions (using Postgres as example)
2+
3+
Below are the instructions for connecting a Postgress server. The installation steps should be the same for connecting all kinds of servers. Different servers would require different configurations in the .yaml or DAG files. See https://docs.open-metadata.org/integrations/connectors for your configuration.
4+
5+
# Goal: To run Postgres metadata ingestion and quality tests with OpenMetadata using Airflow scheduler
6+
7+
Note: This procedure does not support Windows, because Windows does not implement "signal.SIGALRM". **It is highly recommended to use WSL 2 if you are on Windows**.
8+
9+
## Requirements:
10+
See https://docs.open-metadata.org/overview/run-openmetadata-with-prefect "Requirements" section
11+
12+
## Installation:
13+
1. Clone this git hub repo:
14+
`git clone https://github.com/open-metadata/OpenMetadata.git`
15+
16+
2. Cd to ~/.../openmetadata/docker/metadata
17+
18+
3. Start the OpenMetadata containers. This will allow you run OpenMetadata in Docker:
19+
`docker compose up -d`
20+
- To check the status of services, run `docker compose ps`
21+
- To access the UI: http://localhost:8585
22+
23+
4. Install the OpenMetadata ingestion package.
24+
- (optional but highly recommended): Before installing this package, it is recommended to create and activate a virtual environment. To do this, run:
25+
`python -m venv env` and `source env/bin/activate`
26+
27+
- To install the OpenMetadata ingestion package:
28+
`pip install --upgrade "openmetadata-ingestion[docker]==0.10.3"` (specify the release version to ensure compatibility)
29+
30+
5. Install Airflow:
31+
- 5A: Install Airflow Lineage Backend: `pip3 install "openmetadata-ingestion[airflow-container]"==0.10.3`
32+
- 5B: Install Airflow postgres connector module: `pip3 install "openmetadata-ingestion[postgres]"==0.10.3`
33+
- 5C: Install Airflow APIs: `pip3 install "openmetadata-airflow-managed-apis"==0.10.3`
34+
- 5D: Install necessary Airflow plugins:
35+
- 1) Download the latest openmetadata-airflow-apis-plugins release from https://github.com/open-metadata/OpenMetadata/releases
36+
- 2) Untar it under your {AIRFLOW_HOME} directory (usually c/Users/Yourname/airflow). This will create and setup a plugins directory under {AIRFLOW_HOME} .
37+
- 3) `cp -r {AIRFLOW_HOME}/plugins/dag_templates {AIRFLOW_HOME}`
38+
- 4) `mkdir -p {AIRFLOW_HOME}/dag_generated_configs`
39+
- 5) (re)start the airflow webserver and scheduler
40+
41+
6. Configure Airflow:
42+
- 6A: configure airflow.cfg in your AIRFLOW_HOME directory. Check and make all the folder directories point to the right places. For instance, dags_folder = YOUR_AIRFLOW_HOME/dags
43+
- 6B: configure openmetadata.yaml and update the airflowConfiguration section. See: https://docs.open-metadata.org/integrations/airflow/configure-airflow-in-the-openmetadata-server
44+
45+
## To run a metadata ingestion workflow with Airflow ingestion DAGs on Postgres data:
46+
47+
1. Prepare the Ingestion DAG:
48+
To see a more complete tutorial on ingestion DAG, see https://docs.open-metadata.org/integrations/connectors/postgres/run-postgres-connector-with-the-airflow-sdk
49+
To be brief, below is my own DAG. Copy & Paste the following into a python file (postgres_demo.py):
50+
51+
```
52+
import pathlib
53+
import json
54+
from datetime import timedelta
55+
from airflow import DAG
56+
57+
try:
58+
from airflow.operators.python import PythonOperator
59+
except ModuleNotFoundError:
60+
from airflow.operators.python_operator import PythonOperator
61+
62+
from metadata.config.common import load_config_file
63+
from metadata.ingestion.api.workflow import Workflow
64+
from airflow.utils.dates import days_ago
65+
66+
default_args = {
67+
"owner": "user_name",
68+
"email": ["username@org.com"],
69+
"email_on_failure": False,
70+
"retries": 3,
71+
"retry_delay": timedelta(minutes=5),
72+
"execution_timeout": timedelta(minutes=60)
73+
}
74+
75+
config = """
76+
{
77+
"source":{
78+
"type": "postgres",
79+
"serviceName": "postgres_demo",
80+
"serviceConnection": {
81+
"config": {
82+
"type": "Postgres",
83+
"username": "postgres", (change to your username)
84+
"password": "postgres", (change to your password)
85+
"hostPort": "192.168.1.55:5432", (change to your hostPort)
86+
"database": "surveillance_hub" (change to your database)
87+
}
88+
},
89+
"sourceConfig":{
90+
"config": { (all of the following can switch to true or false)
91+
"enableDataProfiler": "true" or "false",
92+
"markDeletedTables": "true" or "false",
93+
"includeTables": "true" or "false",
94+
"includeViews": "true" or "false",
95+
"generateSampleData": "true" or "false"
96+
}
97+
}
98+
},
99+
"sink":{
100+
"type": "metadata-rest",
101+
"config": {}
102+
},
103+
"workflowConfig": {
104+
"openMetadataServerConfig": {
105+
"hostPort": "http://localhost:8585/api",
106+
"authProvider": "no-auth"
107+
}
108+
}
109+
110+
111+
}
112+
"""
113+
114+
def metadata_ingestion_workflow():
115+
workflow_config = json.loads(config)
116+
workflow = Workflow.create(workflow_config)
117+
workflow.execute()
118+
workflow.raise_from_status()
119+
workflow.print_status()
120+
workflow.stop()
121+
122+
123+
with DAG(
124+
"sample_data",
125+
default_args=default_args,
126+
description="An example DAG which runs a OpenMetadata ingestion workflow",
127+
start_date=days_ago(1),
128+
is_paused_upon_creation=False,
129+
schedule_interval='*/5 * * * *',
130+
catchup=False,
131+
) as dag:
132+
ingest_task = PythonOperator(
133+
task_id="ingest_using_recipe",
134+
python_callable=metadata_ingestion_workflow,
135+
)
136+
137+
if __name__ == "__main__":
138+
metadata_ingestion_workflow()
139+
```
140+
141+
2. Run the DAG:
142+
`
143+
python postgres_demo.py
144+
`
145+
146+
- Alternatively, we could run without Airflow SDK and with OpenMetadata's own methods. Run `metadata ingest -c /Your_Path_To_Json/.json`
147+
The json configuration is exactly the same as the json configuration in the DAG.
148+
- Or, we could also run it with `metadata ingest -c /Your_Path_To_Yaml/.yaml`
149+
The yaml configuration would be the exact same except without the curly brackets and the double quotes.
150+
151+
## To run a profiler workflow on Postgres data
152+
1. Prepare the DAG OR configure the yaml/json:
153+
- To configure the quality tests in json/yaml, see https://docs.open-metadata.org/data-quality/data-quality-overview/tests
154+
- To prepare the DAG, see https://github.com/open-metadata/OpenMetadata/tree/0.10.3-release/data-quality/data-quality-overview
155+
156+
Example yaml I was using:
157+
```
158+
source:
159+
type: postgres
160+
serviceName: your_service_name
161+
serviceConnection:
162+
config:
163+
type: Postgres
164+
username: your_username
165+
password: your_password
166+
hostPort:
167+
database: your_database
168+
sourceConfig:
169+
config:
170+
type: Profiler
171+
172+
processor:
173+
type: orm-profiler
174+
config:
175+
test_suite:
176+
name: demo_test
177+
tests:
178+
- table: your_table_name (FQN)
179+
column_tests:
180+
- columnName: id
181+
testCase:
182+
columnTestType: columnValuesToBeBetween
183+
config:
184+
minValue: 0
185+
maxValue: 10
186+
sink:
187+
type: metadata-rest
188+
config: {}
189+
workflowConfig:
190+
openMetadataServerConfig:
191+
hostPort: http://localhost:8585/api
192+
authProvider: no-auth
193+
```
194+
Note that the table name must be FQN and match exactly with the table path on the OpenMetadata UI.
195+
196+
2. Run it with
197+
`metadata profile -c /path_to_yaml/.yaml`
198+
199+
Make sure to refresh the OpenMetadata UI and click on the Data Quality tab to see the results.

0 commit comments

Comments
 (0)
Please sign in to comment.