Skip to content

Commit

Permalink
[Docs] Add transform catalog page for enrichment transform (apache#30187
Browse files Browse the repository at this point in the history
)

* add transform catalog page

* add to side bar

* add snippet of bigtable cluster

* Update website/www/site/content/en/documentation/transforms/python/elementwise/enrichment.md

Co-authored-by: Rebecca Szper <[email protected]>

* Update website/www/site/content/en/documentation/transforms/python/elementwise/enrichment.md

Co-authored-by: Rebecca Szper <[email protected]>

* Update website/www/site/content/en/documentation/transforms/python/elementwise/enrichment.md

Co-authored-by: Rebecca Szper <[email protected]>

* update context from review

* add enrichment-notebook link

* Update website/www/site/content/en/documentation/transforms/python/elementwise/enrichment.md

Co-authored-by: Rebecca Szper <[email protected]>

* Update website/www/site/content/en/documentation/transforms/python/elementwise/enrichment.md

Co-authored-by: Rebecca Szper <[email protected]>

---------

Co-authored-by: Rebecca Szper <[email protected]>
  • Loading branch information
riteshghorse and rszper authored Feb 14, 2024
1 parent bfe6168 commit 914cf14
Show file tree
Hide file tree
Showing 7 changed files with 175 additions and 2 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# coding=utf-8
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# pytype: skip-file
# pylint: disable=line-too-long


def enrichment_with_bigtable():
# [START enrichment_with_bigtable]
import apache_beam as beam
from apache_beam.transforms.enrichment import Enrichment
from apache_beam.transforms.enrichment_handlers.bigtable import BigTableEnrichmentHandler

project_id = 'apache-beam-testing'
instance_id = 'beam-test'
table_id = 'bigtable-enrichment-test'
row_key = 'product_id'

data = [
beam.Row(sale_id=1, customer_id=1, product_id=1, quantity=1),
beam.Row(sale_id=3, customer_id=3, product_id=2, quantity=3),
beam.Row(sale_id=5, customer_id=5, product_id=4, quantity=2)
]

bigtable_handler = BigTableEnrichmentHandler(
project_id=project_id,
instance_id=instance_id,
table_id=table_id,
row_key=row_key)
with beam.Pipeline() as p:
_ = (
p
| "Create" >> beam.Create(data)
| "Enrich W/ BigTable" >> Enrichment(bigtable_handler)
| "Print" >> beam.Map(print))
# [END enrichment_with_bigtable]
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# coding=utf-8
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# pytype: skip-file
# pylint: disable=line-too-long

import unittest
from io import StringIO

import mock

# pylint: disable=unused-import
try:
from apache_beam.examples.snippets.transforms.elementwise.enrichment import enrichment_with_bigtable
from apache_beam.io.requestresponse import RequestResponseIO
except ImportError:
raise unittest.SkipTest('RequestResponseIO dependencies are not installed')


def validate_enrichment_with_bigtable():
expected = '''[START enrichment_with_bigtable]
Row(sale_id=1, customer_id=1, product_id=1, quantity=1, product={'product_id': '1', 'product_name': 'pixel 5', 'product_stock': '2'})
Row(sale_id=3, customer_id=3, product_id=2, quantity=3, product={'product_id': '2', 'product_name': 'pixel 6', 'product_stock': '4'})
Row(sale_id=5, customer_id=5, product_id=4, quantity=2, product={'product_id': '4', 'product_name': 'pixel 8', 'product_stock': '10'})
[END enrichment_with_bigtable]'''.splitlines()[1:-1]
return expected


@mock.patch('sys.stdout', new_callable=StringIO)
class EnrichmentTest(unittest.TestCase):
def test_enrichment_with_bigtable(self, mock_stdout):
enrichment_with_bigtable()
output = mock_stdout.getvalue().splitlines()
expected = validate_enrichment_with_bigtable()
self.assertEqual(output, expected)


if __name__ == '__main__':
unittest.main()
3 changes: 2 additions & 1 deletion website/www/site/content/en/documentation/ml/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,8 @@ You can use Apache Beam for data validation and preprocessing by setting up data
| Task | Example |
| ------- | ---------------|
| I want to transform my data for preprocessing| [Preprocess data with MLTransform](/documentation/ml/preprocess-data) |
| I want to explore my data | [Data exploration workflow and example](/documentation/ml/data-processing) |:
| I want to explore my data | [Data exploration workflow and example](/documentation/ml/data-processing) |
| I want to enrich my data | [Data enrichment wth Enrichment transform](https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb) |:
{{< /table >}}


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: "Enrichment"
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Enrichment transform

{{< localstorage language language-py >}}

<table>
<tr>
<td>
<a>
{{< button-pydoc path="apache_beam.transforms" class="Enrichment" >}}
</a>
</td>
</tr>
</table>


The enrichment transform lets you dynamically enrich data in a pipeline by doing a key-value lookup to a remote service. The transform uses [`RequestResponeIO`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.requestresponseio.html#apache_beam.io.requestresponseio.RequestResponseIO) internally. This feature uses client-side throttling to ensure that the remote service isn't overloaded with requests. If service-side errors occur, like `TooManyRequests` and `Timeout` exceptions, it retries the requests by using exponential backoff.

In Apache Beam 2.54.0 and later versions, the transform includes a built-in enrichment handler for [Bigtable](https://cloud.google.com/bigtable/docs/overview).

## Use Bigtable to enrich data

The following example demonstrates how to create a pipeline that use the enrichment transform with [`BigTableEnrichmentHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler).

The data stored in the Bigtable cluster uses the following format:

| Row key | product:product_id | product:product_name | product:product_stock |
|:---------:|:--------------------:|:----------------------:|:-----------------------:|
| 1 | 1 | pixel 5 | 2 |
| 2 | 2 | pixel 6 | 4 |
| 3 | 3 | pixel 7 | 20 |
| 4 | 4 | pixel 8 | 10 |


{{< highlight language="py" >}}
{{< code_sample "sdks/python/apache_beam/examples/snippets/transforms/elementwise/enrichment.py" enrichment_with_bigtable >}}
{{</ highlight >}}

{{< paragraph class="notebook-skip" >}}
Output:
{{< /paragraph >}}
{{< highlight class="notebook-skip" >}}
{{< code_sample "sdks/python/apache_beam/examples/snippets/transforms/elementwise/enrichment_test.py" enrichment_with_bigtable >}}
{{< /highlight >}}

## Related transforms

Not applicable.

{{< button-pydoc path="apache_beam.transforms" class="Enrichment" >}}
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ limitations under the License.
</tr>
</table>

The following examples demonstrate how to to create pipelines that use the Beam RunInference API and PyTorch.
The following examples demonstrate how to create pipelines that use the Beam RunInference API and PyTorch.

## Example 1: PyTorch unkeyed model

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ limitations under the License.

<table class="table-bordered table-striped">
<tr><th>Transform</th><th>Description</th></tr>
<tr><td><a href="/documentation/transforms/python/elementwise/enrichment">Enrichment</a></td><td>Performs data enrichment with a remote service.</td></tr>
<tr><td><a href="/documentation/transforms/python/elementwise/filter">Filter</a></td><td>Given a predicate, filter out all elements that don't satisfy the predicate.</td></tr>
<tr><td><a href="/documentation/transforms/python/elementwise/flatmap">FlatMap</a></td><td>Applies a function that returns a collection to every element in the input and
outputs all resulting elements.</td></tr>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,7 @@
<span class="section-nav-list-title">Element-wise</span>

<ul class="section-nav-list">
<li><a href="/documentation/transforms/python/elementwise/enrichment/">Enrichment</a></li>
<li><a href="/documentation/transforms/python/elementwise/filter/">Filter</a></li>
<li><a href="/documentation/transforms/python/elementwise/flatmap/">FlatMap</a></li>
<li><a href="/documentation/transforms/python/elementwise/keys/">Keys</a></li>
Expand Down

0 comments on commit 914cf14

Please sign in to comment.