Skip to content

Commit 0f83324

Browse files
committed
Add logic for building "ml-pipelines-sdk" pipeline SDK package with minimal dependencies.
PiperOrigin-RevId: 345747124
1 parent 72b67b9 commit 0f83324

8 files changed

+240
-32
lines changed

README.ml-pipelines-sdk.md

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
<!-- See: www.tensorflow.org/tfx/ -->
2+
3+
# ML Pipelines SDK
4+
5+
The `ml-pipelines-sdk` package provides the pipeline-authoring SDK for the
6+
[TFX](https://tensorflow.org/tfx) machine learning framework. This package is
7+
a subset of the `tfx` package and is provided for TFX pipeline authors who wish
8+
to minimize runtime dependencies.
9+
10+
Releases are synchronized in lock-step with the
11+
[`tfx` package](https://pypi.org/project/tfx) and overall release notes
12+
can be found at https://github.com/tensorflow/tfx/releases.

package_build/README.md

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# TFX package
2+
3+
TFX is packaged as the `tfx` package on PyPI. We recommend that users install
4+
TFX using `pip install tfx`. As of version 0.26.0, users also have the option
5+
to install a standalone version of the TFX pipeline authoring SDK, as the
6+
`ml-pipelines-sdk` package. This package has minimal dependencies, but does not
7+
include first-party TFX components like the TFX ExampleGen, Transform and
8+
Trainer in `tfx.components.*`, nor any additional tools requiring these
9+
components.
10+
11+
Both the `tfx` and `ml-pipelines-sdk` packages share the `tfx` namespace and
12+
the same source repository at https://github.com/tensorflow/tfx. These two
13+
packages can be built using the instructions below. During development, a
14+
single editable package may be installed for convenience (see the "Installing
15+
the development-only `tfx-dev` package" section below).
16+
17+
# Building TFX pip package from source
18+
19+
## Setting up the build environment
20+
21+
First, set up the build environment by running:
22+
23+
```
24+
package_build/initialize.sh
25+
```
26+
27+
## Building the `tfx` and `ml-pipelines-sdk` packages
28+
29+
Next, each package can be built using the `bdist_wheel` command:
30+
31+
```
32+
python package_build/ml-pipelines-sdk/setup.py bdist_wheel
33+
python package_build/tfx/setup.py bdist_wheel
34+
```
35+
36+
As a result, `.whl` files will be generated in the `dist/` directory.
37+
38+
# Installing the development-only `tfx-dev` package
39+
40+
During development, it is convenient to install a single editable pip package.
41+
This package will contain the union of the `tfx` and `ml-pipelines-sdk`
42+
package in an editable environment. To install this combined package for
43+
development, run from the repository root:
44+
45+
```
46+
pip install -e .
47+
```
48+
49+
This `tfx-dev` package should not be packaged as a binary or source
50+
distribution using `python setup.py {bdist_wheel,sdist}` to avoid conflicts
51+
with the two official `tfx` and `ml-pipelines-sdk` packages. Instead, users
52+
should build the two packages for distribution with the directions above.

package_build/initialize.sh

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#!/bin/bash
2+
# Copyright 2020 Google LLC. All Rights Reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
#
16+
# Initialization script for building TFX SDK release packages.
17+
#
18+
# After this script is run, `python setup.py` commands can be run in the
19+
# `tfx/` and `ml-pipelines-sdk/` packages.
20+
21+
BASEDIR=$(dirname "$(pwd)/${0#./}")/..
22+
23+
mkdir -p $BASEDIR/dist
24+
25+
for CONFIG_NAME in tfx ml-pipelines-sdk
26+
do
27+
ln -sf $BASEDIR/setup.py $BASEDIR/package_build/$CONFIG_NAME/
28+
ln -sf $BASEDIR/dist $BASEDIR/package_build/$CONFIG_NAME/
29+
ln -sf $BASEDIR/tfx $BASEDIR/package_build/$CONFIG_NAME/
30+
ln -sf $BASEDIR/README*.md $BASEDIR/package_build/$CONFIG_NAME/
31+
32+
rm -rf $BASEDIR/package_build/$CONFIG_NAME/build
33+
mkdir $BASEDIR/package_build/$CONFIG_NAME/build
34+
ln -sf $BASEDIR/build/BUILD $BASEDIR/package_build/$CONFIG_NAME/build/
35+
ln -sf $BASEDIR/build/gen_proto.sh $BASEDIR/package_build/$CONFIG_NAME/build/
36+
done
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# Lint as: python2, python3
2-
# Copyright 2019 Google LLC. All Rights Reserved.
1+
# Lint as: python3
2+
# Copyright 2020 Google LLC. All Rights Reserved.
33
#
44
# Licensed under the Apache License, Version 2.0 (the "License");
55
# you may not use this file except in compliance with the License.
@@ -12,8 +12,8 @@
1212
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
15+
"""Configuration for the "ml-pipelines-sdk" package.
1516
16-
"""Init module for TFX."""
17-
18-
# Import version string.
19-
from tfx.version import __version__
17+
Core TFX pipeline authoring SDK, with a minimal set of dependencies.
18+
"""
19+
PACKAGE_NAME = 'ml-pipelines-sdk'

package_build/tfx/package_config.py

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Lint as: python3
2+
# Copyright 2020 Google LLC. All Rights Reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
"""Configuration for the "tfx" package.
16+
17+
Recommended installation package for TFX. This package builds on top of
18+
the "ml-pipelines-sdk" component-authoring SDK package and adds first-party TFX
19+
components and additional functionality.
20+
"""
21+
PACKAGE_NAME = 'tfx'

package_config.py

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Lint as: python3
2+
# Copyright 2020 Google LLC. All Rights Reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
"""Configuration for the "tfx-dev" package.
16+
17+
Monolithic development package with the entirety of `tfx.*` and the full
18+
set of dependencies.
19+
20+
Once installed, this is functionally equivalent to the union of the "tfx" and
21+
"ml-pipeline-sdk" packages, and thus cannot be installed together with the
22+
latter two packages.
23+
24+
See `package_build/README.md` for packaging details.
25+
"""
26+
PACKAGE_NAME = 'tfx-dev'

setup.py

+72-16
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,13 @@
1616

1717
from __future__ import print_function
1818

19+
import logging
1920
import os
2021
import subprocess
2122
import sys
2223

2324
import setuptools
24-
from setuptools import find_packages
25+
from setuptools import find_namespace_packages
2526
from setuptools import setup
2627
from setuptools.command import develop
2728
# pylint: disable=g-bad-import-order
@@ -37,6 +38,15 @@
3738
from tfx.tools import resolve_deps
3839
from wheel import bdist_wheel
3940

41+
# Prefer to import `package_config` from the setup.py script's directory. The
42+
# `package_config.py` file is used to configure which package to build (see
43+
# the logic below switching on `package_config.PACKAGE_NAME`) and the overall
44+
# package build README at `package_build/README.md`.
45+
sys.path.insert(0, os.path.dirname(__file__))
46+
# pylint: disable=g-bad-import-order,g-import-not-at-top
47+
import package_config
48+
# pylint: enable=g-bad-import-order,g-import-not-at-top
49+
4050

4151
class _BdistWheelCommand(bdist_wheel.bdist_wheel):
4252
"""Overrided bdist_wheel command.
@@ -160,13 +170,67 @@ def run(self):
160170
env=os.environ)
161171

162172

163-
# Get the long description from the README file.
173+
# Get the long descriptions from README files.
164174
with open('README.md') as fp:
165-
_LONG_DESCRIPTION = fp.read()
175+
_TFX_LONG_DESCRIPTION = fp.read()
176+
with open('README.ml-pipelines-sdk.md') as fp:
177+
_PIPELINES_SDK_LONG_DESCRIPTION = fp.read()
178+
179+
package_name = package_config.PACKAGE_NAME
180+
tfx_extras_requires = {
181+
# In order to use 'docker-image' or 'all', system libraries specified
182+
# under 'tfx/tools/docker/Dockerfile' are required
183+
'docker-image': dependencies.make_extra_packages_docker_image(),
184+
'tfjs': dependencies.make_extra_packages_tfjs(),
185+
'examples': dependencies.make_extra_packages_examples(),
186+
'test': dependencies.make_extra_packages_test(),
187+
'all': dependencies.make_extra_packages_all(),
188+
}
189+
ml_pipelines_sdk_packages = ['tfx.dsl', 'tfx.dsl.*']
166190

191+
# This `setup.py` file can be used to build packages in 3 configurations. See
192+
# the discussion in `package_build/README.md` for an overview. The `tfx` and
193+
# `ml-pipelines-sdk` pip packages can be built for distribution using the
194+
# selectable `package_config.PACKAGE_NAME` specifier. Additionally, for
195+
# development convenience, the `tfx-dev` package containing the union of the
196+
# the `tfx` and `ml-pipelines-sdk` package can be installed as an editable
197+
# package using `pip install -e .`, but should not be built for distribution.
198+
if package_config.PACKAGE_NAME == 'tfx-dev':
199+
# Monolithic development package with the entirety of `tfx.*` and the full
200+
# set of dependencies. Functionally equivalent to the union of the "tfx" and
201+
# "tfx-pipeline-sdk" packages.
202+
install_requires = dependencies.make_required_install_packages()
203+
extras_require = tfx_extras_requires
204+
long_description = _TFX_LONG_DESCRIPTION
205+
packages = find_namespace_packages(include=['tfx', 'tfx.*'])
206+
# TODO(b/174503231): Remove the override on the following line and prevent
207+
# build of wheels from a monolithic "tfx" package.
208+
package_name = 'tfx'
209+
elif package_config.PACKAGE_NAME == 'ml-pipelines-sdk':
210+
# Core TFX pipeline authoring SDK, without dependency on component-specific
211+
# packages like "tensorflow" and "apache-beam".
212+
install_requires = dependencies.make_pipeline_sdk_required_install_packages()
213+
extras_require = {}
214+
long_description = _PIPELINES_SDK_LONG_DESCRIPTION
215+
packages = find_namespace_packages(include=ml_pipelines_sdk_packages)
216+
elif package_config.PACKAGE_NAME == 'tfx':
217+
# Recommended installation package for TFX. This package builds on top of
218+
# the "ml-pipelines-sdk" pipeline authoring SDK package and adds first-party
219+
# TFX components and additional functionality.
220+
install_requires = (
221+
['ml-pipelines-sdk==%s' % version.__version__] +
222+
dependencies.make_required_install_packages())
223+
extras_require = tfx_extras_requires
224+
long_description = _TFX_LONG_DESCRIPTION
225+
packages = find_namespace_packages(
226+
include=['tfx', 'tfx.*'], exclude=ml_pipelines_sdk_packages)
227+
else:
228+
raise ValueError('Invalid package config: %r.' % package_config.PACKAGE_NAME)
229+
230+
logging.info('Executing build for package %r.', package_name)
167231

168232
setup(
169-
name='tfx',
233+
name=package_name,
170234
version=version.__version__,
171235
author='Google LLC',
172236
author_email='[email protected]',
@@ -192,16 +256,8 @@ def run(self):
192256
'Topic :: Software Development :: Libraries :: Python Modules',
193257
],
194258
namespace_packages=[],
195-
install_requires=dependencies.make_required_install_packages(),
196-
extras_require={
197-
# In order to use 'docker-image' or 'all', system libraries specified
198-
# under 'tfx/tools/docker/Dockerfile' are required
199-
'docker-image': dependencies.make_extra_packages_docker_image(),
200-
'tfjs': dependencies.make_extra_packages_tfjs(),
201-
'examples': dependencies.make_extra_packages_examples(),
202-
'test': dependencies.make_extra_packages_test(),
203-
'all': dependencies.make_extra_packages_all(),
204-
},
259+
install_requires=install_requires,
260+
extras_require=extras_require,
205261
# TODO(b/158761800): Move to [build-system] requires in pyproject.toml.
206262
setup_requires=[
207263
'pytest-runner',
@@ -219,10 +275,10 @@ def run(self):
219275
'resolve_deps': resolve_deps.ResolveDepsCommand,
220276
},
221277
python_requires='>=3.6,<3.9',
222-
packages=find_packages(),
278+
packages=packages,
223279
include_package_data=True,
224280
description='TensorFlow Extended (TFX) is a TensorFlow-based general-purpose machine learning platform implemented at Google',
225-
long_description=_LONG_DESCRIPTION,
281+
long_description=long_description,
226282
long_description_content_type='text/markdown',
227283
keywords='tensorflow tfx',
228284
url='https://www.tensorflow.org/tfx',

tfx/dependencies.py

+15-10
Original file line numberDiff line numberDiff line change
@@ -50,12 +50,25 @@ def select_constraint(default, nightly=None, git_master=None):
5050
return default
5151

5252

53+
def make_pipeline_sdk_required_install_packages():
54+
return [
55+
'absl-py>=0.9,<0.11',
56+
'ml-metadata' + select_constraint(
57+
# LINT.IfChange
58+
default='>=0.25,<0.26',
59+
# LINT.ThenChange(tfx/workspace.bzl)
60+
nightly='>=0.26.0.dev',
61+
git_master='@git+https://github.com/google/ml-metadata@master'),
62+
'protobuf>=3.12.2,<4',
63+
'six>=1.10,<2',
64+
]
65+
66+
5367
def make_required_install_packages():
5468
# Make sure to sync the versions of common dependencies (absl-py, numpy,
5569
# six, and protobuf) with TF.
5670
# TODO(b/130767399): add flask once the frontend is exposed externally.
57-
return [
58-
'absl-py>=0.9,<0.11',
71+
return make_pipeline_sdk_required_install_packages() + [
5972
# LINT.IfChange
6073
'apache-beam[gcp]>=2.25,<3',
6174
# LINT.ThenChange(examples/chicago_taxi_pipeline/setup/setup_beam.sh)
@@ -69,16 +82,8 @@ def make_required_install_packages():
6982
# dependency expecatation with TensorFlow is sorted out.
7083
'keras-tuner>=1,<1.0.2',
7184
'kubernetes>=10.0.1,<12',
72-
'ml-metadata' + select_constraint(
73-
# LINT.IfChange
74-
default='>=0.25,<0.26',
75-
# LINT.ThenChange(tfx/workspace.bzl)
76-
nightly='>=0.26.0.dev',
77-
git_master='@git+https://github.com/google/ml-metadata@master'),
78-
'protobuf>=3.12.2,<4',
7985
'pyarrow>=0.17,<0.18',
8086
'pyyaml>=3.12,<6',
81-
'six>=1.10,<2',
8287
'tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<3',
8388
'tensorflow-hub>=0.9.0,<0.10',
8489
# TODO(b/159488890): remove user module-only dependency.

0 commit comments

Comments
 (0)