Skip to content

Commit 9391b81

Browse files
authored
Merge pull request #10 from IBMStreams/develop
pypi.streamsx.hdfs: Merge develop to master branche
2 parents 03206f9 + 23f0c1c commit 9391b81

File tree

9 files changed

+2415
-93
lines changed

9 files changed

+2415
-93
lines changed

README.md

+49-1
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,32 @@ or
8484
ant test
8585

8686

87+
### Composite Test with local Streams instance
88+
89+
This test requires:
90+
- The environment variable `STREAMS_INSTALL` set and the Streams instance is running.
91+
- The environment variable `STREAMS_HDFS_TOOLKIT` set for the `com.ibm.streamsx.hdfs` toolkit location:
92+
- A runing Ibm HADOOP cluster with a running HDFS instance.
93+
- The environment variable `HDFS_SITE_XML` set to the HDFS configuration file `core.site.xml`.
94+
95+
This test copies the HDFS configiration file 'core.site.xml' in the application directory `etc` and perform the following tests:
96+
97+
- The standard operator `Beacon` generates 1000 lines.
98+
- The `HdfsFileSink` opeartor writes every 100 lines produced by Beacon operator in a new file in 'pytest' (sample41.txt, sample42.txt, ...)
99+
- The `HdfsDirectoryScan` operator scans the directory `pytest` and delivers HDFS file names in output port.
100+
- The `HdfsFileSource` operator reads HDFS files in directory `pytest` deliverd by HdfsDirectoryScan and returns the lines of files in output port.
101+
- The `HdfsDirectoryScan` operator scans the directory `pytest` and delivers HDFS file names in output port.
102+
- The `HdfsFileCopy` operator copies HDFS files from directory 'pytest' deliverd by HdfsDirectoryScan into local directory `/tmp/`
103+
104+
105+
```
106+
cd package
107+
python3 -u -m unittest streamsx.hdfs.tests.test_hdfs.TestCompositeDistributed
108+
```
109+
110+
111+
112+
87113
### Test with Streaming Analytics Service
88114

89115
This requires Streaming Analytics service and IBM Analytics Engine service in IBM cloud.
@@ -100,6 +126,29 @@ or
100126
ant test-sas
101127

102128

129+
### Composite Test with IBM Analytic Engine
130+
131+
This test requires:
132+
- The environment variable `STREAMS_INSTALL` set and the Streams instance is running.
133+
- The environment variable `STREAMS_HDFS_TOOLKIT` set for the `com.ibm.streamsx.hdfs` toolkit location:
134+
- A runing Ibm Analytics Engine HADOOP cluster with a running HDFS instance.
135+
- The environment variable `ANALYTICS_ENGINE` set to credential file that contains the Hadoop cluster webhdfs credentilas as a JSON string.
136+
137+
This test redas the IAE credentilas (HdfsUri, HdfsUser, HdfsPassword) from JSON file and perform the following tests:
138+
139+
- The standard operator `Beacon` generates 1000 lines.
140+
- The `HdfsFileSink` opeartor writes every 100 lines produced by Beacon operator in a new file in `pytest` directory
141+
`sample41.txt, sample42.txt, ...`
142+
- The `HdfsDirectoryScan` operator scans the directory `pytest` and delivers HDFS file names in output port.
143+
- The `HdfsFileSource` operator reads HDFS files in directory `pytest` deliverd by HdfsDirectoryScan and returns the lines of files in output port.
144+
145+
146+
```
147+
cd package
148+
python3 -u -m unittest streamsx.hdfs.tests.test_hdfs.TestCompositeWebHdfs
149+
```
150+
151+
103152
#### Remote build
104153

105154
For using the toolkit from the build service (**force_remote_build**) run the test with:
@@ -115,4 +164,3 @@ cd package
115164
python3 -u -m unittest streamsx.hdfs.tests.test_hdfs.TestCloudRemote.test_close_on_tuples streamsx.hdfs.tests.test_hdfs.TestCloudRemote.test_hdfs_uri
116165
```
117166

118-

package/DESC.txt

+1-36
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Overview
22
========
33

4-
Provides functions to access files on HDFS. For example, connect to IBM Analytics Engine on IBM Cloud.
4+
Provides functions and classes to access files on HDFS. For example, connect to IBM Analytics Engine on IBM Cloud.
55

66
This package exposes the `com.ibm.streamsx.hdfs <https://ibmstreams.github.io/streamsx.hdfs/>`_ toolkit as Python methods for use with Streaming Analytics service on
77
IBM Cloud and IBM Streams including IBM Cloud Pak for Data.
@@ -10,41 +10,6 @@ IBM Cloud and IBM Streams including IBM Cloud Pak for Data.
1010
* `IBM Streams developer community <https://developer.ibm.com/streamsdev/>`_
1111
* `IBM Analytics Engine <https://www.ibm.com/cloud/analytics-engine>`_
1212

13-
14-
Sample
15-
======
16-
17-
A simple hello world example of a Streams application writing string messages to
18-
a file to HDFS. Scan for created file on HDFS and read the content::
19-
20-
from streamsx.topology.topology import *
21-
from streamsx.topology.schema import CommonSchema, StreamSchema
22-
from streamsx.topology.context import submit
23-
import streamsx.hdfs as hdfs
24-
25-
credentials = json.load(credentials_analytics_engine_service)
26-
27-
topo = Topology('HDFSHelloWorld')
28-
29-
to_hdfs = topo.source(['Hello', 'World!'])
30-
to_hdfs = to_hdfs.as_string()
31-
32-
# Write a stream to HDFS
33-
hdfs.write(to_hdfs, credentials=credentials, file='/sample/hw.txt')
34-
35-
scanned = hdfs.scan(topo, credentials=credentials, directory='/sample', init_delay=10)
36-
37-
# read text file line by line
38-
r = hdfs.read(scanned, credentials=credentials)
39-
40-
# print each line (tuple)
41-
r.print()
42-
43-
submit('STREAMING_ANALYTICS_SERVICE', topo)
44-
# Use for IBM Streams including IBM Cloud Pak for Data
45-
# submit ('DISTRIBUTED', topo)
46-
47-
4813
Documentation
4914
=============
5015

package/docs/source/conf.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -65,9 +65,9 @@
6565
# built documents.
6666
#
6767
# The short X.Y version.
68-
version = '1.4'
68+
version = '1.5'
6969
# The full version, including alpha/beta/rc tags.
70-
release = '1.4.0'
70+
release = '1.5.0'
7171

7272
# The language for content autogenerated by Sphinx. Refer to documentation
7373
# for a list of supported languages.

package/docs/source/index.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
streamsx.hdfs package
22
#####################
33

4-
IBM Streams HDFS integration
5-
============================
4+
HDFS integration for IBM Streams
5+
================================
66

77
For details of implementing applications in Python
88
for IBM Streams including IBM Cloud Pak for Data and the Streaming Analytics service

package/setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
packages = ['streamsx.hdfs'],
66
include_package_data=True,
77
version = streamsx.hdfs.__version__,
8-
description = 'IBM Streams HDFS integration',
8+
description = 'HDFS integration for IBM Streams',
99
long_description = open('DESC.txt').read(),
1010
author = 'IBM Streams @ github.com',
1111
author_email = '[email protected]',

package/streamsx/hdfs/__init__.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@
6565
6666
"""
6767

68-
__version__='1.4.0'
68+
__version__='1.5.0'
6969

70-
__all__ = ['download_toolkit', 'configure_connection', 'scan', 'read', 'write']
71-
from streamsx.hdfs._hdfs import download_toolkit, configure_connection, scan, read, write, copy
70+
__all__ = ['HdfsDirectoryScan', 'HdfsFileSink', 'HdfsFileSource', 'HdfsFileCopy', 'download_toolkit', 'configure_connection', 'scan', 'read', 'write']
71+
from streamsx.hdfs._hdfs import download_toolkit, configure_connection, scan, read, write, copy, HdfsDirectoryScan, HdfsFileSink, HdfsFileSource, HdfsFileCopy

0 commit comments

Comments
 (0)