Skip to content

Commit 2da5fea

Browse files
authored
[REVIEW] gquant 1.0.1 (#115)
* added external plugin * upgrade the build.sh jupyterlab 2 support * change version to 1.0.1
1 parent 6415f44 commit 2da5fea

File tree

14 files changed

+401
-9
lines changed

14 files changed

+401
-9
lines changed

CHANGELOG.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,16 @@
11
# Changelog
22

3-
## [1.0](https://github.com/rapidsai/gQuant/tree/1.0) (2020-12-17)
3+
## [v1.0.1](https://github.com/rapidsai/gQuant/tree/v1.0.1) (2021-01-20)
44

5-
[Full Changelog](https://github.com/rapidsai/gQuant/compare/0.5...1.0)
5+
[Full Changelog](https://github.com/rapidsai/gQuant/compare/v1.0.0...v1.0.1)
6+
7+
**Merged pull requests:**
8+
9+
- \[REVIEW\] Simple external plugin example [\#113](https://github.com/rapidsai/gQuant/pull/113) ([yidong72](https://github.com/yidong72))
10+
11+
## [v1.0.0](https://github.com/rapidsai/gQuant/tree/v1.0.0) (2020-12-30)
12+
13+
[Full Changelog](https://github.com/rapidsai/gQuant/compare/0.5...v1.0.0)
614

715
**Closed issues:**
816

@@ -15,13 +23,15 @@
1523

1624
- \[REVIEW\]gQuant plugin implementation [\#112](https://github.com/rapidsai/gQuant/pull/112) ([yidong72](https://github.com/yidong72))
1725
- Gpuciscripts clean and update [\#111](https://github.com/rapidsai/gQuant/pull/111) ([msadang](https://github.com/msadang))
26+
- \[REVIEW\] gQuant 1.0 [\#110](https://github.com/rapidsai/gQuant/pull/110) ([yidong72](https://github.com/yidong72))
1827
- Streamz gQuant example 2 [\#109](https://github.com/rapidsai/gQuant/pull/109) ([yidong72](https://github.com/yidong72))
1928
- Revert "Streamz gQuant example" [\#108](https://github.com/rapidsai/gQuant/pull/108) ([yidong72](https://github.com/yidong72))
2029
- Streamz gQuant example [\#107](https://github.com/rapidsai/gQuant/pull/107) ([yidong72](https://github.com/yidong72))
2130
- Bump node-fetch from 2.6.0 to 2.6.1 in /gquantlab [\#104](https://github.com/rapidsai/gQuant/pull/104) ([dependabot[bot]](https://github.com/apps/dependabot))
2231
- Nemo and xgboost integration [\#103](https://github.com/rapidsai/gQuant/pull/103) ([yidong72](https://github.com/yidong72))
2332
- FIX Update change log check [\#102](https://github.com/rapidsai/gQuant/pull/102) ([mike-wendt](https://github.com/mike-wendt))
2433
- \[REVIEW\] Update CI scripts to remove references to master \[skip ci\] [\#99](https://github.com/rapidsai/gQuant/pull/99) ([dillon-cullinan](https://github.com/dillon-cullinan))
34+
- \[skip ci\] Update master references for main branch [\#98](https://github.com/rapidsai/gQuant/pull/98) ([ajschmidt8](https://github.com/ajschmidt8))
2535
- \[REVIEW\]gQuant UI, first version [\#89](https://github.com/rapidsai/gQuant/pull/89) ([yidong72](https://github.com/yidong72))
2636

2737
## [0.5](https://github.com/rapidsai/gQuant/tree/0.5) (2020-07-10)

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,10 @@ To install JupyterLab plugin, install the following dependence libraries:
4444
```bash
4545
conda install nodejs ipywidgets
4646
```
47+
Build the ipywidgets Jupyterlab plugin
48+
```bash
49+
jupyter labextension install @jupyter-widgets/[email protected]
50+
```
4751
Then install the gquantlab lib:
4852
```bash
4953
pip install gquantlab==0.1.1

docker/build.sh

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ echo -e "\nPlease, select your CUDA version:\n" \
3939

4040
read -p "Enter your option and hit return [1]-3: " CUDA_VERSION
4141

42-
RAPIDS_VERSION="0.14.1"
42+
RAPIDS_VERSION="0.17.0"
4343

4444
CUDA_VERSION=${CUDA_VERSION:-1}
4545
case $CUDA_VERSION in
@@ -158,7 +158,7 @@ RUN conda install -y -c conda-forge jupyterlab'<3.0.0'
158158
RUN conda install -y -c conda-forge python-graphviz bqplot nodejs ipywidgets \
159159
pytables mkl numexpr pydot flask pylint flake8 autopep8
160160
161-
RUN jupyter labextension install @jupyter-widgets/jupyterlab-manager --no-build
161+
RUN jupyter labextension install @jupyter-widgets/jupyterlab-manager@2.0 --no-build
162162
RUN jupyter labextension install bqplot --no-build
163163
#RUN jupyter labextension install jupyterlab-nvdashboard --no-build
164164
RUN jupyter lab build && jupyter lab clean
@@ -169,7 +169,7 @@ RUN pip install jupyterlab-nvdashboard
169169
RUN jupyter labextension install jupyterlab-nvdashboard
170170
171171
## install the dask extension
172-
RUN pip install dask_labextension
172+
RUN pip install "dask_labextension<5.0.0"
173173
RUN jupyter labextension install dask-labextension
174174
RUN jupyter serverextension enable dask_labextension
175175
@@ -289,9 +289,10 @@ index 901a79af..4eb76f95 100644
289289
@@ -14,4 +14,4 @@ unidecode
290290
webdataset
291291
kaldi-python-io
292-
librosa<=0.7.2
292+
-librosa<=0.7.2
293+
+librosa<=0.8.0
293294
-numba<=0.48
294-
+numba==0.49.1
295+
+numba==0.52.0
295296
diff --git a/requirements/requirements_nlp.txt b/requirements/requirements_nlp.txt
296297
index 885adf3e..0e4e44e2 100644
297298
--- a/requirements/requirements_nlp.txt

external/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
## Simple External Plugin Example
2+
3+
This is a simple example to show how to write an external gQuant plugin. gQuant take advantage of the `entry point` inside the `setup.py` file to register the plugin. gQuant can discover all the plugins that has the entry point group name `gquant.plugin`. Check the `setup.py` file to see details.
4+
5+
### Create an new Python enviroment
6+
```bash
7+
conda create -n test python=3.8
8+
```
9+
10+
### Install the gQuant lib
11+
To install the gQuant graph computation library, first install the dependence libraries:
12+
```bash
13+
pip install dask[dataframe] distributed networkx
14+
conda install python-graphviz ruamel.yaml numpy pandas
15+
```
16+
Then install gquant lib:
17+
```bash
18+
pip install gquant
19+
```
20+
21+
### Install the gQuantlab plugin
22+
To install JupyterLab plugin, install the following dependence libraries:
23+
```bash
24+
conda install nodejs ipywidgets
25+
```
26+
Build the ipywidgets Jupyterlab plugin
27+
```bash
28+
jupyter labextension install @jupyter-widgets/[email protected]
29+
```
30+
Then install the gquantlab lib:
31+
```bash
32+
pip install gquantlab
33+
```
34+
If you launch the JupyterLab, it will prompt to build the new plugin. You can
35+
explicitly build it by:
36+
```bash
37+
jupyter lab build
38+
```
39+
40+
### Install the external example plugin
41+
To install the external plugin, in the plugin diretory, run following command
42+
```bash
43+
pip install .
44+
```
45+
46+
### Launch the Jupyter lab
47+
After launching the JupyterLab by,
48+
```bash
49+
jupyter-lab --allow-root --ip=0.0.0.0 --no-browser --NotebookApp.token=''
50+
```
51+
You can see the `DistanceNode` and `PointNode` under the name `custom_node` in the menu.

external/example/__init__.py

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
from .distanceNode import DistanceNode
2+
from .pointNode import PointNode
3+
import pandas as pd
4+
import numpy as np
5+
from .client import validation, display # noqa: F40
6+
from gquant.dataframe_flow._node_flow import register_validator
7+
from gquant.dataframe_flow._node_flow import register_copy_function
8+
9+
10+
def _validate_df(df_to_val, ref_cols, obj):
11+
'''Validate a pandas DataFrame.
12+
13+
:param df_to_val: A dataframe typically of type pd.DataFrame
14+
:param ref_cols: Dictionary of column names and their expected types.
15+
:returns: True or False based on matching all columns in the df_to_val
16+
and columns spec in ref_cols.
17+
:raises: Exception - Raised when invalid dataframe length or unexpected
18+
number of columns. TODO: Create a ValidationError subclass.
19+
20+
'''
21+
if (isinstance(df_to_val, pd.DataFrame) and len(df_to_val) == 0):
22+
err_msg = 'Node "{}" produced empty output'.format(obj.uid)
23+
raise Exception(err_msg)
24+
25+
if not isinstance(df_to_val, pd.DataFrame):
26+
return True
27+
28+
i_cols = df_to_val.columns
29+
if len(i_cols) != len(ref_cols):
30+
print("expect %d columns, only see %d columns"
31+
% (len(ref_cols), len(i_cols)))
32+
print("ref:", ref_cols)
33+
print("columns", i_cols)
34+
raise Exception("not valid for node %s" % (obj.uid))
35+
36+
for col in ref_cols.keys():
37+
if col not in i_cols:
38+
print("error for node %s, column %s is not in the required "
39+
"output df" % (obj.uid, col))
40+
return False
41+
42+
if ref_cols[col] is None:
43+
continue
44+
45+
err_msg = "for node {} type {}, column {} type {} "\
46+
"does not match expected type {}".format(
47+
obj.uid, type(obj), col, df_to_val[col].dtype,
48+
ref_cols[col])
49+
50+
if ref_cols[col] == 'category':
51+
# comparing pandas.core.dtypes.dtypes.CategoricalDtype to
52+
# numpy.dtype causes TypeError. Instead, let's compare
53+
# after converting all types to their string representation
54+
# d_type_tuple = (pd.core.dtypes.dtypes.CategoricalDtype(),)
55+
d_type_tuple = (str(pd.CategoricalDtype()),)
56+
elif ref_cols[col] == 'date':
57+
# Cudf read_csv doesn't understand 'datetime64[ms]' even
58+
# though it reads the data in as 'datetime64[ms]', but
59+
# expects 'date' as dtype specified passed to read_csv.
60+
d_type_tuple = ('datetime64[ms]', 'date', 'datetime64[ns]')
61+
else:
62+
d_type_tuple = (str(np.dtype(ref_cols[col])),)
63+
64+
if (str(df_to_val[col].dtype) not in d_type_tuple):
65+
print("ERROR: {}".format(err_msg))
66+
# Maybe raise an exception here and have the caller
67+
# try/except the validation routine.
68+
return False
69+
return True
70+
71+
72+
def copy_df(df_obj):
73+
return df_obj.copy(deep=False)
74+
75+
76+
register_validator(pd.DataFrame, _validate_df)
77+
register_copy_function(pd.DataFrame, copy_df)

external/example/client.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
2+
display_fun = """
3+
const columnKeys = Object.keys(metaObj);
4+
let header = '';
5+
if (columnKeys.length > 0) {
6+
header += '<table>';
7+
header += '<tr>';
8+
header += '<th>Column Name</th>';
9+
for (let i = 0; i < columnKeys.length; i++) {
10+
header += `<th>${columnKeys[i]}</th>`;
11+
}
12+
header += '</tr>';
13+
header += '<tr>';
14+
header += '<th>Type</th>';
15+
for (let i = 0; i < columnKeys.length; i++) {
16+
header += `<td>${metaObj[columnKeys[i]]}</td>`;
17+
}
18+
header += '</tr>';
19+
header += '</table>';
20+
}
21+
return header;
22+
"""
23+
24+
validation = {}
25+
display = {}
26+
display['pandas.core.frame.DataFrame'] = display_fun

external/example/distanceNode.py

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
import pandas as pd
2+
import numpy as np
3+
from gquant.dataframe_flow import Node, MetaData
4+
from gquant.dataframe_flow import NodePorts, PortsSpecSchema
5+
from gquant.dataframe_flow import ConfSchema
6+
7+
8+
class DistanceNode(Node):
9+
10+
def ports_setup(self):
11+
port_type = PortsSpecSchema.port_type
12+
input_ports = {
13+
'points_df_in': {
14+
port_type: [pd.DataFrame]
15+
}
16+
}
17+
18+
output_ports = {
19+
'distance_df': {
20+
port_type: [pd.DataFrame]
21+
},
22+
'distance_abs_df': {
23+
PortsSpecSchema.port_type: [pd.DataFrame]
24+
}
25+
}
26+
input_connections = self.get_connected_inports()
27+
if 'points_df_in' in input_connections:
28+
types = input_connections['points_df_in']
29+
# connected, use the types passed in from parent
30+
return NodePorts(inports={'points_df_in': {port_type: types}},
31+
outports={'distance_df': {port_type: types},
32+
'distance_abs_df': {port_type: types},
33+
})
34+
else:
35+
return NodePorts(inports=input_ports, outports=output_ports)
36+
37+
def conf_schema(self):
38+
return ConfSchema()
39+
40+
def init(self):
41+
self.delayed_process = True
42+
43+
def meta_setup(self):
44+
req_cols = {
45+
'x': 'float64',
46+
'y': 'float64'
47+
}
48+
required = {
49+
'points_df_in': req_cols,
50+
}
51+
input_meta = self.get_input_meta()
52+
output_cols = ({
53+
'distance_df': {
54+
'distance_cudf': 'float64',
55+
'x': 'float64',
56+
'y': 'float64'
57+
},
58+
'distance_abs_df': {
59+
'distance_abs_cudf': 'float64',
60+
'x': 'float64',
61+
'y': 'float64'
62+
}
63+
})
64+
if 'points_df_in' in input_meta:
65+
col_from_inport = input_meta['points_df_in']
66+
# additional ports
67+
output_cols['distance_df'].update(col_from_inport)
68+
output_cols['distance_abs_df'].update(col_from_inport)
69+
return MetaData(inports=required, outports=output_cols)
70+
71+
def process(self, inputs):
72+
df = inputs['points_df_in']
73+
output = {}
74+
if self.outport_connected('distance_df'):
75+
copy_df = df.copy()
76+
copy_df['distance_cudf'] = np.sqrt((df['x'] ** 2 + df['y'] ** 2))
77+
output.update({'distance_df': copy_df})
78+
if self.outport_connected('distance_abs_df'):
79+
copy_df = df.copy()
80+
copy_df['distance_abs_cudf'] = np.abs(df['x']) + np.abs(df['y'])
81+
output.update({'distance_abs_df': copy_df})
82+
return output

external/example/pointNode.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
import numpy as np
2+
import pandas as pd
3+
from gquant.dataframe_flow import Node, MetaData
4+
from gquant.dataframe_flow import NodePorts, PortsSpecSchema
5+
from gquant.dataframe_flow import ConfSchema
6+
7+
8+
class PointNode(Node):
9+
10+
def ports_setup(self):
11+
input_ports = {}
12+
output_ports = {
13+
'points_df_out': {
14+
PortsSpecSchema.port_type: pd.DataFrame
15+
}
16+
}
17+
return NodePorts(inports=input_ports, outports=output_ports)
18+
19+
def conf_schema(self):
20+
json = {
21+
"title": "PointNode configure",
22+
"type": "object",
23+
"properties": {
24+
"npts": {
25+
"type": "number",
26+
"description": "number of data points",
27+
"minimum": 10
28+
}
29+
},
30+
"required": ["npts"],
31+
}
32+
33+
ui = {
34+
"npts": {"ui:widget": "updown"}
35+
}
36+
return ConfSchema(json=json, ui=ui)
37+
38+
def init(self):
39+
pass
40+
41+
def meta_setup(self):
42+
columns_out = {
43+
'points_df_out': {
44+
'x': 'float64',
45+
'y': 'float64'
46+
},
47+
}
48+
return MetaData(inports={}, outports=columns_out)
49+
50+
def process(self, inputs):
51+
npts = self.conf['npts']
52+
df = pd.DataFrame()
53+
df['x'] = np.random.rand(npts)
54+
df['y'] = np.random.rand(npts)
55+
output = {}
56+
if self.outport_connected('points_df_out'):
57+
output.update({'points_df_out': df})
58+
return output

external/setup.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
from setuptools import setup, find_packages
2+
3+
setup(
4+
name='example_plugin',
5+
packages=find_packages(include=['example']),
6+
entry_points={
7+
'gquant.plugin': [
8+
'custom_nodes = example',
9+
],
10+
}
11+
)

0 commit comments

Comments
 (0)