Skip to content

Commit c874a7a

Browse files
vatjrobzor92
andauthored
[FSTORE-1371] Move to pyproject, add badge for supported python version (#1307)
* Add python version and ruff badge * Update README.md * Add info to pyproject.toml * Move to pyproject.toml * Update ruff, minor adjustments toml * Switch email * Add minor Readme changes, further needed * Minor readme update for app connection * Correct typo * Remove comment about not existing setup section * Fix missing new_line * Apply ruff format manually to non-changed files * Update ruff * Remove manifest.in * remove ruff pin version * Include actual package * Remove console.log * Add upper limit to the python version * Update README.md Co-authored-by: Robin Andersson <[email protected]> --------- Co-authored-by: Robin Andersson <[email protected]>
1 parent 845c90c commit c874a7a

13 files changed

+195
-157
lines changed

.github/workflows/python-lint.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
- 'python/tests/**/*.py'
2727
2828
- name: install deps
29-
run: pip install ruff==0.3.5
29+
run: pip install ruff==0.4.2
3030

3131
- name: ruff on python files
3232
if: steps.get-changed-files.outputs.src_any_changed == 'true'

README.md

+37-8
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@
99
src="https://img.shields.io/badge/docs-HSFS-orange"
1010
alt="Hopsworks Feature Store Documentation"
1111
/></a>
12+
<a><img
13+
src="https://img.shields.io/badge/python-3.8+-blue"
14+
alt="python"
15+
/></a>
1216
<a href="https://pypi.org/project/hsfs/"><img
1317
src="https://img.shields.io/pypi/v/hsfs?color=blue"
1418
alt="PyPiStatus"
@@ -21,9 +25,9 @@
2125
src="https://pepy.tech/badge/hsfs/month"
2226
alt="Downloads"
2327
/></a>
24-
<a href="https://github.com/psf/black"><img
25-
src="https://img.shields.io/badge/code%20style-black-000000.svg"
26-
alt="CodeStyle"
28+
<a href=https://github.com/astral-sh/ruff><img
29+
src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json"
30+
alt="Ruff"
2731
/></a>
2832
<a><img
2933
src="https://img.shields.io/pypi/l/hsfs?color=green"
@@ -41,19 +45,44 @@ The library is environment independent and can be used in two modes:
4145

4246
The library automatically configures itself based on the environment it is run.
4347
However, to connect from an external environment such as Databricks or AWS Sagemaker,
44-
additional connection information, such as host and port, is required. For more information about the setup from external environments, see the setup section.
48+
additional connection information, such as host and port, is required. For more information checkout the [Hopsworks documentation](https://docs.hopsworks.ai/latest/).
4549

4650
## Getting Started On Hopsworks
4751

48-
Instantiate a connection and get the project feature store handler
52+
Get started easily by registering an account on [Hopsworks Serverless](https://app.hopsworks.ai/). Create your project and a [new Api key](https://docs.hopsworks.ai/latest/user_guides/projects/api_key/create_api_key/). In a new python environment with Python 3.8 or higher, install the [client library](https://docs.hopsworks.ai/latest/user_guides/client_installation/) using pip:
53+
54+
```bash
55+
# Get all Hopsworks SDKs: Feature Store, Model Serving and Platform SDK
56+
pip install hopsworks
57+
# or minimum install with the Feature Store SDK
58+
pip install hsfs[python]
59+
# if using zsh don't forget the quotes
60+
pip install 'hsfs[python]'
61+
```
62+
63+
You can start a notebook and instantiate a connection and get the project feature store handler.
64+
65+
```python
66+
import hopsworks
67+
68+
project = hopsworks.login() # you will be prompted for your api key
69+
fs = project.get_feature_store()
70+
```
71+
72+
or using `hsfs` directly:
73+
4974
```python
5075
import hsfs
5176

52-
connection = hsfs.connection()
77+
connection = hsfs.connection(
78+
host="c.app.hopsworks.ai", #
79+
project="your-project",
80+
api_key_value="your-api-key",
81+
)
5382
fs = connection.get_feature_store()
5483
```
5584

56-
Create a new feature group
85+
Create a new feature group to start inserting feature values.
5786
```python
5887
fg = fs.create_feature_group("rain",
5988
version=1,
@@ -135,7 +164,7 @@ You can find more examples on how to use the library in our [hops-examples](http
135164

136165
## Usage
137166

138-
Usage data is collected for improving quality of the library. It is turned on by default if the backend
167+
Usage data is collected for improving quality of the library. It is turned on by default if the backend
139168
is "c.app.hopsworks.ai". To turn it off, use one of the following way:
140169
```python
141170
# use environment variable

docs/css/dropdown.css

+2-5
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,7 @@
1212
.md-tabs {
1313
overflow: inherit;
1414
}
15-
/*
16-
.md-header {
17-
z-index: 1000;
18-
} */
15+
1916

2017
/* The container <div> - needed to position the dropdown content */
2118
.dropdown {
@@ -55,4 +52,4 @@
5552
}
5653

5754
/* Change the background color of the dropdown button when the dropdown content is shown */
58-
.dropdown:hover .dropbtn {}
55+
.dropdown:hover .dropbtn {}

docs/js/inject-api-links.js

-2
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,11 @@ window.addEventListener("DOMContentLoaded", function () {
99
document.getElementById("hsml_api_link").href = "https://docs.hopsworks.ai/machine-learning-api/" + windowPathNameSplits[1] + "/generated/connection_api/";
1010
} else { // on docs.hopsworks.api/feature-store-api/3.0 / docs.hopsworks.api/hopsworks-api/3.0 / docs.hopsworks.api/machine-learning-api/3.0
1111
if (latestRegex.test(windowPathNameSplits[2]) || latestRegex.test(windowPathNameSplits[1])) {
12-
console.log("latest version");
1312
var majorVersion = "latest";
1413
} else {
1514

1615
var apiVersion = windowPathNameSplits[2];
1716
var majorVersion = apiVersion.match(majorVersionRegex)[0];
18-
console.log("specific version", majorVersion);
1917
}
2018
// Version main navigation
2119
document.getElementsByClassName("md-tabs__link")[0].href = "https://docs.hopsworks.ai/" + majorVersion;

docs/js/version-select.js

-2
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,9 @@ window.addEventListener("DOMContentLoaded", function() {
2929
return i.version === CURRENT_VERSION ||
3030
i.aliases.includes(CURRENT_VERSION);
3131
}).version;
32-
console.log("Current version: " + realVersion);
3332
var latestVersion = versions.find(function(i) {
3433
return i.aliases.includes("latest");
3534
}).version;
36-
console.log("Latest version: " + latestVersion);
3735
let outdated_banner = document.querySelector('div[data-md-color-scheme="default"][data-md-component="outdated"]');
3836
if (realVersion !== latestVersion) {
3937
outdated_banner.removeAttribute("hidden");

mkdocs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ nav:
4545
- EmbeddingIndex: generated/api/embedding_index_api.md
4646
- EmbeddingFeature: generated/api/embedding_feature_api.md
4747
- SimilarityFunctionType: generated/api/similarity_function_type_api.md
48+
# Added to allow navigation using the side drawer
4849
- Hopsworks API: https://docs.hopsworks.ai/hopsworks-api/latest/
4950
- MLOps API: https://docs.hopsworks.ai/machine-learning-api/latest/
5051
- Feature Store JavaDoc: https://docs.hopsworks.ai/feature-store-javadoc/latest/

python/.pre-commit-config.yaml

+2-7
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,8 @@
11
exclude: setup.py
22
repos:
33
- repo: https://github.com/astral-sh/ruff-pre-commit
4-
rev: v0.3.5
4+
rev: v0.4.2
55
hooks:
66
- id: ruff
77
args: [--fix]
8-
- id: ruff-format
9-
- repo: https://github.com/pre-commit/pre-commit-hooks
10-
rev: v4.3.0
11-
hooks:
12-
- id: trailing-whitespace
13-
- id: end-of-file-fixer
8+
- id: ruff-format

python/MANIFEST.in

-2
This file was deleted.

python/hsfs/core/opensearch.py

+25-15
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,10 @@
3434

3535

3636
def _is_timeout(exception):
37-
return (isinstance(exception, urllib3.exceptions.ReadTimeoutError)
38-
or isinstance(exception, ConnectionTimeout))
37+
return isinstance(exception, urllib3.exceptions.ReadTimeoutError) or isinstance(
38+
exception, ConnectionTimeout
39+
)
40+
3941

4042
def _handle_opensearch_exception(func):
4143
@wraps(func)
@@ -44,15 +46,18 @@ def error_handler_wrapper(*args, **kw):
4446
return func(*args, **kw)
4547
except Exception as e:
4648
if _is_timeout(e):
47-
raise FeatureStoreException(OpenSearchClientSingleton.TIMEOUT_ERROR_MSG) from e
49+
raise FeatureStoreException(
50+
OpenSearchClientSingleton.TIMEOUT_ERROR_MSG
51+
) from e
4852
else:
4953
raise e
5054

5155
return error_handler_wrapper
5256

57+
5358
class OpensearchRequestOption:
5459
DEFAULT_OPTION_MAP = {
55-
"timeout": 30,
60+
"timeout": 30,
5661
}
5762

5863
@classmethod
@@ -122,18 +127,24 @@ def _refresh_opensearch_connection(self):
122127
@_handle_opensearch_exception
123128
def search(self, index=None, body=None, options=None):
124129
try:
125-
return self._opensearch_client.search(body=body, index=index, params=OpensearchRequestOption.get_options(options))
130+
return self._opensearch_client.search(
131+
body=body,
132+
index=index,
133+
params=OpensearchRequestOption.get_options(options),
134+
)
126135
except (ConnectionError, AuthenticationException):
127136
# OpenSearchConnectionError occurs when connection is closed.
128137
# OpenSearchAuthenticationException occurs when jwt is expired
129138
self._refresh_opensearch_connection()
130-
return self._opensearch_client.search(body=body, index=index,
131-
params=OpensearchRequestOption.get_options(options))
139+
return self._opensearch_client.search(
140+
body=body,
141+
index=index,
142+
params=OpensearchRequestOption.get_options(options),
143+
)
132144
except RequestError as e:
133145
caused_by = e.info.get("error") and e.info["error"].get("caused_by")
134146
if caused_by and caused_by["type"] == "illegal_argument_exception":
135-
raise self._create_vector_database_exception(
136-
caused_by["reason"]) from e
147+
raise self._create_vector_database_exception(caused_by["reason"]) from e
137148
raise VectorDatabaseException(
138149
VectorDatabaseException.OTHERS,
139150
f"Error in Opensearch request: {e}",
@@ -147,9 +158,10 @@ def search(self, index=None, body=None, options=None):
147158
)
148159
@_handle_opensearch_exception
149160
def count(self, index, body=None, options=None):
150-
result = self._opensearch_client.count(index=index, body=body,
151-
params=OpensearchRequestOption.get_options(options))
152-
return result['count']
161+
result = self._opensearch_client.count(
162+
index=index, body=body, params=OpensearchRequestOption.get_options(options)
163+
)
164+
return result["count"]
153165

154166
def close(self):
155167
if self._opensearch_client:
@@ -166,9 +178,7 @@ def _create_vector_database_exception(self, message):
166178
f"Illegal argument in vector database request: "
167179
f"Requested k is too large, it needs to be less than {k}."
168180
)
169-
info = {
170-
VectorDatabaseException.REQUESTED_K_TOO_LARGE_INFO_K: int(
171-
k)}
181+
info = {VectorDatabaseException.REQUESTED_K_TOO_LARGE_INFO_K: int(k)}
172182
else:
173183
reason = VectorDatabaseException.REQUESTED_K_TOO_LARGE
174184
message = "Illegal argument in vector database request: Requested k is too large."

python/hsfs/embedding.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -342,9 +342,7 @@ def count(self, options: map = None):
342342
FeaturestoreException: If an error occurs during the count operation.
343343
"""
344344
if self._vector_db_client is None:
345-
self._vector_db_client = VectorDbClient(
346-
self._feature_group.select_all()
347-
)
345+
self._vector_db_client = VectorDbClient(self._feature_group.select_all())
348346
return self._vector_db_client.count(self.feature_group, options=options)
349347

350348
@classmethod

python/pyproject.toml

+117-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,118 @@
1+
[project]
2+
name = "hsfs"
3+
dynamic = ["version"]
4+
requires-python = ">=3.8,<=3.12"
5+
readme = "README.md"
6+
description = "HSFS Python SDK to interact with Hopsworks Feature Store"
7+
keywords = [
8+
"Hopsworks",
9+
"Feature Store",
10+
"hsfs",
11+
"Spark",
12+
"Machine Learning",
13+
"MLOps",
14+
"DataOps",
15+
]
16+
authors = [{ name = "Hopsworks AB", email = "[email protected]" }]
17+
license = { text = "Apache-2.0" }
18+
19+
classifiers = [
20+
"Development Status :: 5 - Production/Stable",
21+
"Topic :: Utilities",
22+
"License :: OSI Approved :: Apache Software License",
23+
"Programming Language :: Python :: 3",
24+
"Programming Language :: Python :: 3.8",
25+
"Programming Language :: Python :: 3.9",
26+
"Programming Language :: Python :: 3.10",
27+
"Programming Language :: Python :: 3.11",
28+
"Programming Language :: Python :: 3.12",
29+
"Intended Audience :: Developers",
30+
]
31+
32+
dependencies = [
33+
"pyhumps==1.6.1",
34+
"requests",
35+
"furl",
36+
"boto3",
37+
"pandas<2.2.0",
38+
"numpy<2",
39+
"pyjks",
40+
"mock",
41+
"avro==1.11.3",
42+
"sqlalchemy",
43+
"PyMySQL[rsa]",
44+
"great_expectations==0.18.12",
45+
"tzlocal",
46+
"fsspec",
47+
"retrying",
48+
"aiomysql[sa] @ git+https://[email protected]/logicalclocks/aiomysql",
49+
"polars>=0.20.18,<=0.21.0",
50+
"opensearch-py>=1.1.0,<=2.4.2",
51+
]
52+
53+
[project.optional-dependencies]
54+
dev = [
55+
"pytest==7.4.4",
56+
"pytest-mock==3.12.0",
57+
"ruff",
58+
"pyspark==3.1.1",
59+
"moto[s3]==5.0.0",
60+
"typeguard==4.2.1",
61+
]
62+
dev-pandas1 = [
63+
"pytest==7.4.4",
64+
"pytest-mock==3.12.0",
65+
"ruff",
66+
"pyspark==3.1.1",
67+
"moto[s3]==5.0.0",
68+
"pandas<=1.5.3",
69+
"sqlalchemy<=1.4.48",
70+
]
71+
docs = [
72+
"mkdocs==1.5.3",
73+
"mkdocs-material==9.5.17",
74+
"mike==2.0.0",
75+
"sphinx==7.2.6",
76+
"keras_autodoc @ git+https://[email protected]/logicalclocks/keras-autodoc",
77+
"markdown-include==0.8.1",
78+
"mkdocs-jupyter==0.24.3",
79+
"markdown==3.6",
80+
"pymdown-extensions==10.7.1",
81+
"mkdocs-macros-plugin==1.0.4",
82+
"mkdocs-minify-plugin>=0.2.0",
83+
]
84+
hive = [
85+
"pyhopshive[thrift]",
86+
"pyarrow>=10.0",
87+
"confluent-kafka<=2.3.0",
88+
"fastavro>=1.4.11,<=1.8.4",
89+
]
90+
python = [
91+
"pyhopshive[thrift]",
92+
"pyarrow>=10.0",
93+
"confluent-kafka<=2.3.0",
94+
"fastavro>=1.4.11,<=1.8.4",
95+
"tqdm",
96+
]
97+
98+
[build-system]
99+
requires = ["setuptools", "wheel"]
100+
build-backend = "setuptools.build_meta"
101+
102+
[tool.setuptools.packages.find]
103+
exclude = ["tests*"]
104+
include = ["../Readme.md", "../LICENSE", "hsfs"]
105+
106+
[tool.setuptools.dynamic]
107+
version = { attr = "hsfs.version.__version__" }
108+
109+
[project.urls]
110+
Documentation = "https://docs.hopsworks.ai/latest"
111+
Repository = "https://github.com/logicalclocks/feature-store-api"
112+
Homepage = "https://www.hopsworks.ai"
113+
Community = "https://community.hopsworks.ai"
114+
115+
1116
[tool.ruff]
2117
# Exclude a variety of commonly ignored directories.
3118
exclude = [
@@ -39,14 +154,15 @@ target-version = "py38"
39154

40155
[tool.ruff.lint]
41156
# 1. Enable flake8-bugbear (`B`) rules, in addition to the defaults.
42-
select = ["E4", "E7", "E9", "F", "B", "I"]#, "ANN"]
157+
select = ["E4", "E7", "E9", "F", "B", "I", "W"] #, "ANN"]
43158
ignore = [
44159
"B905", # zip has no strict kwarg until Python 3.10
45160
"ANN101", # Missing type annotation for self in method
46161
"ANN102", # Missing type annotation for cls in classmethod
47162
"ANN003", # Missing type annotation for **kwarg in function
48163
"ANN002", # Missing type annotation for *args in function
49164
"ANN401", # Allow Any in type annotations
165+
"W505", # Doc line too long
50166
]
51167

52168
# Allow fix for all enabled rules (when `--fix`) is provided.

0 commit comments

Comments
 (0)