-
-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trouble uploading datasets to the test server #1159
Comments
This is just a warning. The test server is supposed to show these. The production server doesn't. Did you actually have a problem uploading the dataset? |
Thanks for the clarification! from openml.datasets import create_dataset
import sklearn
import numpy as np
from sklearn import datasets
import openml
openml.config.apikey = "API_TEST_KEY"
openml.config.server = "https://test.openml.org/api/v1"
diabetes = sklearn.datasets.load_diabetes()
name = "Diabetes(scikit-learn)"
X = diabetes.data
y = diabetes.target
attribute_names = diabetes.feature_names
description = diabetes.DESCR
data = np.concatenate((X, y.reshape((-1, 1))), axis=1)
attribute_names = list(attribute_names)
attributes = [(attribute_name, "REAL") for attribute_name in attribute_names] + [
("class", "INTEGER")
]
citation = (
"Bradley Efron, Trevor Hastie, Iain Johnstone and "
"Robert Tibshirani (2004) (Least Angle Regression) "
"Annals of Statistics (with discussion), 407-499"
)
paper_url = "https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf"
diabetes_dataset = create_dataset(
# The name of the dataset (needs to be unique).
# Must not be longer than 128 characters and only contain
# a-z, A-Z, 0-9 and the following special characters: _\-\.(),
name=name,
# Textual description of the dataset.
description=description,
# The person who created the dataset.
creator="Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani",
# People who contributed to the current version of the dataset.
contributor=None,
# The date the data was originally collected, given by the uploader.
collection_date="09-01-2012",
# Language in which the data is represented.
# Starts with 1 upper case letter, rest lower case, e.g. 'English'.
language="English",
# License under which the data is/will be distributed.
licence="BSD (from scikit-learn)",
# Name of the target. Can also have multiple values (comma-separated).
default_target_attribute="class",
# The attribute that represents the row-id column, if present in the
# dataset.
row_id_attribute=None,
# Attribute or list of attributes that should be excluded in modelling, such as
# identifiers and indexes. E.g. "feat1" or ["feat1","feat2"]
ignore_attribute=None,
# How to cite the paper.
citation=citation,
# Attributes of the data
attributes=attributes,
data=data,
# A version label which is provided by the user.
version_label="test",
original_data_url="https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html",
paper_url=paper_url,
)
diabetes_dataset.publish()
print(f"URL for dataset: {diabetes_dataset.openml_url}") gives me
|
Similar for this: import openml
from sklearn import compose, ensemble, impute, neighbors, preprocessing, pipeline, tree
openml.config.apikey = "TEST_KEY"
openml.config.server = "https://test.openml.org/api/v1"
# NOTE: We are using dataset 68 from the test server: https://test.openml.org/d/68
dataset = openml.datasets.get_dataset(68)
X, y, categorical_indicator, attribute_names = dataset.get_data(
dataset_format="array", target=dataset.default_target_attribute
)
clf = neighbors.KNeighborsClassifier(n_neighbors=1)
clf.fit(X, y)
dataset = openml.datasets.get_dataset(17)
X, y, categorical_indicator, attribute_names = dataset.get_data(
dataset_format="array", target=dataset.default_target_attribute
)
print(f"Categorical features: {categorical_indicator}")
transformer = compose.ColumnTransformer(
[("one_hot_encoder", preprocessing.OneHotEncoder(categories="auto"), categorical_indicator)]
)
X = transformer.fit_transform(X)
clf.fit(X, y)
# Get a task
task = openml.tasks.get_task(403)
# Build any classifier or pipeline
clf = tree.DecisionTreeClassifier()
# Run the flow
run = openml.runs.run_model_on_task(clf, task)
print(run)
myrun = run.publish()
# For this tutorial, our configuration publishes to the test server
# as to not pollute the main server.
print(f"Uploaded to {myrun.openml_url}")
|
Hi Seb, I recently fixed a number of issues with the test server. Thanks! |
It seems like listing datasets from the test server does not work. The other things I have not checked yet (e.g. upload) but will report when I did
|
I have tried with the API, as well as with through the website.
When trying to upload a dataset to the test server, I encounter the following error:
The text was updated successfully, but these errors were encountered: