Skip to content

Commit

Permalink
feat: Modify the user API of skore to respect "what you put is what…
Browse files Browse the repository at this point in the history
… you get" principle (#1052)

Closes #1045
Closes #734

Refactor the user API to hide all notions of `Item`, `View`, and to
respect "what you put is what you get" from a user's point of view.
Among others:

- Hide item classes in sub-directory to be less visible by users
- All `@cached_property` in items have been removed, because items are
not used anymore directly by users
- Remove the ability to store anything other than strings in `MediaItem`
- Explode `MediaItem` in new item classes
- Add `PickleItem` class which can persist any object when it cannot be
otherwise
- Add `display_as` parameter to `project.put` to control how a string is
displayed in the frontend
- Remove `project.put_item` in such way user need only to use
`project.put`
- The `project.get` function always returns what the user has put
- The `project.get` and `project.get_item_versions` have been merged
- The `CrossValidationItem` has been replaced by a
`CrossValidationReporterItem` based on pickle
- To go fast, and because a report is composed of complex objects, such
as estimator, X and y, i've made the choice to persist the report as a
pickle. That way, we can get a report from the persistency without
effort. In a next iteration, we should think about how to persist more
efficiencly and env-independently a report which can be rebuilt entirely
from the persistency.

---

- [ ] hide item API
    - [x] hide `put_item`
    - [x] hide `get_item`
    - [x] change `get_item_versions` to be item agnostic
- [ ] change the constructor of the `Project` to hide repositories ->
postponed #1160
- [x] update each item classes to return their original objects
    - [x] cross validation reporter
    - [x] pillow image
    - [x] plotly figure
    - [x] altair figure
    - [x] matplotlib figure
    - [x] media item, to only accept str with `display_as`
    - [x] ~primitive~ -> already 🆗 
    - [x] ~pandas dataframe~ -> already 🆗 
    - [x] ~polars dataframe~ -> already 🆗 
    - [x] ~pandas series~ -> already 🆗 
    - [x] ~polars series~ -> already 🆗 
    - [x] ~numpy array~ -> already 🆗 
    - [x] ~scikit-learn estimator~ -> already 🆗 
- [x] set note in each factory
- [x] update `put` to allow the new parameter `display_as`
- [x] hide view API
- [ ] move `repr_html` to reporters ->
#1161
- [ ] change the way numpy array are serialized -> postponed
#1159

---------

Co-authored-by: Auguste Baum <[email protected]>
  • Loading branch information
thomass-dev and auguste-probabl authored Jan 20, 2025
1 parent 8b2f0c7 commit 5cb9468
Show file tree
Hide file tree
Showing 63 changed files with 2,082 additions and 1,426 deletions.
23 changes: 10 additions & 13 deletions examples/getting_started/plot_tracking_items.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,17 +75,14 @@
# We retrieve the history of the ``my_int`` item:

# %%
item_versions = my_project.get_item_versions("my_int")
history = my_project.get("my_int", latest=False, metadata=True)

# %%
# We can print the details of the first version of this item:

# %%
first_item = item_versions[0]
print(first_item)
print(first_item.primitive)
print(first_item.created_at)
print(first_item.updated_at)

print(history[0])

# %%
# Let us construct a dataframe with the values and last updated times:
Expand All @@ -94,13 +91,13 @@
import numpy as np
import pandas as pd

list_primitive, list_created_at, list_updated_at = zip(
*[(elem.primitive, elem.created_at, elem.updated_at) for elem in item_versions]
list_value, list_created_at, list_updated_at = zip(
*[(version["value"], history[0]["date"], version["date"]) for version in history]
)

df_track = pd.DataFrame(
{
"primitive": list_primitive,
"value": list_value,
"created_at": list_created_at,
"updated_at": list_updated_at,
}
Expand All @@ -113,9 +110,9 @@
# :language: python
#
# Notice that the ``created_at`` dates are the same for all iterations because they
# correspond to the same item, but the ``updated_at`` dates are spaced by 0.1 second
# (approximately) as we used :python:`time.sleep(0.1)` between each
# :func:`~skore.Project.put`.
# correspond to the date of the first version of the item, but the ``updated_at`` dates
# are spaced by 0.1 second (approximately) as we used :python:`time.sleep(0.1)` between
# each :func:`~skore.Project.put`.

# %%
# We can now track the value of the item over time:
Expand All @@ -126,7 +123,7 @@
fig = px.line(
df_track,
x="version_number",
y="primitive",
y="value",
hover_data=df_track.columns,
markers=True,
)
Expand Down
24 changes: 9 additions & 15 deletions examples/getting_started/plot_working_with_projects.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,20 +71,20 @@
# see :ref:`example_tracking_items`.

# %%
# By using the :func:`~skore.Project.delete_item` method, we can also delete an object
# By using the :func:`~skore.Project.delete` method, we can also delete an object
# so that our skore UI does not become cluttered:

# %%
my_project.put("my_int_2", 10)

# %%
my_project.delete_item("my_int_2")
my_project.delete("my_int_2")

# %%
# We can display all the keys in our project:

# %%
my_project.list_item_keys()
my_project.keys()

# %%
# Storing strings and texts
Expand Down Expand Up @@ -119,25 +119,19 @@ def my_func(x):
)

# %%
# Moreover, we can also explicitly tell skore the media type of an object, for example
# in HTML:
# Moreover, we can also explicitly tell skore the way we want to display an object, for
# example in HTML:

# %%
from skore.item import MediaItem

my_project.put_item(
my_project.put(
"my_string_3",
MediaItem.factory(
"<p><h1>Title</h1> <b>bold</b>, <i>italic</i>, etc.</p>", media_type="text/html"
),
"<p><h1>Title</h1> <b>bold</b>, <i>italic</i>, etc.</p>",
display_as="HTML",
)

# %%
# .. note::
# We used :func:`~skore.Project.put_item` instead of :func:`~skore.Project.put`.

# %%
# Note that the media type is only used for the UI, and not in this notebook at hand:
# Note that the `display_as` is only used for the UI, and not in this notebook at hand:

# %%
my_project.get("my_string_3")
Expand Down
11 changes: 6 additions & 5 deletions examples/model_evaluation/plot_cross_validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,15 +155,16 @@
reporter.plots.scores

# %%
# We can also access the plot after we have stored the ``CrossValidationReporter``:
my_project.put("cross_validation_regression", reporter)
cv_item = my_project.get_item("cross_validation_regression")
cv_item.plots["Scores"]
# We can put the reporter in the project, and retrieve it as is:
my_project.put("cross_validation_reporter", reporter)

reporter = my_project.get("cross_validation_reporter")
reporter.plots.scores

# %%
# .. note::
#
# If we put a cross-validation item in a skore project, we get some nice
# If we put a cross-validation reporter in a skore project, we get some nice
# information in the UI:
#
# .. image:: https://media.githubusercontent.com/media/probabl-ai/skore/main/sphinx/_static/images/2024_12_12_skore_demo_comp.gif
Expand Down
3 changes: 2 additions & 1 deletion skore/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ maintainers = [{ name = "skore developers", email = "[email protected]" }]
dependencies = [
"diskcache",
"fastapi",
"joblib",
"matplotlib",
"numpy",
"pandas",
"matplotlib",
"plotly>=5,<6",
"pyarrow",
"rich",
Expand Down
63 changes: 0 additions & 63 deletions skore/src/skore/item/__init__.py

This file was deleted.

Loading

0 comments on commit 5cb9468

Please sign in to comment.