You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking at using pins to version various model assets, but one of the pain points I'm running into is custom drivers for reading and writing. Many of our model artifacts are serialized to YAML or ONNX or a custom serialization, but there's no driver for those. There's friction too if we wanted, say a polars driver to read a csv or parquet directly into polars.
This leaves pin_upload and pin_download, which while workable, doesn't provide the same level of integration as pin_read and pin_write. If you use a user-defined helper with pin_download to read a custom type, then users have to know in advance to use that helper, versus just board.pin_read(name) for officially supported types.
For example, to show the difference in experience between a csv file and yaml:
importpandasaspdimportyamlfrompinsimportboard_tempboard=board_temp()
data=pd.DataFrame({"a": [1,2,3]})
board.pin_write(data, "pandas_data", type="csv")
withopen("data.yml", "w") asf:
yaml.dump({"a": [1,2,3]}, f)
board.pin_upload("data.yml", name="yaml_data")
# Readingpandas_data_read=board.pin_read("pandas_data")
yaml_path=board.pin_download("yaml_data")[0]
withopen(yaml_path) asf:
yaml_data_read=yaml.safe_load(f)
# Using helperdefpin_read_yaml(board, *args, **kwargs):
yaml_path=board.pin_download(*args, **kwargs)[0]
withopen(yaml_path) asf:
returnyaml.safe_load(f)
yaml_helper_read=pin_read_yaml(board, "yaml_data") # need to know in advance to use yaml, not just pin_read(name)
Optimally, I want to be able to specify custom types that are associated with custom drivers. The custom type would be written to data.txt, which would allow for dispatching to the associated driver on pin_read like is done for officially supported types, without needing to know in advance to use a user-defined helper.
I'm also seeing similarities between this problem and "dynamic artifact handling" in another library we work on (lazyscribe/lazyscribe#56), where we used handler classes and entrypoints to allow users to define custom artifact serializers/deserializers.
Anyway, please let me know if I'm missing anything here and what you think!
The text was updated successfully, but these errors were encountered:
Hello there 👋 Thank you for this feedback--I'm glad you were able to make pin_upload/pin_download work for your use case! At this point in time, there are not plans to allow for a custom pin type for base reading and writing, as there is pin_upload/pin_download in place as an escape hatch for this purpose. That being said, I'll leave this issue open in case others in the community have input.
I am going to break out supporting yaml as a built in type as a separate issue, since I think that is a super reasonable driver to have for reading and writing!
I'm looking at using pins to version various model assets, but one of the pain points I'm running into is custom drivers for reading and writing. Many of our model artifacts are serialized to YAML or ONNX or a custom serialization, but there's no driver for those. There's friction too if we wanted, say a polars driver to read a csv or parquet directly into polars.
This leaves
pin_upload
andpin_download
, which while workable, doesn't provide the same level of integration aspin_read
andpin_write
. If you use a user-defined helper withpin_download
to read a custom type, then users have to know in advance to use that helper, versus justboard.pin_read(name)
for officially supported types.For example, to show the difference in experience between a csv file and yaml:
Optimally, I want to be able to specify custom types that are associated with custom drivers. The custom type would be written to
data.txt
, which would allow for dispatching to the associated driver onpin_read
like is done for officially supported types, without needing to know in advance to use a user-defined helper.I'm also seeing similarities between this problem and "dynamic artifact handling" in another library we work on (lazyscribe/lazyscribe#56), where we used handler classes and entrypoints to allow users to define custom artifact serializers/deserializers.
Anyway, please let me know if I'm missing anything here and what you think!
The text was updated successfully, but these errors were encountered: