diff --git a/customize-pins-metadata.html b/customize-pins-metadata.html index cd088a0..03120da 100644 --- a/customize-pins-metadata.html +++ b/customize-pins-metadata.html @@ -2,7 +2,7 @@ - + @@ -190,7 +190,7 @@

Create consistent metadata for pins

The metadata argument in pins is flexible and can hold any kind of metadata that you can formulate as a dict (convertable to JSON). In some situations, you may want to read and write with consistent customized metadata; you can create functions to wrap pin_write and pin_read for your particular use case.

We’ll begin by creating a temporary board for demonstration:

-
+
import pins
 import pandas as pd
 
@@ -202,7 +202,7 @@ 

Create consistent metadata for pins

A function to store pandas Categoricals

Say you want to store a pandas Categorical object as JSON together with the categories of the categorical in the metadata.

For example, here is a simple categorical and its categories:

-
+
some_cat = pd.Categorical(["a", "a", "b"])
 
 some_cat.categories
@@ -212,7 +212,7 @@

A function to store pandas Categoricals

Notice that the categories attribute is just the unique values in the categorical.

We can write a function wrapping pin_write that holds the categories in metadata, so we can easily re-create the categorical with them.

-
+
def pin_write_cat_json(
     board,
     x: pd.Categorical,
@@ -224,39 +224,39 @@ 

A function to store pandas Categoricals

board.pin_write(json_data, name = name, type = "json", metadata = metadata, **kwargs)

We can use this new function to write a pin as JSON with our specific metadata:

-
+
some_cat = pd.Categorical(["a", "a", "b", "c"])
 pin_write_cat_json(board, some_cat, name = "some-cat")
Writing pin:
 Name: 'some-cat'
-Version: 20240322T204927Z-6ce8e
+Version: 20240329T183955Z-6ce8e

A function to read categoricals

It’s possible to read this pin using the regular pin_read function, but the object we get is no longer a categorical!

-
+
board.pin_read("some-cat")
['a', 'a', 'b', 'c']

However, notice that if we use pin_meta, the information we stored on categories is in the .user field.

-
+
pprint(
     board.pin_meta("some-cat")
 )
Meta(title='some-cat: a pinned list object',
      description=None,
-     created='20240322T204927Z',
+     created='20240329T183955Z',
      pin_hash='6ce8eaa9de0dfd54',
      file='some-cat.json',
      file_size=20,
      type='json',
      api_version=1,
-     version=Version(created=datetime.datetime(2024, 3, 22, 20, 49, 27),
+     version=Version(created=datetime.datetime(2024, 3, 29, 18, 39, 55),
                      hash='6ce8e'),
      tags=None,
      name='some-cat',
@@ -265,7 +265,7 @@ 

A function

This enables us to write a special function for reading, to reconstruct the categorical, using the categories stashed in metadata:

-
+
def pin_read_cat_json(board, name, version=None, hash=None, **kwargs):
   data = board.pin_read(name = name, version = version, hash = hash, **kwargs)
   meta = board.pin_meta(name = name, version = version, **kwargs)
diff --git a/get_started.html b/get_started.html
index 7b3984c..8c336cf 100644
--- a/get_started.html
+++ b/get_started.html
@@ -2,7 +2,7 @@
 
 
 
-
+
 
 
 
@@ -192,13 +192,13 @@ 

Get started with pins

The pins package helps you publish data sets, models, and other Python objects, making it easy to share them across projects and with your colleagues. You can pin objects to a variety of “boards”, including local folders (to share on a networked drive or with DropBox), Posit Connect, Amazon S3, Google Cloud Storage, Azure, and more. This vignette will introduce you to the basics of pins.

-
+
from pins import board_local, board_folder, board_temp, board_url

Getting started

Every pin lives in a pin board, so you must start by creating a pin board. In this vignette I’ll use a temporary board which is automatically deleted when your Python session is over:

-
+
board = board_temp()

In real life, you’d pick a board depending on how you want to share the data. Here are a few options:

@@ -210,19 +210,19 @@

Getting started

Reading and writing data

Once you have a pin board, you can write data to it with the pin_write method:

-
+
from pins.data import mtcars
 
 meta = board.pin_write(mtcars, "mtcars", type="csv")
Writing pin:
 Name: 'mtcars'
-Version: 20240322T204942Z-3b134
+Version: 20240329T184009Z-3b134

The first argument is the object to save (usually a data frame, but it can be any Python object), and the second argument gives the “name” of the pin. The name is basically equivalent to a file name; you’ll use it when you later want to read the data from the pin. The only rule for a pin name is that it can’t contain slashes.

After you’ve pinned an object, you can read it back with pin_read:

-
+
board.pin_read("mtcars")
@@ -444,10 +444,10 @@

How and wha

Metadata

Every pin is accompanied by some metadata that you can access with pin_meta:

-
+
board.pin_meta("mtcars")
-
Meta(title='mtcars: a pinned 32 x 11 DataFrame', description=None, created='20240322T204942Z', pin_hash='3b134bae183b50c9', file='mtcars.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 22, 20, 49, 42), hash='3b134'), tags=None, name='mtcars', user={}, local={})
+
Meta(title='mtcars: a pinned 32 x 11 DataFrame', description=None, created='20240329T184009Z', pin_hash='3b134bae183b50c9', file='mtcars.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 29, 18, 40, 9), hash='3b134'), tags=None, name='mtcars', user={}, local={})

This shows you the metadata that’s generated by default. This includes:

@@ -459,7 +459,7 @@

Metadata

  • a unique pin_hash that you can supply to pin_read to ensure that you’re reading exactly the data that you expect.
  • When creating the pin, you can override the default description or provide additional metadata that is stored with the data:

    -
    +
    board.pin_write(
         mtcars,
         name="mtcars2",
    @@ -472,16 +472,16 @@ 

    Metadata

    Writing pin:
     Name: 'mtcars2'
    -Version: 20240322T204942Z-3b134
    +Version: 20240329T184009Z-3b134
    -
    Meta(title='mtcars2: a pinned 32 x 11 DataFrame', description='Data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).', created='20240322T204942Z', pin_hash='3b134bae183b50c9', file='mtcars2.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 22, 20, 49, 42, 326458), hash='3b134bae183b50c9'), tags=None, name='mtcars2', user={'source': 'Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.'}, local={})
    +
    Meta(title='mtcars2: a pinned 32 x 11 DataFrame', description='Data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).', created='20240329T184009Z', pin_hash='3b134bae183b50c9', file='mtcars2.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 29, 18, 40, 9, 626562), hash='3b134bae183b50c9'), tags=None, name='mtcars2', user={'source': 'Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.'}, local={})
    -
    +
    board.pin_meta("mtcars")
    -
    Meta(title='mtcars: a pinned 32 x 11 DataFrame', description=None, created='20240322T204942Z', pin_hash='3b134bae183b50c9', file='mtcars.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 22, 20, 49, 42), hash='3b134'), tags=None, name='mtcars', user={}, local={})
    +
    Meta(title='mtcars: a pinned 32 x 11 DataFrame', description=None, created='20240329T184009Z', pin_hash='3b134bae183b50c9', file='mtcars.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 29, 18, 40, 9), hash='3b134'), tags=None, name='mtcars', user={}, local={})

    While we’ll do our best to keep the automatically generated metadata consistent over time, I’d recommend manually capturing anything you really care about in metadata.

    @@ -489,7 +489,7 @@

    Metadata

    Versioning

    Every pin_write will create a new version:

    -
    +
    board2 = board_temp()
     board2.pin_write([1,2,3,4,5], name = "x", type = "json")
     board2.pin_write([1,2,3], name = "x", type = "json")
    @@ -498,13 +498,13 @@ 

    Versioning

    Writing pin:
     Name: 'x'
    -Version: 20240322T204942Z-2bc5d
    +Version: 20240329T184009Z-2bc5d
     Writing pin:
     Name: 'x'
    -Version: 20240322T204942Z-c24c0
    +Version: 20240329T184009Z-c24c0
     Writing pin:
     Name: 'x'
    -Version: 20240322T204942Z-91d9a
    +Version: 20240329T184009Z-91d9a
    @@ -523,21 +523,21 @@

    Versioning

    0 -2024-03-22 20:49:42 +2024-03-29 18:40:09 2bc5d -20240322T204942Z-2bc5d +20240329T184009Z-2bc5d 1 -2024-03-22 20:49:42 +2024-03-29 18:40:09 91d9a -20240322T204942Z-91d9a +20240329T184009Z-91d9a 2 -2024-03-22 20:49:42 +2024-03-29 18:40:09 c24c0 -20240322T204942Z-c24c0 +20240329T184009Z-c24c0 @@ -547,14 +547,14 @@

    Versioning

    By default, pin_read will return the most recent version:

    -
    +
    board2.pin_read("x")
    [1, 2, 3]

    But you can request an older version by supplying the version argument:

    -
    +
    version = board2.pin_versions("x").version[1]
     board2.pin_read("x", version = version)
    @@ -579,7 +579,7 @@

    Storing models

    You can write a pin with type="joblib" to store arbitrary python objects, including fitted models from packages like scikit-learn.

    For example, suppose you wanted to store a custom namedtuple object.

    -
    +
    from collections import namedtuple
     
     board3 = board_temp(allow_pickle_read=True)
    @@ -593,14 +593,14 @@ 

    Storing models

    Using type="joblib" lets you store and read back the custom coords object.

    -
    +
    board3.pin_write(coords, "my_coords", type="joblib")
     
     board3.pin_read("my_coords")
    Writing pin:
     Name: 'my_coords'
    -Version: 20240322T204942Z-d5e4a
    +Version: 20240329T184009Z-d5e4a
    Coords(x=1, y=2)
    @@ -611,13 +611,13 @@

    Storing models

    Caching

    The primary purpose of pins is to make it easy to share data. But pins is also designed to help you spend as little time as possible downloading data. pin_read and pin_download automatically cache remote pins: they maintain a local copy of the data (so it’s fast) but always check that it’s up-to-date (so your analysis doesn’t use stale data).

    Wouldn’t it be nice if you could take advantage of this feature for any dataset on the internet? That’s the idea behind board_url; you can assemble your own board from datasets, wherever they live on the internet. For example, this code creates a board containing a single pin, penguins, that refers to some fun data I found on GitHub:

    -
    +
    my_data = board_url("", {
       "penguins": "https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins_raw.csv"
     })

    You can read this data by combining pin_download with read_csv from pandas:

    -
    +
    fname = my_data.pin_download("penguins")
     
     fname
    @@ -625,7 +625,7 @@

    Caching

    ['/home/runner/.cache/pins-py/http_e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/e6ac0d2da33fad7e72df6b900933a691b89ed7d54ec0e4a36fe45c32d7e2f67e_penguins_raw.csv']
    -
    +
    import pandas as pd
     
     pd.read_csv(fname[0]).head()
    @@ -765,7 +765,7 @@

    Caching

    -
    +
    my_data.pin_download("penguins")
    ['/home/runner/.cache/pins-py/http_e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/e6ac0d2da33fad7e72df6b900933a691b89ed7d54ec0e4a36fe45c32d7e2f67e_penguins_raw.csv']
    diff --git a/index.html b/index.html index ff8dddd..3e185ed 100644 --- a/index.html +++ b/index.html @@ -2,7 +2,7 @@ - + @@ -203,27 +203,27 @@

    Installation

    Usage

    To use the pins package, you must first create a pin board. A good place to start is board_folder, which stores pins in a directory you specify. Here we’ll use a special version of board_folder called board_temp which creates a temporary board that’s automatically deleted when your Python script or notebook session ends. This is great for examples, but obviously you shouldn’t use it for real work!

    -
    +
    import pins
     from pins.data import mtcars
     
     board = pins.board_temp()

    You can “pin” (save) data to a board with the pin_write method. It requires three arguments: an object, a name, and a pin type:

    -
    +
    board.pin_write(mtcars.head(), "mtcars", type="csv")
    Writing pin:
     Name: 'mtcars'
    -Version: 20240322T204930Z-120a5
    +Version: 20240329T183958Z-120a5
    -
    Meta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20240322T204930Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 22, 20, 49, 30, 764591), hash='120a54f7e0818041'), tags=None, name='mtcars', user={}, local={})
    +
    Meta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20240329T183958Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 29, 18, 39, 58, 661525), hash='120a54f7e0818041'), tags=None, name='mtcars', user={}, local={})

    Above, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the type argument to instead save it as a joblib, parquet, or json file.

    You can later retrieve the pinned data with pin_read:

    -
    +
    board.pin_read("mtcars")
    diff --git a/reference/board.html b/reference/board.html index e48992d..68b8f5a 100644 --- a/reference/board.html +++ b/reference/board.html @@ -2,7 +2,7 @@ - + diff --git a/reference/board_azure.html b/reference/board_azure.html index 0e16a5b..47ed481 100644 --- a/reference/board_azure.html +++ b/reference/board_azure.html @@ -2,7 +2,7 @@ - + diff --git a/reference/board_connect.html b/reference/board_connect.html index 70f5dd6..b87873d 100644 --- a/reference/board_connect.html +++ b/reference/board_connect.html @@ -2,7 +2,7 @@ - + diff --git a/reference/board_folder.html b/reference/board_folder.html index 2685294..5eb5b01 100644 --- a/reference/board_folder.html +++ b/reference/board_folder.html @@ -2,7 +2,7 @@ - + diff --git a/reference/board_gcs.html b/reference/board_gcs.html index 26ff508..de79625 100644 --- a/reference/board_gcs.html +++ b/reference/board_gcs.html @@ -2,7 +2,7 @@ - + diff --git a/reference/board_local.html b/reference/board_local.html index 8958966..ad7ac79 100644 --- a/reference/board_local.html +++ b/reference/board_local.html @@ -2,7 +2,7 @@ - + diff --git a/reference/board_s3.html b/reference/board_s3.html index 0783e7e..8a672a1 100644 --- a/reference/board_s3.html +++ b/reference/board_s3.html @@ -2,7 +2,7 @@ - + diff --git a/reference/board_temp.html b/reference/board_temp.html index 309d370..f28755b 100644 --- a/reference/board_temp.html +++ b/reference/board_temp.html @@ -2,7 +2,7 @@ - + diff --git a/reference/board_url.html b/reference/board_url.html index e115fa6..0be5200 100644 --- a/reference/board_url.html +++ b/reference/board_url.html @@ -2,7 +2,7 @@ - + diff --git a/reference/index.html b/reference/index.html index 8970ea7..bdc8c8b 100644 --- a/reference/index.html +++ b/reference/index.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_delete.html b/reference/pin_delete.html index eb2f449..644b83d 100644 --- a/reference/pin_delete.html +++ b/reference/pin_delete.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_download.html b/reference/pin_download.html index 0b498eb..8cac1fe 100644 --- a/reference/pin_download.html +++ b/reference/pin_download.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_exists.html b/reference/pin_exists.html index e22f604..fc8e39a 100644 --- a/reference/pin_exists.html +++ b/reference/pin_exists.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_list.html b/reference/pin_list.html index 29df435..dbcf14b 100644 --- a/reference/pin_list.html +++ b/reference/pin_list.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_meta.html b/reference/pin_meta.html index a0b2d9d..1618c89 100644 --- a/reference/pin_meta.html +++ b/reference/pin_meta.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_read.html b/reference/pin_read.html index 5442883..3909ad6 100644 --- a/reference/pin_read.html +++ b/reference/pin_read.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_search.html b/reference/pin_search.html index 6437f7f..44872c0 100644 --- a/reference/pin_search.html +++ b/reference/pin_search.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_upload.html b/reference/pin_upload.html index c07c303..3aef07e 100644 --- a/reference/pin_upload.html +++ b/reference/pin_upload.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_version_delete.html b/reference/pin_version_delete.html index c355955..926da1d 100644 --- a/reference/pin_version_delete.html +++ b/reference/pin_version_delete.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_versions.html b/reference/pin_versions.html index 14418e2..0cbe071 100644 --- a/reference/pin_versions.html +++ b/reference/pin_versions.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_versions_prune.html b/reference/pin_versions_prune.html index 933667b..e499abd 100644 --- a/reference/pin_versions_prune.html +++ b/reference/pin_versions_prune.html @@ -2,7 +2,7 @@ - + diff --git a/reference/pin_write.html b/reference/pin_write.html index a9e13b8..bb37307 100644 --- a/reference/pin_write.html +++ b/reference/pin_write.html @@ -2,7 +2,7 @@ - + diff --git a/search.json b/search.json index d6b22a1..b60ad85 100644 --- a/search.json +++ b/search.json @@ -18,7 +18,7 @@ "href": "get_started.html#reading-and-writing-data", "title": "Get started with pins", "section": "Reading and writing data", - "text": "Reading and writing data\nOnce you have a pin board, you can write data to it with the pin_write method:\n\nfrom pins.data import mtcars\n\nmeta = board.pin_write(mtcars, \"mtcars\", type=\"csv\")\n\nWriting pin:\nName: 'mtcars'\nVersion: 20240322T204942Z-3b134\n\n\nThe first argument is the object to save (usually a data frame, but it can be any Python object), and the second argument gives the “name” of the pin. The name is basically equivalent to a file name; you’ll use it when you later want to read the data from the pin. The only rule for a pin name is that it can’t contain slashes.\nAfter you’ve pinned an object, you can read it back with pin_read:\n\nboard.pin_read(\"mtcars\")\n\n\n\n\n\n\n\n\n\nmpg\ncyl\ndisp\nhp\ndrat\nwt\nqsec\nvs\nam\ngear\ncarb\n\n\n\n\n0\n21.0\n6\n160.0\n110\n3.90\n2.620\n16.46\n0\n1\n4\n4\n\n\n1\n21.0\n6\n160.0\n110\n3.90\n2.875\n17.02\n0\n1\n4\n4\n\n\n2\n22.8\n4\n108.0\n93\n3.85\n2.320\n18.61\n1\n1\n4\n1\n\n\n3\n21.4\n6\n258.0\n110\n3.08\n3.215\n19.44\n1\n0\n3\n1\n\n\n4\n18.7\n8\n360.0\n175\n3.15\n3.440\n17.02\n0\n0\n3\n2\n\n\n...\n...\n...\n...\n...\n...\n...\n...\n...\n...\n...\n...\n\n\n27\n30.4\n4\n95.1\n113\n3.77\n1.513\n16.90\n1\n1\n5\n2\n\n\n28\n15.8\n8\n351.0\n264\n4.22\n3.170\n14.50\n0\n1\n5\n4\n\n\n29\n19.7\n6\n145.0\n175\n3.62\n2.770\n15.50\n0\n1\n5\n6\n\n\n30\n15.0\n8\n301.0\n335\n3.54\n3.570\n14.60\n0\n1\n5\n8\n\n\n31\n21.4\n4\n121.0\n109\n4.11\n2.780\n18.60\n1\n1\n4\n2\n\n\n\n\n32 rows × 11 columns\n\n\n\n\nYou don’t need to supply the file type when reading data from a pin because pins automatically stores the file type in the metadata.\n\n\n\n\n\n\nNote\n\n\n\nIf you are using the Posit Connect board board_connect, then you must specify your pin name as \"user_name/content_name\". For example, \"hadley/sales-report\"." + "text": "Reading and writing data\nOnce you have a pin board, you can write data to it with the pin_write method:\n\nfrom pins.data import mtcars\n\nmeta = board.pin_write(mtcars, \"mtcars\", type=\"csv\")\n\nWriting pin:\nName: 'mtcars'\nVersion: 20240329T184009Z-3b134\n\n\nThe first argument is the object to save (usually a data frame, but it can be any Python object), and the second argument gives the “name” of the pin. The name is basically equivalent to a file name; you’ll use it when you later want to read the data from the pin. The only rule for a pin name is that it can’t contain slashes.\nAfter you’ve pinned an object, you can read it back with pin_read:\n\nboard.pin_read(\"mtcars\")\n\n\n\n\n\n\n\n\n\nmpg\ncyl\ndisp\nhp\ndrat\nwt\nqsec\nvs\nam\ngear\ncarb\n\n\n\n\n0\n21.0\n6\n160.0\n110\n3.90\n2.620\n16.46\n0\n1\n4\n4\n\n\n1\n21.0\n6\n160.0\n110\n3.90\n2.875\n17.02\n0\n1\n4\n4\n\n\n2\n22.8\n4\n108.0\n93\n3.85\n2.320\n18.61\n1\n1\n4\n1\n\n\n3\n21.4\n6\n258.0\n110\n3.08\n3.215\n19.44\n1\n0\n3\n1\n\n\n4\n18.7\n8\n360.0\n175\n3.15\n3.440\n17.02\n0\n0\n3\n2\n\n\n...\n...\n...\n...\n...\n...\n...\n...\n...\n...\n...\n...\n\n\n27\n30.4\n4\n95.1\n113\n3.77\n1.513\n16.90\n1\n1\n5\n2\n\n\n28\n15.8\n8\n351.0\n264\n4.22\n3.170\n14.50\n0\n1\n5\n4\n\n\n29\n19.7\n6\n145.0\n175\n3.62\n2.770\n15.50\n0\n1\n5\n6\n\n\n30\n15.0\n8\n301.0\n335\n3.54\n3.570\n14.60\n0\n1\n5\n8\n\n\n31\n21.4\n4\n121.0\n109\n4.11\n2.780\n18.60\n1\n1\n4\n2\n\n\n\n\n32 rows × 11 columns\n\n\n\n\nYou don’t need to supply the file type when reading data from a pin because pins automatically stores the file type in the metadata.\n\n\n\n\n\n\nNote\n\n\n\nIf you are using the Posit Connect board board_connect, then you must specify your pin name as \"user_name/content_name\". For example, \"hadley/sales-report\"." }, { "objectID": "get_started.html#how-and-what-to-store-as-a-pin", @@ -32,21 +32,21 @@ "href": "get_started.html#metadata", "title": "Get started with pins", "section": "Metadata", - "text": "Metadata\nEvery pin is accompanied by some metadata that you can access with pin_meta:\n\nboard.pin_meta(\"mtcars\")\n\nMeta(title='mtcars: a pinned 32 x 11 DataFrame', description=None, created='20240322T204942Z', pin_hash='3b134bae183b50c9', file='mtcars.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 22, 20, 49, 42), hash='3b134'), tags=None, name='mtcars', user={}, local={})\n\n\nThis shows you the metadata that’s generated by default. This includes:\n\ntitle, a brief textual description of the dataset.\nan optional description, where you can provide more details.\nthe date-time when the pin was created.\nthe file_size, in bytes, of the underlying files.\na unique pin_hash that you can supply to pin_read to ensure that you’re reading exactly the data that you expect.\n\nWhen creating the pin, you can override the default description or provide additional metadata that is stored with the data:\n\nboard.pin_write(\n mtcars,\n name=\"mtcars2\",\n type=\"csv\",\n description = \"Data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).\",\n metadata = {\n \"source\": \"Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.\"\n }\n)\n\nWriting pin:\nName: 'mtcars2'\nVersion: 20240322T204942Z-3b134\n\n\nMeta(title='mtcars2: a pinned 32 x 11 DataFrame', description='Data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).', created='20240322T204942Z', pin_hash='3b134bae183b50c9', file='mtcars2.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 22, 20, 49, 42, 326458), hash='3b134bae183b50c9'), tags=None, name='mtcars2', user={'source': 'Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.'}, local={})\n\n\n\nboard.pin_meta(\"mtcars\")\n\nMeta(title='mtcars: a pinned 32 x 11 DataFrame', description=None, created='20240322T204942Z', pin_hash='3b134bae183b50c9', file='mtcars.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 22, 20, 49, 42), hash='3b134'), tags=None, name='mtcars', user={}, local={})\n\n\nWhile we’ll do our best to keep the automatically generated metadata consistent over time, I’d recommend manually capturing anything you really care about in metadata." + "text": "Metadata\nEvery pin is accompanied by some metadata that you can access with pin_meta:\n\nboard.pin_meta(\"mtcars\")\n\nMeta(title='mtcars: a pinned 32 x 11 DataFrame', description=None, created='20240329T184009Z', pin_hash='3b134bae183b50c9', file='mtcars.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 29, 18, 40, 9), hash='3b134'), tags=None, name='mtcars', user={}, local={})\n\n\nThis shows you the metadata that’s generated by default. This includes:\n\ntitle, a brief textual description of the dataset.\nan optional description, where you can provide more details.\nthe date-time when the pin was created.\nthe file_size, in bytes, of the underlying files.\na unique pin_hash that you can supply to pin_read to ensure that you’re reading exactly the data that you expect.\n\nWhen creating the pin, you can override the default description or provide additional metadata that is stored with the data:\n\nboard.pin_write(\n mtcars,\n name=\"mtcars2\",\n type=\"csv\",\n description = \"Data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).\",\n metadata = {\n \"source\": \"Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.\"\n }\n)\n\nWriting pin:\nName: 'mtcars2'\nVersion: 20240329T184009Z-3b134\n\n\nMeta(title='mtcars2: a pinned 32 x 11 DataFrame', description='Data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).', created='20240329T184009Z', pin_hash='3b134bae183b50c9', file='mtcars2.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 29, 18, 40, 9, 626562), hash='3b134bae183b50c9'), tags=None, name='mtcars2', user={'source': 'Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.'}, local={})\n\n\n\nboard.pin_meta(\"mtcars\")\n\nMeta(title='mtcars: a pinned 32 x 11 DataFrame', description=None, created='20240329T184009Z', pin_hash='3b134bae183b50c9', file='mtcars.csv', file_size=1333, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 29, 18, 40, 9), hash='3b134'), tags=None, name='mtcars', user={}, local={})\n\n\nWhile we’ll do our best to keep the automatically generated metadata consistent over time, I’d recommend manually capturing anything you really care about in metadata." }, { "objectID": "get_started.html#versioning", "href": "get_started.html#versioning", "title": "Get started with pins", "section": "Versioning", - "text": "Versioning\nEvery pin_write will create a new version:\n\nboard2 = board_temp()\nboard2.pin_write([1,2,3,4,5], name = \"x\", type = \"json\")\nboard2.pin_write([1,2,3], name = \"x\", type = \"json\")\nboard2.pin_write([1,2], name = \"x\", type = \"json\")\nboard2.pin_versions(\"x\")\n\nWriting pin:\nName: 'x'\nVersion: 20240322T204942Z-2bc5d\nWriting pin:\nName: 'x'\nVersion: 20240322T204942Z-c24c0\nWriting pin:\nName: 'x'\nVersion: 20240322T204942Z-91d9a\n\n\n\n\n\n\n\n\n\n\ncreated\nhash\nversion\n\n\n\n\n0\n2024-03-22 20:49:42\n2bc5d\n20240322T204942Z-2bc5d\n\n\n1\n2024-03-22 20:49:42\n91d9a\n20240322T204942Z-91d9a\n\n\n2\n2024-03-22 20:49:42\nc24c0\n20240322T204942Z-c24c0\n\n\n\n\n\n\n\n\nBy default, pin_read will return the most recent version:\n\nboard2.pin_read(\"x\")\n\n[1, 2, 3]\n\n\nBut you can request an older version by supplying the version argument:\n\nversion = board2.pin_versions(\"x\").version[1]\nboard2.pin_read(\"x\", version = version)\n\n[1, 2]" + "text": "Versioning\nEvery pin_write will create a new version:\n\nboard2 = board_temp()\nboard2.pin_write([1,2,3,4,5], name = \"x\", type = \"json\")\nboard2.pin_write([1,2,3], name = \"x\", type = \"json\")\nboard2.pin_write([1,2], name = \"x\", type = \"json\")\nboard2.pin_versions(\"x\")\n\nWriting pin:\nName: 'x'\nVersion: 20240329T184009Z-2bc5d\nWriting pin:\nName: 'x'\nVersion: 20240329T184009Z-c24c0\nWriting pin:\nName: 'x'\nVersion: 20240329T184009Z-91d9a\n\n\n\n\n\n\n\n\n\n\ncreated\nhash\nversion\n\n\n\n\n0\n2024-03-29 18:40:09\n2bc5d\n20240329T184009Z-2bc5d\n\n\n1\n2024-03-29 18:40:09\n91d9a\n20240329T184009Z-91d9a\n\n\n2\n2024-03-29 18:40:09\nc24c0\n20240329T184009Z-c24c0\n\n\n\n\n\n\n\n\nBy default, pin_read will return the most recent version:\n\nboard2.pin_read(\"x\")\n\n[1, 2, 3]\n\n\nBut you can request an older version by supplying the version argument:\n\nversion = board2.pin_versions(\"x\").version[1]\nboard2.pin_read(\"x\", version = version)\n\n[1, 2]" }, { "objectID": "get_started.html#storing-models", "href": "get_started.html#storing-models", "title": "Get started with pins", "section": "Storing models", - "text": "Storing models\n\n\n\n\n\n\nWarning\n\n\n\nThe examples in this section use joblib to read and write data. Joblib uses the pickle format, and pickle files are not secure. Only read pickle files you trust. In order to read pickle files, set the allow_pickle_read=True argument. Learn more about pickling.\n\n\nYou can write a pin with type=\"joblib\" to store arbitrary python objects, including fitted models from packages like scikit-learn.\nFor example, suppose you wanted to store a custom namedtuple object.\n\nfrom collections import namedtuple\n\nboard3 = board_temp(allow_pickle_read=True)\n\nCoords = namedtuple(\"Coords\", [\"x\", \"y\"])\ncoords = Coords(1, 2)\n\ncoords\n\nCoords(x=1, y=2)\n\n\nUsing type=\"joblib\" lets you store and read back the custom coords object.\n\nboard3.pin_write(coords, \"my_coords\", type=\"joblib\")\n\nboard3.pin_read(\"my_coords\")\n\nWriting pin:\nName: 'my_coords'\nVersion: 20240322T204942Z-d5e4a\n\n\nCoords(x=1, y=2)" + "text": "Storing models\n\n\n\n\n\n\nWarning\n\n\n\nThe examples in this section use joblib to read and write data. Joblib uses the pickle format, and pickle files are not secure. Only read pickle files you trust. In order to read pickle files, set the allow_pickle_read=True argument. Learn more about pickling.\n\n\nYou can write a pin with type=\"joblib\" to store arbitrary python objects, including fitted models from packages like scikit-learn.\nFor example, suppose you wanted to store a custom namedtuple object.\n\nfrom collections import namedtuple\n\nboard3 = board_temp(allow_pickle_read=True)\n\nCoords = namedtuple(\"Coords\", [\"x\", \"y\"])\ncoords = Coords(1, 2)\n\ncoords\n\nCoords(x=1, y=2)\n\n\nUsing type=\"joblib\" lets you store and read back the custom coords object.\n\nboard3.pin_write(coords, \"my_coords\", type=\"joblib\")\n\nboard3.pin_read(\"my_coords\")\n\nWriting pin:\nName: 'my_coords'\nVersion: 20240329T184009Z-d5e4a\n\n\nCoords(x=1, y=2)" }, { "objectID": "get_started.html#caching", @@ -391,7 +391,7 @@ "href": "customize-pins-metadata.html#a-function-to-read-categoricals", "title": "Create consistent metadata for pins", "section": "A function to read categoricals", - "text": "A function to read categoricals\nIt’s possible to read this pin using the regular pin_read function, but the object we get is no longer a categorical!\n\nboard.pin_read(\"some-cat\")\n\n['a', 'a', 'b', 'c']\n\n\nHowever, notice that if we use pin_meta, the information we stored on categories is in the .user field.\n\npprint(\n board.pin_meta(\"some-cat\")\n)\n\nMeta(title='some-cat: a pinned list object',\n description=None,\n created='20240322T204927Z',\n pin_hash='6ce8eaa9de0dfd54',\n file='some-cat.json',\n file_size=20,\n type='json',\n api_version=1,\n version=Version(created=datetime.datetime(2024, 3, 22, 20, 49, 27),\n hash='6ce8e'),\n tags=None,\n name='some-cat',\n user={'categories': ['a', 'b', 'c']},\n local={})\n\n\nThis enables us to write a special function for reading, to reconstruct the categorical, using the categories stashed in metadata:\n\ndef pin_read_cat_json(board, name, version=None, hash=None, **kwargs):\n data = board.pin_read(name = name, version = version, hash = hash, **kwargs)\n meta = board.pin_meta(name = name, version = version, **kwargs)\n return pd.Categorical(data, categories=meta.user[\"categories\"])\n\npin_read_cat_json(board, \"some-cat\")\n\n['a', 'a', 'b', 'c']\nCategories (3, object): ['a', 'b', 'c']\n\n\nFor an example of how this approach is used in a real project, look at look at how the vetiver package wraps these functions to write and read model binaries as pins." + "text": "A function to read categoricals\nIt’s possible to read this pin using the regular pin_read function, but the object we get is no longer a categorical!\n\nboard.pin_read(\"some-cat\")\n\n['a', 'a', 'b', 'c']\n\n\nHowever, notice that if we use pin_meta, the information we stored on categories is in the .user field.\n\npprint(\n board.pin_meta(\"some-cat\")\n)\n\nMeta(title='some-cat: a pinned list object',\n description=None,\n created='20240329T183955Z',\n pin_hash='6ce8eaa9de0dfd54',\n file='some-cat.json',\n file_size=20,\n type='json',\n api_version=1,\n version=Version(created=datetime.datetime(2024, 3, 29, 18, 39, 55),\n hash='6ce8e'),\n tags=None,\n name='some-cat',\n user={'categories': ['a', 'b', 'c']},\n local={})\n\n\nThis enables us to write a special function for reading, to reconstruct the categorical, using the categories stashed in metadata:\n\ndef pin_read_cat_json(board, name, version=None, hash=None, **kwargs):\n data = board.pin_read(name = name, version = version, hash = hash, **kwargs)\n meta = board.pin_meta(name = name, version = version, **kwargs)\n return pd.Categorical(data, categories=meta.user[\"categories\"])\n\npin_read_cat_json(board, \"some-cat\")\n\n['a', 'a', 'b', 'c']\nCategories (3, object): ['a', 'b', 'c']\n\n\nFor an example of how this approach is used in a real project, look at look at how the vetiver package wraps these functions to write and read model binaries as pins." }, { "objectID": "index.html", @@ -412,7 +412,7 @@ "href": "index.html#usage", "title": "pins ", "section": "Usage", - "text": "Usage\nTo use the pins package, you must first create a pin board. A good place to start is board_folder, which stores pins in a directory you specify. Here we’ll use a special version of board_folder called board_temp which creates a temporary board that’s automatically deleted when your Python script or notebook session ends. This is great for examples, but obviously you shouldn’t use it for real work!\n\nimport pins\nfrom pins.data import mtcars\n\nboard = pins.board_temp()\n\nYou can “pin” (save) data to a board with the pin_write method. It requires three arguments: an object, a name, and a pin type:\n\nboard.pin_write(mtcars.head(), \"mtcars\", type=\"csv\")\n\nWriting pin:\nName: 'mtcars'\nVersion: 20240322T204930Z-120a5\n\n\nMeta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20240322T204930Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 22, 20, 49, 30, 764591), hash='120a54f7e0818041'), tags=None, name='mtcars', user={}, local={})\n\n\nAbove, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the type argument to instead save it as a joblib, parquet, or json file.\nYou can later retrieve the pinned data with pin_read:\n\nboard.pin_read(\"mtcars\")\n\n\n\n\n\n\n\n\n\nmpg\ncyl\ndisp\nhp\ndrat\nwt\nqsec\nvs\nam\ngear\ncarb\n\n\n\n\n0\n21.0\n6\n160.0\n110\n3.90\n2.620\n16.46\n0\n1\n4\n4\n\n\n1\n21.0\n6\n160.0\n110\n3.90\n2.875\n17.02\n0\n1\n4\n4\n\n\n2\n22.8\n4\n108.0\n93\n3.85\n2.320\n18.61\n1\n1\n4\n1\n\n\n3\n21.4\n6\n258.0\n110\n3.08\n3.215\n19.44\n1\n0\n3\n1\n\n\n4\n18.7\n8\n360.0\n175\n3.15\n3.440\n17.02\n0\n0\n3\n2\n\n\n\n\n\n\n\n\nA board on your computer is good place to start, but the real power of pins comes when you use a board that’s shared with multiple people. To get started, you can use board_folder with a directory on a shared drive or in DropBox, or if you use Posit Connect you can use board_connect:\n# Note that this uses one approach to connecting,\n# the environment variables CONNECT_SERVER and CONNECT_API_KEY\n\nboard = pins.board_connect()\nboard.pin_write(tidy_sales_data, \"hadley/sales-summary\", type=\"csv\")\nThen, someone else (or an automated report) can read and use your pin:\nboard = board_connect()\nboard.pin_read(\"hadley/sales-summary\")\nYou can easily control who gets to access the data using the Posit Connect permissions pane.\nThe pins package also includes boards that allow you to share data on services like Amazon’s S3 (board_s3), Google Cloud Storage (board_gcs), and Azure blob storage (board_azure)." + "text": "Usage\nTo use the pins package, you must first create a pin board. A good place to start is board_folder, which stores pins in a directory you specify. Here we’ll use a special version of board_folder called board_temp which creates a temporary board that’s automatically deleted when your Python script or notebook session ends. This is great for examples, but obviously you shouldn’t use it for real work!\n\nimport pins\nfrom pins.data import mtcars\n\nboard = pins.board_temp()\n\nYou can “pin” (save) data to a board with the pin_write method. It requires three arguments: an object, a name, and a pin type:\n\nboard.pin_write(mtcars.head(), \"mtcars\", type=\"csv\")\n\nWriting pin:\nName: 'mtcars'\nVersion: 20240329T183958Z-120a5\n\n\nMeta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20240329T183958Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2024, 3, 29, 18, 39, 58, 661525), hash='120a54f7e0818041'), tags=None, name='mtcars', user={}, local={})\n\n\nAbove, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the type argument to instead save it as a joblib, parquet, or json file.\nYou can later retrieve the pinned data with pin_read:\n\nboard.pin_read(\"mtcars\")\n\n\n\n\n\n\n\n\n\nmpg\ncyl\ndisp\nhp\ndrat\nwt\nqsec\nvs\nam\ngear\ncarb\n\n\n\n\n0\n21.0\n6\n160.0\n110\n3.90\n2.620\n16.46\n0\n1\n4\n4\n\n\n1\n21.0\n6\n160.0\n110\n3.90\n2.875\n17.02\n0\n1\n4\n4\n\n\n2\n22.8\n4\n108.0\n93\n3.85\n2.320\n18.61\n1\n1\n4\n1\n\n\n3\n21.4\n6\n258.0\n110\n3.08\n3.215\n19.44\n1\n0\n3\n1\n\n\n4\n18.7\n8\n360.0\n175\n3.15\n3.440\n17.02\n0\n0\n3\n2\n\n\n\n\n\n\n\n\nA board on your computer is good place to start, but the real power of pins comes when you use a board that’s shared with multiple people. To get started, you can use board_folder with a directory on a shared drive or in DropBox, or if you use Posit Connect you can use board_connect:\n# Note that this uses one approach to connecting,\n# the environment variables CONNECT_SERVER and CONNECT_API_KEY\n\nboard = pins.board_connect()\nboard.pin_write(tidy_sales_data, \"hadley/sales-summary\", type=\"csv\")\nThen, someone else (or an automated report) can read and use your pin:\nboard = board_connect()\nboard.pin_read(\"hadley/sales-summary\")\nYou can easily control who gets to access the data using the Posit Connect permissions pane.\nThe pins package also includes boards that allow you to share data on services like Amazon’s S3 (board_s3), Google Cloud Storage (board_gcs), and Azure blob storage (board_azure)." }, { "objectID": "index.html#contributing",