Skip to content

Commit

Permalink
Typo repair and PEP8 cleanup (#1190)
Browse files Browse the repository at this point in the history
* Fixed typos, spelling, and grammar

* Fixed several simple PEP warnings

* Reverted changes to _vendor folder

* Black formatting

* Black linting

* Reset _vendor folder after black formatting
  • Loading branch information
InterferencePattern authored Nov 28, 2022
1 parent 6764209 commit 2061104
Show file tree
Hide file tree
Showing 112 changed files with 360 additions and 336 deletions.
2 changes: 1 addition & 1 deletion R/inst/tutorials/02-statistics/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Episode 02-statistics: Is this Data Science?

**Use metaflow to load the movie metadata CSV file into a data frame and compute some movie genre specific statistics. These statistics are then used in
**Use metaflow to load the movie metadata CSV file into a data frame and compute some movie genre-specific statistics. These statistics are then used in
later examples to improve our playlist generator. You can optionally use the
Metaflow client to eyeball the results in a Markdown Notebook, and make some simple
plots.**
Expand Down
2 changes: 1 addition & 1 deletion R/inst/tutorials/02-statistics/stats.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ output:
df_print: paged
---

MovieStatsFlow loads the movie metadata CSV file into a Pandas Dataframe and computes some movie genre specific statistics. You can use this notebook and the Metaflow client to eyeball the results and make some simple plots.
MovieStatsFlow loads the movie metadata CSV file into a Pandas Dataframe and computes some movie genre-specific statistics. You can use this notebook and the Metaflow client to eyeball the results and make some simple plots.

```{r}
suppressPackageStartupMessages(library(metaflow))
Expand Down
2 changes: 1 addition & 1 deletion R/inst/tutorials/05-statistics-redux/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ running on remote compute. In this example we re-run the 'stats.R' workflow
adding the '--with batch' command line argument. This instructs Metaflow to run
all your steps on AWS batch without changing any code. You can control the
behavior with additional arguments, like '--max-workers'. For this example,
'max-workers' is used to limit the number of parallel genre specific statistics
'max-workers' is used to limit the number of parallel genre-specific statistics
computations.
You can then access the data artifacts (even the local CSV file) from anywhere
because the data is being stored in AWS S3.**
Expand Down
18 changes: 9 additions & 9 deletions docs/Environment escape.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ but *some* can execute in another Python environment.
At a high-level, the environment escape plugin allows a Python interpreter to
forward calls to another interpreter. To set semantics, we will say that a
*client* interpreter escapes to a *server* interpreter. The *server* interpreter
operates in a slave-like mode with regards to the *client*. To give a concrete
operates in a slave-like mode with regard to the *client*. To give a concrete
example, imagine a package ``data_accessor`` that is available in the base
environment you are executing in but not in your Conda environment. When
executing within the Conda environment, the *client* interpreter is the Conda
Expand Down Expand Up @@ -69,7 +69,7 @@ identifier to find the correct stub. There is therefore a **one-to-one mapping
between stub objects on the client and backing objects on the server**.

The next method called on ```job``` is ```wait``` which returns ```None```. In
this system, by design, only certain objects are able to be transferred between
this system, by design, only certain objects may be transferred between
the client and the server:
- any Python basic type; this can be extended to any object that can be pickled
without any external library;
Expand Down Expand Up @@ -224,9 +224,9 @@ everything to the server:
performs computations at the request of the client when the client is unable
to do so.

The server is thus started by the client and the client is responsible for
terminating it when it dies. A big part of the client and server code consist
in loading the configuration for the emulated module, particularly the
The server is thus started by the client, and the client is responsible for
terminating the server when it dies. A big part of the client and server code
consist in loading the configuration for the emulated module, particularly the
overrides.

The steps to bringing up the client/server connection are as follows:
Expand Down Expand Up @@ -274,7 +274,7 @@ used).

## Defining an emulated module

To define an emulated module, you need to create a sub directory in
To define an emulated module, you need to create a subdirectory in
```plugins/env_escape/configurations``` called ```emulate_<name>``` where
```<name>``` is the name of the library you want to emulate. It can be a "list"
where ```__``` is the list separator; this allows multiple libraries to be
Expand All @@ -286,9 +286,9 @@ create two files:
- ```EXPORTED_CLASSES```: This is a dictionary of dictionary describing the
whitelisted classes. The outermost key is either a string or a tuple of
strings and corresponds to the "module" name (it doesn't really have to be
the module but the prefix of the full name of the whitelisted class)). The
the module but the prefix of the full name of the whitelisted class). The
inner key is a string and corresponds to the suffix of the whitelisted
class. Finally, the value is the class that the class maps to internally. If
class. Finally, the value is the class to which the class maps internally. If
the outermost key is a tuple, all strings in that tuple will be considered
aliases of one another.
- ```EXPORTED_FUNCTIONS```: This is the same structure as
Expand Down Expand Up @@ -324,7 +324,7 @@ create two files:
define how attributes are accessed. Note that this is not restricted to
attributes accessed using the ```getattr``` and ```setattr``` functions but
any attribute. Both of these functions take as arguments ```stub```,
```name``` and ```func``` which is the function to call to call the remote
```name``` and ```func``` which is the function to call in order to call the remote
```getattr``` or ```setattr```. The ```setattr``` version takes an additional
```value``` argument. The remote versions simply take the target object and
the name of the attribute (and ```value``` if it is a ```setattr``` override)
Expand Down
18 changes: 9 additions & 9 deletions docs/cards.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Metaflow cards can be created by placing an [`@card` decorator](#@card-decorator

Since the cards are stored in the datastore we can access them via the `view/get` commands in the [card_cli](#card-cli) or by using the `get_cards` [function](../metaflow/plugins/cards/card_client.py).

Metaflow ships with a [DefaultCard](#defaultcard) which visualizes artifacts, images, and `pandas.Dataframe`s. Metaflow also ships custom components like `Image`, `Table`, `Markdown` etc. These can be added to a card at `Task` runtime. Cards can also be edited from `@step` code using the [current.card](#editing-metaflowcard-from-@step-code) interface. `current.card` helps add `MetaflowCardComponent`s from `@step` code to a `MetaflowCard`. `current.card` offers methods like `current.card.append` or `current.card['myid']` to helps add components to a card. Since there can be many `@card`s over a `@step`, `@card` also comes with an `id` argument. The `id` argument helps disambigaute the card a component goes to when using `current.card`. For example, setting `@card(id='myid')` and calling `current.card['myid'].append(x)` will append `MetaflowCardComponent` `x` to the card with `id='myid'`.
Metaflow ships with a [DefaultCard](#defaultcard) which visualizes artifacts, images, and `pandas.Dataframe`s. Metaflow also ships custom components like `Image`, `Table`, `Markdown` etc. These can be added to a card at `Task` runtime. Cards can also be edited from `@step` code using the [current.card](#editing-metaflowcard-from-@step-code) interface. `current.card` helps add `MetaflowCardComponent`s from `@step` code to a `MetaflowCard`. `current.card` offers methods like `current.card.append` or `current.card['myid']` to helps add components to a card. Since there can be many `@card`s over a `@step`, `@card` also comes with an `id` argument. The `id` argument helps disambiguate the card a component goes to when using `current.card`. For example, setting `@card(id='myid')` and calling `current.card['myid'].append(x)` will append `MetaflowCardComponent` `x` to the card with `id='myid'`.

### `@card` decorator
The `@card` [decorator](../metaflow/plugins/cards/card_decorator.py) is implemented by inheriting the `StepDecorator`. The decorator can be placed over `@step` to create an HTML file visualizing information from the task.
Expand Down Expand Up @@ -75,7 +75,7 @@ if __name__ == "__main__":


### `CardDatastore`
The [CardDatastore](../metaflow/plugins/cards/card_datastore.py) is used by the the [card_cli](#card-cli) and the [metaflow card client](#access-cards-in-notebooks) (`get_cards`). It exposes methods to get metadata about a card and the paths to cards for a `pathspec`.
The [CardDatastore](../metaflow/plugins/cards/card_datastore.py) is used by the [card_cli](#card-cli) and the [metaflow card client](#access-cards-in-notebooks) (`get_cards`). It exposes methods to get metadata about a card and the paths to cards for a `pathspec`.

### Card CLI
Methods exposed by the [card_cli](../metaflow/plugins/cards/.card_cli.py). :
Expand Down Expand Up @@ -142,12 +142,12 @@ class CustomCard(MetaflowCard):

The class consists of the `_get_mustache` method that returns [chevron](https://github.com/noahmorrison/chevron) object ( a `mustache` based [templating engine](http://mustache.github.io/mustache.5.html) ). Using the `mustache` templating engine you can rewrite HTML template file. In the above example the `PATH_TO_CUSTOM_HTML` is the file that holds the `mustache` HTML template.
#### Attributes
- `type (str)` : The `type` of card. Needs to ensure correct resolution.
- `ALLOW_USER_COMPONENTS (bool)` : Setting this to `True` will make the a card be user editable. More information on user editable cards can be found [here](#editing-metaflowcard-from-@step-code).
- `type (str)` : The `type` of card. Needs to ensure correct resolution.
- `ALLOW_USER_COMPONENTS (bool)` : Setting this to `True` will make the card be user editable. More information on user editable cards can be found [here](#editing-metaflowcard-from-@step-code).

#### `__init__` Parameters
- `components` `(List[str])`: `components` is a list of `render`ed `MetaflowCardComponent`s created at `@step` runtime. These are passed to the `card create` cli command via a tempfile path in the `--component-file` argument.
- `graph` `(Dict[str,dict])`: The DAG associated to the flow. It is a dictionary of the form `stepname:step_attributes`. `step_attributes` is a dictionary of metadata about a step , `stepname` is the name of the step in the DAG.
- `graph` `(Dict[str,dict])`: The DAG associated to the flow. It is a dictionary of the form `stepname:step_attributes`. `step_attributes` is a dictionary of metadata about a step , `stepname` is the name of the step in the DAG.
- `options` `(dict)`: helps control the behavior of individual cards.
- For example, the `DefaultCard` supports `options` as dictionary of the form `{"only_repr":True}`. Here setting `only_repr` as `True` will ensure that all artifacts are serialized with `reprlib.repr` function instead of native object serialization.

Expand Down Expand Up @@ -201,7 +201,7 @@ class CustomCard(MetaflowCard):
```

### `DefaultCard`
The [DefaultCard](../metaflow/plugins/cards/card_modules/basic.py) is a default card exposed by metaflow. This will be used when the `@card` decorator is called without any `type` argument or called with `type='default'` argument. It will also be the default card used with cli. The card uses a [HTML template](../metaflow/plugins/cards/card_modules/base.html) along with a [JS](../metaflow/plugins/cards/card_modules/main.js) and a [CSS](../metaflow/plugins/cards/card_modules/bundle.css) files.
The [DefaultCard](../metaflow/plugins/cards/card_modules/basic.py) is a default card exposed by metaflow. This will be used when the `@card` decorator is called without any `type` argument or called with `type='default'` argument. It will also be the default card used with cli. The card uses an [HTML template](../metaflow/plugins/cards/card_modules/base.html) along with a [JS](../metaflow/plugins/cards/card_modules/main.js) and a [CSS](../metaflow/plugins/cards/card_modules/bundle.css) files.

The [HTML](../metaflow/plugins/cards/card_modules/base.html) is a template which works with [JS](../metaflow/plugins/cards/card_modules/main.js) and [CSS](../metaflow/plugins/cards/card_modules/bundle.css).

Expand Down Expand Up @@ -237,17 +237,17 @@ def train(self):
)
self.next(self.end)
```
In the above scenario there are two `@card` decorators which are being customized by `current.card`. The `current.card.append`/ `current.card['a'].append` methods only accepts objects which are subclasses of `MetaflowCardComponent`. The `current.card.append`/ `current.card['a'].append` methods only add a component to **one** card. Since there can be many cards for a `@step`, a **default editabled card** is resolved to disambiguate which card has access to the `append`/`extend` methods within the `@step`. A default editable card is a card that will have access to the `current.card.append`/`current.card.extend` methods. `current.card` resolve the default editable card before a `@step` code gets executed. It sets the default editable card once the last `@card` decorator calls the `task_pre_step` callback. In the above case, `current.card.append` will add a `Markdown` component to the card of type `default`. `current.card['a'].append` will add the `Markdown` to the `blank` card whose `id` is `a`. A `MetaflowCard` can be user editable, if `ALLOW_USER_COMPONENTS` is set to `True`. Since cards can be of many types, **some cards can also be non editable by users** (Cards with `ALLOW_USER_COMPONENTS=False`). Those cards won't be eligible to access the `current.card.append`. A non user editable card can be edited through expicitly setting an `id` and accessing it via `current.card['myid'].append` or by looking it up by its type via `current.card.get(type=’pytorch’)`.
In the above scenario there are two `@card` decorators which are being customized by `current.card`. The `current.card.append`/ `current.card['a'].append` methods only accepts objects which are subclasses of `MetaflowCardComponent`. The `current.card.append`/ `current.card['a'].append` methods only add a component to **one** card. Since there can be many cards for a `@step`, a **default editable card** is resolved to disambiguate which card has access to the `append`/`extend` methods within the `@step`. A default editable card is a card that will have access to the `current.card.append`/`current.card.extend` methods. `current.card` resolve the default editable card before a `@step` code gets executed. It sets the default editable card once the last `@card` decorator calls the `task_pre_step` callback. In the above case, `current.card.append` will add a `Markdown` component to the card of type `default`. `current.card['a'].append` will add the `Markdown` to the `blank` card whose `id` is `a`. A `MetaflowCard` can be user editable, if `ALLOW_USER_COMPONENTS` is set to `True`. Since cards can be of many types, **some cards can also be non-editable by users** (Cards with `ALLOW_USER_COMPONENTS=False`). Those cards won't be eligible to access the `current.card.append`. A non-user editable card can be edited through explicitly setting an `id` and accessing it via `current.card['myid'].append` or by looking it up by its type via `current.card.get(type=’pytorch’)`.

#### `current.card` (`CardComponentCollector`)

The `CardComponentCollector` is the object responsible for resolving a `MetaflowCardComponent` to the card referenced in the `@card` decorator.

Since there can be many cards, `CardComponentCollector` has a `_finalize` function. The `_finalize` function is called once the **last** `@card` decorator calls `task_pre_step`. The `_finalize` function will try to find the **default editable card** from all the `@card` decorators on the `@step`. The default editable card is the card that can access the `current.card.append`/`current.card.extend` methods. If there are multiple editable cards with no `id` then `current.card` will throw warnings when users call `current.card.append`. This is done because `current.card` cannot resolve which card the component belongs.
Since there can be many cards, `CardComponentCollector` has a `_finalize` function. The `_finalize` function is called once the **last** `@card` decorator calls `task_pre_step`. The `_finalize` function will try to find the **default editable card** from all the `@card` decorators on the `@step`. The default editable card is the card that can access the `current.card.append`/`current.card.extend` methods. If there are multiple editable cards with no `id` then `current.card` will throw warnings when users call `current.card.append`. This is done because `current.card` cannot resolve which card the component belongs.

The `@card` decorator also exposes another argument called `customize=True`. **Only one `@card` decorator over a `@step` can have `customize=True`**. Since cards can also be added from CLI when running a flow, adding `@card(customize=True)` will set **that particular card** from the decorator as default editable. This means that `current.card.append` will append to the card belonging to `@card` with `customize=True`. If there is more than one `@card` decorator with `customize=True` then `current.card` will throw warnings that `append` won't work.

One important feature of the `current.card` object is that it will not fail. Even when users try to access `current.card.append` with multiple editable cards, we throw warnings but don't fail. `current.card` will also not fail when a user tries to access a card of a non-existing id via `current.card['mycard']`. Since `current.card['mycard']` gives reference to a `list` of `MetaflowCardComponent`s, `current.card` will return a non-referenced `list` when users try to access the dictionary inteface with a non existing id (`current.card['my_non_existant_card']`).
One important feature of the `current.card` object is that it will not fail. Even when users try to access `current.card.append` with multiple editable cards, we throw warnings but don't fail. `current.card` will also not fail when a user tries to access a card of a non-existing id via `current.card['mycard']`. Since `current.card['mycard']` gives reference to a `list` of `MetaflowCardComponent`s, `current.card` will return a non-referenced `list` when users try to access the dictionary interface with a nonexistent id (`current.card['my_non_existant_card']`).

Once the `@step` completes execution, every `@card` decorator will call `current.card._serialize` (`CardComponentCollector._serialize`) to get a JSON serializable list of `str`/`dict` objects. The `_serialize` function internally calls all [component's](#metaflowcardcomponent) `render` function. This list is `json.dump`ed to a `tempfile` and passed to the `card create` subprocess where the `MetaflowCard` can use them in the final output.

Expand Down
6 changes: 3 additions & 3 deletions docs/concurrency.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Concurrency is practically never needed during the first two phases.

We divide the concurrency constructs into two categories: Primary and
Secondary. Whenever possible, you should prefer the constructs in
the first category. The patterns are well established and they have
the first category. The patterns are well established and have
been used successfully in the core Metaflow modules, `runtime.py`
and `task.py`. The constructs in the second category can be used in
subprocesses, outside the core code paths in `runtime.py` and `task.py`.
Expand Down Expand Up @@ -109,7 +109,7 @@ delay, to avoid the parent from blocking.

The sidecar subprocess may die for various reasons, in which case
messages sent to it by the parent may be lost. To keep communication
essentially non-blocking and fast, there is no blocking acklowdgement of
essentially non-blocking and fast, there is no blocking acknowledgement of
successful message processing by the sidecar. Hence the communication is
lossy. In this sense, communication with a sidecar is more akin to UDP
than TCP.
Expand Down Expand Up @@ -139,7 +139,7 @@ Use a sidecar if you need a task that runs during scheduling or
execution of user code. A sidecar task can not perform any critical
operations that must succeed in order for a task or a run to be
considered valid. This makes sidecars suitable only for opportunistic,
best effort tasks.
best-effort tasks.

### 3. Data Parallelism

Expand Down
Loading

0 comments on commit 2061104

Please sign in to comment.