diff --git a/assets/example_applying_mask.svg b/assets/example_applying_mask.svg new file mode 100644 index 0000000..bc10add --- /dev/null +++ b/assets/example_applying_mask.svg @@ -0,0 +1,444 @@ + + + + + + + + + + + + + + + + + + + + + A + + + + + + + Pineapple + + + + + + + Tomato + + + + + + + Green Pepper + + + + + + + Cheese + + + + + + + + C + + + + + + + 0.1 + + + + + + + 0.2 + + + + + + + 0.6 + + + + + + + 0.3 + + + + + + + + index + + + + + + + index + + + + + + + 0 + + + + + + + 1 + + + + + + + 2 + + + + + + + 3 + + + + + + + + + + + + + + + B + + + + + + + 90 + + + + + + + 20 + + + + + + + 30 + + + + + + + 10 + + + + + + + + + A + + + + + + + Pineapple + + + + + + + Green Pepper + + + + + + + + + C + + + + + + + 0.1 + + + + + + + 0.6 + + + + + + + + + index + + + + + + + index + + + + + + + 0 + + + + + + + 2 + + + + + + + + + B + + + + + + + 90 + + + + + + + 30 + + + + + + + + + index + + + + + + + index + + + + + + + 0 + + + + + + + 1 + + + + + + + 2 + + + + + + + 3 + + + + + + + + + + + + True + + + + + + + False + + + + + + + True + + + + + + + False + + + + + + An Example of Applying a Mask + + + + + + filtered_dataframe = dataframe[mask] + + + Is this row + True + in + the mask? + + + Rows are matched by the + index label + . + + + + + + + + + + + + Yes + + + + + + + + + + + + No + + + + + + + + + + + + No + + + + + + + + + + + + Yes + + + mask + + + dataframe + + + filtered_dataframe (result) + + + + + + The result is a + DataFrame + . It only + contains the rows that were + True + in + the mask. + + + + + diff --git a/assets/example_dataframe.svg b/assets/example_dataframe.svg new file mode 100644 index 0000000..bbf6c7b --- /dev/null +++ b/assets/example_dataframe.svg @@ -0,0 +1,200 @@ + + + + + + + + + + + + + + + + + + + + + A + + + + + + + B + + + + + + + C + + + + + + + Pineapple + + + + + + + 90 + + + + + + + 0.1 + + + + + + + Tomato + + + + + + + 20 + + + + + + + 0.2 + + + + + + + Green Pepper + + + + + + + 30 + + + + + + + 0.6 + + + + + + + Cheese + + + + + + + 10 + + + + + + + 0.3 + + + + + + + + index + + + + + + + 0 + + + + + + + 1 + + + + + + + 2 + + + + + + + 3 + + + + + + + + + + + The Dataframe (A Pandas Data Structure) + + + Columns + + + Rows + + + An + index + can be any kind of value (integer, string, date/time, etc). + Each value is unique. The index is + not + a column. + + + + + + + + + + + + + + + + + + + + diff --git a/assets/example_mask.svg b/assets/example_mask.svg new file mode 100644 index 0000000..59381f8 --- /dev/null +++ b/assets/example_mask.svg @@ -0,0 +1,370 @@ + + + + + + + + + + + + + + + + + + + + + A + + + + + + + Pineapple + + + + + + + Tomato + + + + + + + Green Pepper + + + + + + + Cheese + + + + + + + + + B + + + + + + + 90 + + + + + + + 20 + + + + + + + 30 + + + + + + + 10 + + + + + + + + + C + + + + + + + 0.1 + + + + + + + 0.2 + + + + + + + 0.6 + + + + + + + 0.3 + + + + + + + + + index + + + + + + + index + + + + + + + 0 + + + + + + + 1 + + + + + + + 2 + + + + + + + 3 + + + + + + + + + B + + + + + + + 90 + + + + + + + 20 + + + + + + + 30 + + + + + + + 10 + + + + + + + + + index + + + + + + + index + + + + + + + 0 + + + + + + + 1 + + + + + + + 2 + + + + + + + 3 + + + + + + + + + B + + + + + + + True + + + + + + + False + + + + + + + True + + + + + + + False + + + + + + An Example of Making a Mask + + + + + + mask = dataframe["B"] > 25 + + + Is the value in column B + greater than 25? + + + + + + + + + + + + + + + + + + + + + + + + Yes + + + + + + + + + + + + Yes + + + + + + No + + + + + + No + + + mask (result) + + + The result is a + Series + assigned to the variable + 'mask'. + This is a series of + boolean + values. + Multi-criteria masks will + not have a + name + . + + + + + diff --git a/assets/example_series.svg b/assets/example_series.svg new file mode 100644 index 0000000..d895cbd --- /dev/null +++ b/assets/example_series.svg @@ -0,0 +1,104 @@ + + + + + + + + + + + + + + + + + + + + + A + + + + + + + Pineapple + + + + + + + Tomato + + + + + + + Green Pepper + + + + + + + Cheese + + + + + + + + index + + + + + + + 0 + + + + + + + 1 + + + + + + + 2 + + + + + + + 3 + + + + + The Series (A Pandas Data Structure) + + + The + name + of this + series is "A" + + + + + + + + diff --git a/assets/jupyterlab.png b/assets/jupyterlab.png new file mode 100644 index 0000000..20a34cd Binary files /dev/null and b/assets/jupyterlab.png differ diff --git a/assets/pd_merge.svg b/assets/pd_merge.svg new file mode 100644 index 0000000..4946b49 --- /dev/null +++ b/assets/pd_merge.svg @@ -0,0 +1,405 @@ + + + + + + + + + + + + + + + + + + + + index + + + + + + + + + + + + + + + + index + + + + + + + ID + + + + + + + + + + + + + + + + + ID + + + + + + + + + + + + + + + + + + A + + + + + + + + + + + + + + + + B + + + + + + + + + + + + + + + + + + A + + + + + + + + + + + + + + + + B + + + + + + + + + + + + + + + + + index + + + + + + + + + + + + + + + + + ID + + + + + + + + + + + + + + + + + + C + + + + + + + + + + + + + + + + D + + + + + + + + + + + + + + + + E + + + + + + + + + + + + + + + 001 + + + 122 + + + 010 + + + + + + + + + + + + + + + C + + + + + + + + + + + + + + + + D + + + + + + + + + + + + + + + + E + + + + + + + + + + + + + + + Merging Two DataFrames with a Shared Column (An Inner Join) + + + dataframe1 + + + Will not be + in result. + + + dataframe2 + + + The result will only contain rows where the + ID was in both dataframes. + The result is a + DataFrame + with a new + index. + + + + + + 001 + + + 122 + + + 010 + + + 001 + + + + + + + + + + + + + + + + + + + + + + + + 122 + + + 010 + + + pd.merge(dataframe1, dataframe2, on="ID") + + + + + + + + + + + + + + + + + + + 113 + + + + + + + + + + + + 123 + + + + + + + + + + + Result: + + + + + diff --git a/assets/select_column.svg b/assets/select_column.svg new file mode 100644 index 0000000..6082db3 --- /dev/null +++ b/assets/select_column.svg @@ -0,0 +1,185 @@ + + + + + + + + + + + + + + + + + + + + + + index + + + + + + + + + + + + + + + + + + + + + + + index + + + + + + + + + + + + + + + + + + + + + + A + + + + + + + + + + + + + + + + + + + + + C + + + + + + + + + + + + + + + + + + + + + B + + + + + + + + + + + + + + + + + + + + + B + + + + + + + + + + + + + + + + + + + + + + + + Selecting One Column + + + df["B"] + + + The result is a + Series + with the same index. + + + + + + + + + Result: + + + + + + + + + + + + + + diff --git a/assets/select_multiple_columns.svg b/assets/select_multiple_columns.svg new file mode 100644 index 0000000..c92664e --- /dev/null +++ b/assets/select_multiple_columns.svg @@ -0,0 +1,208 @@ + + + + + + + + + + + + + + + + + + + + + index + + + + + + + + + + + + + + + + + + + + + index + + + + + + + + + + + + + + + + + + + + + A + + + + + + + + + + + + + + + + + + + + + C + + + + + + + + + + + + + + + + + + + + + B + + + + + + + + + + + + + + + + + + + + + B + + + + + + + + + + + + + + + + + + + + + C + + + + + + + + + + + + + + + + + + + + + + + + Selecting Multiple Columns + + + df[["B", "C"]] + + + The result is a + DataFrame + with the same index. + + + + + + + + + + + + + + + Result: + + + + + + + + + + + + + + diff --git a/assets/select_rows.svg b/assets/select_rows.svg new file mode 100644 index 0000000..ac9574f --- /dev/null +++ b/assets/select_rows.svg @@ -0,0 +1,196 @@ + + + + + + + + + + + + + + + + + + + + + index + + + + + + + + + + + + + + index + + + + + 0 + + + 3 + + + + + A + + + + + + + A + + + + + + + C + + + + + + + C + + + + + + + + + + + + + + + + B + + + + + + + B + + + + + + + + + + + + + + Selecting Rows with loc + + + df.loc[1:2] + + + The result is a + DataFrame + with the + same columns. + + + Result: + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1 + + + 2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1 + + + 2 + + + + + + + + + + + + + + + diff --git a/assets/select_rows_and_cols.svg b/assets/select_rows_and_cols.svg new file mode 100644 index 0000000..8378571 --- /dev/null +++ b/assets/select_rows_and_cols.svg @@ -0,0 +1,171 @@ + + + + + + + + + + + + + + + + + + + + + index + + + + + + + + + + + 0 + + + 3 + + + + + + index + + + + + + + + A + + + + + + + C + + + + + + + + + + + + + + + + + + + + + + + + + + + + B + + + + + + + + + + + + + + + + + B + + + + + Selecting Rows and a Single Column with loc + + + df.loc[1:2, "B"] + + + The result is a + Series + . + + + Result: + + + + + + + + + + + + + + + + 1 + + + 2 + + + + + + + + + + + + + + + + + 1 + + + 2 + + + + + + + + + + + + + + + diff --git a/pandas_diagrams.afdesign b/pandas_diagrams.afdesign new file mode 100644 index 0000000..1df9c42 Binary files /dev/null and b/pandas_diagrams.afdesign differ