Skip to content

Commit

Permalink
Merge pull request #256 from vizzuhq/spark
Browse files Browse the repository at this point in the history
Added: spark DataFrame support, added `typing_extensions` dependency on Python < 3.8
  • Loading branch information
veghdev committed Aug 15, 2023
2 parents a1f2859 + 9f997d2 commit e06c5a3
Show file tree
Hide file tree
Showing 54 changed files with 2,299 additions and 1,086 deletions.
6 changes: 3 additions & 3 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ pip install -U ipyvizzu
```

!!! note
`ipyvizzu` has some extra dependencies such as `pandas`, `numpy` and
`fugue`.
`ipyvizzu` can be used with some extra dependencies such as `pandas`,
`pyspark`, `numpy` and `fugue`.

For example if you would like to work with `pandas` `DataFrame` and
`ipyvizzu`, you can install `pandas` as an extra:
`ipyvizzu`, you should install `pandas` as an extra:

```sh
pip install ipyvizzu[pandas]
Expand Down
60 changes: 57 additions & 3 deletions docs/tutorial/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ There are multiple ways you can add data to `ipyvizzu`.

Use
[`add_df`](../reference/ipyvizzu/animation.md#ipyvizzu.animation.Data.add_df)
method for adding data frame to
method for adding `pandas` DataFrame to
[`Data`](../reference/ipyvizzu/animation.md#ipyvizzu.animation.Data).

```python
Expand Down Expand Up @@ -143,12 +143,12 @@ df = pd.DataFrame(
)

data = Data()
data.add_df_index(df, column_name="IndexColumnName")
data.add_df(df)
data.add_df_index(df, name="IndexColumnName")
```

!!! note
If you want to work with `pandas` `DataFrame` and `ipyvizzu`, you need to
If you want to work with `pandas` DataFrame and `ipyvizzu`, you need to
install `pandas` or install it as an extra:

```sh
Expand Down Expand Up @@ -320,6 +320,60 @@ data.add_df(df)
You'll need to adjust the SQL query and the database connection parameters
to match your specific use case.

### Using `pyspark` DataFrame

Use
[`add_df`](../reference/ipyvizzu/animation.md#ipyvizzu.animation.Data.add_df)
method for adding `pyspark` DataFrame to
[`Data`](../reference/ipyvizzu/animation.md#ipyvizzu.animation.Data).

```python
from pyspark.sql import SparkSession
from pyspark.sql.types import (
StructType,
StructField,
StringType,
IntegerType,
)
from ipyvizzu import Data


spark = SparkSession.builder.appName("ipyvizzu").getOrCreate()
spark_schema = StructType(
[
StructField("Genres", StringType(), True),
StructField("Kinds", StringType(), True),
StructField("Popularity", IntegerType(), True),
]
)
spark_data = [
("Pop", "Hard", 114),
("Rock", "Hard", 96),
("Jazz", "Hard", 78),
("Metal", "Hard", 52),
("Pop", "Smooth", 56),
("Rock", "Experimental", 36),
("Jazz", "Smooth", 174),
("Metal", "Smooth", 121),
("Pop", "Experimental", 127),
("Rock", "Experimental", 83),
("Jazz", "Experimental", 94),
("Metal", "Experimental", 58),
]
df = spark.createDataFrame(spark_data, spark_schema)

data = Data()
data.add_df(df)
```

!!! note
If you want to work with `pyspark` DataFrame and `ipyvizzu`, you need to
install `pyspark` or install it as an extra:

```sh
pip install ipyvizzu[pyspark]
```

### Using `numpy` Array

Use
Expand Down
Loading

0 comments on commit e06c5a3

Please sign in to comment.