From b77d57750b7799b83e42611f85b7baae4cae2d35 Mon Sep 17 00:00:00 2001 From: Kristin Cowalcijk Date: Fri, 23 Aug 2024 00:41:54 +0800 Subject: [PATCH] Update documentation --- docs/api/sql/Constructor.md | 46 ------------------------- docs/tutorial/sql.md | 69 +++++++++++++++++++++++++++++++++++-- 2 files changed, 67 insertions(+), 48 deletions(-) diff --git a/docs/api/sql/Constructor.md b/docs/api/sql/Constructor.md index d84f2c8b84..a1f715a0a7 100644 --- a/docs/api/sql/Constructor.md +++ b/docs/api/sql/Constructor.md @@ -1,49 +1,3 @@ -## Read ESRI Shapefile - -Introduction: Construct a DataFrame from a Shapefile - -Since: `v1.0.0` - -SparkSQL example: - -```scala -var spatialRDD = new SpatialRDD[Geometry] -spatialRDD.rawSpatialRDD = ShapefileReader.readToGeometryRDD(sparkSession.sparkContext, shapefileInputLocation) -var rawSpatialDf = Adapter.toDf(spatialRDD,sparkSession) -rawSpatialDf.createOrReplaceTempView("rawSpatialDf") -var spatialDf = sparkSession.sql(""" - | ST_GeomFromWKT(rddshape), _c1, _c2 - | FROM rawSpatialDf - """.stripMargin) -spatialDf.show() -spatialDf.printSchema() -``` - -!!!note - The path to the shapefile is the path to the folder that contains the .shp file, not the path to the .shp file itself. The file extensions of .shp, .shx, .dbf must be in lowercase. Assume you have a shape file called ==myShapefile==, the path should be `XXX/myShapefile`. The file structure should be like this: - ``` - - shapefile1 - - shapefile2 - - myshapefile - - myshapefile.shp - - myshapefile.shx - - myshapefile.dbf - - myshapefile... - - ... - ``` - -!!!warning - Please make sure you use ==ST_GeomFromWKT== to create Geometry type column otherwise that column cannot be used in SedonaSQL. - -If the file you are reading contains non-ASCII characters you'll need to explicitly set the Spark config before initializing the SparkSession, then you can use `ShapefileReader.readToGeometryRDD`. - -Example: - -```scala -spark.driver.extraJavaOptions -Dsedona.global.charset=utf8 -spark.executor.extraJavaOptions -Dsedona.global.charset=utf8 -``` - ## ST_GeomCollFromText Introduction: Constructs a GeometryCollection from the WKT with the given SRID. If SRID is not provided then it defaults to 0. It returns `null` if the WKT is not a `GEOMETRYCOLLECTION`. diff --git a/docs/tutorial/sql.md b/docs/tutorial/sql.md index 0b617384be..c21ea12db1 100644 --- a/docs/tutorial/sql.md +++ b/docs/tutorial/sql.md @@ -459,9 +459,74 @@ root |-- prop0: string (nullable = true) ``` -## Load Shapefile using SpatialRDD +## Load Shapefile -Shapefile can be loaded by SpatialRDD and converted to DataFrame using Adapter. Please read [Load SpatialRDD](rdd.md#create-a-generic-spatialrdd) and [DataFrame <-> RDD](#convert-between-dataframe-and-spatialrdd). +Since v`1.7.0`, Sedona supports loading Shapefile as a DataFrame. + +=== "Scala/Java" + + ```scala + val df = sedona.read.format("shapefile").load("/path/to/shapefile") + ``` + +=== "Java" + + ```java + Dataset df = sedona.read().format("shapefile").load("/path/to/shapefile") + ``` + +=== "Python" + + ```python + df = sedona.read.format("shapefile").load("/path/to/shapefile") + ``` + +The input path can be a directory containing one or multiple shapefiles, or path to a `.shp` file. + +- When the input path is a directory, all shapefiles under the directory will be loaded. +- When the input path is a `.shp` file, that shapefile will be loaded. Sedona will look for sibling files (`.dbf`, `.shx`, etc.) with the same main file name and load them automatically. + +The name of the geometry column is `geometry` by default. You can change the name of the geometry column using the `geometry.name` option. If one of the non-spatial attributes is named "geometry", `geometry.name` must be configured to avoid conflict. + +=== "Scala/Java" + + ```scala + val df = sedona.read.format("shapefile").option("geometry.name", "geom").load("/path/to/shapefile") + ``` + +=== "Java" + + ```java + Dataset df = sedona.read().format("shapefile").option("geometry.name", "geom").load("/path/to/shapefile") + ``` + +=== "Python" + + ```python + df = sedona.read.format("shapefile").option("geometry.name", "geom").load("/path/to/shapefile") + ``` + +Each record in shapefile has a unique record number, that record number is not loaded by default. If you want to include record number in the loaded DataFrame, you can set the `key.name` option to the name of the record number column: + +=== "Scala/Java" + + ```scala + val df = sedona.read.format("shapefile").option("key.name", "FID").load("/path/to/shapefile") + ``` + +=== "Java" + + ```java + Dataset df = sedona.read().format("shapefile").option("key.name", "FID").load("/path/to/shapefile") + ``` + +=== "Python" + + ```python + df = sedona.read.format("shapefile").option("key.name", "FID").load("/path/to/shapefile") + ``` + +If you are using Sedona earlier than v`1.7.0`, you can load shapefiles as SpatialRDD and converted to DataFrame using Adapter. Please read [Load SpatialRDD](rdd.md#create-a-generic-spatialrdd) and [DataFrame <-> RDD](#convert-between-dataframe-and-spatialrdd). ## Load GeoParquet