Skip to content

Commit

Permalink
Merge pull request #41 from jiayuasu/master
Browse files Browse the repository at this point in the history
Push GeoSpark 0.4.0
  • Loading branch information
jiayuasu authored Dec 22, 2016
2 parents de58ead + 8d13489 commit 99ae47b
Show file tree
Hide file tree
Showing 94 changed files with 15,979 additions and 9,787 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,6 @@
/.settings/
/.classpath
/.project
/dependency-reduced-pom.xml
/dependency-reduced-pom.xml
/bin/
/doc/
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ os:
- linux

language: java
sudo: true
sudo: false
git:
submodules: false

Expand Down
35 changes: 10 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ GeoSpark artifacts are hosted in Maven Central. You can add a Maven dependency w
```
groupId: org.datasyslab
artifactId: geospark
version: 0.3.2
version: 0.4.0
```

The following version supports Apache Spark 1.X versions:

```
groupId: org.datasyslab
artifactId: geospark
version: 0.3.2-spark-1.x
version: 0.4.0-spark-1.x
```


Expand All @@ -31,9 +31,10 @@ version: 0.3.2-spark-1.x

| Version | Summary |
|:----------------: |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 0.4.0| **Major updates:** ([Example](https://github.com/DataSystemsLab/GeoSpark/blob/master/src/main/java/org/datasyslab/geospark/showcase/Example.java)) 1. Refactor constrcutor API usage. 2. Simplify Spatial Join Query API. 3. Add native support for LineStringRDD; **Functionality enhancement:** 1. Release the persist function back to users. 2. Add more exception explanations.
| 0.3.2 | Functionality enhancement: 1. [JTSplus Spatial Objects](https://github.com/jiayuasu/JTSplus) now carry the original input data. Each object stores "UserData" and provides getter and setter. 2. Add a new SpatialRDD constructor to transform a regular data RDD to a spatial partitioned SpatialRDD. |
| 0.3.1 | Bug fix: Support Apache Spark 2.X version, fix a bug which results in inaccurate results when doing join query, add more unit test cases |
| 0.3 | Major updates: Significantly shorten query time on spatial join for skewed data; Support load balanced spatial partitioning methods (also serve as the global index); Optimize code for iterative spatial data mining |
| 0.3 | Major updates: Significantly shorten query time on spatial join for skewed data; Support load balanced spatial partitioning methods (also serve as the global index); Optimize code for iterative spatial data mining ||
| Master branch | even with 0.3.2 |
| Spark 1.X branch | even with 0.3.2 but only supports Apache Spark 1.X |

Expand Down Expand Up @@ -89,36 +90,20 @@ Please refer [GeoSpark Scala and Java API Usage](http://www.public.asu.edu/~jiay

GeoSpark extends RDDs to form Spatial RDDs (SRDDs) and efficiently partitions SRDD data elements across machines and introduces novel parallelized spatial (geometric operations that follows the Open Geosptial Consortium (OGC) standard) transformations and actions (for SRDD) that provide a more intuitive interface for users to write spatial data analytics programs. Moreover, GeoSpark extends the SRDD layer to execute spatial queries (e.g., Range query, KNN query, and Join query) on large-scale spatial datasets. After geometrical objects are retrieved in the Spatial RDD layer, users can invoke spatial query processing operations provided in the Spatial Query Processing Layer of GeoSpark which runs over the in-memory cluster, decides how spatial object-relational tuples could be stored, indexed, and accessed using SRDDs, and returns the spatial query results required by user.



### PointRDD

(column, column,..., Longitude, Latitude, column, column,...)

### RectangleRDD

(column, column,...,Longitude 1, Longitude 2, Latitude 1, Latitude 2,column, column,...)

Two pairs of longitude and latitude present the vertexes lie on the diagonal of one rectangle.

### PolygonRDD

(column, column,...,Longitude 1, Latitude 1, Longitude 2, Latitude 2, ...)

Each tuple contains unlimited points.
**Supported Spatial RDDs: PointRDD, RectangleRDD, PolygonRDD, LineStringRDD**

## Supported data format
GeoSpark supports Comma-Separated Values ("csv"), Tab-separated values ("tsv"), Well-Known Text ("wkt"), and GeoJSON ("geojson") as the input formats. Users only need to specify input format as Splitter and the start column (if necessary) of spatial info in one tuple as Offset when call Constructors.
GeoSpark supports Comma-Separated Values (**CSV**), Tab-separated values (**TSV**), Well-Known Text (**WKT**), and **GeoJSON** as the input formats. Users only need to specify input format as Splitter and the start and end offset (if necessary) of spatial fields in one row when call Constructors.

## Important features

### Spatial partitioning

GeoSpark supports equal size ("equalgrid"), R-Tree ("rtree") and Voronoi diagram ("voronoi") spatial partitioning methods. Spatial partitioning is to repartition RDD according to objects' spatial locations. Spatial join on spatial paritioned RDD will be very fast.
GeoSpark supports R-Tree (**RTREE**) and Voronoi diagram (**VORONOI**) spatial partitioning methods. Spatial partitioning is to repartition RDD according to objects' spatial locations. Spatial join on spatial paritioned RDD will be very fast.

### Spatial Index

GeoSpark supports two Spatial Indexes, Quad-Tree and R-Tree.
GeoSpark supports two Spatial Indexes, Quad-Tree (**QUADTREE**) and R-Tree (**RTREE**). Quad-Tree doesn't support Spatial K Nearest Neighbors query.

### Geometrical operation

Expand Down Expand Up @@ -168,5 +153,5 @@ We appreciate the help and suggestions from the following GeoSpark users (List i
### Project website
Please visit [GeoSpark project wesbite](http://geospark.datasyslab.org) for latest news and releases.

### DataSys Lab
GeoSpark is one of the projects under [DataSys Lab](http://www.datasyslab.org/) at Arizona State University. The mission of DataSys Lab is designing and developing experimental data management systems (e.g., database systems).
### Data Systems Lab
GeoSpark is one of the projects under [Data Systems Lab](http://www.datasyslab.org/) at Arizona State University. The mission of Data Systems Lab is designing and developing experimental data management systems (e.g., database systems).
109 changes: 93 additions & 16 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -75,21 +75,8 @@
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>


<sourceDirectory>src/main/java</sourceDirectory>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
Expand All @@ -100,6 +87,7 @@
</configuration>
</plugin>


<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
Expand Down Expand Up @@ -141,12 +129,101 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.jacoco</groupId>
<artifactId>jacoco-maven-plugin</artifactId>
<version>0.7.7.201606060606</version>
<executions>
<execution>
<goals>
<goal>prepare-agent</goal>
</goals>
</execution>
<execution>
<id>report</id>
<phase>test</phase>
<goals>
<goal>report</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
<resources>
<resource>
<directory>src/resource</directory>
</resource>
</resources>
</build>
<profiles>
<profile>
<id>release-sign-artifacts</id>
<activation>
<property>
<name>performRelease</name>
<value>true</value>
</property>
</activation>
<build>
<plugins>
<plugin>
<groupId>org.sonatype.plugins</groupId>
<artifactId>nexus-staging-maven-plugin</artifactId>
<version>1.6.7</version>
<extensions>true</extensions>
<configuration>
<serverId>ossrh</serverId>
<nexusUrl>https://oss.sonatype.org/</nexusUrl>
<stagingProfileId>21756750b51471</stagingProfileId>
<autoReleaseAfterClose>true</autoReleaseAfterClose>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-gpg-plugin</artifactId>
<executions>
<execution>
<id>sign-artifacts</id>
<phase>verify</phase>
<goals>
<goal>sign</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>3.0.1</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.10.4</version>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
<configuration>
<additionalparam>-Xdoclint:none</additionalparam>
</configuration>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>


1 change: 1 addition & 0 deletions src/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.DS_Store
1 change: 1 addition & 0 deletions src/main/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.DS_Store
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
/**
* FILE: FileDataSplitter.java
* PATH: org.datasyslab.geospark.enums.FileDataSplitter.java
* Copyright (c) 2016 Arizona State University Data Systems Lab.
* All rights reserved.
* Copyright (c) 2017 Arizona State University Data Systems Lab
* All right reserved.
*/

package org.datasyslab.geospark.enums;

import java.io.Serializable;
Expand Down Expand Up @@ -40,8 +41,14 @@ public static FileDataSplitter getFileDataSplitter(String str) {
return null;
}

/** The splitter. */
private String splitter;

/**
* Instantiates a new file data splitter.
*
* @param splitter the splitter
*/
private FileDataSplitter(String splitter) {
this.splitter = splitter;
}
Expand Down
4 changes: 2 additions & 2 deletions src/main/java/org/datasyslab/geospark/enums/GridType.java
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
/**
* FILE: GridType.java
* PATH: org.datasyslab.geospark.enums.GridType.java
* Copyright (c) 2016 Arizona State University Data Systems Lab.
* All rights reserved.
* Copyright (c) 2017 Arizona State University Data Systems Lab
* All right reserved.
*/
package org.datasyslab.geospark.enums;

Expand Down
4 changes: 2 additions & 2 deletions src/main/java/org/datasyslab/geospark/enums/IndexType.java
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
/**
* FILE: IndexType.java
* PATH: org.datasyslab.geospark.enums.IndexType.java
* Copyright (c) 2016 Arizona State University Data Systems Lab.
* All rights reserved.
* Copyright (c) 2017 Arizona State University Data Systems Lab
* All right reserved.
*/
package org.datasyslab.geospark.enums;

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
/**
* FILE: FormatMapper.java
* PATH: org.datasyslab.geospark.formatMapper.FormatMapper.java
* Copyright (c) 2017 Arizona State University Data Systems Lab
* All right reserved.
*/
package org.datasyslab.geospark.formatMapper;

import java.io.Serializable;

import org.datasyslab.geospark.enums.FileDataSplitter;

// TODO: Auto-generated Javadoc
/**
* The Class FormatMapper.
*/
public abstract class FormatMapper implements Serializable{


/** The start offset. */
Integer startOffset = 0;

/** The end offset. */
Integer endOffset = -1; /* If the initial value is negative, GeoSpark will consider each field as a spatial attribute if the target object is LineString or Polygon. */

/** The splitter. */
FileDataSplitter splitter = FileDataSplitter.CSV;

/** The carry input data. */
boolean carryInputData = false;

/**
* Instantiates a new format mapper.
*
* @param startOffset the start offset
* @param endOffset the end offset
* @param Splitter the splitter
* @param carryInputData the carry input data
*/
public FormatMapper(Integer startOffset, Integer endOffset, FileDataSplitter Splitter, boolean carryInputData) {
this.startOffset = startOffset;
this.endOffset = endOffset;
this.splitter = Splitter;
this.carryInputData = carryInputData;
}

/**
* Instantiates a new format mapper.
*
* @param Splitter the splitter
* @param carryInputData the carry input data
*/
public FormatMapper(FileDataSplitter Splitter, boolean carryInputData) {
this.splitter = Splitter;
this.carryInputData = carryInputData;
}
}
Loading

0 comments on commit 99ae47b

Please sign in to comment.