Skip to content

Commit

Permalink
Make geoparquet the new standard for outputs (#217)
Browse files Browse the repository at this point in the history
* make geoparquet the default spatial file format

* fix doc strings

* fix use of old form attributes

* fix docstring

* remove unnecessary test

* fix test

* update CHANGELOG.md

* apply PR comments:
- more save_geodataframe to io module
- rename geojson to spatial
- remove dedicated methods and use file type parameter to request the desired filetype

* update notebooks

* fix a friday afternoon brain fart involving saving shapefiles

* lint

* lint tests

* add extra test for type type check

* update pyarrow version requirement

* Apply suggestions from code review

Co-authored-by: Bryn Pickering <[email protected]>

* apply PR comments:
- add extra test for saving empty geodataframes
- remove remaining `Optional[] != None`
- lint

---------

Co-authored-by: Bryn Pickering <[email protected]>
  • Loading branch information
KasiaKoz and brynpickering authored May 28, 2024
1 parent 6934146 commit 76af776
Show file tree
Hide file tree
Showing 19 changed files with 405 additions and 335 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Fixed

* Fixed generating standard outputs by highway tag which were broken after moving to storing additional attributes in short form [#217](https://github.com/arup-group/genet/pull/217)
* Fixed summary report:
* Intermodal Access/Egress reporting is more general (not expecting just car and bike mode access to PT) [#204](https://github.com/arup-group/genet/pull/204)
* Node/Links numbers were reported incorrectly (switched) [#207](https://github.com/arup-group/genet/pull/207)
Expand All @@ -32,6 +33,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

* GeNet's standard outputs now produce geoparquet format by default [#217](https://github.com/arup-group/genet/pull/217). The output file size is reduced significantly (e.g. network links output was reduced by ~80% on a test network). Networks/Schedules can still be saved to geojson and shape files as before.
* GeNet's pre-baked python scripts have been retired in favour of CLI [#194](https://github.com/arup-group/genet/pull/194)
* Support for python v3.11 [#192](https://github.com/arup-group/genet/pull/192) and v3.12 [#234](https://github.com/arup-group/genet/pull/234)
* **[Breaking change]** Updated to more accurate pyproj version [#192](https://github.com/arup-group/genet/pull/192)
Expand Down
6 changes: 3 additions & 3 deletions examples/3_4_writing_data_json_geojson.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also choose to write the network to GeoJSON. This will produce spatial representation of the Network graph and the Schedule graph. \n",
"You can also choose to write the network to GeoJSON, GeoParquet or as a shapefile. This will produce spatial representation of the Network graph and the Schedule graph. \n",
"\n",
"- The main diference for the Network graph outputs is that the link geometry is a `LINESTRING`, whereas in the JSON outputs, this geometry is an encoded polyline.\n",
"- The main difference for the Network graph outputs is that the link geometry is a `LINESTRING`, whereas in the JSON outputs, this geometry is an encoded polyline.\n",
"- The biggest difference is for the Schedule graph. With JSON output you get the entire Schedule data saved to file. With GeoJSON you get only the spatial representation of the graph, nodes and edges, where nodes are the Stops in the Schedule and edges are the connections between Stops as defined by the Route and Service objects which use those Stops. It does not include any information about the vehicles, their IDs or modes, vehicle definitions or network routes (the edges are straight lines between the Stops)"
]
},
Expand All @@ -123,7 +123,7 @@
}
],
"source": [
"n.write_to_geojson(\"example_data/outputs/geojson\")"
"n.write_spatial(\"example_data/outputs/geojson\", filetype=\"geojson\")"
]
}
],
Expand Down
7 changes: 4 additions & 3 deletions examples/6_2_validating_network_google_directions_api.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@
"import genet\n",
"import matplotlib.pyplot as plt\n",
"from genet import google_directions, read_matsim\n",
"from genet.output.geojson import generate_geodataframes\n",
"from genet.output.spatial import generate_geodataframes\n",
"from genet.utils.io import save_geodataframe\n",
"from shapely.geometry import LineString"
]
},
Expand Down Expand Up @@ -980,8 +981,8 @@
],
"source": [
"logging.info(\"saving network links with valid google speed values to geojson\")\n",
"genet.output.geojson.save_geodataframe(\n",
" with_gs, \"api_requests_viz\", \"example_data/outputs/google_speed_data/\"\n",
"save_geodataframe(\n",
" with_gs, \"api_requests_viz\", \"example_data/outputs/google_speed_data/\", filetype=\"geojson\"\n",
")"
]
},
Expand Down
2 changes: 2 additions & 0 deletions requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ osmnx < 2
pandas >= 1.5, < 2.2
polyline >= 2, < 3
pre-commit < 4
pyarrow >= 12, < 15
pyogrio < 0.8
pyomo >= 6, < 7
pyproj >= 3, < 4
pyyaml >= 6, < 7
Expand Down
8 changes: 2 additions & 6 deletions src/genet/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,8 @@
import genet.utils.spatial as spatial
from genet import google_directions, read_gtfs, read_matsim, read_matsim_schedule, read_osm
from genet.core import Network
from genet.output.geojson import (
generate_headway_geojson,
generate_speed_geojson,
modal_subset,
save_geodataframe,
)
from genet.output.spatial import generate_headway_geojson, generate_speed_geojson, modal_subset
from genet.utils.io import save_geodataframe
from genet.utils.persistence import ensure_dir
from genet.variables import EPSG4326

Expand Down
48 changes: 30 additions & 18 deletions src/genet/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,14 @@
import genet.modify.change_log as change_log
import genet.modify.graph as modify_graph
import genet.modify.schedule as modify_schedule
import genet.output.geojson as geojson
import genet.output.matsim_xml_writer as matsim_xml_writer
import genet.output.sanitiser as sanitiser
import genet.output.spatial as spatial_output
import genet.schedule_elements as schedule_elements
import genet.utils.dict_support as dict_support
import genet.utils.elevation as elevation
import genet.utils.graph_operations as graph_operations
import genet.utils.io as gnio
import genet.utils.pandas_helpers as pd_helpers
import genet.utils.parallel as parallel
import genet.utils.persistence as persistence
Expand Down Expand Up @@ -2917,7 +2918,7 @@ def check_connectivity_for_mode(self, mode):
return con_desc

def generate_standard_outputs(
self, output_dir: str, gtfs_day: str = "19700101", include_shp_files: bool = False
self, output_dir: str, gtfs_day: str = "19700101", filetype: str = "parquet"
):
"""Generates geojsons that can be used for generating standard kepler visualisations.
Expand All @@ -2928,9 +2929,13 @@ def generate_standard_outputs(
gtfs_day (str, optional):
Day in format YYYYMMDD for the network's schedule for consistency in visualisations,
Defaults to "19700101" (1970-01-01).
include_shp_files (bool, optional): If True, also store shapefiles. Defaults to False.
filetype (str, optional):
The file type to save the GeoDataFrame to: geojson, geoparquet or shp are supported.
Defaults to parquet format.
"""
geojson.generate_standard_outputs(self, output_dir, gtfs_day, include_shp_files)
spatial_output.generate_standard_outputs(
self, output_dir, gtfs_day=gtfs_day, filetype=filetype
)
logging.info("Finished generating standard outputs. Zipping folder.")
persistence.zip_folder(output_dir)

Expand Down Expand Up @@ -3010,30 +3015,37 @@ def write_to_json(self, output_dir: str):
self.schedule.write_to_json(output_dir)
self.write_extras(output_dir)

def write_to_geojson(self, output_dir: str, epsg: Optional[str] = None):
"""Writes Network graph and Schedule (if applicable) to nodes and links geojson files.
def write_spatial(self, output_dir, epsg: Optional[str] = None, filetype: str = "parquet"):
"""Transforms Network and Schedule (if applicable) to geopandas.GeoDataFrame of nodes and links and saves to
the requested file format.
Args:
output_dir (str): Output directory.
output_dir (str):
Path to folder where to save the file.
epsg (Optional[str], optional):
Projection if the geometry is to be reprojected. Defaults to None (no reprojection).
filetype (str, optional):
The file type to save the GeoDataFrame to: geojson, geoparquet or shp are supported.
Defaults to parquet format.
"""
# do a quick check the file type is supported before generating all the files
gnio.check_file_type_is_supported(filetype)

persistence.ensure_dir(output_dir)
_network = self.to_geodataframe()
if epsg is not None:
_network["nodes"] = _network["nodes"].to_crs(epsg)
_network["links"] = _network["links"].to_crs(epsg)
logging.info(f"Saving Network to GeoJSON in {output_dir}")
geojson.save_geodataframe(_network["nodes"], "network_nodes", output_dir)
geojson.save_geodataframe(_network["links"], "network_links", output_dir)
geojson.save_geodataframe(
_network["nodes"]["geometry"], "network_nodes_geometry_only", output_dir
)
geojson.save_geodataframe(
_network["links"]["geometry"], "network_links_geometry_only", output_dir
)
logging.info(f"Saving Network in {output_dir}")
for gdf, filename in (
(_network["nodes"], "network_nodes"),
(_network["links"], "network_links"),
(_network["nodes"]["geometry"], "network_nodes_geometry_only"),
(_network["links"]["geometry"], "network_links_geometry_only"),
):
gnio.save_geodataframe(gdf, filename, output_dir, filetype=filetype)
if self.schedule:
self.schedule.write_to_geojson(output_dir, epsg)
self.schedule.write_spatial(output_dir, epsg=epsg, filetype=filetype)
self.write_extras(output_dir)

def to_geodataframe(self) -> dict:
Expand All @@ -3042,7 +3054,7 @@ def to_geodataframe(self) -> dict:
Returns:
dict: dict with keys 'nodes' and 'links', values are the GeoDataFrames corresponding to nodes and links.
"""
return geojson.generate_geodataframes(self.graph)
return spatial_output.generate_geodataframes(self.graph)

def to_encoded_geometry_dataframe(self):
_network = self.to_geodataframe()
Expand Down
4 changes: 2 additions & 2 deletions src/genet/max_stable_set.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import pandas as pd
import pyomo.environ as pe

import genet.output.geojson as gngeojson
import genet.output.spatial as spatial_output
import genet.utils.dict_support as dict_support
import genet.utils.graph_operations as graph_operations
from genet.exceptions import InvalidMaxStableSetProblem
Expand All @@ -34,7 +34,7 @@ def __init__(self, pt_graph, network_spatial_tree, modes, distance_threshold=30,
self.step_size = step_size
self.network_spatial_tree = network_spatial_tree
self.pt_graph = pt_graph
_gdf = gngeojson.generate_geodataframes(pt_graph)
_gdf = spatial_output.generate_geodataframes(pt_graph)
self.stops, self.pt_edges = _gdf["nodes"].to_crs("epsg:4326"), _gdf["links"].to_crs(
"epsg:4326"
)
Expand Down
2 changes: 1 addition & 1 deletion src/genet/output/sanitiser.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ def sanitise_geodataframe(gdf):
not_missing_mask = gdf[col].notna()
if gdf[col].apply(lambda x: isinstance(x, (set, list))).any():
gdf.loc[not_missing_mask, col] = gdf.loc[not_missing_mask, col].apply(
lambda x: ",".join(x)
lambda x: ",".join(x) if isinstance(x, (list, set)) else x
)
elif gdf[col].apply(lambda x: isinstance(x, dict)).any():
gdf.loc[not_missing_mask, col] = gdf.loc[not_missing_mask, col].apply(lambda x: str(x))
Expand Down
Loading

0 comments on commit 76af776

Please sign in to comment.