You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a newer user to ydata-profiling, I am leveraging the package with various types of data. One data type that I am using is geospatial data, I am currently using separate columns for Latitude and Longitude as float values. However the output of a bar chart and correlation may be useful in some cases, it may be more useful to provide the output in a form of a map so the coverage of different areas can data can be seen by the user. Unless I missed something reading through the documentation, there isn't any current functionality for this.
I noticed in the contributing guidelines this was highlighted as a potential EDA: extending data type support (GPS coordinates).
Proposed feature
Plotting of points on a map within the variable exploration so you could consider the coverage areas of the data.
I am happy to help contribute on a topic like this, however there are a few different paths this could appear in so I wanted to know if there was a preference from the ydata team.
Data size
Based on the data size it may take to much time or compute to plot points using some packages. This could be averted by a few ways, one of my initial thoughts is to leverage data shader to make the plots.
Type of data
I think it might be easiest to handle a column as a type of shapely point. I think figuring out how to coordinate two separate columns will not be optimal with the current structure of ydata-profiling.
This could also be handled as a list of tuples to reduce dependencies
Alternatives considered
Ydata-profiling could leverage plotly express for graphing, however this package may struggle with plotting larger datasets.
Ydata-profiling could leverage Cartopy using matplotlib for graphing, I am unsure of the limitations on data size.
Additional context
No response
The text was updated successfully, but these errors were encountered:
thank for you feature request! And enthusiasm - contributions are always welcomed!
I would be happy to have you contribution for the feature, nevertheless there are a few points that need further definition.
Latitude and Longitude can be provided in several formats by users, and we want to keep it as easy as possible. For that reason there is an interface decision to be made;
The report structure design and flow for datasets with geo-location data;
The library to be used for the plot - data shader depends on DASK, and for that reason we prefer to avoid it. We will assess Cartopy in more depth to validate the potential.
Let me get back to you with some more detailed requirements for this feature so we can iterate together!
An alternative could be to extend ydata-profiling to support GeoPandas. GeoPandas uses Shapely internally and can handle Points, Multi-Points, Lines, Multi-Lines, Polygons, Multi-Polygons.
Profiling one dataset could give information about the type of geometry, coordinate system, etc… And eventually plot the shapes on a map.
Comparing two geographic datasets may be more challenging to represent the differences between the geometries.
Missing functionality
As a newer user to ydata-profiling, I am leveraging the package with various types of data. One data type that I am using is geospatial data, I am currently using separate columns for Latitude and Longitude as float values. However the output of a bar chart and correlation may be useful in some cases, it may be more useful to provide the output in a form of a map so the coverage of different areas can data can be seen by the user. Unless I missed something reading through the documentation, there isn't any current functionality for this.
I noticed in the contributing guidelines this was highlighted as a potential EDA: extending data type support (GPS coordinates).
Proposed feature
Plotting of points on a map within the variable exploration so you could consider the coverage areas of the data.
I am happy to help contribute on a topic like this, however there are a few different paths this could appear in so I wanted to know if there was a preference from the ydata team.
Data size
Type of data
Alternatives considered
Additional context
No response
The text was updated successfully, but these errors were encountered: