Validation for OSM and ENTSOE data using resource files for Austria and North Macedonia #67

GbotemiB · 2023-10-25T12:27:09Z

This PR contains a validation notebook that explains the discrepancy found here.

Objective

The Objective of this notebook is to validate OSM data against ENTSOE data using resource files for Austria and North Macedonia.

Results

It was confirmed that the reason for the discrepancy was because the elec.nc file for OSM that was compared against that of PyPSA-Eur contains voltages that are lower than 220kV that was rebased to 220kV. It is advised that comparison should be done using base_network_csv files since there is no rebase of voltage in the csv files.

PS: There is an ongoing conversation about this notebook here

tutorial documentation

ekatef · 2023-10-27T12:44:34Z

Hello @GbotemiB!
Great improvements! It looks like we are converging :)

Adding a couple of comments aimed to make the code a bit easier to read.

I wonder if if_country_in_entsoe( ) checking function may be a bit of overshoot. We know in advance that the requested countries is in ENTSOE data. Agree that it has been a valuable finding with missed countries in ENSTOE data, but this notebook has quite a narrow purpose.
I have an impression that geospatial_plot( ) can be made probably a bit clearer if

we would remove # Filter for country code block and avoid using the global variables (at_country_shape etc.) inside the function. I see two possible solutions:
country code would be transferred as an argument into geospatial_plot( ) and loading data would be done inside geospatial_plot( ) itself.
we would transfer as arguments a country shape and both osm dataframes

What is your feeling about that?

Has it been a specific reason to start a name of _apply_simplication( ) with an underscore? Do you mean that the function is intended for internal use?

Regarding the plots, I have the following comments.

The plots look amazing. A really great work!
In Geospatial Visualization, it looks like raw and clean geometries are mostly completely overlaid by base data, right? It that is the case, I think we may state it it the introduction to the section. Something like: "In this section we compare topology of raw, clean and base datasets for transmission lines. Script clean_osm_data removes lines outside of the considered region, so only "raw" lines present outside each of the considered countries Inside a country, all three PyPSA-Earth datasets are mostly fully overlapped." Feel free please to improve this draft.
I think, it would be helpful for a reader to add a couple of lines to explain a purpose of Simplified vs Non Simplified and some kind of a conclusion. For example: "ENTSOE-extracted data have a simplified topology, while OSM-extracted lines reproduce topology of a transmission network in quite a detail. To estimate an effect of this simplification, we have applied a simplification procedure to OSM line geometries, as well. Douglas-Peucker simplification algorithm has been applied for that."

GbotemiB · 2023-10-28T18:09:21Z

I wonder if if_country_in_entsoe( ) checking function may be a bit of overshoot. We know in advance that the requested countries is in ENTSOE data. Agree that it has been a valuable finding with missed countries in ENSTOE data, but this notebook has quite a narrow purpose.

This function descibed here is to make it easy to filter through the entsoe for just AT and MK using the geometry cordinates.

Has it been a specific reason to start a name of _apply_simplication( ) with an underscore? Do you mean that the function is intended for internal use?

Actually, no reason in particular. i will rename the variable name 😀.

I have an impression that geospatial_plot( ) can be made probably a bit clearer if

we would remove # Filter for country code block and avoid using the global variables (at_country_shape etc.) inside the function. I see two possible solutions:
country code would be transferred as an argument into geospatial_plot( ) and loading data would be done inside geospatial_plot( ) itself.
we would transfer as arguments a country shape and both osm dataframes

Sounds great. I shouldnt be having global variables in my functions.

ekatef · 2023-11-03T20:10:08Z

@GbotemiB amazing work and very nice presentation. I think, we need a final clean-up round, and that task will be completed.

Find please the comments bellow.

My feeling is that a block with if_country_in_entsoe() can be simplified:

not sure the function is really needed: it consists of only a single line, while entsoe_ref["if_at"] = if_country_in_entsoe(at_country_shape, entsoe_ref) is not nessesarily more clear as compared with base_data["geometry"].apply(lambda row: row.within(country_df["geometry"][0]))
the function name is misleading, as well as the comment to applying it:

 # Determine if Austria (AT) is in the ENTSO-E data.
entsoe_ref["if_at"] = if_country_in_entsoe(at_country_shape, entsoe_ref)

As you have explained, that is filtering rather than actually definition if the country present in the data

the filtering itself can be probably simplified: could you please check if geo_df.loc[geo_df.within(country_df["geometry"][0])] works?

I'd avoid duplicating comments, like currently is the case in this part:

# Clean the Austria (AT) OSM raw data, addressing non-uniform column names and NaN values.
at_osm_raw_lines = preprocess_raw_data(at_osm_raw_lines)

# Clean the North Macedonia (MK) OSM raw data, addressing non-uniform column names and NaN values.
mk_osm_raw_lines = preprocess_raw_data(mk_osm_raw_lines)

My feeling is that it would be more clear to keep both code lines together and have only one comment for both of the lines

Note please that the comments in the code should explain why or what for is the following part needed, not what is it doing. On the contrary, names of the functions or methods should indicate what actually they are doing.

In particular, I think that this comment should be revised: # Applying simplification function to preprocess the dataframe with __apply_simplification func

A suggestions on the naming improvement (that is a really hard part!):

it would be great to rename converted_length into something like reprojected_length or length_3035;
it would be greate to make create_plot name more specific.

A section Comparison with Barchart misses explanations.
Would be great to specify also a unit for tolerance in the dedicated comment # Define tolerance level for simplification: is it in meters?
Aren't the bar diagrams duplicated in Comparison with Barchart and Section 2 Applying Simplification?
Great that you have added explanations to the Conclusion. To make our statement more credible, I'd probably add a bit more details: instead to declare that OSM data are more accurate, we may explain that OSM data are more detailed and more accurately represent the actual grid topology.

GbotemiB · 2023-11-06T13:30:06Z

Aren't the bar diagrams duplicated in Comparison with Barchart and Section 2 Applying Simplification?

The plots are not the same, they look similar, which makes sense because of the effect of simiplication.
if you take a look at the Simplified vs Non-Similified Section, you will find that the difference is not the large.

This is the difference in kilometers

GbotemiB · 2023-11-07T09:35:27Z

@ekatef Thank you for the wonderful comments and review.
I have pushed my recent commits effecting the changes you suggested. I will be glad to have you review the notebook with the recent commits.

GbotemiB · 2023-11-29T14:23:42Z

Hi @ekatef, I have refined the conclusion in the notebook. We can start iterating over the notebook from here making it understandable for a larger audience. You can check my recent commits.

ekatef · 2023-12-04T13:41:16Z

Hey @GbotemiB and thanks for the nice conclusion draft!

A couple of comments.

Some typo fixing may not harm like ensure that ENTSO is capitalised everywhere.
The statement in the beginning may feel like wishful thinking: "OSM data provides a more detailed and accurate representation". That may be totally correct in its' essence, but a reader should have an opportunity to judge him/herself, using the evidence we provide. The aim of the conclusion is to summarise the evidence provided in the notebook.
Generally, it's better to write the test in a way when each next sentence is connected with the previous one, and the whole text presents some story. That facilitates reading significantly and helps to communicate the intended message.

I'd suggest you to write the next iteration of the conclusion, giving it some structure. Making some preliminary plan may be probably helpful for that. Basing on the notebook content and the draft you have created, I'd suggest the following plan.

Remind briefly which OSM-extracted data have been considered (basically, that has been the whole data preparation workflow).
Explain which types of comparison have been done: topology and quantitative comparison for length values -> effects have been found which looked like possible using of simplification methods on ENTSO data
Check a hypothesis on simplification applying geopandas simplify method (which utilises Douglas-Peucker algorithm) with a quite satisfactory results: simplified OSM topologies get in fact closer to ENTSO data.
Would be great to provide quantitative estimation to lengths comparison: no need to include a lot of numbers into the conclusion, but a reference to an order of magnitude/one or two nicely looking estimations.

Then the main conclusion point would look well-reasoned ;)

ekatef · 2023-12-04T14:32:59Z

Btw, with a fresh look it feels like clarity of the notebook may be slightly improved:

under # Create plot using the simplified dataframe in [26] cell, that is not clear if the plot shows values for the original or the simplified geometry; in both cases there may be duplication there: do we really need it?;
in cells [28] and [29], not sure if it's evident which OSM-deriven values have been taken for comparison. I assume there have been simplified ones, but that does not follow directly from the table itself. Would be great to double-check it and add some explanations to the table.
does it probably make sense to add such a percentage difference for non-simplified values, as well?
as a side note: please keep in mind that it's better show only meaningful digits when presenting numbers: in the tables, in particular. Set-up of the proper number formatting can be helpful for that.

GbotemiB · 2023-12-05T23:16:00Z

Hi @ekatef, thanks for the review. Based on the recent comments, I have revamped the conclusion based on the suggestions.

Here is the new conclusion.

This analysis aimed to investigate OSM data against ENTSOE data using Austria and North Macedonia. The following process was carried out during the analysis:

A geospatial comparison plot for 220kV and 380kV lines shows the lines topology.
A bar-chart comparison plot comparing the length of the lines in OSM and ENTSOE data.
A hypothesis was tested on the lines by applying the geopandas simplify method, leveraging the Douglas-Peucker algorithm.
A geospatial comparison plot and a bar-chart comparison plot were done based on this hypothesis. This effect of the hypothesis simplified the topologies for OSM data which made the results from OSM data closer to that of ENTSOE data by an estimated percentage difference of 5%
A CRS check was done to verify the length of lines of OSM data by comparing with lines measured on google maps manually.

The analysis showed an estimated percentage difference of 11% between osm-data and ENTSOE-data. After applying the Douglas-peucker algorithm, an estimated percentage difference of about 5% was observed in the analysis.

ekatef · 2023-12-19T12:35:15Z

Hello @GbotemiB! Thank you for working on that. My feeling is that we are converging.

Comments on the overall structure of the notebook:

Would be nice to clarify for the percentage_differences between what and what they are taken. I assume, that is always a difference between the length in question x and the ENTSOE length x_ensto: percentage_difference = (x - x_enstso)/x_entso. But I may be wrong.
Would be perfect to find some way to decrease a number of displayed digits. Hopefully, it may be fixed by specifying format options for the notebook.

Regarding the Conclusion: now it reads much smoother! A couple of comments:

Not sure it feels perfect to have introductory words (In summary and In conclusion) in the conclusion. I feel that they may do the text "heavier". But that is very subjective 🙂
I think it could help to add a bit of context into the conclusion to remind that we are dealing with PyPSA-Earth data extraction workflow. It would be enough to add a few of words to explain what are these OSM-extracted data about in the beginning and which csv-s do we mean in the end.
I do not understand the third paragraph ("While a detailed quantitative ..."): can't agree that the quantitative estimation is not well presented in the notebook, and not sure which impact is meant. Although, the writing style is very nice.

I think we are very close to finalise. Great work!

GbotemiB · 2023-12-19T13:56:54Z

Thanks for the review.
Based on the comments.

Looks like the formula you are referring to percentage change. I think we have to come to a conclusion on whether to use percentage change or percentage difference.
Thanks for the tip. I found an option with pandas.

GbotemiB · 2023-12-20T15:54:30Z

Hi @ekatef, I have revised the notebook based on your comments. Ready for another revision 🙃

ekatef · 2024-01-05T22:13:09Z

Hello @GbotemiB! Perfect!! :D

Great that you have found a neat way to fix numeric precision issue. And I very much like the result of you work.

The only comment left is an answer to your question regarding the percentage change: I think it should be rather a relative change as compared to ENTSO reference data, as we assume them to be standard de facto. Would be also perfect to add a couple of the relative changes values to the conclusion, like: closely align with ENTSO data: being about x1..x2% for Austria and y1..y2% for Northern Macedonia (where x1, x2 correspond to the network which has been made as close as possible to ENSTOE data).

After that, I think the notebook is basically ready to merge.

@pz-max do probably you have any comments or recommendations? 🙂

GbotemiB · 2024-01-05T23:07:58Z

Thank you @ekatef for the review, I will make the final changes as you have suggested.

ekatef · 2024-01-10T14:40:40Z

Thank you @ekatef for the review, I will make the final changes as you have suggested.

Great work! I think, the only point left is the very final polishing of the conclusion, and the PR is ready to merge.

Thanks for the amazing contribution! 😄

ekatef · 2024-01-11T09:37:25Z

Merged 🎉 🎉 🎉
Thanks for the outstanding contribution, @GbotemiB! 🥇
Fantastic work which will be definitely very helpful in future.

GbotemiB · 2024-01-11T09:51:42Z

Thank you for the amazing support.

GbotemiB added 15 commits September 19, 2023 03:35

network analysis for osm and pypsa-eur data

74c3dfa

corrected parameters

7e8a48f

filename changes

19d235e

jupyter notebook formatting

9125f57

removed extra line and comments

ef7d1d6

corrected units

5d15b13

variable renamed

b0867a6

comparison for at and mk against entsoe

da21c2f

Merge pull request #1 from pypsa-meets-earth/main

37a4193

tutorial documentation

changes made to the dataframe

fab0baa

simplification added

0bec0b2

validation notebooks

c6ccd9e

documentation

631d5c3

documentation

c0d98ad

removed comparison notebook

a076219

pz-max mentioned this pull request Oct 25, 2023

Consistency between power line datasets pypsa-meets-earth/earth-osm#44

Closed

GbotemiB added 2 commits October 26, 2023 15:07

added base network files and refactor code

424679c

added gmap validation

0254417

GbotemiB added 7 commits October 28, 2023 20:09

refactor some functions

6c653b6

update on crs check

c6be705

fixed crs to ESRI:54009

6e4ca39

fixed crs to ESRI:3035

a335fbc

Merge branch 'pypsa-meets-earth:main' into osm_entsoe_validation

45c1e5c

redefined crs section

dc1d497

removed unwanted comments

13ea742

redefined checking for country data in entsoe

33758b5

add more comments

740af25

GbotemiB added 3 commits November 28, 2023 14:05

modified notebook

02b6859

modified conclusion

6b35b13

Merge branch 'pypsa-meets-earth:main' into osm_entsoe_validation

4503038

ekatef mentioned this pull request Dec 4, 2023

Investigation of different voltage levels on the Transmission Capacity using IT and DE resource files #68

Open

GbotemiB added 2 commits December 11, 2023 11:45

revised technical details and conclusion

5d15fde

Merge branch 'pypsa-meets-earth:main' into osm_entsoe_validation

f20ee36

GbotemiB added 2 commits December 20, 2023 12:02

fixed floating point

6aa441b

revised conclusion

d7a95e9

removed unused comments

41d8d27

fixed percentage change

b07385d

GbotemiB and others added 2 commits January 10, 2024 16:11

revised conclusion

f579faf

Minor fixes

3226f69

ekatef force-pushed the osm_entsoe_validation branch from cdd8796 to 3226f69 Compare January 11, 2024 09:28

ekatef merged commit 0dc4a39 into pypsa-meets-earth:main Jan 11, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation for OSM and ENTSOE data using resource files for Austria and North Macedonia #67

Validation for OSM and ENTSOE data using resource files for Austria and North Macedonia #67

GbotemiB commented Oct 25, 2023

ekatef commented Oct 27, 2023

GbotemiB commented Oct 28, 2023 •

edited

Loading

ekatef commented Nov 3, 2023

GbotemiB commented Nov 6, 2023

GbotemiB commented Nov 7, 2023

GbotemiB commented Nov 29, 2023

ekatef commented Dec 4, 2023

ekatef commented Dec 4, 2023

GbotemiB commented Dec 5, 2023

ekatef commented Dec 19, 2023

GbotemiB commented Dec 19, 2023 •

edited

Loading

GbotemiB commented Dec 20, 2023 •

edited

Loading

ekatef commented Jan 5, 2024

GbotemiB commented Jan 5, 2024

ekatef commented Jan 10, 2024 •

edited

Loading

ekatef commented Jan 11, 2024

GbotemiB commented Jan 11, 2024

Validation for OSM and ENTSOE data using resource files for Austria and North Macedonia #67

Validation for OSM and ENTSOE data using resource files for Austria and North Macedonia #67

Conversation

GbotemiB commented Oct 25, 2023

Objective

Results

ekatef commented Oct 27, 2023

GbotemiB commented Oct 28, 2023 • edited Loading

ekatef commented Nov 3, 2023

GbotemiB commented Nov 6, 2023

GbotemiB commented Nov 7, 2023

GbotemiB commented Nov 29, 2023

ekatef commented Dec 4, 2023

ekatef commented Dec 4, 2023

GbotemiB commented Dec 5, 2023

ekatef commented Dec 19, 2023

GbotemiB commented Dec 19, 2023 • edited Loading

GbotemiB commented Dec 20, 2023 • edited Loading

ekatef commented Jan 5, 2024

GbotemiB commented Jan 5, 2024

ekatef commented Jan 10, 2024 • edited Loading

ekatef commented Jan 11, 2024

GbotemiB commented Jan 11, 2024

GbotemiB commented Oct 28, 2023 •

edited

Loading

GbotemiB commented Dec 19, 2023 •

edited

Loading

GbotemiB commented Dec 20, 2023 •

edited

Loading

ekatef commented Jan 10, 2024 •

edited

Loading