-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Challenge 12 -Compression of Geospatial Data with Varying Information Density #3
Comments
Note that we recently added some compression features to netCDF, including support of lossy compression and support for the faster zstandard compression library. These may be helpful to those working on this challenge. For more details see: https://www.researchgate.net/publication/365006139_NetCDF_Compression_Improvements |
Amazing, thanks Ed. Great summary! |
Hello there! I came across this project and it immediately caught my attention. The idea seems very interesting and I would love to learn more about it. I am writing to express my keen interest in this project. Started out a draft for the proposal and during my research, I found out that this project is listed as a Google Summer of Code (GSoC) project . Please let me know if there are any updates regarding the project considering that GSoC deadline is April 4th. |
Hi Ayoub! Thanks for your interest!! Yes, indeed, we also got this project into the Google Summer of Code, meaning that it is possible to get funding through either track. Note the different deadlines though. We therefore expect two participants (one from code for earth, one from summer of code) to work on xbitinfo simultaneously. Depending on the proposals we will then define the individual projects in discussion with the participants so that they are somewhat independent of another. For us mentors there's no difference once you get accepted through summer of code or code for earth, but the programmes are distinct and there's only funding to accept one from each. So yes, please write down your ideas and interests into a proposal and apply! You can also pick up ideas from the project ideas we wrote down for GSoC. In the end, we would like to see that you understood the challenge and have ideas how to solve it and a motivation to work on this during the summer. |
Thank you so much Milan for your response and for clarifying the details about the project and the funding options available. |
Challenge 12 - Compression of Geospatial Data with Varying Information Density
Goal
Development of an information-density adapting compression
Mentors and skills
Challenge description
Geospatial data can vary in its information density from one part of the world to another. A dataset containing streets will be very dense in cities but contains little information in remote places like the Alps or even the ocean. The same is also true for datasets about the ocean or the atmosphere. The variability of sea surface temperatures and currents is much larger in the vicinity of the golf stream than in the middle of the Atlantic basin. This variability might also change in time. A hurricane, for example, has a lot of variability in winds, temperature and rain rates, and travels in addition across entire ocean basins.
The challenge of this project is to improve xbitinfo to preserve the natural variability of these features but not to save random noise where the real information density is rather low. This means in particular that the number of bits needed to preserve in compression changes with location. A hurricane has a different information density than a same-sized area in the steadily blowing trade-wind regimes. Compressibility of climate data therefore can change drastically in time and space, which we want to exploit.
Currently in the bitinformation framework, to preserve all real information, the maximum information content calculated by xbitinfo needs to be used for the entire dataset. However, bitinformation can also be calculated on subsets, such that the ‘boring’ parts can therefore be more efficiently compressed.
Xbitinfo is an open-source Python package that enables lossy compression of geo-spatial data based on its information content. Embedded into the pangeo ecosystem, xbitinfo builds on top of xarray and dask and allows for fast compression and analysis of various data formats including netCDF and zarr. Xbitinfo addresses the challenge of increasingly large datasets split into chunks that are currently created due to increasingly available compute power. Climate simulations with resolutions of sub-km scale with petabytes of output are just one example where xbitinfo can help to keep the dataset manageable.
The successful applicant will refine the implementation of xbitinfo to data subsections (chunks) and improve our ability to compress spatially and temporal varying fields. Furthermore, the applicant will learn about information theory and software engineering with international mentors.
References:
The text was updated successfully, but these errors were encountered: