A practical account of cloud-optimized data formats such as Zarr and Parquet, the challenges we at the Australian Ocean Data Network have faced in their adoption, and questions that remain over the use of these cloud native formats.
The Australian Ocean Data Network (AODN) is an interoperable online network of marine observation data and services. The AODN contains data observed by the Integrated Marine Observing System (IMOS) program and partner organisations from across Australia. All data are shared under an open license and published using open source geospatial software through a platform that has been running on Amazon Web Services for eight years.
The wide variety of types of marine data that we host means that we cannot rely on a single strategy or tool to store and provide data. Additionally, there has been recent work around the globe to establish new cloud native ways of storing, accessing and analysing large volumes of data. The AODN has recently undertaken reviews into cloud-native data formats such as GeoParquet, Zarr, TileDB and Kerchunk. In one case we found we could complete a workload 1900 times faster using cloud-native formats than with our legacy solution.
This talk will cover the modern cloud-native formats we have been exploring, our assessment of them and the challenges we have faced, as well as highlighting some questions that remain over the use of these cloud native formats. Weβre planning the next generation of the AODN so that we can continue to meet the current needs of our user community and position ourselves for the future.