Several open source tools are enabling the shift to cloud-native geospatial Machine Learning workflows. Stream data from STAC APIs, generate Machine Learning ready chips on-the-fly and train models for different downstream tasks! Find out about advances in the Pangeo ML community towards scalable GPU-native workflows.
An overview of open source Python packages in the Pangeo (big data geoscience) Machine Learning community will be presented. On read/write, [kvikIO](https://github.com/rapidsai/kvikio) allows low-latency data transfers from Zarr archives via NVIDIA GPU Direct Storage. With tensors loaded in xarray data structures, [xbatcher](https://github.com/xarray-contrib/xbatcher) enables efficient slicing of arrays in an iterative fashion. To connect the pieces, [zen3geo](https://github.com/weiji14/zen3geo) acts as the glue between geospatial libraries - from reading [STAC](https://stacspec.org) items and rasterizing vector geometries to stacking multi-resolution datasets for custom data pipelines. Learn more as the Pangeo community develops tutorials at [Project Pythia](https://cookbooks.projectpythia.org), and join in to hear about the challenges and ideas on scaling machine learning in the geosciences with the [Pangeo ML Working Group](https://pangeo.io/meeting-notes.html#working-group-meetings).