👥 2 conferences
🎤 4 talks
📅 Years active: 2015 to 2017
No biography available.
2 known conferences
With the advent of the massive deluge in Earth data, serving them to diverse communities is increasingly promising and challenging alike. A useful abstraction for spatio-temporal raster data (and beyond) is the coverage data model, as standardized by ISO, OGC, and INSPIRE. Rather than zillions of individual image files it provides spatio-temporal "datacubes" for simple, efficient handling through the corresponding service model, the Web Coverage Service (WCS) with its Web Coverage Processing Service (WCPS) geo analytics language - "one cube tells more than a million images".
Open-source rasdaman ("raster data manager") is the official reference implementation of both OGC and INSPIRE WCS. It supports easy incremental construction and maintenance of spatio-temporal datacubes, based on the OGC WCS-T standard. Retrieval may use WMS for visual navigation, WCS for data extraction and download, and WCPS for massive server-side processing. On server side, adaptive data partitioning and "tile streaming" processing enables fast query responses. In July 2016, US magazine CIO Review has included rasdaman in its top 100 Big Data technologies list.
In this talk we present coverages in terms of concepts, implementation, and large-scale application. Live demos underpin the talk, using publicly accessible sites where the audience can replay and modify the examples. Being editor of the OGC and ISO coverage standard the presenter can give first-hands insights and answers, such as about the new generalized grid model for coverages (CIS 1.1) which OGC has adopted in Fall 2016 as well as the newly adopted INSPIRE-WCS. This is an excellent opportunity to learn about the state of the art and standards in an open, free-of-cost setup.
Never before it was so easy and inexpensive to gather, as well as generate, massive amounts of data. Often, data get discretized in space and time, natÂurÂally leading to multi-dimensional arrays. In fact, arrays play a core role in most domains of science, engineering, and busÂinÂess - generally speakÂing, spatio-temporal sensor, image, timeÂseries, simÂÂulÂation, and statistics dataÂ. This raises the need for flexible, scalable, and open services in replacement of the bespoke silo solutions that have prevailed in the past.
Traditional databases have been successful due to their flexibility (through query languages) and scalability (through manifold optimizations and parallelization in the server) - however, they unfortunately do not support massive arrays. This is being remedied within ISO currently where SQL/MDA ("Multi-Dimensional Arrays") is in an advanced stage, likely becoming adopted in summer 2017. SQL/MDA adds declarative array definition and operations to SQL. Not only paves this the way for powerful services, maybe even more important it allows, for the first time, integrating data and metadata into the same archive, even in one and the same query. As such, SQL/MDA will be a game changer in data services not only for science and engineering at large.
We present the concepts and rationales, as well as the open-source technology rasdaman ("raster data manager") which is serving as the blueprint for MDA.
We have learnt to live with the pain of separating data and metadata into non-interoperable silos. For metadata, we enjoy the flexibility of databases, be they relational, graph, or some other NoSQL. Contrasting this, users still "drown in files" as an unstructured, low-level archiving paradigm. It is time to bridge this chasm which once was technologically induced, but today can be overcome.
One building block towards a common re-integrated information space is to support massive multi-dimensional spatio-temporal arrays. These "datacubes" appear as sensor, image, simulation, and statistics data in all science and engineering domains, and beyond. For example, 2-D satellilte imagery, 2-D x/y/t image timeseries and x/y/z geophysical voxel data, and 4-D x/y/z/t climate data contribute to today's data deluge in the Earth sciences. Virtual observatories in the Space sciences routinely generate Petabytes of such data. Life sciences deal with microarray data, confocal microscopy, human brain data, which all fall into the same category.
The ISO SQL/MDA (Multi-Dimensional Arrays) candidate standard is extending SQL with modelling and query support for n-D arrays ("datacubes") in a flexible, domain-neutral way. This heralds a new generation of services with new quality parameters, such as flexibility, ease of access, embedding into well-known user tools, and scalability mechanisms that remain completely transparent to users. Technology like the EU rasdaman ("raster data manager") Array Database system can support all of the above examples simultaneously, with one technology. This is practically proven: As of today, rasdaman is in operational use on hundreds of Terabytes of satellite image timeseries datacubes, with transparent query distribution across more than 1,000 nodes.
Therefore, Array Databases offering SQL/MDA constitute a natural common building block for next-generation data infrastructures. Being initiator and editor of the standard we present principles, implementation facets, and application examples as a basis for further discussion. Time allowing we will present live demos from services exceeding 20 TB of "datacubes".
While python has developed into the lingua franca in Data Science there is often a paradigm break when accessing specialized tools. In particular for one of the core data categories in science and engineering, massive multi-dimensional arrays, out-of-memory solutions typically employ their own, different models.
We discuss this situation on the example of the scalable open-source array engine, rasdaman ("raster data manager") which offers access to and processing of Petascale multi-dimensional arrays through an SQL-style array query language, rasql. Such queries are executed in the server on a storage engine utilizing adaptive array partitioning and based on a processing engine implementing a "tile streaming" paradigm to allow processing of arrays massively larger than server RAM. The rasdaman QL has acted as blueprint for forthcoming ISO Array SQL and the Open Geospatial Consortium (OGC) geo analytics language, Web Coverage Processing Service, adopted in 2008. Not surprisingly, rasdaman is OGC and INSPIRE Reference Implementation for their "Big Earth Data" standards suite.
Recently, rasdaman has been augmented with a python interface which allows to transparently interact with the database (credits go to Siddharth Shukla's Master Thesis at Jacobs University). Programmers do not need to know the rasdaman query language, as the operators are silently transformed, through lazy evaluation, into queries. Arrays delivered are likewise automatically transformed into their python representation.
The presenter is Principal Architect of rasdaman, editor of several "Big Data" standards, and co-chair of "Big Data" relevant working groups in several high-impact bodies. In the talk, the rasdaman concept will be illustrated with the help of large-scale real-life examples of operational satellite image and weather data services, and sample python code will be demonstrated live.
In Geo service terminology, coverages represent spatio-temporally varying phenomena, such as sensor, image, simulation, and statistics data; incidentally, these typically are prime Big Data contributors in practice. The OGC unified coverage model encompasses regular and irregular grids, point clouds, and general meshes. As opposed to the (abstract) coverage model of ISO 19123, the (concrete) OGC coverage and service model establishes verifiable interoperability while still grounding on ISO 19123. The OGC Web Coverage Service (WCS) comprises a modular suite for accessing large coverage assets. WCS Core provides simple data subsetting whereas extensions add optional service facets up to ad-hoc filtering and processing.
By separating coverage data and service model, any service - such as WMS, WFS, SOS and WPS - can provide and consume coverages in addition to WCS. Generally, the WCS suite is appreciated by implementers due to its clear structuring and concise conformance testing, down to pixel/voxel level. Many WCS implementations are available today, such as rasdaman which has proven efficient on 130+ TB datacubes.
In our talk, we present the OGC coverage data and service model with an emphasis on practical aspects. Presentation will make use of available services allowing participants to recapitulate many of the facets addressed.