Sentinel Hub - federated on-demand ARD generation
- Sinergise/Sentinel Hub, Ljubljana, Slovenia (grega.milcinski@sinergise.com)
Every experiment starts with the data, which needs to be fine-tuned for the specific use-case. We call this "analysis ready data (ARD)". In some cases, for the sake of reusability and comparability, the specifications for ARD are well defined. In many other cases, however, the procedures are not yet mature enough to support standardisation. In Earth Observation (EO) field this is especially true, as the whole community is moving from (semi)manually analysing individual scenes, from the time there were any data barely available, to processing of time-series, now that Landsat and Sentinel made this possible. We are now even facing a problem where there is simply too much of data, with PBs of open and commercial imagery being readily available. With the data being distributed at different places (Copernicus Data Access Service for Sentinel, AWS for Landsat) the challenge is further magnified. Machine learning (ML) approach can address the challenge of shifting through data, but ML as well requires data to be pre-processed for purpose and made available at the place where ML is running. Therefore, it is essential to have facility, which can generate ARD data customised for the specific analysis' requirements.
Sentinel Hub (SH) is a satellite imagery processing service, which is capable of on-the-fly gridding, re-projection, re-scaling, mosaicking, compositing, orthorectification and other actions required, either for integration in web-applications, where pictures are mostly served, or in ML and similar analysis processes, where pixel values and statistics are essential. SH works with original satellite data files and does not require replication or pre-processing. It uses cloud infrastructure and innovative methods to efficiently process and distribute data in a matter of seconds. Sentinel Hub gives access to a rich collection of satellite data including a full set of Sentinel satellites, Landsat collections, commercial VHR collections and other complimentary collections. It also provides an ability for users to onboard their own data in one of the standardised formats. Furthermore, the data located at different clouds, can be fused together in one single process, benefiting from the variability and volume of different sensors.
There are two main capabilities, which make SH especially fit for purpose of generating on-demand ARD data. First one is the support for user-provided processing scripts, which are a set of recipes on what should happen with the sensor data (band composites, indices, even simple neural networks combining available data). The second one is a set of processing orchestration options. There is a Process API for immediate, access to the pixel values. Statistical API is optimised for time-series analysis, which aggregates the data over specific area of interest and provides configurable statistics through time. And then there are asynchronous siblings of these services, which are fine-tuned for large scale processing - if one wants to prepare ML features for the entire continent or get time-series for millions of agriculture parcels.
We will present the technology behind the scenes, making the processing possible, as well as several use-cases, how one can efficiently make use the service in ML.
How to cite: Milcinski, G. and Kolaric, P.: Sentinel Hub - federated on-demand ARD generation , EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-4160, https://doi.org/10.5194/egusphere-egu23-4160, 2023.