Six Years of ExoMars TGO Active Archiving – Lessons Learned
- 1Telespazio UK Ltd for ESA (Camino Bajo del Castillo s/n, 28692 Villanueva de la Cañada, Madrid, Spain (tanya.lim@ext.esa.int)
- 2ESA/ESAC (Camino Bajo del Castillo s/n, 28692 Villanueva de la Cañada, Madrid, Spain
- 3Rhea for ESA (Camino Bajo del Castillo s/n, 28692 Villanueva de la Cañada, Madrid, Spain
- 4SERCO for ESA (Camino Bajo del Castillo s/n, 28692 Villanueva de la Cañada, Madrid, Spain
- 5Aurora Technology B.V. for ESA (Camino Bajo del Castillo s/n, 28692 Villanueva de la Cañada, Madrid, Spain
Introduction
The ExoMars Trace Gas Orbiter (TGO) was launched in 2016 and Science Phase started in April 2018. Apart from the failure of one sub-instrument, the spacecraft and payload remain healthy and in normal operations today. The mission approach to archiving the data had two significant differences from previous ESA missions archiving in the Planetary Science Archive (PSA): ExoMars was the first mission to archive the PDS4 standard data and it was the first to actively process and archive the mission raw data daily.
During the pre-launch phase the planning for the ExoMars archiving was done in close coordination with the BepiColombo mission team who would also take the active archiving approach and use the PDS4 standard, hence we were able to achieve a common data structure and set of rules for our data providers. This set of PSA rules has been recorded the PSA Archiving Guide, which remains a living document used by all new ESA missions archiving in the PSA.
Data Structures
One of the most impactful decisions made for the PSA PDS4 archives was to create a single bundle for each instrument then separate collections by data type, such as document, schema, data etc. Science data is also subdivided by processing level, hence all raw data for a mission accumulates in one collection inside one bundle. This naturally has led to large accumulating collections and while data volumes have not presented any significant issues, sheer numbers of files, especially if stored in a flat structure, have led to the need to consider carefully the physical structure adopted.
Another issue, was that early in the mission, the directory structures were not settled and to change the structure required the data products to be deleted then re-ingested with the new path. This has since been improved with internal tools to change a path internally. It's important to note that the storage layout adheres to a distinct schema from what the end user sees. Data offered to users (via ftp, web pages, or custom applications) is regulated by the distribution path value stored in a database, eliminating the need for physical file movement in case of future rearrangements.
Bundle and Collection Versions:
A major difficulty with the single accumulating bundle/collection approach faced early on, has been versioning. Initially the bundle/collection versions were incremented on every daily delivery, but it quickly became evident that maintaining a full record of each version was going to be more effort that it made sense to expend. The approach was modified to artificially increment on a monthly basis but this approach has also now been dropped with the current approach to only increment if there has been a significant change such as the ingestion of reprocessed data from the whole mission.
Product Design:
One choice which is made for all science products is whether to group data files together into a single product, or even a single file, or whether to produce smaller separate products. For the CaSSIS instrument on the TGO it was decided that the framelets making up the push broom image should be stored as separate images. However a CaSSIS observation typically has around 600 PDS4 framelet products at Raw level. Added to the fact that in a typical day CaSSIS typically has around 30 observations, and data gets re-processed, we now have an archive with tens of millions of CaSSIS products. This has caused 2 main issues. The first has been difficulty in maintaining database performance. The other issue has been the discovery and tracking of missing data due to issues with the initial data downlink, the data processing, validation and/or transfer and ingestion into the PSA. Putting all the framelets for each filter into one product per observation would reduce number of products but at the cost of a very long label e.g. including geometry information for each framelet, which potentially has different downsides.
Filenames:
One of the successful decisions made was to standardise file naming and at least for the science products to include filenames as part of the LID. The use of mission/instrument the bundle name plus the filename for the products has created a convention which keeps the LIDs in the PSA unique.
A different issue encountered was in the use of time in the filename. As data for most instruments did not have an observation ID, it is necessary to use date/UTC to create a unique filename/LID. However, the spacecraft clock drifted by up to a few seconds, corrected later in SPICE, so when the Raw data underwent a full re-processing the filename and hence the LID also changed meaning we had two different LIDs associated with a single observation.
PSA Dictionary:
Another success, has been to develop a dictionary for the PSA, allowing the addition of cross-mission attributes which are not included in the PDS4 core model. For some attributes, such as mission phase, we additionally include schematron in individual mission dictionaries and this standardisation in the schema has aided the development of the PSA tables and UI.
Summary:
Design choices made before the launch of ExoMars in 2016 remain in place, and the active archives now include the BepiColombo and JUICE missions. The structure and conventions adopted are easy to follow and have generally been successful. Some PDS conventions such as versioning have not been compatible with an active archive approach hence the way PSA deals with these has evolved.
How to cite: Lim, T., Coia, D., Bentley, M., Oside, J., Rancero, E., Giordano, F., and Docasal, R.: Six Years of ExoMars TGO Active Archiving – Lessons Learned, Europlanet Science Congress 2024, Berlin, Germany, 8–13 Sep 2024, EPSC2024-848, https://doi.org/10.5194/epsc2024-848, 2024.