Using machine learning to define and characterise 3D ocean regions - MarDATA

Doctoral Researcher:

Yvonne Jenniges, AWI and University of Bremen, yvonne.jenniges@awi.de

Supervisors:

Prof. Dr. Boris Peter Koch, Alfred Wegener Institute - Helmholtz Centre for Polar and Marine Research, Bremerhaven, boris.koch@awi.de
Prof. Dr. Sebastian Maneth, University of Bremen, maneth@uni-bremen.de

Location: Bremen/Bremerhaven

Disciplines: Computer science, data science, machine learning, marine chemistry

Keywords: Ocean regions, clustering, biogeochemistry

Motivation:

Dividing the ocean into meaningful regions proved to be a valuable concept to investigate marinebiogeochemical processes at different scales. In this way, observations and measurements can be interpreted more easily, regions can be compared with one another and the understanding of biogeochemical and ecological processes can be expanded.

Longhurst introduced one of the first definitions of 2D marine regions, called ecological provinces. He deduced the regions by dividing the surface ocean into 56 areas based on the surface chlorophyll field, plankton ecology, physical oceanography, geographical morphology, nutrient availability and his extensive experience as an oceanographer [1]. The relevance of Longhurst’s provinces is substantiated by a preliminary study to this work: Provinces in the eastern Atlantic Ocean were inferred from the chemical composition of surface seawater [2]. They strongly resemble Longhurst’s output.

To attain a more complete comprehension of oceanic processes, considering only its surface is insufficient. Therefore, analyses on marine regions in 3D exist that additionally explore the depth extent. For example, Sayre et al. (2017) compute such regions by machine learning and call them ecological marine units (EMUs) [3].

The EMU partitioning leaves several aspects unanswered, like the influence of the algorithm and oceanographic parameter choice. Moreover, it is a static definition despite the ocean varying in time. Lastly, objective measures are missing to evaluate and compare such 3D region definitions.

Datapoints whose colour indicate their cluster affiliation. This was the result of a first clustering by KMeans with 37 clusters (similar to the EMU project). As parameters, depth, temperature and salinity were used. Each feature was scaled to the range [0, 1] prior to clustering. Missing values were ignored. Data source: Global dataset assembled by Alexander Korablev for the COMFORT project.

Aim: In this work, some of the above-mentioned drawbacks in 3D marine regions will be addressed: To be able to compare existing regions, evaluation and comparison measures are investigated first. These measures serve as a base for comparing the effect of various machine learning algorithms on the regions and their behaviour over time. Since this work bases on an observational dataset, imputation of missing values will be studied as well.

Objectives:

(1) Implement a tool to evaluate new measurements as well as the underlying ocean clustering.

(2) Develop a 3D clustering of the global ocean and investigate temporal changes.

(3) Test different strategies to impute missing values in the dataset and how this affects the clustering.

References

[1] Longhurst, A.R., Ecological Geography of the Sea. 2007.
[2] Koch, B.P. and G. Kattner, Preface "Sources and rapid biogeochemical transformation of dissolved organic matter in the Atlantic surface ocean". Biogeosciences, 2012. 9(7): p. 2597-2602.
[3] Sayre, R., et al., A Three-Dimensional Mapping of the Ocean Based on Environmental Data. Oceanography, 2017. 30(1): p. 90-103.