HPC for semi-parametric statistical modeling on massive data sets

HSU

22. August 2023

Philipp Wittenberg, Lizzie Neumann & Jan Gertheiss (School of Economics and Social Sciences, HSU)

The project ”HPC for semi-parametric statistical modeling on massive data sets“ will be a vital input and enhancement for the dtec.bw project, ”SHM – Digitization and Monitoring of Infrastructure Structures.“ Its primary objective is creating and estimating semi-parametric and non-parametric statistical and statistical-mechanical models to monitor and detect changes in infrastructure conditions. The project involves the (pre-)processing and analysis of large data sets consisting of the output of approximately 100 sensors with a high sampling frequency of up to
1 kHz. Real-time analysis of continuous influxes of large data streams is essential for implementing an effective monitoring system. Structural Health Monitoring (SHM) utilizes sensor data to assess the state of infrastructure, such as bridges. However, environmental factors like temperature, humidity, and solar radiation often influence sensor data measurements. To accurately interpret the data, a model is needed to account for these confounding variables. Various methods are being considered for this purpose.

The project focuses on two main areas. Firstly, it addresses the estimation of conditional covariances for big data utilizing bandwidth selection through cross-validation techniques. These covariances provide a basis for subsequently applied damage detection schemes such as principal component analysis or Mahalanobis distance. Secondly, the project centers on a functional data approach considering the sensor measurements as functional data. Here, we apply and validate regression models utilizing open-source software and develop quality control charts.

Given the massive size of the data set (100-1000 Hz data for 1.5 years from multiple sensors) and the need for concurrent computations by multiple users, the HPC cluster HSUper is a suitable choice. Its capabilities are necessary to handle individual scripts’ memory consumption and extensive runtime. Collaboration with the hpc.bw-Team is expected to facilitate the transfer of existing shared memory parallel scripts, primarily written in R, Rcpp, and Armadillo, to distributed and shared parallel memory implementations using the required tools on the HSUper cluster. Additionally, the collaboration aims to apply the acquired knowledge to other bridge datasets, including the two reference structures ”Stader Straße“ and ”Vahrendorfer Stadtweg“ within the dtec.bw SHM-project. For more details, see SHM. This collaboration will greatly enhance the efficiency and scalability of the statistical modeling process and contribute to the broader field of infrastructure monitoring.