- Diese Veranstaltung hat bereits stattgefunden.
Ángel López-Oriona (University of A Coruña, Spain)
12. Oktober 2022 @ 16:00 - 17:00
Clustering of categorical time series based on two novel feature-based distances with an application to biological sequences
Two novel distances between categorical time series are introduced. Both of them measure discrepancy between extracted features describing the underlying serial dependence patterns. One of them is based on well-known association measures. The other relies on the so-called binarization of a categorical process, which indicates the presence of each category by means of a canonical vector. The binarization is used to construct a collection of innovative association measures capturing every possible type of serial dependence. The metrics are used to construct crisp and fuzzy algorithms for clustering nominal series. The proposed approaches are able to group together series generated from similar underlying stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. Extensive simulation studies show that both hard and soft clustering algorithms outperform several alternative procedures presented in the literature. Two applications involving biological sequences from different species highlight the usefulness of the introduced techniques.