MCbiF: Measuring Topological Autocorrelation in Multiscale Clusterings via 2-Parameter Persistent Homology
Abstract
Datasets often possess an intrinsic multiscale structure with meaningful descriptions at different levels of coarseness. Such datasets are naturally described as multi-resolution clusterings, i.e., not necessarily hierarchical sequences of partitions across scales. To analyse and compare such sequences, we use tools from topological data analysis and define the Multiscale Clustering Bifiltration (MCbiF), a 2-parameter filtration of abstract simplicial complexes that encodes cluster intersection patterns across scales. The MCbiF is a complete invariant of (non-hierarchical) sequences of partitions and can be interpreted as a higher-order extension of Sankey diagrams, which reduce to dendrograms for hierarchical sequences. We show that the multiparameter persistent homology (MPH) of the MCbiF yields a finitely presented and block decomposable module, and its stable Hilbert functions characterise the topological autocorrelation of the sequence of partitions. In particular, at dimension zero, the MPH captures violations of the refinement order of partitions, whereas at dimension one, the MPH captures higher-order inconsistencies between clusters across scales. We then demonstrate through experiments the use of MCbiF Hilbert functions as interpretable topological feature maps for downstream machine learning tasks, and show that MCbiF feature maps outperform both baseline features and representation learning methods on regression and classification tasks for non-hierarchical sequences of partitions. We also showcase an application of MCbiF to real-world data of non-hierarchical wild mice social grouping patterns across time.