On intrinsic dimension of point clouds by a persistent homology approach: computational tips
DOI:
https://doi.org/10.64700/mmm.89Keywords:
Topological data analysis, persistent homology, intrinsic dimensionAbstract
We present new results on estimating the intrinsic dimension (ID) of point clouds using persistent homology. In particular, we compare topological ID estimators with different approaches, comprehensively assessing their strengths and weaknesses. We show that a combination of the so-called i-dimensional persistent homology fractal dimension estimator and the persistent homology dimension, which we termed i-dimensional α persistent homology fractal dimension, is a suitable choice for obtaining an effective estimation of the ID in many benchmark datasets.
References
H. Adams, M. Aminian, E. Farnell, M. Kirby, C. Peterson, J. Mirth, R. Neville, P. Shipman and C. Shonkwiler: A fractal dimension for measures via persistent homology, Topological Data Analysis: Abel Symposia Springer, Switzerland (2020).
A. Aghajanyan, S. Gupta and L. Zettlemoyer: Intrinsic dimensionality explains the effectiveness of language model finetuning, In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, (2021), 7319–7328.
D. Ali, A. Asaad, M. Jimenez, V. Nanda, E. Paluzo-Hidalgo and M. Soriano-Trigueros: A survey of vectorization methods in topological data analysis, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 14069–14080.
M. Allegra, E. Facco, F. Denti, A. Laio and A. Mira: Data segmentation based on the local intrinsic dimension, Sci. Rep., 10 (1) (2020), Article ID: 10:16449.
E. Altan, S. A. Solla, L. E. Miller and E. J. Perreault: Estimating the dimensionality of the manifold underlying multielectrode neural recordings, PLoS Comput. Biol., 17 (11) (2021), Article ID: e1008591.
A. Ansuini, A. Laio, J. H. Macke and D. Zoccolan: Intrinsic dimension of data representations in deep neural networks, Adv. Neural Inf. Process. Syst., 32 (2019).
A. Asaad, D. Ali, T. Majeed and R. Rashid: Persistent homology for breast tumor classification using mammogram scans, Mathematics, 10 (21) (2022), Article ID: 4039.
J. A. D. Binnie, P. Dłotko, J. Harvey, J. Malinowski and K. M. Yim: A survey of dimension estimation methods, (2025), arXiv preprint arXiv:2507.13887.
M. Biondo, N. Cirone, F. Valle, S. Lazzardi, M. Caselle and M. Osella: The intrinsic dimension of gene expression during cell differentiation, Nucleic Acids Res., 53 (16) (2025), 1–11.
T. Birdal, L. Guibas, A. Lou and U. Simsekli: Intrinsic dimension, persistent homology and generalization in neural networks, Adv. Neural Inf. Process. Syst. (NeurIPS 2021), 34 (2021), 6776–6789.
R. Brüel-Gabrielsson, V. Ganapathi-Subramanian, P. Skraba and L. J. Guibas: Topology-aware surface reconstruction for point clouds, Comput. Graph. Forum., 39 (2020) 197–207.
A. Bukkuri, N. Andor and I. K Darcy: Applications of topological data analysis in oncology, Front. Artif. Intell., 4 (2021), Article ID: 659037.
F. Camastra, A. Staiano: Intrinsic dimension estimation: Advances and open problems, Inf. Sci., 328 (2016), 26–41.
P. Campadelli, E. Casiraghi, C. Ceruti and A. Rozza: Intrinsic dimension estimation: Relevant techniques and a benchmark framework, Math. Probl. Eng., 2015 (2015), Article ID: 759567.
G. Carlsson: Topology and data, Bull. Amer. Math. Soc., 46 (2009), 255–308.
G. Carlsson, M. Vejdemo-Johansson: Topological data analysis with applications, Cambridge University Press, Cambridge (2022).
F. Chazal, B. Michel: An introduction to topological data analysis: Fundamental and practical aspects for data scientists, Front. Artif. Intell., 4 (2021), Article ID: 667963.
D. Cohen-Steiner, H. Edelsbrunner and J. Harer: Stability of persistence diagrams, Discrete Comput. Geom., 37 (2007), 103–120.
H. Edelsbrunner, J. Harer: Persistent homology-a survey, Contemp. Math., 453 (2008), 257–282.
H. Edelsbrunner, J. Harer: Computational topology: An introduction, American Mathematical Society, USA (2010).
V. Erba, M. Gherardi and P. Rotondo: Intrinsic dimension estimation for locally undersampled data, Sci. Rep., 9 (1) (2019), Article ID: 17133.
E. Facco, M. d’Errico, A. Rodriguez and A. Laio: Estimating the intrinsic dimension of datasets by a minimal neighborhood information, Sci. Rep., 7 (1) (2017), Article ID: 12140.
E. Facco, A. Pagnani, E. T. Russo and A. Laio: The intrinsic dimension of protein sequence evolution, PLoS Comput. Biol., 15 (4) (2019), Article ID: e1006767.
A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk and F. Herrera: Learning from imbalanced data sets, Springer Switzerland (2018).
M. Flammer: Persistent homology-based classification of chaotic multi-variate time series: Application to electroencephalograms, SN Comput. Sci., 5 (2024), Article ID: 107.
A. T. Fomenko: Visual geometry and topology, Springer Science and Business Media, Berlin (2012).
Giotto-tda 0.5.1 Documentation. Available online: https://giotto-ai.github.io/gtda-docs/0.5.1/library.html (2021) (accessed on 25 January 2021).
P. Grassberger, I. Procaccia: Measuring the strangeness of strange attractors, Physica D: Nonlinear Phenomena, 9 (1-2) (1983), 189–208.
The GUDHI Project, GUDHI User and Reference Manual, 3.5.0 Edition, GUDHI Editorial Board. (2022). Available online: https://gudhi.inria.fr/doc/3.5.0/ (accessed on 13 January 2022).
M. Guillemard A. Iske: Interactions between kernels, frames and persistent homology, Springer: Recent Applications of harmonic analysis to function spaces, differential equations, and data science, Switzerland (2017).
J. Jaquette, B Schweinhart: Fractal dimension estimation with persistent homology: A comparative study, Commun. Nonlinear Sci. Numer. Simul., 84 (2020), Article ID: 105163.
M. Jazayeri, S. Ostojic: Interpreting neural computations by examining intrinsic and embedding dimensionality of neural activity, Curr. Opin. Neurobiol., 70 (2021), 113–120.
W. Jia, M. Sun, J. Lian and S. Hou: Feature dimensionality reduction: a review, Complex Intell. Syst., 8 (3) (2022), 2663–2693.
I. T. Jolliffe, J. Cadima: Principal component analysis: a review and recent developments, Philos. Trans. A Math. Phys. Eng. Sci., 374 (2065) (2016), Article ID: 20150202.
R. Kindelan, J. Frías, M. Cerda and N. Hitschfeld: A topological data analysis based classifier, Adv. Data Anal. Classif., 18 (2024), 493–538.
A. Lawson, Y.-M. Chung and W. Cruse: A hybrid metric based on persistent homology and its application to signal classification, 2020 25th International Conference on Pattern Recognition (ICPR), Milan (Italy) (2020), 9944–9950.
E. Levina, P. Bickel: Maximum likelihood estimation of intrinsic dimension, Adv. Neural Inf. Process. Syst., 17 (2004), 777–784.
D. Leykam, D. G. Angelakis: Topological data analysis and machine learning, Adv. Phys., 8 (1) (2023), Article ID: 2202331.
C. Li, H. Farkhoor, R. Liu and J. Yosinski: Measuring the intrinsic dimension of objective landscapes, (2018), arXiv preprint arXiv:1804.08838.
S. Majumdar, A. K. Laha: Clustering and classification of time series using topological data analysis with applications to finance, Expert Syst. Appl., 162 (2020), Article ID: 113868.
C. Moon, Q. Li and G. Xiao: Using persistent homology topological features to characterize medical images: Case studies on lung and brain cancers, Ann. Appl. Stat., 17 (3) (2023), 2192–2211.
D. Moroni, M. A. Pascali: Learning topology: bridging computational topology and machine learning, Pattern Recognit. Image Anal., 31 (2021), 443–453.
H. S. Obaid, S. A. Dheyab and S. S. Sabry: The impact of data pre-processing techniques and dimensionality reduction on the accuracy of machine learning, 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON) (2019), Jaipur (India) 279–283.
K. Özçoban, M. Manguo˘glu and E. F. Yetkin: A novel approach for intrinsic dimension estimation, (2025), arXiv preprint arXiv:2503.09485.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R.Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and E. Duchesnay: Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12 (2011), 2825–2830.
P. Pope, C. Zhu, A. Abdelkader, M. Goldblum and T. Goldstein: The intrinsic dimension of images and its impact on learning, (2021), arXiv preprint arXiv:2104.08894.
C. S. Pun, S. X. Lee and K. Xia: Persistent-homology-based machine learning: a survey and a comparative study, Artif. Intell. Rev., 55 (2022), 5169–5213.
J. J. Rotman: An introduction to algebraic topology, Springer, (New York) (1988).
N. Saul, C. Tralie: Scikit-tda: Topological data analysis for Python, (2019). Available online: https://doi.org/10.5281/zenodo.2533369 (accessed on 25 January 2019).
B. Scholkopf, A. J. Smola: Learning with Kernels: Support vector machines, regularization, optimization and beyond, The MIT Press, Massachusetts (2002).
J. Shawe-Taylor, N. Cristianini: Kernel methods for pattern analysis, Cambridge University Press, Cambridge (2009).
Y. Skaf, R. Laubenbacher: Topological data analysis in biomedicine: A review, J. Biomed. Inform., 130 (2022), Article ID: 104082.
A. Som, H. Choi, K. N. Ramamurthy, M. P. Buman and P. Turaga: PI-Net: A deep learning approach to extract topological persistence images, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, (2020),
P. Sonego, M. Pacurar, S. Dhir, A. Kertész-Farkas, A. Kocsor, Z. Gáspári, J. A. M. Leunissen and S. Pongor: A Protein classification benchmark collection for machine learning, Nucleic Acids Res., 35 (2007), 232–236.
B. Sorscher, S. Ganguli and H. Sompolinsky: Neural representational geometry underlies few-shot concept learning, Proc. Natl. Acad. Sci., 119 (43) (2022), Article ID: e2200800119.
D. Sussillo, O. Barak: Opening the black box: low-dimensional dynamics in high-dimensional recurrent neural networks, Neural Comput., 25 (3) (2013), 626–649.
J. Theiler: Estimating fractal dimension, JOSA A, 7 (6) (1990) 1055–1073.
J. Townsend, C. P. Micucci, J. H. Hymel, V. Maroulas and K. D. Vogiatzis: Representation of molecular structures with persistent homology for machine learning applications in chemistry, Nat. Commun., 11 (2020), Article ID: 3230.
C. Tralie, N. Saul and R. Bar-On: Ripser.py: A lean persistent homology library for Python, J. Open Source Softw., 3 (29) (2018), 1–4.
L. Van Der Maaten, E. O. Postma and H. J. Van Den Herik: Dimensionality reduction: A comparative review, J. Mach. Learn. Res., 10 (2009), 1–41.
L. Van der Maaten, G. Hinton: Visualizing data using t-SNE, J. Mach. Learn. Res., 9 (11) (2008), 2579–2605.
P. J. Verveer, R. P. W. Duin: An evaluation of intrinsic dimensionality estimators, IEEE Trans. Pattern Anal. Mach. Intell., 17 (1) (1995), 81–86.
J. Von Rohrscheidt, B. Rieck: Topological singularity detection at multiple scales, Proceedings of the 40th International Conference on Machine Learning, Hawaii (USA) (2023), 35175–35197.
G. R. Yang, M. R. Joglekar, H. F. Song,W. T. Newsome and X. J.Wang: Task representations in neural networks trained to perform many cognitive tasks, Nat. Neurosci., 22 (2) (2019), 297–306.
A. Zomorodian, G. Carlsson: Computing persistent homology, Discrete Comput. Geom., 33 (2005), 249–274.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Cinzia Bandiziol, Stefano De Marchi, Michele Allegra

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.