Manifold Learning

Manifold learning looks at ways to automatically extract meaningful features, dimensions or subspaces from data in order to build better models, expand data, reduce data, etc.

WEBPAGE: Elements of Dimensionality Reduction and Manifold Learning (Amazon)

Manifold Learning and Dimensionality Reduction are vast areas of study in Math and Computer Science. The task is to find ways to determine the essential relationships and structure of a dataset. Researchers in this area looks at ways to automatically extract meaningful features, dimensions or subspaces from data in order to build better models, expand data, reduce data, etc.

The recent focus on Deep Learning seems to raise the question whether dedicated research on Manifold Learning and Dimensionality Reduction are still required as their own pursuit since. After all, some form of Encoder-Decoder neural network could always be devised as a replacement. While such systems work well given the right training process and enough data, there is also certainly a role to be played by interpretable models built on solid statistical concepts.

Extraction of lower-dimensional representations of data can allow more compact storage or transmission and also improve the performance of other ML tasks such as classification and regres_sion, as the more compact representation must necessarily encode the most important relationships to maintain accuracy.

We have an exciting group of work which has been published in recent years on this topic which you can see below in the Publications list.

Upcoming Textbook on Manifold Learning!

This work has culiminated recently in the graduation of my first Doctoral student, Benyamin Ghojogh, in April 2021 with his thesis encompassing many of these advances. Dr. Ghojogh continued as a postdoc in my lab until 2022 and now works in industry. In late 2022 we will publish, via Springer, a new textbook on “Manifold Learning and Dimensionality Reduction” (Ghojogh et al., 2023) written in collaboration with Prof. Ali Godsi andd Prof. Fakhri Karray.

Our Papers on Manifold Learning

Textbook

Elements of Dimensionality Reduction and Manifold Learning

Benyamin Ghojogh, Mark Crowley, Fakhri Karray, and Ali Ghodsi.

Springer Nature, Feb, 2023.

Abs URL

Dimensionality reduction, also known as manifold learning, is an area of machine learning used for extracting informative features from data, for better representation of data or separation between classes. This book presents a cohesive review of linear and nonlinear dimensionality reduction and manifold learning. Three main aspects of dimensionality reduction are covered – spectral dimensionality reduction, probabilistic dimensionality reduction, and neural network-based dimensionality reduction, which have geometric, probabilistic, and information-theoretic points of view to dimensionality reduction, respectively. This book delves into basic concepts and recent developments in the field of dimensionality reduction and manifold learning, providing the reader with a comprehensive understanding. The necessary background and preliminaries, on linear algebra, optimization, and kernels, are also explained in the book to ensure a comprehensive understanding of the algorithms.
Theoretical Connection between Locally Linear Embedding, Factor Analysis, and Probabilistic PCA

Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, and Mark Crowley

In Canadian Conference on Artificial Intelligence. Canadian Conference on Artificial Intelligence (CAIAC), Toronto, Ontario, Canada. May, 2022.

Abs PDF Slides URL Hypoth

Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction and manifold learning method. It has two main steps which are linear reconstruction and linear embedding of points in the input space and embedding space, respectively. In this work, we look at the linear reconstruction step from a stochastic perspective where it is assumed that every data point is conditioned on its linear reconstruction weights as latent factors. The stochastic linear reconstruction of LLE is solved using expectation maximization. We show that there is a theoretical connection between three fundamental dimensionality reduction methods, i.e., LLE, factor analysis, and probabilistic Principal Component Analysis (PCA). The stochastic linear reconstruction of LLE is formulated similar to the factor analysis and probabilistic PCA. It is also explained why factor analysis and probabilistic PCA are linear and LLE is a nonlinear method. This work combines and makes a bridge between two broad approaches of dimensionality reduction, i.e., the spectral and probabilistic algorithms.
QQE

Quantile–Quantile Embedding for distribution transformation and manifold embedding with ability to choose the embedding distribution

Benyamin Ghojogh, Fakhri Karray, and Mark Crowley

Machine Learning with Applications (MLWA). 6, 2021.

Abs arXiv PDF URL

We propose a new embedding method, named Quantile-Quantile Embedding (QQE), for distribution transformation and manifold embedding with the ability to choose the embedding distribution. QQE, which uses the concept of quantile-quantile plot from visual statistical tests, can transform the distribution of data to any theoretical desired distribution or empirical reference sample. Moreover, QQE gives the user a choice of embedding distribution in embedding the manifold of data into the low dimensional embedding space. It can also be used for modifying the embedding distribution of other dimensionality reduction methods, such as PCA, t-SNE, and deep metric learning, for better representation or visualization of data. We propose QQE in both unsupervised and supervised forms. QQE can also transform a distribution to either an exact reference distribution or its shape. We show that QQE allows for better discrimination of classes in some cases. Our experiments on different synthetic and image datasets show the effectiveness of the proposed embedding method.
TOOL-Gen-LLE

Generative locally linear embedding: A module for manifold unfolding and visualization

Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, and Mark Crowley

Software Impacts. 9, (100105). Elsevier, 2021.

Abs PDF

Data often have nonlinear patterns in machine learning. One can unfold the nonlinear manifold of a dataset for low-dimensional visualization and feature extraction. Locally Linear Embedding (LLE) is a nonlinear spectral method for dimensionality reduction and manifold unfolding. It embeds data using the same linear reconstruction weights as in the input space. In this paper, we propose an open source module which not only implements LLE, but also includes implementations of two generative LLE algorithms whose linear reconstruction phases are stochastic. Using this module, one can generate as many manifold unfoldings as desired for data visualization or feature extraction.
Acceleration of Large Margin Metric Learning for Nearest Neighbor Classification Using Triplet Mining and Stratified Sampling

Parisa Poorheravi, Benyamin Ghojogh, Vincent Gaudet, Fakhri Karray, and Mark Crowley

Journal of Computational Vision and Imaging Systems. 6, (1). Jan, 2021.

Abs PDF URL

Metric learning is a technique in manifold learning to find a projection subspace for increasing and decreasing the inter- and intra-class variances, respectively. Some metric learning methods are based on triplet learning with anchor-positive-negative triplets. Large margin metric learning for nearest neighbor classification is one of the fundamental methods to do this. Recently, Siamese networks have been introduced with the triplet loss. Many triplet mining methods have been developed for Siamese nets; however, these techniques have not been applied on the triplets of large margin metric learning. In this work, inspired by the mining methods for Siamese nets, we propose several triplet mining techniques for large margin metric learning. Moreover, a hierarchical approach is proposed, for acceleration and scalability of optimization, where triplets are selected by stratified sampling in hierarchical hyper-spheres. We analyze the proposed methods on three publicly available datasets.
Batch-Incremental Triplet Sampling for Training Triplet Networks Using Bayesian Updating Theorem

Milad Sikaroudi, Benyamin Ghojogh, Fakhri Karray, Mark Crowley, and H. R. Tizhoosh.

In 25th International Conference on Pattern Recognition (ICPR). IEEE, Milan, Italy (virtual). Jan, 2021.

Abs arXiv URL

Variants of Triplet networks are robust entities for learning a discriminative embedding subspace. There exist different triplet mining approaches for selecting the most suitable training triplets. Some of these mining methods rely on the extreme distances between instances, and some others make use of sampling. However, sampling from stochastic distributions of data rather than sampling merely from the existing embedding instances can provide more discriminative information. In this work, we sample triplets from distributions of data rather than from existing instances. We consider a multivariate normal distribution for the embedding of each class. Using Bayesian updating and conjugate priors, we update the distributions of classes dynamically by receiving the new mini-batches of training data. The proposed triplet mining with Bayesian updating can be used with any triplet-based loss function, e.g., triplet-loss or Neighborhood Component Analysis (NCA) loss. Accordingly, Our triplet mining approaches are called Bayesian Updating Triplet (BUT) and Bayesian Updating NCA (BUNCA), depending on which loss function is being used. Experimental results on two public datasets, namely MNIST and histopathology colorectal cancer (CRC), substantiate the effectiveness of the proposed triplet mining method.
Theoretical Insights into the Use of Structural Similarity Index In Generative Models and Inferential Autoencoders

Benyamin Ghojogh, Fakhri Karray, and Mark Crowley

In International Conference on Image Analysis and Recognition (ICIAR-2020). Springer, Póvoa de Varzim, Portugal (virtual). Jun, 2020.
Weighted Fisher Discriminant Analysis in the Input and Feature Spaces

Benyamin Ghojogh, Milad Sikaroudi, H.R. Tizhoosh, Fakhri Karray, and Mark Crowley

In International Conference on Image Analysis and Recognition (ICIAR-2020). Springer, Póvoa de Varzim, Portugal (virtual). Jun, 2020.
Generalized Subspace Learning by Roweis Discriminant Analysis

Benyamin Ghojogh, Fakhri Karray, and Mark Crowley

In International Conference on Image Analysis and Recognition (ICIAR-2020). Springer, Póvoa de Varzim, Portugal (virtual). Jun, 2020.

Abs URL

We present a new method which generalizes subspace learning based on eigenvalue and generalized eigenvalue problems. This method, Roweis Discriminant Analysis (RDA), is named after Sam Roweis to whom the field of subspace learning owes significantly. RDA is a family of infinite number of algorithms where Principal Component Analysis (PCA), Supervised PCA (SPCA), and Fisher Discriminant Analysis (FDA) are special cases. One of the extreme special cases, which we name Double Supervised Discriminant Analysis (DSDA), uses the labels twice, it is novel and has not appeared elsewhere. We propose a dual for RDA for some special cases. We also propose kernel RDA, generalizing kernel PCA, kernel SPCA, and kernel FDA, using both dual RDA and representation theory. Our theoretical analysis explains previously known facts such as why SPCA can use regression but FDA cannot, why PCA and SPCA have duals but FDA does not, why kernel PCA and kernel SPCA use kernel trick but kernel FDA does not, and why PCA is the best linear method for reconstruction. Roweisfaces and kernel Roweisfaces are also proposed generalizing eigenfaces, Fisherfaces, supervised eigenfaces, and their kernel variants. We also report experiments showing the effectiveness of RDA and kernel RDA on some benchmark datasets.
Instance Ranking and Numerosity Reduction Using Matrix Decompositionand Subspace Learning

Benyamin Ghojogh, and Mark Crowley

In Canadian Conference on Artificial Intelligence. Springer’s Lecture Notes in Artificial Intelligence., Kingston, ON, Canada. 2019.

Abs

One way to deal with the ever increasing amount of available data for processing is to rank data instances by usefulness and reduce the dataset size. In this work, we introduce a framework to achieve this using matrix decomposition and subspace learning. Our central contribution is a novel similarity measure for data instances that uses the basis obtained from matrix decomposition of the dataset. Using this similarity measure, we propose several related algorithms for ranking data instances and performing numerosity reduction. We then validate the effectiveness of these algorithms for data reduction on several datasets for classification, regression, and clustering tasks.
Locally Linear Image Structural Embedding for Image Structure Manifold Learning

Benyamin Ghojogh, Fakhri Karray, and Mark Crowley

In International Conference on Image Analysis and Recognition (ICIAR-19). Waterloo, Canada. 2019.
Image Structure Subspace Learning Using Structural Similarity Index

Benyamin Ghojogh, Fakhri Karray, and Mark Crowley

In International Conference on Image Analysis and Recognition (ICIAR-19). Waterloo, Canada. 2019.
Principal Component Analysis Using Structural Similarity Index for Images

Benyamin Ghojogh, Fakhri Karray, and Mark Crowley

In International Conference on Image Analysis and Recognition (ICIAR-19). Waterloo, Canada. 2019.
Principal Sample Analysis for Data Reduction

Benyamin Ghojogh, and Mark Crowley

In 2018 IEEE International Conference on Big Knowledge (ICBK). Singapore. 2018.