This project spans multiple modalities of medical imaging and multiple types of modelling to empower medical experts.
Digital Pathology - In this dataset we have looked at NLP methods for analyzing medical reports, Deep Learning methods for classifying images, and Manifold Learning methods to extract compact embeddings for using in classifiers and search engines.
Alzheimer’s Classification - learning predictive classification models for diffusion MRI data to provide decision support for degenerative brain diseases using Deep Neural Network methods currently only used for 2D image classification. This domain is challenging due to the 3D structure of the data as well as the non-visual properties which do not necessarily carry over from other domains.
Our Papers on Medical Imaging
Quantile–Quantile Embedding for distribution transformation and manifold embedding with ability to choose the embedding distribution
Machine Learning with Applications (MLWA).
6,
2021.
We propose a new embedding method, named Quantile-Quantile Embedding (QQE), for distribution transformation and manifold embedding with the ability to choose the embedding distribution. QQE, which uses the concept of quantile-quantile plot from visual statistical tests, can transform the distribution of data to any theoretical desired distribution or empirical reference sample. Moreover, QQE gives the user a choice of embedding distribution in embedding the manifold of data into the low dimensional embedding space. It can also be used for modifying the embedding distribution of other dimensionality reduction methods, such as PCA, t-SNE, and deep metric learning, for better representation or visualization of data. We propose QQE in both unsupervised and supervised forms. QQE can also transform a distribution to either an exact reference distribution or its shape. We show that QQE allows for better discrimination of classes in some cases. Our experiments on different synthetic and image datasets show the effectiveness of the proposed embedding method.
Analysis of Language Embeddings for Classification of Unstructured Pathology Reports
In
International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).
IEEE,
Nov,
2021.
A pathology report is one of the most significant medical documents providing interpretive insights into the visual appearance of the patient’s biopsy sample. In digital pathology, high-resolution images of tissue samples are stored along with pathology reports. Despite the valuable information that pathology reports hold, they are not used in any systematic manner to promote computational pathology. In this work, we focus on analyzing the reports, which are generally unstructured documents written in English with sophisticated and highly specialized medical terminology. We provide a comparative analysis of various embedding models like BioBERT, Clinical BioBERT, BioMed-RoBERTa and Term Frequency-Inverse Document Frequency (TF-IDF), a traditional NLP technique, as well as the combination of embeddings from pre-trained models with TF-IDF. Our results demonstrate the effectiveness of various word embedding techniques for pathology reports.
Magnification Generalization for Histopathology Image Embedding
In
IEEE International Symposium on Biomedical Imaging (ISBI).
Apr,
2021.
Histopathology image embedding is an active research area in computer vision. Most of the embedding models exclusively concentrate on a specific magnification level. However, a useful task in histopathology embedding is to train an embedding space regardless of the magnification level. Two main approaches for tackling this goal are domain adaptation and domain generalization, where the target magnification levels may or may not be introduced to the model in training, respectively. Although magnification adaptation is a well-studied topic in the literature, this paper, to the best of our knowledge, is the first work on magnification generalization for histopathology image embedding. We use an episodic trainable domain generalization technique for magnification generalization, namely Model Agnostic Learning of Semantic Features (MASF), which works based on the Model Agnostic Meta-Learning (MAML) concept. Our experimental results on a breast cancer histopathology dataset with four different magnification levels show the proposed method’s effectiveness for magnification generalization.
Batch-Incremental Triplet Sampling for Training Triplet Networks Using Bayesian Updating Theorem
In
25th International Conference on Pattern Recognition (ICPR).
IEEE,
Milan, Italy (virtual).
Jan,
2021.
Variants of Triplet networks are robust entities for learning a discriminative embedding subspace. There exist different triplet mining approaches for selecting the most suitable training triplets. Some of these mining methods rely on the extreme distances between instances, and some others make use of sampling. However, sampling from stochastic distributions of data rather than sampling merely from the existing embedding instances can provide more discriminative information. In this work, we sample triplets from distributions of data rather than from existing instances. We consider a multivariate normal distribution for the embedding of each class. Using Bayesian updating and conjugate priors, we update the distributions of classes dynamically by receiving the new mini-batches of training data. The proposed triplet mining with Bayesian updating can be used with any triplet-based loss function, e.g., triplet-loss or Neighborhood Component Analysis (NCA) loss. Accordingly, Our triplet mining approaches are called Bayesian Updating Triplet (BUT) and Bayesian Updating NCA (BUNCA), depending on which loss function is being used. Experimental results on two public datasets, namely MNIST and histopathology colorectal cancer (CRC), substantiate the effectiveness of the proposed triplet mining method.
Offline versus Online Triplet Mining based on Extreme Distances of Histopathology Patches
In
15th International Symposium on Visual Computing (ISCV 2020).
Springer International Publishing,
(virtual).
Oct,
2020.
We analyze the effect of offline and online triplet mining for colorectal cancer (CRC) histopathology dataset containing 100,000 patches. We consider the extreme, i.e., farthest and nearest patches with respect to a given anchor, both in online and offline mining. While many works focus solely on how to select the triplets online (batch-wise), we also study the effect of extreme distances and neighbor patches before training in an offline fashion. We analyze the impacts of extreme cases for offline versus online mining, including easy positive, batch semi-hard, and batch hard triplet mining as well as the neighborhood component analysis loss, its proxy version, and distance weighted sampling. We also investigate online approaches based on extreme distance and comprehensively compare the performance of offline and online mining based on the data patterns and explain offline mining as a tractable generalization of the online mining with large mini-batch size. As well, we discuss the relations of different colorectal tissue types in terms of extreme distances. We found that offline mining can generate a better statistical representation of the population by working on the whole dataset.
Supervision and Source Domain Impact on Representation Learning: A Histopathology Case Study
In
International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’20).
Montreal, Quebec, Canada (virtual).
Jul,
2020.
As many algorithms depend on a suitable representation of data, learning
unique features is considered a crucial task. Although supervised techniques
using deep neural networks have boosted the performance of representation
learning, the need for a large set of labeled data limits the application of
such methods. As an example, high-quality delineations of regions of interest
in the field of pathology is a tedious and time-consuming task due to the large
image dimensions. In this work, we explored the performance of a deep neural
network and triplet loss in the area of representation learning. We
investigated the notion of similarity and dissimilarity in pathology
whole-slide images and compared different setups from unsupervised and
semi-supervised to supervised learning in our experiments. Additionally,
different approaches were tested, applying few-shot learning on two publicly
available pathology image datasets. We achieved high accuracy and
generalization when the learned representations were applied to two different
pathology datasets
Fisher Discriminant Triplet and Contrastive Losses for Training Siamese Networks
In
IEEE International Joint Conference on Neural Networks (IJCNN).
Glasgow, UK (virtual).
Jul,
2020.
Weighted Fisher Discriminant Analysis in the Input and Feature Spaces
In
International Conference on Image Analysis and Recognition (ICIAR-2020).
Springer,
Póvoa de Varzim, Portugal (virtual).
Jun,
2020.
Application of probabilistically-weighted graphs to image-based diagnosis of Alzheimer’s disease using diffusion MRI
Syeda Maryam,
Laura McCrackin,
Mark Crowley,
Yogesh Rathi,
and Oleg Michailovich.
In
Proceedings of SPIE 101324, Medical Imaging 2017 : Computer-Aided Diagnosis.
International Society for Optics and Photonics,
Mar,
2017.
The world’s aging population has given rise to an increasing awareness towards neurodegenerative disorders, including Alzheimers Disease (AD). Treatment options for AD are currently limited, but it is believed that future success depends on our ability to detect the onset of the disease in its early stages. The most frequently used tools for this include neuropsychological assessments, along with genetic, proteomic, and image-based diagnosis. Recently, the applicability of Diffusion Magnetic Resonance Imaging (dMRI) analysis for early diagnosis of AD has also been reported. The sensitivity of dMRI to the microstructural organization of cerebral tissue makes it particularly well-suited to detecting changes which are known to occur in the early stages of AD. Existing dMRI approaches can be divided into two broad categories: region-based and tract-based. In this work, we propose a new approach, which extends region-based approaches to the simultaneous characterization of multiple brain regions. Given a predefined set of features derived from dMRI data, we compute the probabilistic distances between different brain regions and treat the resulting connectivity pattern as an undirected, fully-connected graph. The characteristics of this graph are then used as markers to discriminate between AD subjects and normal controls (NC). Although in this preliminary work we omit subjects in the prodromal stage of AD, mild cognitive impairment (MCI), our method demonstrates perfect separability between AD and NC subject groups with substantial margin, and thus holds promise for fine-grained stratification of NC, MCI and AD populations.