Skip to main content
 

Toward An Outlier Uncertainty Model: A Comparative Analysis (2024)

Undergraduate: Sophia Lin


Faculty Advisor: Danielle Szafir
Department: Computer Science


In exploratory data analysis, creating data visualizations like scatterplots is a fundamental step to discern patterns and trends. However, outliers and noise can obscure these patterns, complicating data interpretation. Identifying and removing outliers is crucial for meaningful analysis, however, in real-world datasets that lack ground truths and have unclear cluster boundaries, this ambiguity often makes outlier identification and classification challenging. _x000D_
_x000D_
The study investigates 18 outlier detection algorithms, evaluating their interpretability, computational complexity, scalability, and suitability for visualization. The goal is to leverage insights from visual clustering and outlier detection to develop a perception-based model for estimating outlier uncertainty._x000D_
_x000D_
This research area focuses on outlier perception and ambiguity, with the goal of developing a robust model that mirrors human perception in outlier identification. By analyzing different outlier detection techniques, this study lays the groundwork for feature engineering and model design to estimate outlier uncertainty effectively. Ultimately, this research contributes to advancing outlier detection methodologies, enhancing the reliability and interpretability of data analysis in complex datasets.