Processing Pipelines, Feature Extraction, and Multimodal Fusion in the Computational Analysis of Psychiatric Disorders Using Auditory and Visual Data
Chapter from the book:
Yılmaz,
A.
(ed.)
2026.
Theoretical Foundations of Computational Intelligence: Artificial Intelligence, Learning Theory, and the Big Data Paradigm .
Synopsis
The early diagnosis, treatment, and monitoring of mental disorders are difficult owing to the subjective nature of symptoms and the measurement limitations of traditional clinical methods. This chapter examines, from a signal processing and machine learning perspective, computational approaches that aim to derive objective biomarkers from auditory, visual, and multimodal data. Data sources, clinical assessment scales, preprocessing steps, feature extraction methods for audio and visual data, classification architectures, and multimodal fusion strategies are examined at a technical level. This technical framework rests on a broad body of literature that has, in recent years, concentrated around particular methods and objectives and has expanded rapidly, particularly in the period following the COVID-19 pandemic. Within this concentration of methods and objectives, depression detection has emerged as the dominant research target, whereas convolutional neural networks (CNNs) have become the principal architecture. Audio-visual fusion performed at the feature level has yielded considerable accuracy gains compared with unimodal solutions. The chapter also discusses representative facial and voice recognition architectures, the distinction between primary and auxiliary classification methods, and real-world applications such as remote and continuous assessment, privacy-preserving non-invasive monitoring, and adaptability to resource-constrained settings. Nevertheless, datasets lacking cultural and linguistic diversity, the gap between diagnosis and treatment, and the limited interpretability of models remain the principal obstacles to translating innovations in the field into clinical impact. Consequently, overcoming these obstacles directly shapes the field's future priorities. Studies indicate that, beyond improving model accuracy, the field needs to construct diverse and longitudinal datasets, extend models across the entire clinical pathway, and build interpretable systems that directly address clinical needs.
