Theoretical Foundations of Computational Intelligence: Artificial Intelligence, Learning Theory, and the Big Data Paradigm

Atınç Yılmaz; Sara Naghib Zadeh; Zühre Aydın; Hatice Nur Gök; Cansu Arslan; Uygar Aydin; İnci Zaim Gökbay; Mehmet Dinçer Erbaş; Kenan Peker; Gökhan Önder Ergüven; Tevfik Erdal Baylay; Atınç Yılmaz

doi:10.58830/ozgur.pub1351

Processing Pipelines, Feature Extraction, and Multimodal Fusion in the Computational Analysis of Psychiatric Disorders Using Auditory and Visual Data
Chapter from the book: Yılmaz, A. (ed.) 2026. Theoretical Foundations of Computational Intelligence: Artificial Intelligence, Learning Theory, and the Big Data Paradigm .

Return to Book

Uygar Aydin

İstanbul University

İnci Zaim Gökbay

İstanbul University

Downloads

Read Chapter Download

Synopsis

The early diagnosis, treatment, and monitoring of mental disorders are difficult owing to the subjective nature of symptoms and the measurement limitations of traditional clinical methods. This chapter examines, from a signal processing and machine learning perspective, computational approaches that aim to derive objective biomarkers from auditory, visual, and multimodal data. Data sources, clinical assessment scales, preprocessing steps, feature extraction methods for audio and visual data, classification architectures, and multimodal fusion strategies are examined at a technical level. This technical framework rests on a broad body of literature that has, in recent years, concentrated around particular methods and objectives and has expanded rapidly, particularly in the period following the COVID-19 pandemic. Within this concentration of methods and objectives, depression detection has emerged as the dominant research target, whereas convolutional neural networks (CNNs) have become the principal architecture. Audio-visual fusion performed at the feature level has yielded considerable accuracy gains compared with unimodal solutions. The chapter also discusses representative facial and voice recognition architectures, the distinction between primary and auxiliary classification methods, and real-world applications such as remote and continuous assessment, privacy-preserving non-invasive monitoring, and adaptability to resource-constrained settings. Nevertheless, datasets lacking cultural and linguistic diversity, the gap between diagnosis and treatment, and the limited interpretability of models remain the principal obstacles to translating innovations in the field into clinical impact. Consequently, overcoming these obstacles directly shapes the field's future priorities. Studies indicate that, beyond improving model accuracy, the field needs to construct diverse and longitudinal datasets, extend models across the entire clinical pathway, and build interpretable systems that directly address clinical needs.

Keywords:

Artificial Intelligence Learning Theory Big Data Machine Learning Computational Intelligence

How to cite this book

Aydin, U. & Zaim Gökbay, İ. (2026). Processing Pipelines, Feature Extraction, and Multimodal Fusion in the Computational Analysis of Psychiatric Disorders Using Auditory and Visual Data. In: Yılmaz, A. (ed.), Theoretical Foundations of Computational Intelligence: Artificial Intelligence, Learning Theory, and the Big Data Paradigm . Özgür Publications. DOI: https://doi.org/10.58830/ozgur.pub1351.c5539

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Published

June 30, 2026

DOI

https://doi.org/10.58830/ozgur.pub1351.c5539