Current Approaches in Applied Statistics II

Yalçın Tahtalı; İbrahim Demir; Lütfi Bayyurt; Samet Hasan Abacı; Berna Özbaşaran; Gözde Ulutagay; Bilal Saraç; Çağlar Karamaşa; Çağdaş Yıldız; Adem Tüzemen; Diana Bratić; Eyüp Sayin; Murat Topuz; Muhammet Fatih Aslan; Akif Durdu; Gizem Şara Onay; Mehmet Çakmakçı; Hasan Arda Solak; Şeyma Koltuklu; Sueda Turgut; Mahmut Bağcı; Melisa Dikici; Gökçe Sabriye Hörük; Deniz Efendioğlu; Mervenur Ünver; Şahika Gökmen; Murat Yildirim; Ziya Çakır; Cafer Yildirim; Özlem Türkşen; Necdet Ünüvar; İlker Astarcı; Murat Açıkgöz; Defne Akay; Özlem Türkşen

doi:10.58830/ozgur.pub865

Latent Similarity Clustering of Video Games Based on Euclidean Distance and PCA
Chapter from the book: Tahtalı, Y. & Demir, İ. & Bayyurt, L. & Abacı, S. H. (eds.) 2025. Current Approaches in Applied Statistics II.

Return to Book

Diana Bratić

University of Zagreb

Downloads

Read Chapter Download

Synopsis

This paper presents a multi-criteria similarity analysis of video games using quantitative variables from an available dataset. The research includes the following variables: user rating, number of recommendations, average playing time (overall and in the last two weeks), and percentage of positive reviews. The research aims to develop a similarity model for games in a multidimensional space defined by these attributes and to identify patterns and groupings based on their quantitative profiles.
The data was standardized to ensure comparability across variables with different scales. Euclidean distance was used to measure similarity between games, as it is intuitively interpretable in real space: the distance between two games is calculated as the square root of the sum of squared differences across all dimensions. This metric enables accurate positioning of games within the attribute space and forms the basis for hierarchical clustering. Principal component analysis (PCA) was applied to reduce dimensionality and facilitate visual interpretation of the results.

Preliminary findings indicate the existence of several stable clusters, including games with high ratings and recommendations but relatively short playing time, as well as a group of games played extensively but rated lower by users. These combinations suggest distinct usage patterns and perceived value, which are not directly aligned with traditional categories such as genre or publisher.
The approach presented in this study can serve as a foundation for structuring large-scale game datasets and as a starting point for developing classification and recommendation algorithms based on objective rather than subjective product characteristics.

Keywords:

Statistics Applied Statistics Statistical Methods Data Analysis Machine Learning and Statistics

How to cite this book

Bratić, D. (2025). Latent Similarity Clustering of Video Games Based on Euclidean Distance and PCA. In: Tahtalı, Y. & Demir, İ. & Bayyurt, L. & Abacı, S. H. (eds.), Current Approaches in Applied Statistics II. Özgür Publications. DOI: https://doi.org/10.58830/ozgur.pub865.c3503

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Published

October 11, 2025

DOI

https://doi.org/10.58830/ozgur.pub865.c3503

Latent Similarity Clustering of Video Games Based on Euclidean Distance and PCA Chapter from the book: Tahtalı, Y. & Demir, İ. & Bayyurt, L. & Abacı, S. H. (eds.) 2025. Current Approaches in Applied Statistics II.