
Latent Similarity Clustering of Video Games Based on Euclidean Distance and PCA
Chapter from the book:
Tahtalı,
Y.
&
Demir,
İ.
&
Bayyurt,
L.
&
Abacı,
S.
H.
(eds.)
2025.
Current Approaches in Applied Statistics II.
Synopsis
This paper presents a multi-criteria similarity analysis of video games using quantitative variables from an available dataset. The research includes the following variables: user rating, number of recommendations, average playing time (overall and in the last two weeks), and percentage of positive reviews. The research aims to develop a similarity model for games in a multidimensional space defined by these attributes and to identify patterns and groupings based on their quantitative profiles.
The data was standardized to ensure comparability across variables with different scales. Euclidean distance was used to measure similarity between games, as it is intuitively interpretable in real space: the distance between two games is calculated as the square root of the sum of squared differences across all dimensions. This metric enables accurate positioning of games within the attribute space and forms the basis for hierarchical clustering. Principal component analysis (PCA) was applied to reduce dimensionality and facilitate visual interpretation of the results.
Preliminary findings indicate the existence of several stable clusters, including games with high ratings and recommendations but relatively short playing time, as well as a group of games played extensively but rated lower by users. These combinations suggest distinct usage patterns and perceived value, which are not directly aligned with traditional categories such as genre or publisher.
The approach presented in this study can serve as a foundation for structuring large-scale game datasets and as a starting point for developing classification and recommendation algorithms based on objective rather than subjective product characteristics.