Printer Friendly Version | Back

Topic Identification and Classification of GooglePlay Store Reviews

Year: 2022       Vol.: 71       No.: 2      

Authors: Daniel David M. Pamplona


Digital distribution platforms, such as Google®3 Play Store, contain an enormous quantity of information related to app data and user reviews. A particularly challenging task is to classify a large unstructured dataset into smaller clusters or topics. With this, data from 19,886 user reviews was extracted from Google Play Store. The main task is to determine app characteristics, though common themes, that are commonly mentioned in positive and negative reviews. Text data was preprocessed and then common topics were identified using LDA for positive reviews and negative reviews. The accuracy of topics was assessed using perplexity-based approach and human interpretation. To further validate the topic model, the topic assignment was used as the outcome variable in Naive Bayes model with reviews as input. Empirical results show that the extracted topics can be predicted well using text reviews. Finally, the distribution of topics was calculated according to different app categories.

Keywords: Topic Modeling, Latent Dirichlet Allocation, Naive Bayes Classifier, Perplexity

Download this article:

Back to top