Relationship between the product rating and the topic proportions

be analyzing Amazon product review data. Please download the review dataset on the beauty product line and complete the following analysis. We will keep analyzing the most reviewed product.

(1) Please conduct topical modeling analysis for the product review to summarize the main themes of the reviews.
a. Please preprocess the data and format the data so that it can be analyzed by topic models.

b. Draw the word clouds for this product and discuss the findings.

c. Please run lda.collapsed.gibbs.sampler function to conduct topical modeling. This set of reviews would require more iterations to converge. Please specify the number of topics to be 4 and use the following sets of parameters to run.
set.seed(12345)

# select model parameters to be 4 topics, select parameters
K=4
N=10000
result = lda.collapsed.gibbs.sampler(
input$documents,
K, # The number of topics.
input$vocab,
N, # The number of iteration
alpha=1/K, # The Dirichlet hyper parameter for topic proportion
eta=0.1, # The Dirichlet hyper parameter for topic multinomial
compute.log.likelihood=TRUE)

d. Please generate the top 10 words for each topic.

e. Based on these top words, please come up with a theme to describe each topic.

f. Which topic is most discussed sub topic and which topic is the least discussed topic?

g. Check the relationship between the product rating and the topic proportions. Based on the outputs, identify the sub topic that tends to generating lower ratings.

This question has been answered.

Get Answer