review_LDA
代码说明:
说明: 用LDA对英文语料库提取n个主题,并输出每条文章属于哪个主题: 1)对英文评论数据进行预处理:分词、词性标注、去掉停用词和垃圾字符串 2)仅保留名词、形容词和动词 3) 将每条评论处理成TF-IDF向量表示,去掉频率为后2%的词语言 4)拟合LDA模型 5)提取n个主题,输出每个主题下包含哪些关键词(按重要程度排序) 6)对每条评论,给出其属于哪个主题(以及属于每个主题的概率) 7)统计每个主题下有多少条评论 依赖: python3, NLTK, enchant, sklearn, numpy, pickle等,详细见代码 数据集:80,000+英文评论 输出结果: topic #1: view night river light building nice walk day beautiful skyline visit evening amazing spectacular stroll time floor architecture people amaze modern top enjoy cruise look photo fantastic skyscraper awesome picture topic #2: garden bike nice beautiful visit peaceful ride chinese walk ancient temple town time rent cycle gate china history bicycle building middle hour oasis quiet busy look enjoy hire lot architecture topic #3: ...(An LDA topic model for review topic classification. Able to extract n topics from 80,000 English reviews or articles. Implmented by Python3, with packages such as NLTK, enchant, sklearn, numpy, pickleand so on.)
文件列表:
review_LDA\.idea\lda.iml, 398 , 2019-06-03
review_LDA\.idea\modules.xml, 258 , 2019-06-03
review_LDA\.idea\workspace.xml, 12122 , 2019-06-03
review_LDA\final_lda.py, 11183 , 2019-06-03
review_LDA\lda_topics.txt, 1037 , 2019-06-03
review_LDA\results.txt, 13248837 , 2019-06-03
review_LDA\rubbish_words.txt, 22337 , 2019-06-03
review_LDA\used_words.txt, 155661 , 2019-06-03
review_LDA\~$结果截图.docx, 162 , 2019-06-03
review_LDA\结果截图.docx, 24366 , 2019-06-03
review_LDA\.idea, 0 , 2019-06-03
review_LDA, 0 , 2019-06-03
下载说明:请别用迅雷下载,失败请重下,重下不扣分!