Giter Club home page Giter Club logo

Comments (1)

chenwenqi228 avatar chenwenqi228 commented on August 18, 2024

我也是频繁报错,后来结合chatgpt反复修改,发现问题主要出在:
1)Rstudio版本和命令版本不兼容,需要检查,改更新的就更新;
2)paper1和paper2不是UTF-8格式,需要设定一下
3)过滤停止词库的命令有bug,需要补充:
corp <- tm_map(corp, content_transformer(function(x) iconv(enc2utf8(x), sub = "byte")))
corp <- tm_map(corp, function(x){removeWords(x,stopwords("en"))})

现在这个版本的可以运行:
library(tm)
library(wordcloud)
Paper1 <- paste(scan("R语言可视化之美/第3章_类别比较型图表/Paper1.txt", what = character(0), sep = "", encoding = "UTF-8"), collapse = " ")
Paper2 <- paste(scan("R语言可视化之美/第3章_类别比较型图表/Paper2.txt", what = character(0), sep = "", encoding = "UTF-8"), collapse = " ")
#补充encoding = "UTF-8",防止文件识别格式错误

tmpText<- data.frame(c(Paper1, Paper2),row.names=c("Text1","Text2"))
df_title <- data.frame(doc_id=row.names(tmpText),
text=tmpText$c.Paper1..Paper2.)
ds <- DataframeSource(df_title)
#创建一个数据框格式的数据源,首列是文档id(doc_id),第二列是文档内容
corp <- VCorpus(ds)
#加载文档集中的文本并生成语料库文件
corp<- tm_map(corp,removePunctuation) #清除语料库内的标点符号
corp <- tm_map(corp,PlainTextDocument) #转换为纯文本
corp <- tm_map(corp,removeNumbers) #清除数字符号

corp <- tm_map(corp, content_transformer(function(x) iconv(enc2utf8(x), sub = "byte")))
corp <- tm_map(corp, function(x){removeWords(x,stopwords("en"))}) #过滤停止词库
term.matrix <- TermDocumentMatrix(corp)
#利用TermDocumentMatrix()函数将处理后的语料库进行断字处理,生成词频权重矩阵

term.matrix <- as.matrix(term.matrix) #频率
colnames(term.matrix) <- c("Paper1","paper2")
df<-data.frame(term.matrix)
write.csv(df,'term_matrix.csv') #导出两篇文章的频率分析结果

#---------------------------------------导入数据------------------------------------------
df<-read.csv('term_matrix.csv',header=TRUE,row.names=1)

#----------------------------------------两篇文章数据的对比-------------------------------------------------------------
comparison.cloud(df, max.words=300, random.order=FALSE, rot.per=.15, c(4,0.4), title.size=1.4)
image
comparison.cloud(df,max.words=300,random.order=FALSE,colors=c("#00B2FF", "red"))
image
commonality.cloud(df,max.words=100,random.order=FALSE,color="#E7298A")
image

comparison cloud

comparison.cloud(df, random.order=FALSE,
colors = c("#00B2FF", "red", "#FF0099", "#6600CC"),
title.size=1.5, max.words=500)
image
#-------------------------------------单篇文章数据的展示-----------------------------------------------------------------
#Colors<-colorRampPalette(rev(brewer.pal(9,'RdBu')))(length(df$Paper1>10))
wordcloud(row.names(df) , df$Paper1 , min.freq=10,col=brewer.pal(8, "Dark2"), rot.per=0.3 )
image

from beautiful-visualization-with-r.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.