Comments (1)
我也是频繁报错,后来结合chatgpt反复修改,发现问题主要出在:
1)Rstudio版本和命令版本不兼容,需要检查,改更新的就更新;
2)paper1和paper2不是UTF-8格式,需要设定一下
3)过滤停止词库的命令有bug,需要补充:
corp <- tm_map(corp, content_transformer(function(x) iconv(enc2utf8(x), sub = "byte")))
corp <- tm_map(corp, function(x){removeWords(x,stopwords("en"))})
现在这个版本的可以运行:
library(tm)
library(wordcloud)
Paper1 <- paste(scan("R语言可视化之美/第3章_类别比较型图表/Paper1.txt", what = character(0), sep = "", encoding = "UTF-8"), collapse = " ")
Paper2 <- paste(scan("R语言可视化之美/第3章_类别比较型图表/Paper2.txt", what = character(0), sep = "", encoding = "UTF-8"), collapse = " ")
#补充encoding = "UTF-8",防止文件识别格式错误
tmpText<- data.frame(c(Paper1, Paper2),row.names=c("Text1","Text2"))
df_title <- data.frame(doc_id=row.names(tmpText),
text=tmpText$c.Paper1..Paper2.)
ds <- DataframeSource(df_title)
#创建一个数据框格式的数据源,首列是文档id(doc_id),第二列是文档内容
corp <- VCorpus(ds)
#加载文档集中的文本并生成语料库文件
corp<- tm_map(corp,removePunctuation) #清除语料库内的标点符号
corp <- tm_map(corp,PlainTextDocument) #转换为纯文本
corp <- tm_map(corp,removeNumbers) #清除数字符号
corp <- tm_map(corp, content_transformer(function(x) iconv(enc2utf8(x), sub = "byte")))
corp <- tm_map(corp, function(x){removeWords(x,stopwords("en"))}) #过滤停止词库
term.matrix <- TermDocumentMatrix(corp)
#利用TermDocumentMatrix()函数将处理后的语料库进行断字处理,生成词频权重矩阵
term.matrix <- as.matrix(term.matrix) #频率
colnames(term.matrix) <- c("Paper1","paper2")
df<-data.frame(term.matrix)
write.csv(df,'term_matrix.csv') #导出两篇文章的频率分析结果
#---------------------------------------导入数据------------------------------------------
df<-read.csv('term_matrix.csv',header=TRUE,row.names=1)
#----------------------------------------两篇文章数据的对比-------------------------------------------------------------
comparison.cloud(df, max.words=300, random.order=FALSE, rot.per=.15, c(4,0.4), title.size=1.4)
comparison.cloud(df,max.words=300,random.order=FALSE,colors=c("#00B2FF", "red"))
commonality.cloud(df,max.words=100,random.order=FALSE,color="#E7298A")
comparison cloud
comparison.cloud(df, random.order=FALSE,
colors = c("#00B2FF", "red", "#FF0099", "#6600CC"),
title.size=1.5, max.words=500)
#-------------------------------------单篇文章数据的展示-----------------------------------------------------------------
#Colors<-colorRampPalette(rev(brewer.pal(9,'RdBu')))(length(df$Paper1>10))
wordcloud(row.names(df) , df$Paper1 , min.freq=10,col=brewer.pal(8, "Dark2"), rot.per=0.3 )
from beautiful-visualization-with-r.
Related Issues (5)
- 关于3.2.1柱形图书中和代码不一致的情况 HOT 1
- 部分.shp文件无法打开
- 关于图5-2-3代码中的报错 HOT 2
- 3.6玫瑰图绘制的R代码会报错,请问怎么解决呢 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from beautiful-visualization-with-r.