lepennec / ggwordcloud Goto Github PK

A word cloud geom for ggplot2

Home Page: https://lepennec.github.io/ggwordcloud/

License: GNU General Public License v3.0

R 78.81% C++ 21.19%

ggwordcloud's Introduction

ggwordcloud

ggwordcloud provides a word cloud text geom for ggplot2. The placement algorithm implemented in C++ is an hybrid between the one of wordcloud and the one of wordcloud2.js. The cloud can grow according to a shape and stay within a mask. The size aesthetic is used either to control the font size or the printed area of the words. ggwordcloud also supports arbitrary text rotation. The faceting scheme of ggplot2 can also be used. Two functions meant to be the equivalent of wordcloud and wordcloud2 are proposed. Last but not least you can use gridtext markdown/html syntax in the labels.

Installation

You can install the released version of ggwordcloud from CRAN with:

install.packages("ggwordcloud")

or the development version from the github repository

devtools::install_github("lepennec/ggwordcloud")

Please check the latest development version before submitting an issue.

Some word clouds

Because sometimes, pictures are better than a thousand words…

library(ggwordcloud)
#> Loading required package: ggplot2
data("love_words_small")
set.seed(42)
ggplot(love_words_small, aes(label = word, size = speakers)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 40) +
  theme_minimal()

data("love_words")
set.seed(42)
ggplot(
  love_words,
  aes(
    label = word, size = speakers,
    color = speakers
  )
) +
  geom_text_wordcloud_area(aes(angle = 45 * sample(-2:2, nrow(love_words),
    replace = TRUE,
    prob = c(1, 1, 4, 1, 1)
  )),
  mask = png::readPNG(system.file("extdata/hearth.png",
    package = "ggwordcloud", mustWork = TRUE
  )),
  rm_outside = TRUE
  ) +
  scale_size_area(max_size = 40) +
  theme_minimal() +
  scale_color_gradient(low = "darkred", high = "red")
#> Some words could not fit on page. They have been removed.

library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
library(tidyr, quietly = TRUE)
set.seed(42)
ggplot(
  love_words_small %>%
    gather(key = "type", value = "speakers", -lang, -word) %>%
    arrange(desc(speakers)),
  aes(label = word, size = speakers)
) +
  geom_text_wordcloud_area() +
  scale_size_area(max_size = 40) +
  theme_minimal() +
  facet_wrap(~type)

set.seed(42)
ggplot(love_words_small, aes(label = word, size = speakers,
                             label_content = sprintf("%s<span style='font-size:7.5pt'>(%g)</span>", word, speakers))) +
  geom_text_wordcloud_area() +
  scale_size_area(max_size = 40) +
  theme_minimal()

More examples are available in the vignette.

ggwordcloud's People

Contributors

Stargazers

Watchers

Forkers

makarevichy bravegag hoanganhngo610 westcoastjoe gaospecial xuri11 pawigor noriakis

ggwordcloud's Issues

NA displayed as legend key

Hi Erwan,

when answering this question on SO I stumbled over an issue which looks like a bug to me introduced with the new label_content aesthetic added in ggwordcloud 0.6.0.

When adding a legend to a word cloud the value of label_content (which defaults to NA) is displayed as the legend key glyph instead of the default "a" one would expect from ggplot2::draw_key_text. Here is a reprex of the issue:

library(ggwordcloud)
#> Loading required package: ggplot2

set.seed(42)
data("love_words_latin_small")

p <- ggplot(love_words_latin_small, aes(label = word, size = speakers)) +
  geom_text_wordcloud(show.legend = TRUE) +
  scale_size_area(max_size = 20) +
  theme_minimal()

p

By default ggplot2::draw_key_text sets a letter "a" as the default key glyph by checking the condition:

 if(is.null(data$label)) data$label <- "a"

However, because of the new label_content aes, the data passed from geom_text_word_cloud to draw_key_text now contains a column named label_content. Hence, because $ does partial matching the check is.null(data$label) returns FALSE even if there is actually no label column (whereas is.null(data[["label"]]) would return TRUE) and more importantly for the same reason the value from the label_content column (which by default is NA) is used as the label for the legend key.

A workaround would be to set a value for label or label_content via the override.aes argument of guide_legend:

p +
  guides(size = guide_legend(override.aes = list(label_content = "a")))

Cannot display Chinese characters in Mac OS

I use this package with the "love" example and find that the results are both OK in Win10 and Ubuntu 16.04, however when I run the code in Mac OS, the Chinese characters cannot display correctly, they only show polygons. I also use wordcloud package and find the Chinese characters are OK, too. Even I set the text font family in theme function, it doesn't work. Can you tell me how to deal with it? Thank you!

A issue of too much space between words

Dear creator

I have a problem while using geom_text_worcloud. The word spacing is too large. That happens even when I use the exactly the same code as yours. I find two questions online regarding to the same issue, but didn't find a solution. Can you let me know why? Thank you!

This is desired, but using wordcloud() command.

This is undesired with too much spacing while using geom_text_wordcloud with ggplot():

Fit to plot area

Hi! Thanks for a very nice package. I wonder if there is a way to get the figure to fit to the plot area without adjusting max_size. There must be many applications where the number of terms and their relative sizes/lengths isn't known beforehand (in Shiny apps and so on).

Unable to make the colourbar appear

Hello.
First off, thanks for the package, very easy to use and to customize.

There's just one thing that I can't do: make the color legend appear... I've tried many approaches, but it just doesn't want to appear. Is there a way to force it?

Below a small example, using the love words dataset:

library(ggwordcloud)
#> Loading required package: ggplot2
data("love_words_small")
data("love_words")

set.seed(42)
ggplot(love_words_small, aes(label = word, color = speakers)) +
geom_text_wordcloud() +
theme_minimal() +
scale_color_viridis_c(guide = "colourbar")

grouping words by a var in one single cloud?

Hi,

Is there a way to position words grouped by a variable, but in one single cloud?
Maybe a could with multiple points to gravity?

Mask not performing

Hello,

I love this package and am eternally thankful for your reimplementation in ggplot.
I have managed to reproduce almost every wordcloud your provide in the vignette except the mask.

I have struggled with all aspects and cannot seem to get the mask function to work at all. There is no error at all on the console, but the masking is not applied to the resulting cloud. I have also replaced the default hearth.png mask with my own files in /extdata/ but same (lack of) issue.

Could this be something specific to my R install? I did not see this coming up as an outstanding issue, but I noticed a number of folks asking about this on the wordcloud2 forums.

Thanks again for this most excellent package

Mask plots not reproducible

Hello, I was playing around with your package and came across a problem when trying to use a mask. At first I thought it was something with my image but I tried re-producing the example in the vignette using the languages dataset and the heart shape. I basically copy-pasted the code and it just results in blank space. I tried waiting but even after about 20 minutes it was just blank square. Everything else in the vignette works - shapes (start, triangle, etc.), faceting, and coloring. Do you happen to have an idea what could be causing the mask issues? I saw another mask issue that was closed last year but could not find any details or explanations in the thread.
Provided at the end is the output of my SessionInfo().
Thanks for the amazing package!

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] png_0.1-7 stringr_1.4.0 lubridate_1.7.4 rio_0.5.16 dplyr_0.8.3
[6] wordcloud2_0.2.2 hwordcloud_0.1.0 ggwordcloud_0.5.0.9000 ggplot2_3.2.1

loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 remotes_2.1.1 purrr_0.3.3 haven_2.2.0 vctrs_0.2.0 colorspace_1.4-1
[7] testthat_2.3.2 usethis_1.6.1 htmltools_0.4.0 yaml_2.2.0 rlang_0.4.6 pkgbuild_1.0.6
[13] pillar_1.4.2 foreign_0.8-72 glue_1.4.1 withr_2.1.2 readxl_1.3.1 sessioninfo_1.1.1
[19] cellranger_1.1.0 munsell_0.5.0 gtable_0.3.0 zip_2.0.4 devtools_2.3.0 htmlwidgets_1.5.1
[25] memoise_1.1.0 forcats_0.4.0 labeling_0.3 callr_3.4.3 ps_1.3.0 curl_4.2
[31] fansi_0.4.0 Rcpp_1.0.3 scales_1.0.0 backports_1.1.5 desc_1.2.0 pkgload_1.0.2
[37] jsonlite_1.6.1 fs_1.3.1 hms_0.5.2 digest_0.6.25 stringi_1.4.3 openxlsx_4.1.3
[43] processx_3.4.1 grid_3.6.1 rprojroot_1.3-2 cli_2.0.2 tools_3.6.1 magrittr_1.5
[49] lazyeval_0.2.2 tibble_2.1.3 zeallot_0.1.0 crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.0
[55] data.table_1.12.6 prettyunits_1.0.2 assertthat_0.2.1 rstudioapi_0.11 R6_2.4.0 compiler_3.6.1
[61] git2r_0.26.1

Cannot put the biggest word at the center of the word cloud.

I want to plot a word cloud in which the biggest word (word sizes are scaled by the column "freq" which represents odds-ratio) is displayed at the center of the picture. Sometimes geom_text_wordcloud() worked. However, I failed with the data attached to this issue. The result looks like this

How can I make the biggest word such as "PK--M" at the center?
df.txt

`facet_wrap(scales='free')` is ignored

Factors that are used for facetting aren't always balanced.
A facet for a rare category will contain very small wordcloud words. I think it would be nice if would rescale on a relative basis.

rcpp boosts efficiency

Big words appearing on the outside?

In your example plots, it appears like the larger words all appear towards the centre of the image and the smaller words are on the periphery. When I try to use this package, I'm finding the opposite: that the larger words are around the periphery and the smaller words are at the centre.

is this something I can change? I'd like to see the larger words near the centre. I've tried relevelling the label factor a few different ways, but it doesn't seem to make a difference.

ggwordcloud fontfamily and legends

Hello,

I had two doubts to customize my wordcloud. The first was how to change the ggwordcloud fonts. The second was whether in data sets with groups, whether it would be possible to suppress the word length caption. I wanted something like:

I wrote to @lepennec , who asked me to post my problem here. He gave me one advice for changing the font that makes me realize where to use the family parameter. Later, I found the solution for the second problem. So, I decided to post my code here to help others and to check with the developers if it's an appropriate solution.

The example uses the same love words data set, grouping the languages in families. Here is the code:

library(ggwordcloud)
library(dplyr)
library('ISOcodes') # To find the font families we use the ISOcodes package


# Changing to the same ID of 'love words'
## Maybe it's not correct, but it's just an example
ISO_3 <- ISO_639_3 %>% select(Id, Family)
ISO_2 <- ISO_639_2 %>% select(Alpha_3_B, Alpha_2)

# Love words - merge to ISO
data("love_words")
dataWord <- merge(love_words_small, ISO_2, by.x="lang", by.y = "Alpha_2")
dataWord <- merge(dataWord, ISO_3, by.x="Alpha_3_B", by.y = "Id")

ggplot(dataWord, aes(label = word, x=Family, size = speakers, colour=Family)) +
  geom_text_wordcloud_area(show.legend = TRUE, family="Purisa") +
  scale_size_area(max_size = 24) +
  scale_x_discrete(breaks = NULL) +
  theme_minimal()+  guides(size = FALSE)

My solution was to provide both 'show.legend = TRUE' and 'family="Purisa"' in geom_text_wordcloud_area. To suppress the size legend, I used 'guides(size = FALSE)' from ggplot2.

Sorry for the naivety, but I was wrongly trying to change the parameters directly in the ggplot functions.

Thank you.

wordcloud needs a legend

Hi, I used geom_text_wordcloud_area() and it works like a charm! One thing I think it needs to have is a legend. This is especially the case when I add color = something so that I am able to distinguish which text belongs to which category. Maybe this is more like user suggestion than an issue I can provide after using it. Thanks!

rationalize for provision of new word data sets

World clouds are a nice piece of visualization especially in slides. These usually terminate with a slide like "Thank you" or "Questions?" that could be rendered with a word cloud.
I propose to rationalize the existing infrastructure of the 'Love' word data set to make it easy to add new ones.
This is possible if there are 2 tables such as:

language stats table (ISO 639-3 code, L1, L2), and
the word dictionary (ISO 639-3 code, word).

How can I tell user about fitting error in shiny app ?

Hello! Thank you for your great package.
I am having one problem.
I am currently working on a shiny application for wordcloud that allows the user to adjust the size of the plot as shown below.

runApp(shinyApp(
  ui = fluidPage(
    sliderInput("size","size",min=100,max=1000,step=100,value=300),
    plotOutput("cloud")
  ),
  server = function(input,output, session) {
    
    output$cloud = renderPlot(width=reactive(input$size),height=reactive(input$size),{
      data("love_words")
      ggplot(love_words,aes(label=word)) +
        geom_text_wordcloud(
          rm_outside = T
        )
    })
  }
))

When the plot size is small, the R studio console displays the message "Some words could not fit on page. They have been removed.",
is there any way to make this visible to Shiny app users as well?
I have tried combining renderUI and sink() myself, etc., but it didn't work.
In the wordcloud package, I can set rm_outside to FALSE to center the words that could not be fitted,
but I would like to explicitly notify the user about the fitting failure.

I apologize for the lack of clarity as I am not a native English speaker, but I would appreciate your help.
Thank you.

Put this message on Shiny app for users.

Error when using ggwordcloud inside of Shiny application

Here is a minimal shiny app, using one of the test examples in a renderPlot expression.

library(shiny)
library(ggwordcloud)

data("love_words_small")

set.seed(42)

ui <- fluidPage(

    titlePanel("GGWordCloud Test"),

    sidebarLayout(
        sidebarPanel(
        ),

        mainPanel(
           plotOutput("wc")
        )
    )
)

server <- function(input, output) {

    output$wc <- renderPlot({
        ggplot(love_words_small, aes(label = word, size = speakers)) +
            geom_text_wordcloud() +
            scale_size_area(max_size = 24) +
            theme_minimal()
    })
}

shinyApp(ui = ui, server = server)

This produces:

Warning: Error in [: subscript out of bounds

Any ideas? This is using R 3.6, and the latest CRAN versions of both Shiny (1.3.2) and ggwordcloud (0.4.0)

Too many warnings clutter example in Reference

The examples in geom_text_wordcloud generate a huge amount of warnings as seen online
https://lepennec.github.io/ggwordcloud/reference/geom_text_wordcloud.html

Chinese Character cannot be shown

The Chinese Character cannot be shown in the example, I have tried the following:

ggplot(love_words_small, aes(label = word)) +
geom_text_wordcloud() +
theme_minimal(base_family = "STKaiti")

Still not work