Giter Club home page Giter Club logo

wordcloud.jl's Introduction

juliadoc
Dev Binder CI CI-nightly codecov DOI
Word cloud (tag cloud or wordle) is a novelty visual representation of text data. The importance of each word is shown with its font size, position, or color. WordCloud.jl is the perfect tool for generating word clouds, offering several advantages:

  • Flexible - You have control over every aspect of generating a word cloud. You can customize the shape, color, angle, position, distribution, density, and spacing to align with your preferences and artistic style.
  • Faithful - This visualization solution guarantees precise results. Each word appears only once, and its font size is determined solely by the provided weight. Words are never repeated or shrunk artificially to fill empty spaces.
  • Efficient - It utilizes intelligent strategies and efficient nesting algorithms, implemented entirely in Julia (see Stuffing.jl). As a result, it can easily generate high-resolution word clouds.

🌐 Try the online generator 🌐

✨ Go to the gallery ✨


Installation

import Pkg; Pkg.add("WordCloud")

Basic Usage

using WordCloud
using Random
words = [randstring(rand(1:8)) for i in 1:300]
weights = randexp(length(words))
wc = wordcloud(words, weights)
generate!(wc)
paint(wc, "random.svg")

Alternatively, it could be

wc = wordcloud("It's easy to generate word clouds") |> generate! # from a string
wc = wordcloud(open(pkgdir(WordCloud)*"/res/alice.txt")) |> generate! # from a file
wc = wordcloud(["中文", "需要", "提前", "分词"], fonts="") |> generate! # from a list
wc = wordcloud(["the"=>1.0, "to"=>0.51, "and"=>0.50,
                  "of"=>0.47, "a"=>0.44, "in"=>0.33]) |> generate! # from pairs or a dict

Advanced Usage

using WordCloud
textfile = pkgdir(WordCloud)*"/res/alice.txt"
maskfile = pkgdir(WordCloud)*"/res/alice_mask.png"
wc = wordcloud(
    open(textfile),
    stopwords_extra = ["said"],
    maxnum = 500, 
    mask = maskfile,
    maskcolor = "#faeef8",
    outline = 4,
    linecolor = "purple",
    colors = :Set1_5,
    angles = (0, 90),
    fonts = "Tahoma",
    density = 0.55,
    spacing = 3,) |> generate!
paint(wc, "alice.png", ratio=0.5)

try runexample(:alice) or showexample(:alice)
alice

More Examples

Gathering style

gathering
try runexample(:gathering) or showexample(:gathering)

Recolor

recolor
try runexample(:recolor) or showexample(:recolor)

Semantic

semantic
try runexample(:semantic) or showexample(:semantic)
The variable WordCloud.examples holds all available examples.

About Implementation

WordCloud.jl stands out from other tools due to its unique approach based on image local gradient optimization. Unlike conventional algorithms, WordCloud.jl utilizes a non-greedy algorithm that enables words to be repositioned even after their initial placement. This dynamic adjustment process provides unparalleled freedom in assigning words to any desired position, irrespective of potential overlaps. Furthermore, it eliminates the necessity of scaling words during the adjustment phase. This ingenious design choice maximizes the generator's flexibility, opening up boundless possibilities for customization. For a more detailed understanding of the algorithm, you can refer to the Stuffing.jl - Algorithm Description.

  • 权重计算和单词位置初始化
  • 基于四叉树(层次包围盒)的碰撞检测
  • 根据局部灰度梯度平移单词(训练迭代)
  • 引入动量加速训练
  • 分代检测优化性能(for pairwise trainer)
  • 区域四叉树批量碰撞检测
  • LRU优化性能(for element-wise trainer)
  • 控制字体大小和填充密度的策略
  • 使用重新放置策略跳出局部最优
  • 使用缩放策略降低训练难度
  • 训练失败检测和提前中断
  • 主题配色等
  • 并行计算

Other word cloud generators

wordcloud.jl's People

Contributors

guo-yong-zhi avatar jakewilliami avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

wordcloud.jl's Issues

how to NOT mutate word array after wordcloud(word, weights)?

Is there a flag in wordcloud() to NOT rearrange the word array after its usage?

I have words = ["John", "Peter", "James"] with weights = [1,6,3]

After

wordcloud(words, weights)

words = ["Peter", "James", "John"]

is there a way to leave as it was before running wordcloud?

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

[Question] How to suppress logging/progress printing?

Thank you for the amazing package.

I was wondering if there is a way to suppress all the printed messages when producing a word cloud?

I have disabled logging but there still a lot of commands sent to stdin (eg, here), which show up anyway.

My ideal outcome would be that I get only the word cloud output and none of the following messages:

total words: 63. Unique words: 43. After filtration: 42.
The top 42 words are kept.
angles = -60:60
mask size: 300×400, content area: 346² (53²/word)
set fontsize ∈ [8.0, 300]
set density = 0.5
The word "insightful"(56.063448564744085) is too big. Set maxfontsize = 50.883237416299686.
The word "collaborative"(45.12210419223353) is too big. Set maxfontsize = 39.79636673099108.
⋯scale=59.76143046671968, density=0.5652075000000001 ↑
⋯scale=56.208504238858396, density=0.5291157500000001 ↑
⋯scale=53.16956950394639, density=0.4920475 ↓
fontsize ∈ [11.078697398031899, 39.79636673099108]
▸1. scale = 53.83598778968367
31 epochs
Total words: 14. Unique words: 10. After filtration: 9.
The top 9 words are kept.
angles = 0
mask size: 300×400, content area: 346² (115²/word)
set fontsize ∈ [8.0, 300]
set density = 0.5
⋯scale=129.09944487358058, density=0.640553 ↑
⋯scale=114.05959947564652, density=0.5048582500000001 ↑
fontsize ∈ [29.435917304955776, 115.67395628431842]
center the word "hazy"
gathering style: rt = 1, ellipse
▸1. scale = 113.48417855265
25 epochs

s-ending words are not portable to other language

The hard coded s_ending_words is well suited to English, however, when I run WordCloud.jl on a French text, I encounter several s_ending words which get stripped by lemmatize (such as dans->dan, nou->nous, etc.), which makes the wordcloud less meaningful.

A workaround would be to be able to specify other s_ending_words as an argument to lemmatize function (that may be used in the process option of processwords.

What could I do otherwise ?

Display seems to be full of bars of numbers

I tried the example:

using WordCloud
using Random
words = [randstring(rand(1:8)) for i in 1:300]
weights = randexp(length(words))
wc = wordcloud(words, weights)
generate!(wc)
paint(wc, "D:/random.svg")

and this is what it looks like:
random

environment:

julia> versioninfo()
Julia Version 1.8.4
Commit 00177ebc4f (2022-12-23 21:32 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

(@v1.8) pkg> st WordCloud
Status `C:\Users\anonymous\.julia\environments\v1.8\Project.toml`
  [6385f0a0] WordCloud v0.10.7

How to set the word colors to be based on weight?

Thanks for this package. I am testing it at the moment and was not able to find out how to set word colors such that the color depends on the weight (based on occurrence). This could perhaps create a nice gradient of color rather than the random look I seem to get.

I've reproduced the code I'm using at the moment here:

using WordCloud

stopwords = WordCloud.stopwords_en
textfile = "/home/kevin/Desktop/temp.txt"
mydata = lowercase(read(textfile, String)) 

wc = wordcloud(
        processtext(mydata, stopwords=WordCloud.stopwords_en), 
        angles=(0, 90), 
        density=0.7,     
        mask=ellipse,   
        outline = 0,
        colors = :seaborn_dark,#colors = :Set1_5,
        fonts = "Tahoma",
        backgroundcolor="white",
        state=initwords!)
    placewords!(wc, style=:gathering, level=6)#, centeredword="modelling")
    pin(wc, "modelling") do # keep "modelling" in the center        
        generate!(wc, reposition=0.5) # don't teleport largest 30% words
        #generate!(wc)
    end

paint(wc, "/home/kevin/Desktop/plot.svg",background=true)

Currently I get something like:
image

But I would prefer if it would look more like this:
image

Luxor incompatibility?

The package looks great, but when running the Basic usage example, I'm getting Luxor.jl related errors:

julia> using WordCloud

julia> using Random

julia> words = [randstring(rand(1:8)) for i in 1:300]
300-element Vector{String}:
 "p"
 "hllURK"
 "na8bWIoc"
 "1P"
 "KUqjUB5Z"
 "zPjL"
 "TWZWxzru"
 "mAKVdNJ"
 "Zi"
 "CAIJfm"
 
 "U470Bg"
 "EOEuSvC"
 "xn6DM"
 "GHUntrN"
 "Smlb"
 "L9JCRE4a"
 "DhEin0QK"
 "FyVpG"
 "c8Irnq"

julia> weights = randexp(length(words))
300-element Vector{Float64}:
 0.5256522217671034
 0.9403936612800416
 0.017777891248460475
 1.1106986836398218
 0.15621395690673698
 0.5832926340742414
 1.4217763203578326
 0.06599237517817222
 0.11019957939738709
 2.235672409116253
 
 0.613767753907875
 0.5273219607104737
 0.7117686410768388
 0.5831573224635551
 0.3647362587696542
 0.3574125301761831
 0.598172058781939
 0.3009548664671495
 1.6857832106298407

julia> wc = wordcloud(words, weights)
color scheme: :Dark2_3, random size: 3, shuffled
angles = [0, 45, -45]
shape(squircle, 1078, 765, rt=2.0, color=1.0, padding=64)
ERROR: MethodError: no method matching squircle(::Luxor.Point, ::Float64, ::Float64, ::Symbol; action::Symbol, rt::Float64)

Closest candidates are:
  squircle(::Luxor.Point, ::Any, ::Any, ::Any; rt, vertices, stepby, reversepath) got unsupported keyword argument "action"
   @ Luxor <hidden>/.julia/packages/Luxor/e7R1R/src/curves.jl:120
  squircle(::Luxor.Point, ::Any, ::Any) got unsupported keyword arguments "action", "rt"
   @ Luxor <hidden>/.julia/packages/Luxor/e7R1R/src/curves.jl:120

[ ... ]

My versions:

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 256 × AMD EPYC 7742 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 96 on 256 virtual cores
Environment:
  JULIA_WORKER_TIMEOUT = 3600
  JULIA_LOAD_PATH = <hidden>/.julia:
  JULIA_DEPOT_PATH = <hidden>/.julia
  LD_GOLD = <hidden>/miniconda3/bin/x86_64-conda-linux-gnu-ld.gold
  JULIA_NUM_THREADS = 96

(@v1.9) pkg> st WordCloud
Status `<hidden>/.julia/environments/v1.9/Project.toml`
  [6385f0a0] WordCloud v0.11.0

Any idea what could be the issue?

Issue with short wordlists?

Trying to run

words = ["multiple", "dispatch", "generic", "programming", "parallel", "computing", "high", "performance", "linear", "algebra"]
weights = rand(length(words)) .^ 2 .* 100 .+ 30
wc = wordcloud(words, weights)
generate!(wc)

leads to

(scheme, length(colors)) = (:Pastel2_5, 5)
angles = (0, 90, 45)
maskcolor = 0.0
set minfontsize to 8.0
groundoccupied = 320185
length(words) = 10
(sc, tg, tg / ground_size) = (223.69490766953797, 133760.0, 0.4177584833767978)
(sc, tg, tg / ground_size) = (224.71148861392473, 206258.0, 0.6441838312225745)
set fillingrate to 0.65, with scale=224.71148861392473
#1. scale = 224.71148861392473
(nepoch, patient) = (1000, 10)
MethodError: no method matching getindex(::Set{Int64}, ::Int64)

Stacktrace:
 [1] (::Combinatorics.var"#9#11"{Set{Int64}})(::Int64) at ./none:0
 [2] iterate at ./generator.jl:47 [inlined]
 [3] collect at ./array.jl:686 [inlined]
 [4] reorder at /Users/crstnbr/.julia/packages/Combinatorics/Udg6X/src/combinations.jl:48 [inlined]
 [5] iterate at ./generator.jl:47 [inlined]
 [6] collect(::Base.Generator{Combinatorics.Combinations,Combinatorics.var"#reorder#10"{Set{Int64}}}) at ./array.jl:686
 [7] |> at ./operators.jl:834 [inlined]
 [8] listcollision_native(::Array{Any,1}, ::WordCloud.QTree.ShiftedQtree{Array{WordCloud.QTree.PaddedMat{Array{UInt8,2}},1}}, ::Set{Int64}; collist::Array{Pair{Tuple{Int64,Int64},Tuple{Int64,Int64,Int64}},1}, queue::Array{Tuple{Int64,Int64,Int64},1}, at::Tuple{Int64,Int64,Int64}) at /Users/crstnbr/.julia/packages/WordCloud/JZsfo/src/qtreetools.jl:124
 [9] #listcollision#27 at /Users/crstnbr/.julia/packages/WordCloud/JZsfo/src/qtreetools.jl:367 [inlined]
 [10] trainepoch_EM2!(::Array{Any,1}, ::WordCloud.QTree.ShiftedQtree{Array{WordCloud.QTree.PaddedMat{Array{UInt8,2}},1}}; memory::WordCloud.LRU{Int64,WordCloud.IntMap{Int64}}, optimiser::Function, queue::Array{Tuple{Int64,Int64,Int64},1}, collpool::Array{Pair{Tuple{Int64,Int64},Tuple{Int64,Int64,Int64}},1}) at /Users/crstnbr/.julia/packages/WordCloud/JZsfo/src/train.jl:202
 [11] train!(::Array{Any,1}, ::WordCloud.QTree.ShiftedQtree{Array{WordCloud.QTree.PaddedMat{Array{UInt8,2}},1}}, ::Int64; trainer::typeof(WordCloud.trainepoch_EM2!), patient::Int64, callbackstep::Int64, callbackfun::WordCloud.var"#46#48", kargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/crstnbr/.julia/packages/WordCloud/JZsfo/src/train.jl:376
 [12] train!(::Array{Any,1}, ::WordCloud.QTree.ShiftedQtree{Array{WordCloud.QTree.PaddedMat{Array{UInt8,2}},1}}, ::Int64) at /Users/crstnbr/.julia/packages/WordCloud/JZsfo/src/train.jl:359 (repeats 2 times)
 [13] generate!(::wordcloud; retry::Int64, krags::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/crstnbr/.julia/packages/WordCloud/JZsfo/src/interface.jl:323
 [14] generate!(::wordcloud) at /Users/crstnbr/.julia/packages/WordCloud/JZsfo/src/interface.jl:313
 [15] top-level scope at In[29]:5
 [16] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091

When I make the list longer by, for example, repeat(words, 3) it works fine.

Word == Sentences ??

Hi
I would like to generate word cloud but with ... sentences/expressions...
The given text file may be of the following form

"This is sentence A"
"Another expression"
"What a good tool"
"Julia is powerful"

Seems that's not possible for now with Wordcloud (because of splitword regexp ?)
Thanks for WordCloud!

Make some flag for case-insensitive processing

I think that case insensitive might be nice. If there isn't already functionality.

P.S., I think docs would be really nice for this package. Happy to help you do that when I have a little more time :-)

Thanks for this package!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.