Giter Club home page Giter Club logo

pytagcloud's Introduction

PyTagCloud

PyTagCloud let you create simple tag clouds inspired by http://www.wordle.net/

Currently, the following output formats have been written and are working:

  • PNG images
  • HTML/CSS code

If you have ideas for other formats please let us know.

Installation

You can install PyTagCloud either via the Python Package Index (PyPI) or from source.

To install using pip:

$ pip install -U pytagcloud

To install using easy_install:

$ easy_install -U pytagcloud

Downloading and installing from source

Download the latest version of PyTagCloud from http://pypi.python.org/pypi/pytagcloud/

You can install it by doing the following,:

$ tar xfz pytagcloud-*.tar.gz
$ cd pytagcloud-*/
$ python setup.py build
$ python setup.py install # as root

Requirements

  1. Install pygame >= 1.9.1:

    $ apt-get install python-pygame
    
  2. Install simplejson:

    $ pip install simplejson
    

Quick start

You probably want to see some code by now, so here's an example:

from pytagcloud import create_tag_image, make_tags
from pytagcloud.lang.counter import get_tag_counts

YOUR_TEXT = "A tag cloud is a visual representation for text data, typically\
used to depict keyword metadata on websites, or to visualize free form text."

tags = make_tags(get_tag_counts(YOUR_TEXT), maxsize=80)

create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='Lobster')

import webbrowser
webbrowser.open('cloud_large.png') # see results

More examples can be found in test.py.

Example

Demo

https://github.com/atizo/PyTagCloud/raw/master/docs/example.png

Contributing

Development of pytagcloud happens at Github: https://github.com/atizo/PyTagCloud

You are highly encouraged to participate in the development of pytagcloud. If you don't like Github (for some reason) you're welcome to send regular patches.

pytagcloud's People

Contributors

jsma avatar justis avatar kernc avatar konstantint avatar paulklinger avatar ubershmekel avatar yossi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytagcloud's Issues

Bug in get_tag_count

I tried running the example code on Python 3.

I get an iteritems() error on get_tag_count.
Replacing iteritems() to items() does the trick.

However by looking at the counted variable in that function,
it returns an empty dictionary where as the words variable is
non-empty.

It seems the s.guess(words) loads "german", not "english",
which might have caused the counted variable to be empty
because of the improper stop words used.

Here is my a screenshot of my fix
capture

Great library by the way, kudos.

html_data example

any demo/example of using create_html_data for displaying word cloud in html ?

tags is overlapped

tags is not generated perfectly on my server. they are overlapped and incorrect position and size
cloud_large

Here is my simple code

from pytagcloud import create_tag_image, make_tags, LAYOUT_HORIZONTAL, LAYOUTS, LAYOUT_MIX, LAYOUT_VERTICAL, LAYOUT_MOST_HORIZONTAL, LAYOUT_MOST_VERTICAL
from pytagcloud.lang.counter import get_tag_counts

text = "abc sda 123 343 hello world asia innovation djaksdja daksjdaslkd alkdha djhasjkdahskjdahsd kjasdjkasd jkah dakj djfsdhf kjsfhd sdjfhs kjfshd k"
tags = make_tags(get_tag_counts(text), maxsize=80, minsize=20)
create_tag_image(tags, '/home/hoanghua/cloud_large.png', size=(900, 900), fontname='Lobster')

Please help me to find out the issue. Thanks in advance

Evenly distributed tagset (all counts the same) cause div/0 error in defscale

The default scaling function has this expression:
1.0 / (maxcount - mincount)

When maxcount == mincount, things blow up. As a workaround I've used a hacky alternative that avoids this well enough for my needs...

def tagscale(count, mincount, maxcount, minsize, maxsize):
if maxcount == mincount:
return defscale(count, mincount, maxcount + 1, minsize, maxsize)
return defscale(count, mincount, maxcount, minsize, maxsize)

There's probably a better way though

Example not working on Python 3

Hello, I tried the example on Python 3.5 and realized it was not compatible. In order to make it function on this version I modified the pytagcloud/lang/counter.pyfile to look like this

# -*- coding: utf-8 -*-
import re
from pytagcloud.lang.stopwords import StopWords
from operator import itemgetter

def get_tag_counts(text):
    """
    Search tags in a given text. The language detection is based on stop lists.
    This implementation is inspired by https://github.com/jdf/cue.language. Thanks Jonathan Feinberg.
    """

    # words = map(lambda x:x.lower(), re.findall(r'\w+', text, re.UNICODE))
    # the line above doesn't work on python 3.5

    s = StopWords()
    s.load_language(s.guess(map(lambda x:x.lower(), re.findall(r'\w+', text, re.UNICODE))))

    counted = {}

    for word in map(lambda x:x.lower(), re.findall(r'\w+', text, re.UNICODE)):
        if not s.is_stop_word(word) and len(word) > 1:
            if word in counted: #no has_key() method on python 3, use in instead
                counted[word] += 1
            else:
                counted[word] = 1

    return sorted(counted.items(), key=itemgetter(1), reverse=True)

I leave this here because I did not find a branch for python 3 and maybe someone could find it useful

words with apostrophes like "don't" show up as "don" and "t"

Most of them should be getting snagged in stopwords.

This is a simple fix to the regex in counter.py

words = map(lambda x:x.lower(), re.findall(r'\w+', text, re.UNICODE))

should be:

words = map(lambda x:x.lower(), re.findall(r"[\w']+", text, re.UNICODE))

Not a perfect fix, but it pushes the corner cases to places where they are less likely to be noticeable to an end user.

Liked tags

Hi!

I wonder if your code would be flexible enough to provide links as well.

Imagine following structure:

tagslist = [ ['tag1', 23, 'http://domain.example.com/tags/tag1/'], ['tag2', 42, 'http://domain.example.com/tags/tag1/'], ['tag3', 13, 'http://domain.example.com/tags/tag1/'] ]

A tag cloud generated with the elements [tagname, count, url] should result in an image and an image-map as described in http://www.w3schools.com/tags/tag_area.asp

This way, I can use your cool library to generate a tag cloud for navigation within my blog system.

What do you think?

Words still cut out of the picture

Kudos,

Using PyTagCloud to amend my CV with relevant keywords, I found words still get offset out of the image.

While @konstantint's issue #12 helps considerably, I experienced truncation until I replaced https://github.com/atizo/PyTagCloud/blob/master/pytagcloud/__init__.py#L355
image_surface = Surface((sizeRect.w, sizeRect.h), SRCALPHA, 32)
with
image_surface = Surface((sizeRect.w + 2*TAG_CLOUD_PADDING, sizeRect.h + 2*TAG_CLOUD_PADDING), SRCALPHA, 32)
Not sure if this is a fix, though, but it helped in my case.

font issue with pygame

This might not be a pytagcloud issue per se... but here goes.

I'm getting a IO error related to the font filenames. I assume it's a pygame problem:

Traceback (most recent call last):
File "/home/blake/Dropbox/2012.03_WorkCode/tagcloud.py", line 41, in
File "/usr/local/lib/python2.7/dist-packages/pytagcloud/init.py", line 344, in create_tag_image
File "/usr/local/lib/python2.7/dist-packages/pytagcloud/init.py", line 275, in _draw_cloud
File "/usr/local/lib/python2.7/dist-packages/pytagcloud/init.py", line 62, in init
IOError: unable to read font filename

When I run the example code off your site, it runs though without a problem. I just can't figure out for the life of me why my code, which uses the same font as the example, won't run.

Here's my code......

import csv
from pytagcloud import create_tag_image, make_tags
from pytagcloud.lang.counter import get_tag_counts

input_csv = csv.reader(open('sql_output.csv','rb'),delimiter="~")

....[some stuff here that creates a list of word frequency tuples from my csv data]

word_frequencies = []
for n,irecord in enumerate(sorted(word_freqs.items(), key=lambda item: item[1])):
word_frequencies.append(tuple([irecord[0],irecord[1]]))

tags = make_tags(word_frequencies, maxsize=120)

create_tag_image(tags, 'cloud_large2.png', size=(900, 600), fontname='Lobster')

Sprite overlap

Taking the initial example from the unit tests and running:

create_tag_image(tags, os.path.join(home, 'cloud.png'), size=(300,400), background=(255,255,255,255), vertical=True, crop=True, fontname='fonts/Arial.ttf', fontzoom=3.2)

Gives significant overlap many of the sprites. Is it possible for the code to, at the very least, be aware this is happening and warn the user?

Image output appears offset, words get cut off

I've been using the PyTagCloud master from here on GitHub and I've been finding that when attempting to output images, the result ends up frequently cutting words off at the edges -- the output looks like it might be offset from one side.

An example:

Image

Note how the left-hand side is a white margin and the word 'respect' gets cut off the at right-hand side. I've also noticed this happening at the bottom side as well, particularly with words that are longer and vertically aligned.

Snippet of code used to produce this:

text = "..."
#Set a max_tags to avoid infinite processing on large word counts
max_tags = 100
tags = make_tags(get_tag_counts(text)[:max_tags], minsize=1,  maxsize=60)
size = (1024, 500)
create_tag_image(tags, 'cloud.png', size=size, fontname='IM Fell DW Pica', layout=LAYOUT_MIX)

Am I using something incorrectly?

Thanks for this fantastic library -- extremely helpful.

PyTagCloud fails when running on a server without a sound card...

Is it necessary for PyTagCloud to depend on pygame? pygame raises the exception below, apparently because it needs to see a sound card? Any advice?

ALSA lib confmisc.c:768:(parse_card) cannot find card '0'
ALSA lib conf.c:4241:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory
ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings
ALSA lib conf.c:4241:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
ALSA lib confmisc.c:1251:(snd_func_refer) error evaluating name
ALSA lib conf.c:4241:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4720:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM default

Tips for Chinese, maybe Japanese or Korean?

Really awesome tools to generate tag cloud: )
It can be extend to generate tag cloud for languages like Chinese, but I don't find any doc/wiki about it. many people don't think it support CJK.

So, Is it a good idea to add some tips in the readme.rst for CJK users?

I don't know whether the method I use is a good practice......

I Use another module named jieba for Chinese word segmentation instead of get_tag_count

Add one of Chinese font to fonts directory and edit fonts.json.

Sorry, poor English T T...

When using a dummy display (eg. headless environment) all of the tags are on top of each other

Possibly I'm just missing some config, do let me know if so! Here's what I did:

$ export SDL_VIDEODRIVER="dummy"
$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pygame
>>> pygame.display.set_mode((1024,768)) 
<Surface(1024x768x8 SW)>
>>> from pytagcloud import create_tag_image, make_tags
>>> from pytagcloud.lang.counter import get_tag_counts
>>> 
>>> YOUR_TEXT = "A tag cloud is a visual representation for text data, typically\
... used to depict keyword metadata on websites, or to visualize free form text."
>>> 
>>> tags = make_tags(get_tag_counts(YOUR_TEXT), maxsize=120)
>>> 
>>> create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='Lobster') 

The result looks like this:
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.