ankush-me / synthtext Goto Github PK

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Home Page: http://www.robots.ox.ac.uk/~vgg/data/scenetext/

License: Apache License 2.0

Python 96.11% MATLAB 3.89%

synthtext's Introduction

SynthText

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Synthetic Scene-Text Image Samples

The code in the master branch is for Python2. Python3 is supported in the python3 branch.

The main dependencies are:

pygame==2.0.0, opencv (cv2), PIL (Image), numpy, matplotlib, h5py, scipy

Generating samples

python gen.py --viz [--datadir <path-to-dowloaded-renderer-data>]

where, --datadir points to the renderer_data directory included in the data torrent. Specifying this datadir is optional, and if not specified, the script will automatically download and extract the same renderer.tar.gz data file (~24 M). This data file includes:

sample.h5: This is a sample h5 file which contains a set of 5 images along with their depth and segmentation information. Note, this is just given as an example; you are encouraged to add more images (along with their depth and segmentation information) to this database for your own use.
fonts: three sample fonts (add more fonts to this folder and then update fonts/fontlist.txt with their paths).
newsgroup: Text-source (from the News Group dataset). This can be subsituted with any text file. Look inside text_utils.py to see how the text inside this file is used by the renderer.
models/colors_new.cp: Color-model (foreground/background text color model), learnt from the IIIT-5K word dataset.
models: Other cPickle files (char_freq.cp: frequency of each character in the text dataset; font_px2pt.cp: conversion from pt to px for various fonts: If you add a new font, make sure that the corresponding model is present in this file, if not you can add it by adapting invert_font_size.py).

This script will generate random scene-text image samples and store them in an h5 file in results/SynthText.h5. If the --viz option is specified, the generated output will be visualized as the script is being run; omit the --viz option to turn-off the visualizations. If you want to visualize the results stored in results/SynthText.h5 later, run:

python visualize_results.py

Pre-generated Dataset

A dataset with approximately 800000 synthetic scene-text images generated with this code can be found in the SynthText.zip file in the torrent here; dataset detais/description in readme.txt file in the same torrent.

Adding New Images

Segmentation and depth-maps are required to use new images as background. Sample scripts for obtaining these are available here.

predict_depth.m MATLAB script to regress a depth mask for a given RGB image; uses the network of Liu etal. However, more recent works (e.g., this) might give better results.
run_ucm.m and floodFill.py for getting segmentation masks using gPb-UCM.

For an explanation of the fields in sample.h5 (e.g.: seg,area,label), please check this comment.

Pre-processed Background Images

The 8,000 background images used in the paper, along with their segmentation and depth masks, are included in the same torrent as the pre-generated dataset under the bg_data directory. The files are:

filenames	description
`imnames.cp`	names of images which do not contain background text
`bg_img.tar.gz`	images (filter these using `imnames.cp`)
`depth.h5`	depth maps
`seg.h5`	segmentation maps

use_preproc_bg.py provides sample code for reading this data.

Note: We do not own the copyright to these images.

Generating Samples with Text in non-Latin (English) Scripts

@JarveeLee has modified the pipeline for generating samples with Chinese text here.
@adavoudi has modified it for arabic/persian script, which flows from right-to-left here.
@MichalBusta has adapted it for a number of languages (e.g. Bangla, Arabic, Chinese, Japanese, Korean) here.
@gachiemchiep has adapted for Japanese here.
@gungui98 has adapted for Vietnamese here.
@youngkyung has adapted for Korean here.
@kotomiDu has developed an interactive UI for generating images with text here.
@LaJoKoch has adapted for German here.

Further Information

Please refer to the paper for more information, or contact me (email address in the paper).

synthtext's People

Contributors

Stargazers

Watchers

Forkers

cjnolet wuqixiaobai caomw xshhhm linjm fanghaizhao benjamesbabala jugg1024 matrixplayer fireae dengdan chenquan-cq vikingmew zchengquan mittlin anna0509 qingzew fujianhai yipeng-sun hershawn craftsliu nidetaoge ismymajia ericustc crazylyf simmoncn lazylazypig lxj0276 jayhello dengcy028 yeqingli wenyafei4 viet-nguyen rongyousu jkschin harrywy wangheng86 zaheersm lzd0825 tacitadeplata maor-gramlabs sagisaga stanstarks sterguti beniz ml-lab catherineyao chloeehkim amzhanghan zbxzc35 boboualice benakiva clscy chenjianjuncjj mnrmja007 aayn kitter zj463261929 yingning codeveryslow tobechao guoguicheng doudoubean simon-llong liviust jsmilemsj zw88 zgsxwsdxg avilash skmbw silasxue anddywang domimic balancewing jdc08161063 andyhx zhww higherwang vincentliubuaa shiyongde shenggaozhu labimage huoo2015 flyflywang realzheng holygen onebaicai amusi tanyufei xhappy taozhi2yaoyao soumenms2015 justrypython chucktan123 myrababa zhongxingpeng nightinwhite ricelingz wineternity vividx

synthtext's Issues

Ability to draw words with no spacing at all between letters/characters/symbols

I've dug into this problem now a few times and it appears to be non-trivial and a product of drawing each letter separately with pygame. I hope there's a way around this, however.

number of bounding boxes per grid

What was the number of bounding boxes that you were predicting per grid cell ?

Thanks!

Problems with Opencv 3.0

This code can not run in opencv 3.1. There have any version of this code which can run in opencv 3.1?

seg question

How to get the "seg" "label" and "area" from gPb-UCM (https://github.com/jponttuset/mcg) code that you advised. I only found it got ucm image. Are these information including "seg" "label" and "area" related to superpixel? Can you give me a clue about the specific process gathering these information? Thanks a lot. @ankush-me

can use my own word type, and word ? like : chinese

how can i use this to generate cineses word image

ValueError: too many values to unpack

i meet this problem,
xting@xiaomi-To-be-filled-by-O-E-M:~/SynthText$ python gen.py
getting data..
-> done
Storing the output in: results/SynthText.h5
0 of 4
Traceback (most recent call last):
File "/home/xting/SynthText/synthgen.py", line 615, in render_text
regions = self.filter_for_placement(xyz,seg,regions)
File "/home/xting/SynthText/synthgen.py", line 391, in filter_for_placement
res = get_text_placement_mask(xyz,seg==l,regions['coeff'][idx],pad=2)
File "/home/xting/SynthText/synthgen.py", line 219, in get_text_placement_mask
contour,hier = cv2.findContours(mask.copy().astype('uint8'),mode=cv2.RETR_CCOMP,method=cv2.CHAIN_APPROX_SIMPLE)
ValueError: too many values to unpack
1 of 4
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred
Traceback (most recent call last):
File "/home/xting/SynthText/synthgen.py", line 615, in render_text
regions = self.filter_for_placement(xyz,seg,regions)
File "/home/xting/SynthText/synthgen.py", line 391, in filter_for_placement
res = get_text_placement_mask(xyz,seg==l,regions['coeff'][idx],pad=2)
File "/home/xting/SynthText/synthgen.py", line 219, in get_text_placement_mask
contour,hier = cv2.findContours(mask.copy().astype('uint8'),mode=cv2.RETR_CCOMP,method=cv2.CHAIN_APPROX_SIMPLE)
ValueError: too many values to unpack
2 of 4
Traceback (most recent call last):
File "/home/xting/SynthText/synthgen.py", line 615, in render_text
regions = self.filter_for_placement(xyz,seg,regions)
File "/home/xting/SynthText/synthgen.py", line 391, in filter_for_placement
res = get_text_placement_mask(xyz,seg==l,regions['coeff'][idx],pad=2)
File "/home/xting/SynthText/synthgen.py", line 219, in get_text_placement_mask
contour,hier = cv2.findContours(mask.copy().astype('uint8'),mode=cv2.RETR_CCOMP,method=cv2.CHAIN_APPROX_SIMPLE)
ValueError: too many values to unpack
3 of 4
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred
Traceback (most recent call last):
File "/home/xting/SynthText/synthgen.py", line 615, in render_text
regions = self.filter_for_placement(xyz,seg,regions)
File "/home/xting/SynthText/synthgen.py", line 391, in filter_for_placement
res = get_text_placement_mask(xyz,seg==l,regions['coeff'][idx],pad=2)
File "/home/xting/SynthText/synthgen.py", line 219, in get_text_placement_mask
contour,hier = cv2.findContours(mask.copy().astype('uint8'),mode=cv2.RETR_CCOMP,method=cv2.CHAIN_APPROX_SIMPLE)
ValueError: too many values to unpack
4 of 4
Traceback (most recent call last):
File "/home/xting/SynthText/synthgen.py", line 615, in render_text
regions = self.filter_for_placement(xyz,seg,regions)
File "/home/xting/SynthText/synthgen.py", line 391, in filter_for_placement
res = get_text_placement_mask(xyz,seg==l,regions['coeff'][idx],pad=2)
File "/home/xting/SynthText/synthgen.py", line 219, in get_text_placement_mask
contour,hier = cv2.findContours(mask.copy().astype('uint8'),mode=cv2.RETR_CCOMP,method=cv2.CHAIN_APPROX_SIMPLE)
ValueError: too many values to unpack

invert_font_size.py can not run

I got error as follows when I run invert_font_size.py

File "invert_font_size.py", line 7, in
from synth2 import FontState

Have you tried any type of stochastic pooling for regularization similar to dropout?

I'm training my network now and from epoch to epoch, I'm testing the model to see how well it is in predictions. So far, it's getting me less than 25% recall and I'm trying to figure out what I'm doing wrong. I believe the math I'm doing for converting the pose parameters back to bounding boxes is correct. Anyways, some light research brought me to stochastic pooling and I'm curious if you have ever tried this in your networks and if it's worth me trying to see if maybe my network is overfitting.

Also- how many epochs did you end up running? My epochs are each taking 30 hours with roughly 550k images (I will try a larger set once I am able to see if this approach is going to generalize well, if at all, to my actual data).

question in function get_text_placement_mask(xyz,mask,plane,pad=2,viz=False):

In synthgen.py, I know that pts is points of contour in 2d image and pts_fp is where these points should be in fronto-parallel. But what does pts_tmp mean?
rect = cv2.minAreaRect(pts_fp[0].copy().astype('float32'))
box = np.array(cv2.cv.BoxPoints(rect))
R2d = su.unrotate2d(box.copy())
box = np.vstack([box,box[0,:]]) #close the box for visualization

mu = np.median(pts_fp[0],axis=0)
pts_tmp = (pts_fp[0]-mu[None,:]).dot(R2d.T) + mu[None,:]
boxR = (box-mu[None,:]).dot(R2d.T) + mu[None,:]
s = rescale_frontoparallel(pts_tmp,boxR,pts[0])

Why does these codes necessary ? Why pts_fp and box cant't use in function “rescale_frontoparallel " directlly but use pts_tmp and boxR?

Some bounding boxes have very large/small positive/negative coordinates

The size of images

The size of images in SynthText are the same? And How big?

how to filter this ?

some chars is not rightly placed(maybe the reason of font), but i cannot find how to filter this in pygame

Generating Cropped Word Images

@ankush-me This is a great work. Thanks for sharing it.
Currently I am working on a problem of text recognition from cropped word images. Initially I used MJSynth dataset provided by Visual Geometry Group for training my LSTM based model. But this dataset is very biased towards words and has very less occurrences of numbers. I have a dictionary containing the type of text/numbers/symbols that I want to recognize but I am more concerned about rendering process. Can I use your script to synthetically generate cropped images (resembling natural text images) containing words and numbers?
Please help.

model training question

Hi,
Is there any trick or constraint to force the confidence in the 7 length vector to lie between 0 and 1 during training.

THanks

can Fully-Convolutional Regression Network (FCRN) algorithm be open source?

can push FCRN algorithm to git ?

IoU in cvpr 2016 paper

Hi,
For the paper "Synthetic Data for Text Localisation in Natural Images" did you calculate IoU for oriented BBs or you converted them to axis aligned bbs ?

Thanks
M

How can we generate char_freq.cp for custom text data ?

The rotations of some word bounding boxes do not line up properly with the word itself.

It's strange and it happens quite often, maybe 20% of the time. It usually occurs when a word is rotated itself or has been projected onto a surface causing it to have a strange dimension. Thing is, though, it does not only happen on extreme projections. It also happens on only slightly projected text. It makes it hard for the fully convolutional network to converge well on the sin/cos pose params.

what should i do if i want to generate Chinese text

Is there any background images database available?

Hi, is there any background images database available? Crawling and manual inspecting large number of images is really annoying... Thx!

How to add my own images into dest.h5?

Hello ankush, I am having trouble creating my own data set. I want add images to dest.h5,but I didn't find the method. Can you provide the example?

hi,ankush-me,i want to know how to generate colors_new.cp from IIIT-5K word dataset,do you have ideas?

questions on your paper

Hi Ankur,
In your paper you mention hough voting but I do not see it being used in the paper. Can you clarify ?

Thanks

Some bounding boxes are twisted in a way where they don't properly represent the text

In a few cases, it almost appears the bottom left and the top right boxes are flipped causing the bounding box to almost appear as two triangles that are touching at one of the points rather than a single rectangle. Not exactly sure what is causing this

How to get some bigger text?

Thanks for sharing the code. I want to get some bigger text (for example, the text occupies a large part of the image), can you tell me how to operate your code? Thank you very much.

synthtext question

Hi,
Sorry to ask a question regarding the SynthText dataset since I am not aware of any other place to pose this q .

I am guessing that the size of the textboxes and angles are wrt the original image size and not wrt 512 x 512 ?

Thanks

Using the "FCRN" approach to create a saliency map

I've recently been kicking around the idea of possibly bypassing the localization stage altogether and moving right into a holistic method for text recognition- more specifically for creating a saliency map of characters found within in an image. My thoughts are that instead of trying to predict the pose params for a bounding box around the text, I'm trying to predict a vector that gives the confidence that a specific cell contains each character of the alphabet.

I don't recall seeing any research papers published on this topic yet. Ankush, have you seen any research done on this? Have you tried this yourself?

Recently, I trained up a text detection network that, similar to your FCRN, only tried to detect the presence of text within each cell. Basically a 0 meant no text while a 1 meant high confidence of text. The results I've had both on the synth text dataset and on other datasets have been very promising.

I'm having a lot of fun with this FCRN approach, if you couldn't tell. For my specific case, I don't care as much about having bounding boxes around text as I do for knowing what the text in the image says.

Thanks again for any advice, comments, lessons learned, or references you may have.

it's really hard to hack the code

First thanks for sharing the code, I want to remove some restrictions in your filtering (which use depth and segmentation masks), but so many numeric operations (like ransac, depth camera..) maybe not that intuitive to understand, can you show me a direction ? Maybe some detailed materials about how to use the depth and segmentation info will help a lot.

text angle

I found the text angle is related to plane normal, and it range from -90 to 90 degrees. Is there any possible for text angle range from -180 to 180 degrees?

how to get image samples along with their depth and segmentation information

This program can run correctly, but in order to get more SynthText samples, how can I get a lot of images with depth and segmentation information? @ankush-me

loss function

Hi,
I had a question about the loss function in your synthtext paper. Can you give a latex formula for the version used in synth text paper?

when i run the code , it generates the errors

when i use the python gen.py --viz in the command

it generates
~/SynthText-master$ python gen.py --viz
Traceback (most recent call last):
File "gen.py", line 19, in
from synthgen import *
File "/home/ubuntu/SynthText-master/synthgen.py", line 20, in
import text_utils as tu
File "/home/ubuntu/SynthText-master/text_utils.py", line 12, in
from pygame import freetype
ImportError: cannot import name freetype

waht is the wrong ?

Ability to specify minimum words per image

I have some images that have quite a bit of space open for text to be drawn but only a single 2-3 character word will be drawn on the entire image. It would be nice to specify if the minimum amount of words that I'd like drawn on an image so that it can continue looking for words to draw.

（this is not issue) is there a way to train custom data with synthtext?If yes ,is there a instructions?

（this is not issue) is there a way to train custom data with synthtext?If yes ,is there a instructions?
thank you very much !

Download link in gen.py not working

Hello, I am having trouble creating my own data set. The link to example data seems to be no longer working...
DATA_URL = 'http://www.robots.ox.ac.uk/~ankush/data.tar.gz'
Question related to the textspotter demo http://zeus.robots.ox.ac.uk/textspot/
Is this demo using your "FCRNall + multi-filt" model ? Is the code for this model available ?

Thank you

How to add my own font into font_px2pt.cp ?

hello Ankush , I find the way

Question about Synthetic Word Dataset

I came here after reading "Synthetic Data and Artificial Neural Networks for
Natural Scene Text Recognition" looking for the code to generate Synthetic Word Dataset (http://www.robots.ox.ac.uk/~vgg/data/text/). Is the code available?

Thanks.

How to generate char_freq.cp and font_px2pt.cp of chinese characters?

Hi, I want to generate natural Chinese character images, so could you tell me that how to generate char_freq.cp and font_px2pt.cp corresponding Chinese in detail.
Thank you.

what is the lib : synth2 in invert_font_size.py...I google it but found nothing relevant

I add chinese font into data/font/ and update the fontlist.txt, then according to you I should run invert_font_size.py to update the font_px2pt.cp. But when running invert_font_size.py I have difficulty when installing synth2 in 'from synth2 import Fontstate'...what is its real name if i use conda install synth2?

What training params did you use for the dicount in your FCRN?

Similar to your research, I'm finding that 3% of the cells in my input images contained bounding boxes and I'm starting my discount at 0.01 of the loss for the negative cells in the confidence matrix, c, during training. I am having trouble finding a good step value (for increasing the discount value), however, during training. How long did it take you to train your final network with 800,000 images? How often did you increase the discount that you applied to the loss of the negative c classes during training? What did you change it by?

is there a way to get UCM - segmentation using python and opencv ?

i wonder if i can use python + opencv to get ucm.mat file for a given image...

Question about FCRN downscaling inference

In the paper "Synthetic Data for Text Localisation in Natural Images", you mention that you downscale the images by 1/2, 1/4 and 1/8 in order to pick up larger words. I'm wondering if you are able to provide more details about this. For instance, when you downscale the images, did you feed them into the same (1x512x512) network? Did you fill the empty area with black/white pixels?

The height of bounding boxes for some words is much larger than the words

This is causing some problems when determining the midpoints for the bounding boxes for the fully convolutional network. It's basically putting the midpoint in a different spot then it would have been had the bounding box tightly enclosed the text. Any ideas?

postprocessing code from Jaderberg et al

I came here after reading the CVPR 2016 paper. Is the post-processing code/models from Jaderberg et al (used in the paper) available at all ? Thanks!

can not download the 'http://www.robots.ox.ac.uk/~ankush/data.tar.gz'

hello, when i run the gen.py, i failed. i can not find the "http://www.robots.ox.ac.uk/~ankush/data.tar.gz". is there other's way to get the data? thanks.

the error is as follows:
File "gen.py", line 55
print colorize(Color.RED,'Data not found and have problems downloading.',bold=True)
^
SyntaxError: invalid syntax

by the way,800000 synthetic scene-text images,the download can not complete, the URL i tried is http://www.robots.ox.ac.uk/~vgg/data/scenetext/, is it right?
thanks a lot!

difference between px and pt?

def get_nline_nchar(self,mask_size,font_height,font_width):
"""
Returns the maximum number of lines and characters which can fit
in the MASK_SIZED image.
"""
H,W = mask_size
nline = int(np.ceil(H/(2*font_height)))
nchar = int(np.floor(W/font_width))
return nline,nchar
In this function, font_height and font_width are type of pt, but H and W are type of px. Why can px be divide by pt？ what‘s difference between px and pt？