xhtml2pdf / xhtml2pdf Goto Github PK

View Code? Open in Web Editor NEW

2.2K 73.0 634.0 45.49 MB

A library for converting HTML into PDFs using ReportLab

Home Page: https://xhtml2pdf.readthedocs.io/

License: Apache License 2.0

HTML 45.50% Python 52.17% CSS 2.14% Makefile 0.14% Dockerfile 0.06%

html-pdf html-pdf-converter html-to-pdf html-to-pdf-converter pdf pdf-converter pdf-generation pypdf python reportlab

xhtml2pdf's Introduction

XHTML2PDF

Release Notes can be found here: Release Notes As with all open-source software, its use in production depends on many factors, so be aware that you may find issues in some cases.

Big thanks to everyone who has worked on this project so far and to those who help maintain it.

About

xhtml2pdf is a HTML to PDF converter using Python, the ReportLab Toolkit, html5lib and pypdf. It supports HTML5 and CSS 2.1 (and some of CSS 3). It is completely written in pure Python, so it is platform independent.

The main benefit of this tool is that a user with web skills like HTML and CSS is able to generate PDF templates very quickly without learning new technologies.

Please consider support this project using Patreon or Bitcoins: bc1qmr0skzwx5scyvh2ql28f7gfh6l65ua250qv227

Documentation

The documentation of xhtml2pdf is available at Read the Docs.

And we could use your help improving it! A good place to start is doc/source/usage.rst.

Installation

This is a typical Python library and can be installed using pip:

pip install xhtml2pdf

Requirements

Only Python 3.8+ is tested and guaranteed to work.

All mandatory requirements are listed in the pyproject.toml file and are installed automatically using the pip install xhtml2pdf method.

As PDF library we depend on reportlab, which needs a rendering backend to generate bitmaps and vector graphic formats. For more information about this, have a look at the reportlab docs.

The recommended choice is the cairo graphics library which has to be installed system-wide e.g. via the OS package manager in combination with the PyCairo extra dependency:

pip install xhtml2pdf[pycairo]

Alternatively, the legacy RenderPM can be used by installing:

pip install xhtml2pdf[renderpm]

Alternatives

You can try WeasyPrint. The codebase is pretty, it has different features and it does a lot of what xhtml2pdf does.

Call for testing

This project is heavily dependent on getting its test coverage up! Furthermore, parts of the codebase could do well with cleanups and refactoring.

If you benefit from xhtml2pdf, perhaps look at the test coverage and identify parts that are yet untouched.

Development environment

If you don't have it, install pip, the python package installer:
```
sudo easy_install pip
```
For more information about pip refer to http://www.pip-installer.org
We will recommend using venv for development.
Create a virtual environment for the project. This can be inside the project directory, but cannot be under version control:
```
python -m venv .venv
```
Activate your virtual environment:
```
source .venv/bin/activate
```
Later to deactivate it use:
```
deactivate
```
The next step will be to install/upgrade dependencies from the pyproject.toml file:
```
pip install -e .[test,docs,build]
```
Run tests to check your configuration:
```
tox
```
You should have a log with the following success status:
```
congratulations :) (75.67 seconds)
```

Python integration

Some simple demos of how to integrate xhtml2pdf into a Python program may be found here: test/simple.py

Running tests

Two different test suites are available to assert that xhtml2pdf works reliably:

Unit tests. The unit testing framework is currently minimal, but is being improved on a regular basis (contributions welcome). They should run in the expected way for Python's unittest module, i.e.:
```
tox
```
Functional tests. Thanks to mawe42's super cool work, a full functional test suite is available at testrender/.

You can run them using make

make test       # run tests
make test-ref   # generate reference data for testrender
make test-all   # Run all test using tox

Contact

This project is community-led! Feel free to open up issues on GitHub about new ideas to improve xhtml2pdf.

History

These are the major milestones and the maintainers of the project:

2000-2007 Dirk Holtwick (commercial project of spirito.de)
2007-2010 Dirk Holtwick (project named "pisa", project released as GPL)
2010-2012 Dirk Holtwick (project named "xhtml2pdf", changed license to Apache)
2012-2015 Chris Glass (@chrisglass)
2015-2016 Benjamin Bach (@benjaoming)
2016-2018 Sam Spencer (@LegoStormtroopr)
2018-Current Luis Zarate (@luisza)

For more history, see the Release Notes.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

xhtml2pdf's People

Contributors

Stargazers

Watchers

Forkers

holtwick lasconic bachp leopay nduthoit alanjds poswald dtran320 djcoin orestis k1000 davidszotten stefanfoulis ejucovy mawe42 mattjmorrison jacobninja kezabelle amites sbarysiuk tecknicaltom oviboy fpliger repos-python pi11 jacobwegner maurojr gabrielgrant tadeo emildv golumn datakurre fernandotakai gabejackson bbzoo laiwei thefunny42 atul-bhouraskar mauler atn9 stefanofontanelli vsr inmeet tendigitsdown lqik2004 sethportman tedliang vmallory ahalife daleobrien ibyer bentley4 bertrandbordage devasia2112 pranavk zhiwehu gtp pdl vangheem dmitry89 wpbird007 jesusmaherrera vrialland bradleyayers kvdb taito-zz andrewschoen bruschistefania kurmaev andrewjhart utburd xrmx useevil grangier frague59 mkudo jamzo neilalbrock mmohiudd pombredanne vincentrosso jgentes zhangdjxx yakky chrispbailey jorise c-steindl ssutee hawstein nielssteenkrogh shibz lecstor krsreenatha xushiwei emilkjer hengesense kadmillos ericfernandez eaudeweb nickpack

xhtml2pdf's Issues

Having trouble with images.

I'm using this in Django and I read about link_callback.

My pdfs are missing images.

How can I debug this? Are there some ways to figure out if the callback is being called, or other things to look for.

Thanks,
Shige

link callback and svg

I edited the pisaLoader getFileName so that the default suffix is the lower() alphanumeric content-type. That is '.' + subtype of the link, e.g. .png, .svgxml
I was hoping to be able to find where you render images and add a mode for svgxml using this:
http://stackoverflow.com/questions/5835795/generating-pdfs-from-svg-input

But, I am relatively new to python and can't figure out how/ where you handle fetched files/ images.
If you can just tell me where you do this, i will try to make a patch, otherwise please have a look

Support svg

It would be nice to have svg supported directly rather than having a intermediary step to make png first.

Cannot discover last page

I have content of dynamic length, and would like to put a footer only on the last page. From what I've seen from the mail chains and docs, there is no way to do this.

fails on 'Printing a book with css' code.

Hi,

there is an article here on using css+html to print a book.
http://www.alistapart.com/articles/boom

It may be a good test case, since it deals with a fairly common layout.

Here are the files from the article.

curl -O http://people.opera.com/howcome/2005/ala/sample.html
curl -O http://people.opera.com/howcome/2005/ala/sample.css
curl -O http://people.opera.com/howcome/2005/ala/sample.pdf

$ xhtml2pdf sample.html xhtml2pdf_sample.pdf

Here is the error:

File "/Users/rene/xhtml2pdfenv/lib/python2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py", line 664, in _parseAtPage
stylesheetElements.extend(atResults)
TypeError: 'NotImplementedType' object is not iterable

usage.rst: import incorrect

/doc/usage.rst

import xhtml2pdf
import xhtml2pdf.pisa

now it works out of the box :-)

make rtl languages from left

for example Persian text must start from right but your result seems like this
Farsi / Persian: .‫یم نم‬
‫مروخب هشيش درد ساسحا ِنودب مناوت‬
also in PDF separate character

correct Persian من می نوانم ...

ps: maybe i couldn't say the problem. for example correct word is "word" but your result is "dorw"

Footer frames in landscape pages

Bottom aligned frames are positioned outside the page if the document is in landscape.

<html>
<style>
@page {
  margin: 1cm;
  margin-bottom: 2.5cm;
  size: A4 landscape; 
  @frame footer {
    -pdf-frame-content: footerContent;
    bottom: 2cm;
    margin-left: 1cm;
    margin-right: 1cm;
    height: 1cm;
  }
}
</style>
<body>
  Some text
  <div id="footerContent">
    This is a footer on page #<pdf:pagenumber>
  </div>
</body>
</html>

probably because the frame position is calculated with c.pageSize still in portrait

https://github.com/chrisglass/xhtml2pdf/blob/master/xhtml2pdf/context.py#L325

The _parseAt* functions consume one too many characters which can cause problems with minified CSS

In _parseAtPage, for example, the line

src = src[len('@page '):].lstrip()

will throw a ParseError for the following valid css

@page{/.../}

Not latin chars?

Can't display non latin chars. Document encoding utf-8. How?

Twitter-Bootstrap Causes Selector CSSParseError

Twitter Bootstrap has some pretty gnarly CSS selectors that xhml2pdf doesn't like.

Result is:

Selector Pseudo Function closing ')' not found:: (u':not(', u'[controls]) {\n disp')

pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), dest=result, link_callback=fetch_resources )

File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/document.py" in pisaDocument

```
                    encoding, context=context, xml_output=xml_output)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/document.py" in pisaStory
```
pisaParser(src, context, default_css, xhtml, encoding, xml_output)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/parser.py" in pisaParser
```
context.parseCSS()
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/context.py" in parseCSS
```
    self.css = self.cssParser.parse(self.cssText)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in parse
```
            src, stylesheet = self._parseStylesheet(src)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseStylesheet
```
            src, atResults = self._parseAtKeyword(src)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseAtKeyword
```
        src, result = self._parseAtImports(src)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseAtImports
```
        stylesheet = self.cssBuilder.atImport(import_, mediums, self)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/css.py" in atImport
```
        return cssParser.parseExternal(import_)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/context.py" in parseExternal
```
    result = self.parse(cssFile.getData())
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in parse
```
            src, stylesheet = self._parseStylesheet(src)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseStylesheet
```
            src, ruleset = self._parseRuleset(src)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseRuleset
```
    src, selectors = self._parseSelectorGroup(src)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelectorGroup
```
        src, selector = self._parseSelector(src)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelector
```
    src, selector = self._parseSimpleSelector(src)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSimpleSelector
```
            src, selector = self._parseSelectorPseudo(src, selector)
```
File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelectorPseudo

            raise self.ParseError('Selector Pseudo Function closing \')\' not found', src, ctxsrc)

Exception Type: CSSParseError at /p/pdf/gd8lx6xbl
Exception Value: Selector Pseudo Function closing ')' not found:: (u':not(', u'[controls]) {\n disp')

PDF background breaks links

If you create an HTML document with background: url(something.pdf), and then have links on the document, they do not work. There is a patch available in my fork to fix this, but it slows the process down.

Orphans, Widows

Please add these css properties to prevent unproper page breaking.

xhtml support broken?

Hi,

Just installed xhtml2pdf 0.0.3 using pip:

Traceback (most recent call last):
  File "testxhtml2pdf.py", line 5, in <module>
    pisa.CreatePDF(infile, dest=outfile, xhtml=True, encoding='utf-8')
  File "/Users/herve/dev/autoslave_venv/lib/python2.6/site-packages/xhtml2pdf/document.py", line 85, in pisaDocument
    encoding, context=context, xml_output=xml_output)
  File "/Users/herve/dev/autoslave_venv/lib/python2.6/site-packages/xhtml2pdf/document.py", line 56, in pisaStory
    pisaParser(src, context, default_css, xhtml, encoding, xml_output)
  File "/Users/herve/dev/autoslave_venv/lib/python2.6/site-packages/xhtml2pdf/parser.py", line 616, in pisaParser
    parser = html5lib.XHTMLParser(tree=treebuilders.getTreeBuilder("dom"))
AttributeError: 'module' object has no attribute 'XHTMLParser'

With xhtml=False, it all runs fine.

html5lib==0.90

Truncate debug output for src

Logging takes a some time when big html documents are being created because full html source is dumped to log file. Also increases log file size.

I have made the following change in file document.py at line 77 (log.debug("...):

src[100:],

To be able to identify document but not make logging so heavy.

Full html dump to log should be optional, IMHO.

Support cStringIO module for Google App Engine tempfile strategies

The current list of tempfile strategies for Google App Engine in xhtml2pdf/util.py references StringIO. For Python 2.7 users, I believe they may have enabled the cStringIO module for improved performance.

I'd like to request a change to use cStringIO instead of StringIO.

Regarding compatibility, cStringIO is already aliased to StringIO.

Ol Tag CSS Counting Broken

CSS Style

ol{counter-reset: section;} 
ol li{counter-increment: section;} 
ol li:before{content: counters(section, ".") " ";}

Example Code:


        item 1
            
                sub item 1
                    
                        sub-sub item 1
                        sub-sub item 2
                        sub-sub item 3
                    
                 
                 Sub item 2
             
         
         item 2

I Need this

1 item 1
        1.1 sub item 1
                1.1.1 sub-sub item 1
                1.1.2 sub-sub item 2
                1.1.3 sub-sub item 3
        1.2 Sub item 2
2 item 2

But I'm getting this

1 item 1
        1 sub item 1
                1 sub-sub item 1
                2 sub-sub item 2
                3 sub-sub item 3
        2 Sub item 2
2 item 2

Is There A Cure ? :)

Relative sizes don't work

Relative sizes are ignored or worse, cause a crash.
The following html:

<html>
<body>
<div style="width:50%; height:100px; background-color:red">
hi
</div>
<body>
</html>

Gives a red box the entire width of the document.

Providing a relative size of an image causes a crash.

<html>
<body>
<div style="width:100px; height:100px; background-color:blue">
<img src="http://www.google.com/intl/en_com/images/srpr/logo3w.png" style="width: 50%; height: 50%"/>
</div>
<body>
</html>

gives:

$ xhtml2pdf -d ./2.html 
Converting ./2.html to /usr/local/CustomerPortal/src/docs/2.pdf...
DEBUG [xhtml2pdf] /usr/local/CustomerPortal/src/docs/src/xhtml2pdf/xhtml2pdf/document.py line 77: pisaDocument options:
  src = <open file '/usr/local/CustomerPortal/src/docs/2.html', mode 'rb' at 0x1849ae0>
  dest = <open file '/usr/local/CustomerPortal/src/docs/2.pdf', mode 'wb' at 0x1849db0>
  path = '/usr/local/CustomerPortal/src/docs/2.html'
  link_callback = None
  xhtml = False
DEBUG [xhtml2pdf] /usr/local/CustomerPortal/src/docs/src/xhtml2pdf/xhtml2pdf/util.py line 513: FileObject 'http://www.google.com/intl/en_com/images/srpr/logo3w.png', Basepath: '/usr/local/CustomerPortal/src/docs'
DEBUG [xhtml2pdf] /usr/local/CustomerPortal/src/docs/src/xhtml2pdf/xhtml2pdf/util.py line 529: URLParts: ParseResult(scheme='http', netloc='www.google.com', path='/intl/en_com/images/srpr/logo3w.png', params='', query='', fragment='')
WARNING [xhtml2pdf] /usr/local/CustomerPortal/src/docs/src/xhtml2pdf/xhtml2pdf/util.py line 277: getSize: Not a float '50%'
Traceback (most recent call last):
  File "/usr/local/bin/xhtml2pdf", line 9, in <module>
    load_entry_point('xhtml2pdf==0.0.3', 'console_scripts', 'xhtml2pdf')()
  File "/usr/local/CustomerPortal/src/docs/src/xhtml2pdf/xhtml2pdf/pisa.py", line 173, in command
    execute()
  File "/usr/local/CustomerPortal/src/docs/src/xhtml2pdf/xhtml2pdf/pisa.py", line 425, in execute
    xml_output = xml_output,
  File "/usr/local/CustomerPortal/src/docs/src/xhtml2pdf/xhtml2pdf/document.py", line 131, in pisaDocument
    doc.build(context.story)
  File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/doctemplate.py", line 880, in build
    self.handle_flowable(flowables)
  File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/doctemplate.py", line 763, in handle_flowable
    if frame.add(f, canv, trySplit=self.allowSplitting):
  File "/usr/local/lib/python2.7/dist-packages/reportlab/platypus/frames.py", line 159, in _add
    w, h = flowable.wrap(aW, h)
  File "/usr/local/CustomerPortal/src/docs/src/xhtml2pdf/xhtml2pdf/xhtml2pdf_reportlab.py", line 555, in wrap
    self._calcImageMaxSizes(availWidth, self.getMaxHeight() - self.deltaHeight)
  File "/usr/local/CustomerPortal/src/docs/src/xhtml2pdf/xhtml2pdf/xhtml2pdf_reportlab.py", line 532, in _calcImageMaxSizes
    wfactor = float(width) / img.width
ZeroDivisionError: float division by zero

AttributeError: 'tuple' object has no attribute 'scheme'

I get this error when trying to run pisa / xhtml2pdf :

pisa test.html test.pdf

Converting test.html to test.pdf...
Traceback (most recent call last):
File "bin/pisa", line 7, in ?
sys.exit(
File "build/bdist.macosx-10.6-i386/egg/xhtml2pdf/pisa.py", line 173, in command
File "build/bdist.macosx-10.6-i386/egg/xhtml2pdf/pisa.py", line 425, in execute
File "build/bdist.macosx-10.6-i386/egg/xhtml2pdf/document.py", line 80, in pisaDocument
File "build/bdist.macosx-10.6-i386/egg/xhtml2pdf/context.py", line 404, in init
AttributeError: 'tuple' object has no attribute 'scheme'

input form elements?

Hello!
How can I display a box representing an input element? Right now looks like are just skipped.
Thanks in advance.

Upgrade from 0.0.4 to 0.0.5

Hey,

When I upgrade from xhtml2pdf==0.0.4 to xhtml2pdf==0.0.5 everything breaks and half my pdf content is gone.

Tables and paragraphs are messed up or missing.
Images are missing or not styled.
And as it seems half the styling is not applied.

Do you have any idea where I should begin with fixing this? In the release notes I did not see any mentions about this new version being not backwards compatible. Any help would be much appreciated :)

State of testrender

I ran testrender/testrender.py

Seems the generated results have a transparent background, where the reference images do not. That's causing a big difference. Should the reference images be re-rendered to adjust to the new (transparent) situation, or is it not supposed to be like that?

As a sidenote, the generated result was that much different than the reference image, it wouldn't fit in an int:

diff_value = int(result.strip())

should be:

diff_value = float(result.strip())

logging.getLogger somehow still uses ho.pisa

As reported on the mailing list, when converting a document from the command line, a "cannot get logger for 'ho.pisa'" message is displayed, preventing clean output when running in silent mode.

This needs to be changed. (opening this bug to keep track of it)

short options like -c doesnt work

yac@deathstar % xhtml2pdf -c ../default.css machine-id.html 
Traceback (most recent call last):
  File "/usr/bin/xhtml2pdf", line 9, in <module>
    load_entry_point('xhtml2pdf==0.0.4', 'console_scripts', 'xhtml2pdf')()
  File "/usr/lib64/python2.7/site-packages/xhtml2pdf/pisa.py", line 173, in command
    execute()
  File "/usr/lib64/python2.7/site-packages/xhtml2pdf/pisa.py", line 307, in execute
    css = file(a, "r").read()
IOError: [Errno 2] No such file or directory: ''
--------------------------------------------------------------------------------
~/w  [1]
yac@deathstar % xhtml2pdf -c=../default.css machine-id.html 
XHTML2PDF/pisa 3.0.33 (Build 2010-06-16)
http://www.xhtml2pdf.com

Copyright 2010 Dirk Holtwick, holtwick.it

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

USAGE: pisa [options] SRC [DEST]

SRC
  Name of a HTML file or a file pattern using * placeholder.
  If you want to read from stdin use "-" as file name.
  You may also load an URL over HTTP. Take care of putting
  the <src> in quotes if it contains characters like "?".

DEST
  Name of the generated PDF file or "-" if you like
  to send the result to stdout. Take care that the
  destination file is not already opened by an other
  application like the Adobe Reader. If the destination is
  not writeable a similar name will be calculated automatically.

[options]
  --base, -b:
    Specify a base path if input come via STDIN
  --css, -c:
    Path to default CSS file
  --css-dump:
    Dumps the default CSS definitions to STDOUT
  --debug, -d:
    Show debugging informations
  --encoding:
    the character encoding of SRC. If left empty (default) this
    information will be extracted from the HTML header data
  --help, -h:
    Show this help text
  --quiet, -q:
    Show no messages
  --start-viewer, -s:
    Start PDF default viewer on Windows and MacOSX
    (e.g. AcrobatReader)
  --version:
    Show version information
  --warn, -w:
    Show warnings
  --xml, --xhtml, -x:
    Force parsing in XML Mode
    (automatically used if file ends with ".xml")
  --html:
    Force parsing in HTML Mode (default)
--------------------------------------------------------------------------------
~/w  [2]
yac@deathstar % xhtml2pdf --css ../default.css machine-id.html 
Converting machine-id.html to /home/yac/w/machine-id.pdf...
--------------------------------------------------------------------------------
~/w  
yac@deathstar % xhtml2pdf --css=../default.css machine-id.html 
Converting machine-id.html to /home/yac/w/machine-id.pdf...
--------------------------------------------------------------------------------
~/w  
yac@deathstar %

CSS rules not always applied in the same order

Usually in CSS if you have the following code:

#content {
    border: 2px solid blue;
}

#content {
    border: 2px solid red;
}

The border of #content should always be red since styles declared last always override previously declared styles.

However this is not the case with xhtml2pdf, with a css declaration like this sometimes the border will be blue, sometimes it will be red.

Since it's an on & off problem, it looks awfully like a threading problem. You can F5 many times and the styles will flips randomly and not necessarily every times.

I know reportlab isn't thread safe so I was pretty sure it was a threading problem, but it doesn't seems so. I tried with the dev server with --nothreading and with apache+wsgi on a single thread and the problem persist in both cases.

I dumped what the CSS parser was seeing and the CSS code was always right, which leaves us only with one possibility. The CSS rules aren't always applied in the same order.

'module' object has no attribute 'CreatePDF', version - 3.0.33

When I try to use the CreatePDF function, I get an AttributeError -

>>> xhtml2pdf.CreatePDF('<h1>Test</h1>', file('test.pdf', 'wb'))Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'CreatePDF'

But I can create a PDF using the document.pisaDocument function

Not working with non-latin chars

Please download http://dl.dropbox.com/u/137209/pisa_problem.zip
The test.html is a trivial UTF-8 encoded valid XHTML Strict
The test.pdf is the result of

pisa --encoding utf-8 test.html

(the same happens when not speccifying the encoding)

2 screenshots show the source opened in the text editor, and the result opened in Preview

Generated PDF is damaged and is unable to open

Trying to narrow down why my PDFs that are created aren't able to be opened properly. Is this happening for anyone else? I'm using this same recipe below that is in test/simple.py but pdf.err returns "0" which I assume means that there are no errors, but when loading the .pdf file Adobe Acrobat say that it is damaged and not able to be opened.

pdf = pisa.CreatePDF(
cStringIO.StringIO(data),
    file(dest, "wb")
    )

if pdf.err:
    dumpErrors(pdf)
else:
    pisa.startViewer(dest)

pdf-frame-content not (always) working

I have a HTML that defines two extra frames, a header and a footer. In the HTML body I define two div elements, one with id "header" and one with id "footer", and in the CSS section I use the set them to show up on every page.

For some reason only the footer, and the regular content, is shown in the resulting PDF. The header is empty.

Here is the HTML:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
        <style>
            @page {
                size:a4;

                @frame {
                    top:1.5cm;
                    left:0.5cm;
                    right:0.5cm;
                    bottom:1.5cm;
                }

                @frame header {
                    -pdf-frame-content:header;
                    top:0.5cm;
                    left:0.5cm;
                    right:0.5cm;
                    height:0.5cm;
                }

                @frame footer {
                    -pdf-frame-content:footer;
                    bottom:0.5cm;
                    left:0.5cm;
                    right:0.5cm;
                    height:0.5cm;
                }
            }

            img {
                zoom: 80%;
            }

            .report-title {
                text-align:center;
            }

            .footer-left {
                text-align:left;
            }

            .footer-center {
                text-align:center;
            }

            .footer-right {
                text-align:right;
            }

        </style>
        <title>
            Document Title
        </title>
    </head>

    <body>
        <div id="header">
            <h1 class="report-title">
            Header Titel
            </h1>
        </div>

        <div id="content">
            <h1 class="report-title">
            Content Title 1
            </h1>

            <h2>
            Content Title 2
            </h2>

            <h2>
            Content Title 3
            </h1>

            <p>
                This is a test content
            </p>
        </div>

        <div id="footer">
            <table>
                <tr>
                    <td class="footer-left">
                        Footer left
                    </td>
                    <td class="footer-center">
                        Footer center
                    </td>
                    <td class="footer-right">
                        <pdf:pagenumber/>
                    </td>
                </tr>
            </table>
        </div>
    </body>
</html>

Recent changes broke my test case

Can't say which one is the problem but some recent change (may be some optimization) causes my test case to fail (producing malformed pdf).

`LayoutError: Flowable too large` when embedding large images

Hello, i have a lot of issues with content inside a table that does not fit in the page like this:

Flowable <PmlTable@0x091C87EC 4 rows x 5 cols(tallest row 713)> with cell(0,0) containing
'<PmlKeepInFrame at 0x91c89ec> size= maxWidth=188.1878x maxHeight=756.8504'(538.582677165 x 764.999997), tallest cell 714.0 points, too large on page 2 in frame 'body'(538.582677165 x 756.850393701*) of template 'body'

So how difficult would be for xhtml2pdf support css max-width and max-height? So one can hide what does not fit.

landscape output

would be great to have a -O --orientation flag for specifying landscape output

test-css-border functional test fails

It seems the CSS border has problems cascading the following to the frame: border-top: 12px solid blue; The color gets lost in translation, somehow.
See : https://www.tribaal.org/images/test-css-border.png

wsgi demo requirement

After setting up a virtual env and pip installing all package in the requirements.xml the demo/wsgi borks with this error.

(wsgi)mich@xgray:~/dev/xhtml2pdf/demo/wsgi (master)$ python pisawsgidemo.py
Traceback (most recent call last):
File "pisawsgidemo.py", line 40, in
import sx.pisa3.pisa_wsgi as pisa_wsgi
ImportError: No module named sx.pisa3.pisa_wsgi

<pdf:pagecount> issues with multiple templates

In my templates I'm using <pdf:pagenumber> of <pdf:pagecount>, but this produces weird results when combined with multiple templates. There are three templates for a total of three templates, and are used like <pdf:nexttemplate name="second">. The page numbers are shown using -pdf-frame-content: footer; and is the same in all the templates. However, the page numbering shows like this:

Page 1 of 1
Page 2 of 2
Page 3 of 3

When I drop the third template, so page 2 and 3 have the same template, it shows like this:

Page 1 of 1
Page 2 of 3
Page 3 of 3

`LayoutError: Splitting Error` on some line breaks

With some documents I get a LayoutError:

Splitting error(n==1) on page 16 in
<PmlParagraphAndImage at 0xb36cc20cL frame=contentPage>...

This seems to depend on where page breaks fall, since most very similar documents work. The offending pieces of the input HTML are a bunch of paragraphs with images:

<p><img src="foo.png" hspace="5" vspace="5" align="left" width="200" />blah blah blah</p>

No special CSS applies to them, and no special page break attributes afaik.

Is there any way to prevent this error? Get it to open a new page when this happens, for example?

How to implement setCharSpace for letter-spacing

ReportLab in the documentation:
http://www.reportlab.com/docs/reportlab-userguide.pdf

has this method:

textobject.setCharSpace(charSpace)

I am trying to implement it but for some reason even in line 1618 of reportlab_paragraph.py I hardcode the method tx.setCharSpace(10) and then it calls canvas.drawText(tx) I am still not seeing it go through. Chris maybe you have some better ideas?

ZeroDivisionError: float division by zero

Hi,
I get this error while trying to parse an HTML containing the following piece of code.
I'm using the latest versions of all packages needed:

html5lib-0.90
pyPdf-1.13
reportlab-2.5
xhtml2pdf-0.0.3

and Python 2.7 (2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)])

Python Code:
-[
import cStringIO as StringIO
from xhtml2pdf import pisa
....

html = '''
<TABLE BORDER="0" CELLPADDING="2" CELLSPACING="2">
<TR>
<TD></TD>
</TR>
</TABLE>
'''
dest = file('test.pdf', "wb")
pdf = pisa.CreatePDF(
StringIO.StringIO(html),
dest,
log_warn = 1,
log_err = 1
)
]-

Note: If I put something inside the TD (example: ".... <TD>... some stuff..... </TD>........") or I change the value of the attr cellpadding, it works!!!

Traceback:
-[
Traceback (most recent call last):
File "C:\tmp\test.py", line 95, in
log_err = 1
File "C:\Python27\lib\site-packages\xhtml2pdf\document.py", line 131, in pisaDocument
doc.build(context.story)
File "C:\Python27\lib\site-packages\reportlab\platypus\doctemplate.py", line 880, in build
self.handle_flowable(flowables)
File "C:\Python27\lib\site-packages\reportlab\platypus\doctemplate.py", line 763, in handle_flowable
if frame.add(f, canv, trySplit=self.allowSplitting):
File "C:\Python27\lib\site-packages\reportlab\platypus\frames.py", line 174, in _add
flowable.drawOn(canv, self._x + self._leftExtraIndent, y, _sW=aW-w)
File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 108, in drawOn
self._drawOn(canvas)
File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 89, in _drawOn
self.draw()#this is the bit you overload
File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 1302, in draw
self._drawCell(cellval, cellstyle, (colpos, rowpos), (colwidth, rowheight))
File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 1393, in _drawCell
w, h = self._listCellGeom(cellval,colwidth,cellstyle,W=W, H=H,aH=rowheight)
File "C:\Python27\lib\site-packages\xhtml2pdf\xhtml2pdf_reportlab.py", line 710, in _listCellGeom
return Table._listCellGeom(self, V, w, s, W=W, H=H, aH=aH)
File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 377, in _listCellGeom
vw, vh = v.wrapOn(canv, aW, aH)
File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 119, in wrapOn
w, h = self.wrap(aW,aH)
File "C:\Python27\lib\site-packages\xhtml2pdf\xhtml2pdf_reportlab.py", line 693, in wrap
return KeepInFrame.wrap(self, availWidth, availHeight)
File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 970, in wrap
W, H = func(s1)
File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 951, in func
W /= x
ZeroDivisionError: float division by zero
]-

Thanks for your great job,
Shen139

Issue with drawing background image for landscape page orientation.

Generation of the background image for landscape page orientation is currently broken.

ZeroDivisionError Caused by Empty Table Rows

If the HTML you send to pisa.CreatePDF() contains "" then a ZeroDivisionError exception is thrown. The traceback goes to reportlab/platypus/flowables.py in func, line 951. Removing the empty table row resolves the error.

/doc/usage.rst: close output file explicitly

Changed the import (see issue #93)
Opening and closing the output file explicitly, otherwise the file will not be flushed instantly.

import xhtml2pdf.pisa (1)

def helloWorld():
filename = file + ".pdf" (2)
output = file(filename, "wb")
pdf = xhtml2pdf.pisa.CreatePDF( (3)
"Hello World",
output)
output.close()

if not pdf.err: (4)
xhtml2pdf.startViewer(filename) (5)

if name=="main":
xhtml2pdf.showLogging() (6)
helloWorld()

new keep-in-frame seems to fail functional tests

while it would seem like it works, the functional tests report that the scaling actually doesn't occur (see visual diff here: https://www.tribaal.org/images/keep-in-frame.png )

silent error if link_callback raise a Exception...

I check the path in pisaDocument.link_callback and raise a AssertionError. But this would be silently ignored :(

But raise_exception is True (default)

Add support for `page-break-before`/`-after` CSS property

When i export pdf heading is on one page and paragraphs are on the other page. http://www.w3.org/wiki/CSS/Properties/page-break-after

Pillow instead of PIL please?

I would love to see the requirements.txt mention pillow instead of PIL.

https://github.com/python-imaging/Pillow

`LayoutError: Flowable too large` when embedding large images

If I have two or more images in HTML with sum height bigger than pdf page height, then I have crash in reportlab! I'm not really know where is reportlab issues list, so I write it here, may be someone fix this bug.

Here is example HTML, which crashes:

<img src="http://escalibro.com/static/img/logo.png" height="500"><br>
<img src="https://secure.gravatar.com/avatar/f8f0734a3d7563e5504433dbef483472/?s=900" height="675">

Python-code:

pdf = pisa.pisaDocument(_content, pdf_file, raise_exception=False)

Traceback:

  File "/mnt/storage/projects/escalibro/project/apps/poetry/utils/html2pdf.py", line 9, in convert
    pdf = pisa.pisaDocument(_content, pdf_file, raise_exception=False)
  File "/mnt/storage/projects/escalibro/env/local/lib/python2.7/site-packages/xhtml2pdf/document.py", line 131, in pisaDocument
    doc.build(context.story)
  File "/mnt/storage/projects/escalibro/env/local/lib/python2.7/site-packages/reportlab/platypus/doctemplate.py", line 888, in build
    self.handle_flowable(flowables)
  File "/mnt/storage/projects/escalibro/env/local/lib/python2.7/site-packages/reportlab/platypus/doctemplate.py", line 801, in handle_flowable
    raise LayoutError(ident)
LayoutError: Flowable <PmlParagraph at 0x4dd9998 frame=body>(538.582677165 x 786.005338766) too large on page 2 in frame 'body'(538.582677165 x 785.196850394*) of template 'body'

Here is converted PDF with changed height of second image to 670px instead of 675px: http://bb.escalibro.com/other/reportlab_bug.pdf

PIL or Pillow?

I'm using Pillow (the excellent fork of PIL) instead of PIL. However, in setup.py, PIL is listed as a requirement, so pip constantly tries to install PIL, while Pillow is available. xhtml2pdf doesn't seem to bother if Pillow/PIL is used (both work), but I don't want both on my system. In other projects, neither PIL or Pillow is listed in setup.py, to give end-users freedom of choice. Would you consider changing the project's requirements?

requirements.txt not in distrib package.

requirements.xml is not distributed in the downloadable package:
http://pypi.python.org/pypi/xhtml2pdf/

https://github.com/chrisglass/xhtml2pdf:

"
file requirements.xml July 14, 2011 added requirements file [kgrodzicki] "

...

"Next step will be to install/upgrade dependencies from requirements.txt file:

pip install -r requirements.txt"

Installation documentation incorrect, inaccurate, or out of date:

requirements.txt isn't distributed in
http://pypi.python.org/packages/source/x/xhtml2pdf/xhtml2pdf-0.0.3.tar.gz#md5=13b0d6059b72c994473fddfa7a528451
from http://pypi.python.org/pypi/xhtml2pdf/. http://pypi.python.org/pypi/xhtml2pdf/ documentation is identical to https://github.com/chrisglass/xhtml2pdf.
requirements.txt is actually requirements.xml.
requirements.xml isn't distributed in the above linked package.
requirements.xml doesn't contain information parsable by the above listed installation method.

Unknown page size should raise an exception instead of showing blank pages

Hi there!

Thanks for the amazing application!!

When running xhtml2pdf -d -w test.html test.pdfI get this warning:

WARNING [ho.pisa] (...)pisa3/pisa_context.py line 279: Unknown size value for @page

...however, this should definitely be an exception as the actual result seems to be blank pages. The html in the broken file said:

@page
{
      size: a4 portrait;
...

Changing it to this was the solution:

@page
{
      size: A4;
...

Installing Requirements Fails

Using Arch Linux. I have followed the README exactly.

$ pip install -r requirements.xml
Downloading/unpacking PIL==1.1.7 (from -r requirements.xml (line 1))
  Real name of requirement PIL is PIL
  Downloading PIL-1.1.7.tar.gz (506Kb): 506Kb downloaded
  Running setup.py egg_info for package PIL
    Traceback (most recent call last):
      File "<string>", line 14, in <module>
      File "/xhtml2pdf/xhtml2pdfenv/build/PIL/setup.py", line 182
        print "--- using Tcl/Tk libraries at", TCL_ROOT
                                            ^
    SyntaxError: invalid syntax
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 14, in <module>

  File "/xhtml2pdf/xhtml2pdfenv/build/PIL/setup.py", line 182

    print "--- using Tcl/Tk libraries at", TCL_ROOT

                                        ^

SyntaxError: invalid syntax

----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in /.pip/pip.log