Giter Club home page Giter Club logo

Comments (7)

gmischler avatar gmischler commented on August 17, 2024 2

Complicated yes, because there seems to be a conflict between the default value False and a False value being an unset value.

Ah I see, but then this is the underlying problem that we should really fix, instead of trying to work around it.

The default value for the skip_leading_spaces parameter of ParagraphCollectorMixin.paragraph() should not be False, but rather None. Then we test for skip_leading_spaces is None before applying the regions default. This will make sure that an argument explicitly given to the paragraph will always take precedence, whether it is True or False.

from fpdf2.

gmischler avatar gmischler commented on August 17, 2024 1

Thanks for reporting this, @dmail00.

It looks like you're right, and the space skipping feature conflicts with <pre> sections.
Your attempted solution looks way too complicated though.
Since a <pre> tag sets its own state flag before triggerring a new paragraph, a rather small change should do the trick:

        self._paragraph = self._column.paragraph(
            text_align=align,
            line_height=line_height,
            skip_leading_spaces=not self._pre_formatted,
            top_margin=top_margin,
            bottom_margin=bottom_margin,
        )

We could also use some more tests in this area to cover the various possible combinations of tags.

Would you be interested to submit a PR?

from fpdf2.

dmail00 avatar dmail00 commented on August 17, 2024

Here is my hack to fix this and boy is it a hack :)

in html._new_paragraph add a parameter with a default of None. In the method if the param is not None and not the same as the current region value, then created a new region and cached the current. Then when creating the paragraph after ending the current check this value to supply for the constructor

    def _new_paragraph(
        self, align=None, line_height=1.0, top_margin=0, bottom_margin=0,skip_leading_spaces=None
    ):
        self._end_paragraph()
        self.align = align or ""
        if not top_margin and not self.follows_heading:
            top_margin = self.font_size / self.pdf.k
        if skip_leading_spaces is not None and skip_leading_spaces !=self._column.skip_leading_spaces:
            self._cached_column = self._column
            self._column = self.pdf.text_columns(skip_leading_spaces=skip_leading_spaces)
        self._paragraph = self._column.paragraph(
            text_align=align,
            line_height=line_height,
            skip_leading_spaces=skip_leading_spaces if skip_leading_spaces is not None else True,
            top_margin=top_margin,
            bottom_margin=bottom_margin,
        )
        self.follows_trailing_space = True
        self.follows_heading = False

When handling the start pre tag supply the new parameter to _new_paragraph

def handle_starttag(self, tag, attrs):
        if tag == "pre":
            self.font_stack.append((self.font_face, self.font_size, self.font_color))
            self.set_font(self.pre_code_font, self.font_size)
            self._pre_formatted = True
            self._new_paragraph(skip_leading_spaces=False)

In endtag when handling the pre tag assert that there is a cached region and reset it to the default and clear out the cached member instance

def handle_endtag(self, tag):
        ...
        if tag == "pre":
            self._end_paragraph()
            assert self._cached_column is not None
            self._column = self._cached_column
            self._cached_column = None

from fpdf2.

dmail00 avatar dmail00 commented on August 17, 2024

Complicated yes, because there seems to be a conflict between the default value False and a False value being an unset value.
I tried something similar earlier however it does not work because firstly the paragraphs value will use the regions value

https://github.com/py-pdf/fpdf2/blob/master/fpdf/text_region.py#L328

And then even if it did, ._end_paragraph() calls self._column.render() which calls self.collect_lines() which calls paragraph.build_lines which calls MultiLineBreak which still defaults to the regions value if the paragraphs value is False.

https://github.com/py-pdf/fpdf2/blob/master/fpdf/text_region.py#L121

Hence why I swapped out the region and later swapped it back in. As I said it is a hack as I am really not sure the correct method to apply in a clean manner without messing other things up that rely on the False value in paragraph defaulting to the regions value.

When a pre tag is found in the method handle_starttag a new paragraph is created, no value is provided for skip_leading_spaces , however even if False was provided (which is the defeault) this would be ignored and it would default to the region value, which as I noted about is set sensibibly to False.
I did attempt to set the paragraphs value to False after calling _new_paragraph in the "pre" tag handling but this also has no effect. This is because when collect_lines calls paragraph.build_lines the paragraphs value is set to False but it will then default to the regions value.

from fpdf2.

dmail00 avatar dmail00 commented on August 17, 2024

Here is the same hack yet more confined to the _end_paragraph method and using @gmischler suggestion.
Still ugly as sin, brittle and doesn't resolve the underlying issue.

 self._paragraph = self._column.paragraph(
            text_align=align,
            line_height=line_height,
            skip_leading_spaces=not self._pre_formatted,
            top_margin=top_margin,
            bottom_margin=bottom_margin,
        )
    def _end_paragraph(self):
        self.align = ""
        if self._paragraph:
            self._column.end_paragraph()
            our_context = (
                self.pdf._pop_local_stack()  # pylint: disable=protected-access
            )
            self._column.render()
            self.pdf._push_local_stack(our_context)  # pylint: disable=protected-access
            self._paragraph = None
            self.follows_trailing_space = True
        if self._pre_formatted:
            if self._pre_started:
                #exited the pre block reset the column
                assert self._cached_column #type: ignore
                self._column = self._cached_column #type: ignore
                self._cached_column = None
            else:
                #next paragraph will be pre contents
                self._cached_column = self._column
                self._column = self.pdf.text_columns(skip_leading_spaces=False)

from fpdf2.

dmail00 avatar dmail00 commented on August 17, 2024

@gmischler Have you actually tried that code?
As I have mentioned now three times that only handles one part of the problem, what about in build_lines where even if the variable is false (as is the default for the class) it defaults to the region value?

from fpdf2.

gmischler avatar gmischler commented on August 17, 2024

Have you actually tried that code?

No, I just tried to explain the concept.
If you find other places where things are not handled correctly, then of course those should also be fixed.
After the correct argument values or defaults have been picked when creating it, a paragraph should indeed just use its own value in build_lines(), and not refer to the regions settings anymore.

Skipping leading spaces was a relatively late and quick addition to the text region and line wrapping code, for the benefit of the HTML parser. At that time I simply made sure that the existing tests kept giving the correct results (in most cases: more correct than before). Unfortunately the existing tests didn't contain any leading whitespace in a <pre> segment, so I didn't catch this special case. Now that you have identified the problem, if you manage to clean it up for good, more power to you!

from fpdf2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.