Giter Club home page Giter Club logo

Comments (23)

coolwanglu avatar coolwanglu commented on August 19, 2024

Right, I'll do that.

from pdf2htmlex.

iapain avatar iapain commented on August 19, 2024

Thanks ๐Ÿ‘

from pdf2htmlex.

coolwanglu avatar coolwanglu commented on August 19, 2024

I cannot reproduce it, there should already a newline character there.
Could you please provide an affected PDF files?

from pdf2htmlex.

iapain avatar iapain commented on August 19, 2024

I can still produce it (only on mobile devices). Obvious fix is to introduce space between line div. Probably you may close it as it might be not relevant for some usecases.

from pdf2htmlex.

coolwanglu avatar coolwanglu commented on August 19, 2024

I guess it's a bug of the browsers then, I remember that for div, a 'newline' should be appended by the browser.

from pdf2htmlex.

iapain avatar iapain commented on August 19, 2024

I'd encourage to re-open this. As of Firefox 22 it can be now reproduced in both firefox and webkit based browsers. To repeat it, select text (multiline) and paste it. Last word would overlap with first work of newline.

from pdf2htmlex.

coolwanglu avatar coolwanglu commented on August 19, 2024

@iapain Let me try again.

from pdf2htmlex.

coolwanglu avatar coolwanglu commented on August 19, 2024

@iapain I cannot reproduce it, e.g. demo.pdf which is one of the demo files of pdf2htmlEX. Can you please provide an example PDF?

from pdf2htmlex.

iapain avatar iapain commented on August 19, 2024

@coolwanglu See this attached screenshot. I tried this demo.pdf as well and I was able to repeat it.

Steps:

  • Select text multiline. (as shown in image).
  • Paste this text. It'll not have extra space as it should have between last char of first and first char of next line.

case_without_space

from pdf2htmlex.

coolwanglu avatar coolwanglu commented on August 19, 2024

@iapain Which browser are you using?

from pdf2htmlex.

iapain avatar iapain commented on August 19, 2024

@coolwanglu That screenshot is from Firefox 22 but I can also repeat it on Google Chrome 27

from pdf2htmlex.

coolwanglu avatar coolwanglu commented on August 19, 2024

@iapain As we've discussed before, this has always been the behaviour of Chrome.

For Firefox 22, I've just tested on Windows and Linux (Ubuntu), If you select the text and paste them to a multi-line text editor, you will see the line breaks, but if you paste them to the location bar, the Linux version will consume all the line breaks, and the Windows version will convert them into whitespaces.

So I think it's how the location bar handles the line breaks, but the line breaks are there.

Can you please verify this?

from pdf2htmlex.

iapain avatar iapain commented on August 19, 2024

@coolwanglu You're right about Firefox but still it fails on both WebKit based browsers and IE 9/10.

Looks like:
Webkit omits newline char.
IE 9/10 preserver it as newline (and if you paste it then it keeps the text before newline char)
Firefox convert it to space.

In my opinion we should unify this behaviour.

from pdf2htmlex.

coolwanglu avatar coolwanglu commented on August 19, 2024

@iapain This is indeed an issues, and I'll reopen it. But I don't have a good solution right row.

from pdf2htmlex.

iapain avatar iapain commented on August 19, 2024

@coolwanglu I will try to patch this bug.

from pdf2htmlex.

coolwanglu avatar coolwanglu commented on August 19, 2024

@iapain Thanks! Maybe we can discuss about your solution before you implement it.

from pdf2htmlex.

iapain avatar iapain commented on August 19, 2024

@coolwanglu Possible solution is to get rid of newline char as in HTML it has no or little influence and substitute it with proper HTML equivalent. Do you have a better idea?

from pdf2htmlex.

coolwanglu avatar coolwanglu commented on August 19, 2024

@iapain I think you can add a <br> there, but I'm not sure how you can git rid of the newline there.

from pdf2htmlex.

iapain avatar iapain commented on August 19, 2024

@coolwanglu <br> is not really required, I was thinking about scanning line and if end of line is a new line char then just replace it with empty space.

from pdf2htmlex.

coolwanglu avatar coolwanglu commented on August 19, 2024

@iapain We should never modify the content unless necessary. When you see a char 0xa, which is supposed to be a char, it might be actually something else, due to the evil encodings of the fonts, which is exactly the reason that sometimes characters could be lost due to --space-as-offset.

Also in PDF file there are rarely actually line break characters, afaik, instead text are simply repositioned with a PDF instruction.

from pdf2htmlex.

iapain avatar iapain commented on August 19, 2024

@coolwanglu You're correct about that but I found much more simpler way. It'd be great if you can test it with IE. I have tested it with gecko and webkit.

from pdf2htmlex.

coolwanglu avatar coolwanglu commented on August 19, 2024

@iapain No it didn't work. I've tested on Firefox and Chrome on Windows.
I remembered that all whitespaces between tags are ignored in HTML.

from pdf2htmlex.

ficolo avatar ficolo commented on August 19, 2024

How about using & n b s p ; before every < / d i v>? (it is a workaround but worked for me)

from pdf2htmlex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.