Comments (23)
Right, I'll do that.
from pdf2htmlex.
Thanks ๐
from pdf2htmlex.
I cannot reproduce it, there should already a newline character there.
Could you please provide an affected PDF files?
from pdf2htmlex.
I can still produce it (only on mobile devices). Obvious fix is to introduce space between line div. Probably you may close it as it might be not relevant for some usecases.
from pdf2htmlex.
I guess it's a bug of the browsers then, I remember that for div, a 'newline' should be appended by the browser.
from pdf2htmlex.
I'd encourage to re-open this. As of Firefox 22 it can be now reproduced in both firefox and webkit based browsers. To repeat it, select text (multiline) and paste it. Last word would overlap with first work of newline.
from pdf2htmlex.
@iapain Let me try again.
from pdf2htmlex.
@iapain I cannot reproduce it, e.g. demo.pdf
which is one of the demo files of pdf2htmlEX. Can you please provide an example PDF?
from pdf2htmlex.
@coolwanglu See this attached screenshot. I tried this demo.pdf as well and I was able to repeat it.
Steps:
- Select text multiline. (as shown in image).
- Paste this text. It'll not have extra space as it should have between last char of first and first char of next line.
from pdf2htmlex.
@iapain Which browser are you using?
from pdf2htmlex.
@coolwanglu That screenshot is from Firefox 22 but I can also repeat it on Google Chrome 27
from pdf2htmlex.
@iapain As we've discussed before, this has always been the behaviour of Chrome.
For Firefox 22, I've just tested on Windows and Linux (Ubuntu), If you select the text and paste them to a multi-line text editor, you will see the line breaks, but if you paste them to the location bar, the Linux version will consume all the line breaks, and the Windows version will convert them into whitespaces.
So I think it's how the location bar handles the line breaks, but the line breaks are there.
Can you please verify this?
from pdf2htmlex.
@coolwanglu You're right about Firefox but still it fails on both WebKit based browsers and IE 9/10.
Looks like:
Webkit omits newline char.
IE 9/10 preserver it as newline (and if you paste it then it keeps the text before newline char)
Firefox convert it to space.
In my opinion we should unify this behaviour.
from pdf2htmlex.
@iapain This is indeed an issues, and I'll reopen it. But I don't have a good solution right row.
from pdf2htmlex.
@coolwanglu I will try to patch this bug.
from pdf2htmlex.
@iapain Thanks! Maybe we can discuss about your solution before you implement it.
from pdf2htmlex.
@coolwanglu Possible solution is to get rid of newline char as in HTML it has no or little influence and substitute it with proper HTML equivalent. Do you have a better idea?
from pdf2htmlex.
@iapain I think you can add a <br>
there, but I'm not sure how you can git rid of the newline there.
from pdf2htmlex.
@coolwanglu <br>
is not really required, I was thinking about scanning line and if end of line is a new line char then just replace it with empty space.
from pdf2htmlex.
@iapain We should never modify the content unless necessary. When you see a char 0xa
, which is supposed to be a char, it might be actually something else, due to the evil encodings of the fonts, which is exactly the reason that sometimes characters could be lost due to --space-as-offset
.
Also in PDF file there are rarely actually line break characters, afaik, instead text are simply repositioned with a PDF instruction.
from pdf2htmlex.
@coolwanglu You're correct about that but I found much more simpler way. It'd be great if you can test it with IE. I have tested it with gecko and webkit.
from pdf2htmlex.
@iapain No it didn't work. I've tested on Firefox and Chrome on Windows.
I remembered that all whitespaces between tags are ignored in HTML.
from pdf2htmlex.
How about using & n b s p ; before every < / d i v>? (it is a workaround but worked for me)
from pdf2htmlex.
Related Issues (20)
- How to convert a PDF form into a table label in HTML HOT 1
- Warning: Very difficult to get this to build or run HOT 1
- ๆ ๆณๆฅ็็ฝ้กตๆบไปฃ็
- Open at 100% width HOT 1
- Segmentation Fault HOT 1
- Embed background images into CSS instead of HTML
- Is this project dead? HOT 4
- running this image from nodejs program. HOT 1
- pdf2html is a wonderful tool.
- Memory leak for some pdf files
- How to compile pdf2htmlEX in CentOS 7?
- How to get the width of the div?
- How to get hidden element using --correct-text-visibility option?
- Official way to run it on ubuntu 18.04
- pdftohtmlex for ios HOT 1
- Problems with list symbols HOT 1
- compile error HOT 4
- is there a way to use em font-sizes instead of px
- how to building and run it on ubuntu16 HOT 2
- Option to generate images as in pdf f
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdf2htmlex.