Giter Club home page Giter Club logo

Comments (20)

tavinus avatar tavinus commented on August 16, 2024

Hi, I am not sure I understand everything, but I will try to explain what I can.

First, some questions:

  • What version of pdfScale are you using?
  • Can you post the verbose output of your run here (as code pls)?
  • Can you provide me with a PDF file example so I can try doing the same here?
  • Can you provide me with the resulting PDF that didn't go as expected?

You are running in mixed mode, resizing the paper and then scaling down.

1st Step - Resize

  • Always runs before scaling
  • Fits to page
    • Will resize paper and reposition the content
    • If the new paper has the same proportions, things will look the same
    • If the new paper has different proportions, the fitting of the contents will change
    • You can also disable the fit-to-page setting with --no-fit-to-page
  • By default it will try to detect if your are flipping portrait/landscape and correct it, you may disable this with -f disable or -f d
  • By default it will run ghostscript auto-rotation detection in auto mode, you can also disable this with -a none or -a n
  • And then you can even manually position things with vertical/horizontal alignment setting and x/y offset settings, here is the help description for it
 --hor-align, --horizontal-alignment <left|center|right>
             Where to translate the scaled page
             Default: center
             Options: left, right, center
 --vert-align, --vertical-alignment <top|center|bottom>
             Where to translate the scaled page
             Default: center
             Options: top, bottom, center
 --xoffset, --xtrans-offset <FloatNumber>
             Add/Subtract from the X translation (move left-right)
             Default: 0.0 (zero)
             Options: Positive or negative floating point number
 --yoffset, --ytrans-offset <FloatNumber>
             Add/Subtract from the Y translation (move top-bottom)
             Default: 0.0 (zero)
             Options: Positive or negative floating point number

2nd Step - Scale

  • Never affects page size, whatever came from step 1, will be kept
  • Will zoom contents inside the page, which may bleed "outside" the page

At MacOSX, there may be a problem if you are trying to process a PDF file that was just created by another script step, which may not yet have spotlight's metadata. That will happen if mdls is used because it uses this metadata. This will only happen if the PDF was just created miliseconds before.

You can force another method of size detection to make sure this is not a problem though. Try installing (from homebrew) imagemagick or xpdf and using -m i or -m p to force using one of them.

Next version will have ghostscript detection, so this should not be a problem anymore.


Let me know if this helps.
Cheers! 🍺
Gus

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

BTW, postscript points are integers.
They should be rounded to integers even if you use floating point numbers (from what I recall).
I remember that Ghostscript would not like to receive Points as floats.

Can you try it using integers for custom paper size? Or use metric.
Can you also try using a pre-defined paper size like letter or A4?

EDIT3: Does the merged (input) PDF has each page with different paper size maybe?

from pdfscale.

Cellomaster87 avatar Cellomaster87 commented on August 16, 2024

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

Hi, there are no links at your post.

Seems like the upgrade is broken on MacOSX because it uses BSD's readlink instead of GNU's.
I have already found a bash implementation of that to replace it, will come at the next version. This is related to #17

To force imagemagick, just add -m i to your call.

You can paste images here, so maybe the screenshots would help me understand the results.

Would be nice to have the actual PDF's as well though.

from pdfscale.

Cellomaster87 avatar Cellomaster87 commented on August 16, 2024

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

Yep, now I got it.
Downloaded and checking.

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

Still investigating, but this is what I have so far:

  • File Two.pdf does have some weird stuff coming with the /Mediabox definition (from grep).
  • Even though this causes an error, the page size seems to be processed accordingly.
  • Using imagemagick (or pdfinfo) will solve this problem (add -m i to call).

The verbose run of Two.pdf

$ pdfscale -v -r 'custom mm 232 305' -s 0.985 Two.pdf
pdfscale v2.4.9 - Verbose Execution
   Mixed Tasks: Resize & Scale
       Dry-Run: FALSE
    Input File: Two.pdf
   Output File: Two.CUSTOM.SCALED.pdf
 Get Page Size: Adaptive Enabled
        Method: Grep
/usr/local/bin/pdfscale: line 1497: warning: command substitution: ignored null byte in input
  Source Width: 595 postscript-points
 Source Height: 842 postscript-points
    Print Mode: Print ( auto/empty )
   Fit To Page: Enabled (default)
   Auto Rotate: PageByPage
   Flip Detect: No change needed
  Run Resizing: CUSTOM ( 658 x 865 ) pts
     New Width: 658 postscript-points
    New Height: 865 postscript-points
  Scale Factor: 0.985
    Vert-Align: CENTER
     Hor-Align: CENTER
 Translation X: 5.01 = 5.01 + 0.00 (offset)
 Translation Y: 6.59 = 6.59 + 0.00 (offset)
   Run Scaling: -1 %
    Background: No background (default)
  Final Status: File created successfully

The error is here:

/usr/local/bin/pdfscale: line 1497: warning: command substitution: ignored null byte in input

But as mentioned the page size is parsed correctly and the execution seems to proceed without problems.

This is what the grep call returns on One.pdf and Two.pdf

One.pdf

$ grep -a -e '/MediaBox' -m 1 ./One.pdf
/MediaBox [0 0 595.000000 842.000000]

Two.pdf

$ grep -a -e '/MediaBox' -m 1 ./Two.pdf
ðV ù(Õ��çKp       �a§��uV4L��ò×ç]áÐ�Àxú©AÖ0�àt~îSD?�NT�Äg¢jO�§|®I�O|C|%´�áÑu?k�Óºá�º�òÛ JÀz�È_H/üÛ
<</Contents[1422 0 R 1423 0 R 1424 0 R 1425 0 R 1426 0 R 1427 0 R 1428 0 R 1430 0 R]/CropBox[0 0 595.2756 841.8898]/MediaBox[0 0 595.2756 841.8898]/Parent 1400 0 R/Resources 1437 0 R/Rotate 0/T<</Filter/FlateDecode/First 72/Length 642/N 8/Type/ObjStm>>stream

Those weird chars are what causes the parsing problems.
Maybe I can run it through a pipe with strings or cat to mitigate the problem (eg.)

$ strings Two.pdf | grep -e '/MediaBox' -m 1
<</Contents[1422 0 R 1423 0 R 1424 0 R 1425 0 R 1426 0 R 1427 0 R 1428 0 R 1430 0 R]/CropBox[0 0 595.2756 841.8898]/MediaBox[0 0 595.2756 841.8898]/Parent 1400 0 R/Resources 1437 0 R/Rotate 0/Type/Page>>

But as mentioned, you can use -m i to solve this as well.

However, this does not seem to be the problem, since the page size is parsed correctly (even with the error).

Please note how complex the second PDF definition is and how it has a lot more stuff than the other file has. I would guess that these other things are interfering with the result.


Honestly, I am still not 100% sure I understand what the problem is?
It is a bit confusing, but seems like the proportions of the original file is maintained, right?

The resulting MediaBox size seems to be correct, so pdfScale seems to be working properly, but the CropBox seems to be keeping the original proportion and that is what ends up rendering on screen.

I am not sure why you have a cropbox defined. From what I understand, that is used in pre-press to define a page with a bleed. So they can print it a bit bigger than the actual needed size and then cut the excess later (for a better finishing and no borders).

So maybe you can config the Acrobat merger in order for it to not define a cropbox?
I would try to tinker with the merger options to see if it makes any difference.

Here are some explanations on the PDF boxes:

Anyways, let me know if this helps.
I recommend using Lightshot to create screenshots (copy to memory) and then you can just paste them here (ctrl + V). You can save the image and drag+drop here as well.


While writing this I made a few more tests and got some new info:

Example run

$ pdfscale -m i -v -r 'custom mm 232 305' -s 0.985 Two.pdf
Checking for imagemagick's identify
pdfscale v2.4.9 - Verbose Execution
   Mixed Tasks: Resize & Scale
       Dry-Run: FALSE
    Input File: Two.pdf
   Output File: Two.CUSTOM.SCALED.pdf
 Get Page Size: Adaptive Disabled
        Method: ImageMagick's Identify
  Source Width: 595 postscript-points
 Source Height: 842 postscript-points
    Print Mode: Print ( auto/empty )
   Fit To Page: Enabled (default)
   Auto Rotate: PageByPage
   Flip Detect: No change needed
  Run Resizing: CUSTOM ( 658 x 865 ) pts
     New Width: 658 postscript-points
    New Height: 865 postscript-points
  Scale Factor: 0.985
    Vert-Align: CENTER
     Hor-Align: CENTER
 Translation X: 5.01 = 5.01 + 0.00 (offset)
 Translation Y: 6.59 = 6.59 + 0.00 (offset)
   Run Scaling: -1 %
    Background: No background (default)
  Final Status: File created successfully

Notes

  • Source Width/Height is always correct (even when using grep with the error)
  • Target Width/Height is also correct
    • PTS ( 658 x 865 ) == MM ( 232 x 305 )
  • The resulting /Mediaboxes have the correct size
$ strings Two.CUSTOM.SCALED.pdf | grep -e '/MediaBox'
<</Type/Page/MediaBox [0 0 658 865]
<</Type/Page/MediaBox [0 0 658 865]
<</Type/Page/MediaBox [0 0 658 865]
. . . 
  • The resulting /Cropboxes are the ones keeping the proportion of the page
    • Their sizes are ( 634.808105 x 865.0 )
$ strings Two.CUSTOM.SCALED.pdf | grep -e '/CropBox'
/CropBox [23.191925 0 634.808105 865.0]
/CropBox [23.191803 .00003051758 634.808228 865.0]
/CropBox [23.191925 0 634.808105 865.0]
/CropBox [23.191925 0 634.808105 865.0]
/CropBox [23.191925 0 634.808105 865.0]
/CropBox [23.191803 .00003051758 634.808228 865.0]
/CropBox [23.3735352 .00003051758 634.626465 865.0]
. . .

So we at least know where the problem is now, but I sill don't know what path I should take to solve this yet.

This post seems to shed some light on the /Cropbox issue and offers a workaround.
https://stackoverflow.com/a/26989410/1273636

Your file also has a /Cropbox defined for EACH page as in the question above.

I will keep researching it.

Cheers!
Gus

from pdfscale.

Cellomaster87 avatar Cellomaster87 commented on August 16, 2024

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

Seems like I have a solution to bypass the Cropboxes with the new sizes.

I am still researching the best way to implement it though.
I don't think I want to always apply the Cropboxes.
I would prefer that files like One.pdf that do not contain any \Cropboxes defined to keep it as is, without adding a cropbox on each page. I am not sure I can detect this automagically in all cases and then apply the change.

I am inclined to just add a cli parameter that will redefine all Cropboxes to the same size as the paper (Mediabox). This will be easy to implement and run, but will not be a universal/automatic solution (which would be nice).

Gus

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

The problem is thinking on all possible outcomes and situations.

As mentioned before, this only applies to resizing, for scaling this is all irrelevant.

Possibilities

  • Resize Cropbox proportionally to the original
    • Seems to be the current behavior, even though unintended
  • Resize Cropbox to same size as Mediabox
    • Seems to be what you want, but may not always be the case
    • Seems to be the only solution for files with Cropboxes defined
  • Resize Cropboxes to custom size
    • Allows to set different Mediabox / Cropbox sizes
    • This may be important for printing jobs with bleeds

Options for now

  • Add a parameter to set a custom /CropBox size (independent to the /MediaBox size)
  • Have a parameter that sets the size of the /CropBox the same as the /MediaBox

Detecting Cropboxes would be nice, but the complexity grows a lot. Detecting is the first problem, since it may not always work (seems to be exactly the same as Mediabox detection). There is also no clear definition on what default behaviour should be used on each case, since it will always depend on what the user actually wants.

Still digging and thinking here.
I will add the readlink bash implementation to fix the MacOSX installer while I think a bit more.

from pdfscale.

Cellomaster87 avatar Cellomaster87 commented on August 16, 2024

Good morning Gus!
I have read now everything and those three articles you shared are just amazing!
Before even trying any of the tools suggested I followed the instructions and opened one of the resized files in Acrobat CC 2020, then in Preferences activate the showing of all boxes (which showed up the media box in blue around the page). Then, I went Edit > Crop Pages > double click on any page and noticed this:
Screenshot 2020-03-31 at 10 59 46
I therefore clicked on "Set to zero" under "Margin control" and this fixed the page for good.

Now, as doing this for every document I export would be as slow as manually converting them one by one, I wonder if the culprit may be the PDF export engine of Apple Pages. In macOS Finder, the infos of the combined file shows Pages in the "Created by" field, which is strange as the combining software was Acrobat.
My suspicion is that the encoding softwares being different between Pages and Sibelius cause issues. In Pages the encoding software is "macOS Version 10.15.3 (Build 19D76) Quartz PDFContext", while in Sibelius it is "Qt 5.12.5".
While pdfScale has no issue in converting each one of them separately, it gets probably rightfully confused when it has to convert a PDF made up of 2A+B+2A where A is a Pages created PDF and B is a Sibelius created one.

From the screenshot you can see that margins were added to the Crop Box.
Would it be possible to set its margins to 0, simply? Maybe putting a condition that would check if the encoding software between the components is different (don't know if this is at all possible).
I think setting the size of one to the other would not be desirable as, as explained in the articles, sometimes one would want the media box to be bigger. We just want the CropBox to be itself, unmodified, therefore removing the margins would be enough I guess.

What do you think?

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

Hi, things went a bit hectic yesterday, so I could no finish anything.
Also, the readlink -f implementation for non-GNU systems is also being a pain to test and implement (this will fix the installer/upgrader on MacOSX, Solaris, etc).

Anyways, for your specific use case I already have a solution (which will be to reset all cropboxes to the same size as the Mediabox by issuing an execution flag). On top of that I will also add the option
to manually change the Cropbox to a custom size.

It will probably be something like

--cropbox a4
--cropbox 'custom mm 200x200'
--cropbox fullsize (or any other appropriate name)

So it will probably be similar to the regular page size definition.

from pdfscale.

Cellomaster87 avatar Cellomaster87 commented on August 16, 2024

Thank you so much!
I really appreciate all this!
Looking forward to seeing this in action!

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

Hi,
I have pushed a new Branch so we can test it before I merge and release it.

https://github.com/tavinus/pdfScale/tree/v2.5

Can you please try it and let me know?
I had trouble to test on MacOSX (currently only have a VM and it does not work very well).

Things to note

  • The Installer and the Upgrader should now work on MacOSX (can you please test them for me?)
    • Please note that the Upgrader will end up failing because it will download the old version from the master branch (for now). But if the installer of the new version works, the upgrader will also work after merging it to the master branch.
  • The GREP page size detection should not have any errors with your file Two.pdf anymore (uses strings)
  • -c | --cropbox option added

Here is the --help explanation for the --cropbox parameter

 -c, --cropbox <paper>
             Resets Cropboxes on all pages to a specific paper size
             Only applies to resize mode
             <paper> can be: full | fullsize - Uses the same size as the main paper/mediabox
                             custom          - Define a custom cropbox size in inches, mm or points
                             std paper name  - Uses a paper size name (eg. a4, letter, etc)

So on your case you should just add -c full to your pdfScale call and you should be fine.


EDIT
v2.5.2 fixes a problem with curl redirects that was breaking upgrades.

from pdfscale.

Cellomaster87 avatar Cellomaster87 commented on August 16, 2024

The --upgrade is still not working for me on Catalina. Here is the Terminal output:

pdfScale.sh --upgrade
readlink: illegal option -- f
usage: readlink [-n] [file ...]
pdfScale.sh v2.4.9 - Self Upgrade

Preparing download to temp folder
 > /tmp/pdfScale_20200404-160012.tar.gz
Downloading file with curl

Extracting compressed file
Extraction error.

Cleaning up downloaded files from /tmp
 > /tmp/pdfScale_20200404-160012.tar.gz > Ok
 > no temporary master folder was found to remove

I have installed it manually and the version is not correctly 2.5.2.
I ran the script adding -c full and it works perfectly.
Do you have any suggestion on how to make this script automatically be applied on the content of a folder of PDFs and possibly adding a suffix to the name of the output? Or is the original Automator I pasted in the beginning the best thing?

Thank you so much for this!
I stay at your disposal for testing the upgrading issue on macOS.

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

The --upgrade is still not working for me on Catalina. Here is the Terminal output:

pdfScale.sh v2.4.9 - Self Upgrade

^ This was running 2.4.9, so it is normal for it not to work. Only 2.5.2 will run the upgrade properly on Macs (even though it will offer the older version, with a warning).

Proceeding with the upgrade will downgrade (until I merge with the master branch).

I was able to test on a Yosemite VM (which is when I found the problem with curl that was patched on 2.5.2).

I ran the script adding -c full and it works perfectly.
Do you have any suggestion on how to make this script automatically be applied on the content of a folder of PDFs and possibly adding a suffix to the name of the output? Or is the original Automator I pasted in the beginning the best thing?

Your automator script seems fine to what you need and I can't think any reason for it not to work with the new version.

Would be nice to have batch processing for folders included into pdfScale, but I am not sure I will be able to do it right now.

I will probably merge with master today, so everything will be easier to test and the upgrade will not downgrade anymore.

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

To install using the v2.5 branch you need to adjust the URLs
changing master to v2.5 >

# Normal install with prompts
curl -s -o /tmp/pdfScale.sh 'https://raw.githubusercontent.com/tavinus/pdfScale/v2.5/pdfScale.sh' && bash /tmp/pdfScale.sh --install

# Automated install with --assume-yes
curl -s -o /tmp/pdfScale.sh 'https://raw.githubusercontent.com/tavinus/pdfScale/v2.5/pdfScale.sh' && bash /tmp/pdfScale.sh --install --assume-yes

# To ignore SSL, use --insecure
curl --insecure -s -o /tmp/pdfScale.sh 'https://raw.githubusercontent.com/tavinus/pdfScale/v2.5/pdfScale.sh' && bash /tmp/pdfScale.sh --install

from pdfscale.

fabern avatar fabern commented on August 16, 2024

I believe I had a similar problem resizing PDFs in letter format to A4. Sorry if this is hijacking this thread. Just wanted to give feedback that using --cropbox A4 works marvellously for me.

Here is an example of a scientific article in letter format: https://www.hydrol-earth-syst-sci.net/23/303/2019/hess-23-303-2019.pdf

The standard command doesn't yield the desired result. Although there are some differences in the dimensions of the Media and Crop Box with respect to the original file. It shows the pdf still in letter format:
pdfscale -v -r A4 Downloads/hess-23-303-2019.pdf

Using the --cropbox argument effectively modifies how the pdf is shown:

bernharf@bernstein:~|⇒  pdfscale -v -r A4 --cropbox A4 Downloads/hess-23-303-2019.pdf
pdfscale v2.5.3 - Verbose Execution
   Single Task: Resize PDF Paper
       Dry-Run: FALSE
    Input File: Downloads/hess-23-303-2019.pdf
   Output File: Downloads/hess-23-303-2019.A4.pdf
 Get Page Size: Adaptive Enabled
        Method: Grep
  Source Width: 612 postscript-points
 Source Height: 802 postscript-points
    Print Mode: Print ( auto/empty )
  Scale Factor: Disabled (resize only)
   Fit To Page: Enabled (default)
   Auto Rotate: PageByPage
   Flip Detect: No change needed
  Run Resizing: A4 ( 595 x 842 ) pts
 Cropbox Reset: A4 ( 595 x 842 ) pts
  Final Status: File created successfully

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

Not hijacking at all. Thanks for the feedback @fabern

From what I tested, most problematic PDFs had different cropbox sizes on different pages (some where very close but still a bit different).

If you want the cropbox reset to the SAME size as you are resizing, -c full should be the best option (will use the same size as the main resize in any case without the need to specifically set a size).

from pdfscale.

tavinus avatar tavinus commented on August 16, 2024

Ok,

v2.5.3 was merged and released and the v2.5 branch was deleted.
Feel free to report any problems.

Cheers 🍻
Gus

from pdfscale.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.