Giter Club home page Giter Club logo

pdftoimage's People

Contributors

johanneseh avatar michalblaha avatar sungaila avatar tadelsucht avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pdftoimage's Issues

Big file sizes generate blank picturse

When you upload pdf files over 3kb the returned image turns blank. Can this be some buffer issue?
`

        IAsyncEnumerable<SKBitmap> images = Conversion.ToImagesAsync(data,null,100);
        await foreach (SKBitmap image in images)
        {
            var encoded = image.Encode(SKEncodedImageFormat.Jpeg, 100);
            imageList.Add(Convert.ToBase64String(encoded.ToArray()));
        }
        return new OkObjectResult(imageList);`

the variable data is a base64 string.
Operating os is windows 11.

Please make PdfDocument accessible.

Detailed feature request

Hi,

Are there any plans to publicly expose the PDFtoImage.Internals.PdfDocument class?

I would like an efficient way to get the page count and page images without requiring PDFtoImage to instantiate the PdfDocument class multiple times in the background.

Current Behavior

Calling PDFtoImage.Conversion.GetPageCount(..) first and later PDFtoImage.Conversion.ToImagesAsync(...) results in PdfDocument.Load(..) being called twice.

Requested Behavior

Introduce a class similar to PdfDocument with additional methods like the static methods defined in PDFtoImage.Conversion.*. These methods could be extension methods if preferred.

This high-level implementation could focus on efficient resource handling such as:

  • Reading the stream only once (if possible; is seeking required?) or at least reading metadata only once.
  • Reducing heap allocations.

Example code to demonstrate the desired API:

using var pdf = PdfDocument.Load(stream, ...);

var count = pdf.PagesCount;

SKBitmap image = await pdf.GetImageAsync(page: 1);

await foreach (var img in pdf.GetImagesAsync()) {
  // ..
}

Please let me know what you think. Does this feature align with your roadmap?

Kind regards,
René

Convert PDF to PNG problem

PDFtoImage version

4.0.2

OS

Windows

OS version

Windows 11 23H2

Architecture

x64

Framework

.NET (Core)

App framework

WinUI3 .Net7.0

Detailed bug report

Hello,

I'm trying to use the PDFtoImage package to convert a single page PDF file to a PNG image via the Conversion.SavePng() function.

The conversion works since a PNG file is created, but this image is truncated, at the bottom, on N pixels which are transparent!

I should point out that the PDF contains a square page and not a rectangular A4 type page.

The generated PNG image is square too, but a part, at the bottom, is missing.

I tried using different RenderOptions options but couldn't generate a correct PNG image.

Example PDF and PNG files for this problem are available via the links:
https://1drv.ms/i/s!AqhmCsNj4U4CyjugqM2Iv5hn297s?e=KKiqNd
https://1drv.ms/b/s!AqhmCsNj4U4CyjpmZPIBqUG1UR4t?e=sea6MF

Best regards

PDFToImage.Conversion pdf file size limit?

Question

Hello, We are wondering if there is a pdf file size limit for using the Conversion method. We are currently trying to convert a pdf base64string to an image using the Conversion.SaveJpeg method, but we are experiencing troubles when converting the pdf that we need to use. The file size of this is 500 kb. The program then ends up being stuck. However, when converting a smaller sized pdf with 50 kb or such, it works fine.

Is this something that is supposed to be handled in the method, or is it the case that we should find a work-around ourselves?

PDFtoImage closes stream after any operation done on it

First of all, thank you for making PDFtoImage, it's just what I was looking for:
A very lightweight package that does just that: converts PDFs to images.

I've been having some issues with it and was hoping you could provide a solution to them.
I'm using PDFtoImage version 1.2.1
.NET Core 3.1 (WPF Windows Application)

After using GetPageCount, ToImage, etc, I cannot use the same Filestream (pdfStream) otherwise I'll get an exception as follows:
"Cannot access a closed file"

Sample code:
using (FileStream fs = File.Open(pdfPath, FileMode.Open, FileAccess.Read, FileShare.Read)) { int imageNumber = 0; imageNumber = Conversion.GetPageCount(fs); System.Drawing.Image myImage = Conversion.ToImage(fs); //Exception occurs here }

Below is the stack trace:
" at System.IO.FileStream.get_Length()\r\n at PDFtoImage.PdfiumViewer.NativeMethods.FPDF_LoadCustomDocument(Stream input, String password, Int32 id)\r\n at PDFtoImage.PdfiumViewer.PdfFile..ctor(Stream stream, String password)\r\n at PDFtoImage.PdfiumViewer.PdfDocument..ctor(Stream stream, String password)\r\n at PDFtoImage.PdfiumViewer.PdfDocument.Load(Stream stream, String password)\r\n at PDFtoImage.Conversion.ToImage(Stream pdfStream, String password, Int32 page, Int32 dpi, Nullable1 width, Nullable1 height, Boolean withAnnotations, Boolean withFormFill)\r\n "

To remediate for this, I've had to copy to separate memorystreams, like:

using (FileStream fs = File.Open(pdfPath, FileMode.Open, FileAccess.Read, FileShare.Read)) { int imageCount = 0; using (MemoryStream fs2 = new MemoryStream()) { fs.CopyTo(fs2); imageCount = Conversion.GetPageCount(fs2); } System.Drawing.Image image = Conversion.ToImage(fs, null, 0); }

Would it be possible to add an extra argument/flag which defines if a filestream will be closed after operations are performed with it? To re-use the same filestream.

Thanks,

Memory leak

Hi, I was using your library in our project, but at big scale we found the memory leak, and server constantly crushing with out of memory. I wasn't able to reproduce that error on windows , but it pretty consistent on linux. I spent long time to find that leak, as it wasn't obvious where exactly the leak, as we are using other libraries which using unmanaged memory, finally I found this library is leaking, I had no time to dive and investigate the issue, is it inside the wrapper or pdfium itself. So please be aware of it.

Hope to optimize it

Through testing, LoadLibrary dll call to pdfium performs better than DllImport, especially during batch processing

Opening Pdfs with Password fails

09 September 15 th July PW 6BA065200.pdf
While making a call to ToImage method in Conversion with the correct password as shown below

using var inputStream = new FileStream(pdfFilePath, FileMode.Open, FileAccess.Read);
var image = Conversion.ToImage(inputStream, password, page: 0);

the code fails with the an Exception PDFtoImage.PdfiumViewer.PdfException: 'Password required or incorrect password'

at line 30 in file PDFtoImage\PdfiumViewer\PdfFile as after call to NativeMethods.FPDF_LoadCustomDocument in line 28

var document = NativeMethods.FPDF_LoadCustomDocument(stream, password, _id);
if (document == IntPtr.Zero)
throw new PdfException((PdfError)NativeMethods.FPDF_GetLastError());

*Attached a PDF file with password to test. Please use password 6BA065200 to test

PDF image conversion error

PDFtoImage version

4.0.1

OS

Windows

OS version

No response

Architecture

x64

Framework

.NET (Core)

App framework

No response

Detailed bug report

1.pdf
PDF image conversion error. This PDF file is a bit large. How can this library solve the problem of converting images from such large files

When using bounds, the output gets scaled to the same size as the input document

PDFtoImage version

4.0.1

OS

Windows

OS version

Windows 11

Architecture

x64

Framework

.NET (Core)

App framework

No response

Detailed bug report

I am extracting bits from PDF files using bounds. However it seems like the output is scaled to the size of the input document rather than using the size of the bounds, which is what I belive would be the correct behaviour.

Bounds.pdf is a 2:1 aspect ratio (2000pt x 1000pt) document containing a red square with a black border of size 500pt at (750, 250).

Sample repro:

using var input = File.OpenRead("Bounds.pdf"); 

var bounds = new RectangleF(750, 250, 500, 500); 

var dpi = 72; // For claifying the problem.

using var image = PDFtoImage.Conversion.ToImage(input, true, null, 0, new RenderOptions { Dpi = dpi, Bounds = bounds });

// BoundsCrop.png is cropped correctly but scaled to the size of the input document.
using var output = File.OpenWrite("BoundsCrop.png"); 
image.Encode(output, SKEncodedImageFormat.Png, 100);

// Expected: Width: 500, Height: 500
// Actual: Width: 2000, Height: 1000
Console.WriteLine($"Width: {image.Width}, Height {image.Height}"); 

Bounds.pdf
BoundsCrop.png

Consider making the PdfDocument struct public.

Detailed feature request

I am working on a project to render a pdf that has a lot of pages. I need to "recon" the document to get the number of pages and the sizes so that I can virtualize the process of getting the actual images. The PdfDocument struct would be perfect for this. Unfortunately, it is internal.

I understand the desire to keep the API simple but I feel there is a lot of value in exposing a few of the internals.

In-Use memory increasing every time when you load a new page

PDFtoImage version

4.0.0

OS

Windows

OS version

Hololens 2

Architecture

arm64

Framework

.NET Framework

App framework

UWP

Detailed bug report

Hi @sungaila
The memory usage is consistently growing with each new page load, suggesting that certain unreferenced objects are not being properly disposed of in the code. Consequently, the application crashes after loading several pages. On Hololens 2, the crash occurs after loading 40 pages, with an observed increase of around 20 MB in used memory per new page. On an Android phone, the app crashes after loading 15 pages. It's worth noting that I'm setting the Encode quality to 1000. Below is the code snippet used for loading a page.

 private async Task GetPage(int pageNo, MemoryStream pdfStream,OnPDFImageLoaded onPDFImageLoaded)
    {
        try
        {
            await Task.Run(() =>
            {
                MemoryStream imageStream = new MemoryStream();
                SKBitmap img = PDFtoImage.Conversion.ToImage(pdfStream, true, null, pageNo);
                img.Encode(imageStream, SKEncodedImageFormat.Png, quality);
                onPDFImageLoaded?.Invoke(true,"Success",imageStream);
            });
        }
        catch (Exception e)
        {
            onPDFImageLoaded?.Invoke(false, "Reason to fail:"+e.ToString(), null);
        }
    }

Color was not correct

I use PDFtoImage to Convert PDF file first page to PNG image
This is the original Pdf screenshot.
Original Pdf screenshot
This is the convert result
Convert Image
This is the original Pdf file
322145774-Coca-Cola-by-Group-5.pdf
You can see the color was not right.
How can I fix it?

Here is my code

            using var inputStream = new FileStream(inputFile, FileMode.Open, FileAccess.Read);
            string outputPath = Path.Combine(outputFolder, outputFileName + "_200.png");
            Conversion.SavePng(outputPath, inputStream, null, 0, 600, 200, 200);

Export to SVG

Question

Is it possible to draw a PDF onto the SKSvgCanvas to be able to export as SVG?

Anti-Alias being applied causing distortion to barcodes

Question

Not sure if its a problem with this library, or an issue with PDFium or I'm just not getting the right combination of parameters.

I'm trying to print shipment labels to a Zebra GK420d printer (203dpi), using PDFtoZPL and I've got it mostly working, however I noticed some issues with the 2D datamatrix barcode, where the elemets are rounded instead of square, I think I've found the culprit being the PDF to Image conversion, using this code

var pdfBytes = File.ReadAllBytes("example-label.pdf");

var images = PDFtoImage.Conversion.ToImages(pdfBytes, dpi:203, width:812, height:1218);

foreach (var image in images)
{
    var data = image.Encode(SKEncodedImageFormat.Png, 100);
    File.WriteAllBytes("label-test.png", data.Span.ToArray());
}

It seems anti-aliasing has been applied
image
Which is fine for the text elements of the label, but could result in issues reading the barcodes, especially after the image goes through the Image to ZPL conversion, although that might be tweak-able by adjusting the threshold value, but seems a bit to fragile to rely on. I thought it might be related to the scaling caused by the dpi, height and width settings, and removing those 3 from the above code seems to remove the anti-aliasing from the 2D datamatrix, but not the rest of the label, which is just plain confusing.
image

Original PDF file, 300dpi formatted to print on 4inch * 6 inch labels

PNG file generated by above code
label-test

PNG file generated by above code with dpi,width and height not set
label-test-no-params

Replacing System.Drawing.Common (GDI+) with SkiaSharp

Support for System.Drawing.Common (GDI+) on other platforms than Windows will stop with .NET 6.0 and newer:
System.Drawing.Common only supported on Windows

This project used GDI+ for encoding the bitmap provided by PDFium when rendering. PDFium just returns a pointer to the memory it stored its rendered PDF bitmap.

GDI+ supports BMP, GIF, JPEG, PNG and TIFF as output image formats.

I decided for SkiaSharp as a cross-platform replacement. However, it has two drawbacks found so far:

  • BMP, GIF and TIFF are not supported. These need third-party libs to reimplement these image formats.
  • The NuGet packages becomes even larger. Now both pdfium and SkiaSharp native libraries have to be included.

On the positive side:

  • WebP added as supported image format.
  • No need to install libgdiplus on Linux and macOS anymore.

Loading of pdfium.dll fails when running in an IIS context

Because IIS is shadow copying files you will get a path like C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\b7dc711f\a2c2b98f\assembly\dl3\8b76e845\0097b45e_1094d701\ when you call Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location), and not the original path of the dll.

I'm not sure how to fix this in a way that works across all runtimes and environments. It would be nice if you could provide a pdfium.dll path yourself.

AOT issue

Thx for project!
Sadly issue:
PDFtoImage.Conversion.GetPageCount and PDFtoImage.Conversion.ToImage cause deadlock (permanent loop) after AOT compilation
(Tested with all version AOT native)

Unable to build Blazor project with .NET 8

PDFtoImage version

3.0.0

OS

Windows

OS version

Windows 11

Architecture

x64

Framework

.NET (Core)

App framework

Blazor

Detailed bug report

I cloned the repository and tried to run the WebConverter project on .net8 instead of .net7. I'm unable to get the build working and get the following errors in the build output

2>EXEC : warning : To disable errors for undefined symbols use `-sERROR_ON_UNDEFINED_SYMBOLS=0`
2>EXEC : warning : _emscripten_longjmp may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library
2>EXEC : error : undefined symbol: emscripten_longjmp (referenced by top-level compiled C/C++ code)
2>EXEC : error : Aborting compilation due to previous errors
2>emcc : error : '"C:\Program Files\dotnet\packs\Microsoft.NET.Runtime.Emscripten.3.1.34.Node.win-x64\8.0.0\tools\bin\node.exe" "C:\Program Files\dotnet\packs\Microsoft.NET.Runtime.Emscripten.3.1.34.Sdk.win-x64\8.0.0\tools\emscripten\src\compiler.js" C:\Users\dev\AppData\Local\Temp\tmpyv8t97kd.json' failed (returned 1)
2>C:\Program Files\dotnet\packs\Microsoft.NET.Runtime.WebAssembly.Sdk\8.0.0\Sdk\WasmApp.Native.targets(513,5): error MSB3073: The command "emcc "@C:\Program Files\dotnet\packs\Microsoft.NETCore.App.Runtime.Mono.browser-wasm\8.0.0\runtimes\browser-wasm\native\src\emcc-default.rsp" -msimd128 "@C:\Program Files\dotnet\packs\Microsoft.NETCore.App.Runtime.Mono.browser-wasm\8.0.0\runtimes\browser-wasm\native\src\emcc-link.rsp" "@C:\dev\PDFtoImage-master\src\WebConverter\obj\Debug\net8.0\wasm\for-build\emcc-link.rsp"" exited with code 1.
2>    5 Warning(s)
2>    4 Error(s)

Has anyone tried running PDFtoImage on Blazor WASM in .NET8?

runtimes/** dirs & files aren't copies during deploy/publish - .net 5 web app.

  • asp.net core 5 application
  • TargetFramework: net5.0
  • Runtime: win-x64
  • PDFtoImage referenced as nuget

publishing with dotnet publish with above parameters.

directory runtimes/* with *.dll and *. *.dylib files is not copied to the destination.

Compilation in Visual Studio 2019 is ok, /runtimes/** files are copied.

Type initializer exception on Android

PDFtoImage version

3.0.0

OS

Android

OS version

Android 12+

Architecture

arm64

Framework

(Other)

App framework

.NET 4.6.2 (.NET standard 2.1)

Detailed bug report

Hi,

I am trying to use this lib in a very special context : Android Unity app.

We were, until now, using UWP methods to read a pdf file as a byte array. But now trying to migrate onto Android, this seemed to be the solution. In the editor, everything works great, but in the Quest 3 helmet (Android), we have this exception

The type initializer for 'PDFtoImage.PdfiumViewer.NativeMethods' threw an exception. [line: 105]

When calling this method

pageCount = Conversion.GetPageCount(documentStream, false);

with document stream :

using (UnityWebRequest www = UnityWebRequest.Get(pdfFileName))
{
				www.SendWebRequest();
				while (!www.isDone)
				{
					if (www.result == UnityWebRequest.Result.ConnectionError)
					{
						throw new Exception($"Connection Error while reading File '{pdfFileName}'");
					}
				}

				if (www.result == UnityWebRequest.Result.ConnectionError)
				{
					throw new Exception($"Connection Error while reading File '{pdfFileName}'");
				}

				pdfData = www.downloadHandler.data;

				using (var documentStream = new MemoryStream(pdfData));

(the only way to access streamingassets files on android, but this code is also called in the editor and works.)

Do you have any clue on where this is coming from ?

There is no lib or dll error prior to this, and I have the Pdfium .so for Android arm64 imported.

UPM couldn't fetch

PDFtoImage version

4.0.1

OS

Windows

OS version

No response

Architecture

x64

Framework

.NET Framework

App framework

Unity

Detailed bug report

Unity package manager sometimes couldn't fetch it properly and also if it achieves to load, it is so slow

System.DllNotFoundException: '/data/data/com.companyname.pos.apphost/files/.__override__/runtimes/android-x64/native/libpdfium.so'

I'm testing PDFtoImage.Conversion.ToImage() in .NET MAUI project that targets windows and android.
It works fine in windows platform but throws an exception in android:

System.DllNotFoundException: '/data/data/com.companyname.pos.apphost/files/.override/runtimes/android-x64/native/libpdfium.so'

I'm testing on Android Emulator: Android 12.0 - API 31
image

I installed PDFtoImage v2.1.2

Can you please help me solve this issue.

Best regards,

Support of .NET 7 MAUI

I tried the „PdfToImage“-Package in a .NET 7 MAUI environment. I want to target all MAUI supported platforms (Android, iOS, macOS, Windows). This package seems to be great work, but unfortunately using PDFtoImage.Conversion.SavePng(img, pdfStream, height: 300) throws an Exception

{System.NotSupportedException: Only win-x86, win-x64, win-arm64, linux-x64, linux-arm, linux-arm64, osx-x64, and osx-arm64 are supported.    at PDFtoImage.PdfiumViewer.NativeMethods.LoadNativeLibraryNetCore(String path)    at PDFtoImage.PdfiumViewer.NativeMetho…}

Do you have any plans to make this package work for the .NET 7 MAUI platform?

Text missing when render pdf into image

PDFtoImage version

3.0.0

OS

Windows

OS version

No response

Architecture

x64

Framework

.NET (Core)

App framework

No response

Detailed bug report

using (var pdfStream = File.OpenRead(pdfFile))
            {
                PDFtoImage.Conversion.SavePng("D:\\output.png", pdfStream);
            }

1.pdf
When processing the pdf attached, all texts are missing in the lower part of the image created. Png file is too big to upload.

Add support for UWP arm64 within the unity package

Kindly include UWP arm64 support in the Unity package. I am currently developing a project designed for Hololens 2, which is of ARM64 architecture. The pdfium and libSkiaSharp libraries are not available for the UWP arm64 target in our Unity package. To address this, I manually incorporated pdfium for the arm64 target from this repository. This modification enabled me to retrieve the total number of pages of the PDF file; however, I encountered errors related to libSkiaSharp with the use of PDFtoImage.Conversion.ToImage since the libSkiaSharp is not present for the arm64 target. Despite my efforts, I was unable to locate this library. I would greatly appreciate it if you could add support for UWP ARM64.

Thank you.

Render predefined area instead of the entire pdf page to receive a subset of the pdf page as an image

Detailed feature request

At the moment all methods to save the pdf page as an image render the entire page using X:0 and Y:0 as their origin.
If i only want a subset of the pdf as an image file i need to postprocess the image.

            using (MemoryStream pdfStream = new MemoryStream(resultPdfBytes))
            {
                SkiaSharp.SKBitmap bitmap = PDFtoImage.Conversion.ToImage(pdfStream);
                bitmap.ExtractSubset(bitmap, new SkiaSharp.SKRectI(0, 0, 1000, 1000));

                var data = bitmap.Encode(SkiaSharp.SKEncodedImageFormat.Png, 100);
                if(data is not null)
                {
                    resultPdfBytes = data.ToArray();
                }
            }

This is a time consuming process for very large pdf pages and not necessary if all methods would accept a Rect or x,y coordinates as their render origin + width / height. But it would probably be better to just pass a rectangle that defines the area the pdfium viewer should render.

Additionally rendering a very large pdf (for example large building plans) the rendering might fail. I have reproduced this multiple times by setting a dpi higher then 350 which in one case always fails to render. Rendering a subset should also solve this problem and allow rendering of areas with a higher dpi on a large pdf pages with a predefined rendering area.

I think this should be fairly easy to implement since both methods already expose left and top parameters which are set to 0 as default atm. (PDFtoImage.PdfiumViewer.PdfDocument.Render())

FPDFBitmap_FillRect()
RenderPDFPageToBitmap()

Greetings ynnob!

Page batching - Is it supported?

Question

While navigating the source code, I couldn't find native support for batching pages (e.g., GetImages(from: 1, to: 5)). Is this feature already implemented, and I missed it, or is it not available yet?

I see that GetImage has a page property, which could be used to manually implement batch logic. However, this would introduce additional overhead and performance issues.

If this functionality is not currently present, I am considering forking the project to add it. Please let me know if this feature is planned or desired, and I will create a PR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.