Giter Club home page Giter Club logo

qpdfnet's Introduction

qpdf_c#

What is QPdfNet

A C# wrapper for qpdf that exposes all the functionality that is available through the qpdf command-line tool

It supports linearization, encryption, and numerous other features. It can also be used for splitting and merging files, creating PDF files (but you have to supply all the content yourself), and inspecting files for study or analysis. qpdf does not render PDFs or perform text extraction, and it does not contain higher-level interfaces for working with page contents. It is a low-level tool for working with the structure of PDF files and can be a valuable tool for anyone who wants to do programmatic or command-line-based manipulation of PDF files.

The qpdf Manual is hosted online at https://qpdf.readthedocs.io. The project website is https://qpdf.sourceforge.io. The source code repository is hosted at GitHub: https://github.com/qpdf/qpdf.

License Information

QPdfNet is Copyright (C) 2021-2024 Magic-Sessions and is licensed under the MIT license:

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

Microsoft Visual C++ runtimes

The DLL qpdf29.dll is compiled with Visual Studio 2022 you need these C++ runtimes for it on your computer

How to use

Getting the amount of pages

var job = new Job();
var result = job.InputFile(Path.Combine("TestFiles", "20_pages.pdf"))
    .ShowNPages()
    .Run(out var output);

Assert.AreEqual(ExitCode.Success, result);
Assert.AreEqual(output, "20");

Encrypting a PDF file

var outputFile = Path.Combine(_testFolder, "output_encryption_256_bit.pdf");

var job = new Job();
var result = job.InputFile(Path.Combine("TestFiles", "test.pdf"))
    .OutputFile(outputFile)
    .Encrypt("user", "owner", new Encryption256Bit(true, true, true, true, true, true, Modify.None, Print.None))
    .Linearize()
    .Run(out _);

Assert.AreEqual(ExitCode.Success, result);

Checking if a file is encrypted

var job = new Job();
var result = job.InputFile(Path.Combine("TestFiles", "encryption_256_bit.pdf"))
    .IsEncrypted()
    .RunIsEncrypted(out _);

Assert.AreEqual(ExitCodeIsEncrypted.Encrypted, result);

See the test project for more examples https://github.com/Sicos1977/QPdfNet/blob/main/QpdfNetTest/QpdfTests.cs

Logging

QPdfNet uses the Microsoft ILogger interface (https://docs.microsoft.com/en-us/dotnet/api/microsoft.extensions.logging.ilogger?view=dotnet-plat-ext-5.0). You can use any logging library that uses this interface.

The following 3 loglevels are used:

  • LogInformatie: Logs information about the usage of QPdfNet
  • LogError: Logs any errors that occurs
  • LogDebug: Logs debug information

QPdfNet has some build in loggers that can be found in the QPdfNet.Logger namespace.

For example

var logger = !string.IsNullOrWhiteSpace(<some logfile>)
                ? new QPdfNet.Loggers.Stream(File.OpenWrite(<some logfile>))
                : new QPdfNet.Loggers.Console();
                
var job = new Job(logger);                

Installing via NuGet

The easiest way to install QPdfNet is via NuGet.

In Visual Studio's Package Manager Console, simply enter the following command:

Install-Package QPdfNet

Core Team

Sicos1977 (Kees van Spelde)

Support

If you like my work then please consider a donation as a thank you.

qpdfnet's People

Contributors

deralbert avatar hagman avatar m-holger avatar sicos1977 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

qpdfnet's Issues

System.AccessViolationException occurs when calling Run() in an application targeting the x86 architecture

Brief description
When utilizing QPdfNet and targeting the x86 architecture, an exception occurs: System.AccessViolationException - 'Attempted to read or write protected memory. This is often an indication that other memory is corrupt.' The issue arises within the InternalRun method in Job.cs.

Repro steps (superficially)

  1. Add QPdfNet as a NuGet package to your program.
  2. Choose the x86 architecture for your program.
  3. Include test files, create a job, and execute something like var result = job.InputFile(pdfPath).Check().Run(out var output). This immediately triggers the System.AccessViolationException.

Yes, unfortunately, I am still compelled to release both an x86 and an x64 version of the application.

Cause
private delegate int CallbackDelegate(IntPtr data, int length, IntPtr udata) in Job.cs.

According to the platform default calling convention:

For the x86 architecture, the default calling convention is platform specific. Stdcall ("standard call") is the default calling convention on Windows x86 and it is used by most Win32 APIs. Cdecl is the default calling convention on Linux x86. Windows ports of open-source libraries that originated on Unix often use the Cdecl calling convention even on Windows x86. It's necessary to explicitly specify the Cdecl calling convention in P/Invoke declarations for interop with these libraries.

For non-x86 architectures, both Stdcall and Cdecl calling conventions are treated as the canonical platform default calling convention.

Proposed fix
In the case with the qpdf library, the Cdecl calling convention is used, irrespective of the architecture. The mentioned delegate must be annotated in advance with the UnmanagedFunctionPointerAttribute, specifying the Cdecl calling convention:

[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
private delegate int CallbackDelegate(IntPtr data, int length, IntPtr udata);

Error in qpdf terminates whole process

Execution of qpdf wrapper with input file in not pdf format causes error that terminates whole process.
Error in output is:

renderer_1  | terminate called after throwing an instance of 'std::runtime_error'
renderer_1  |   what():  error logger function returned code -1299308016

Environment: Alpine linux 3.18 in docker. Library libqpdf.so.29 is successfully loaded.

Expected behavior to return non success return value instead of uncatchable c++ exceptions.

Usage of library done as

using var workspace = new Workspace(_options.WorkingDirectory);

var inputFileName = workspace.TempFile();
await input.SaveAsFileAsync(inputFileName, cancellationToken);

using Job job = new Job(_logger)
         .InputFile(inputFileName)
         .When(() => !string.IsNullOrWhiteSpace(password), j => j.Password(password));
  
 var result = job
         .WarningExit0()
         .NoWarn()
         .RequiresPassword()
         .RunRequiresPassword(out var output);       

System.AccessViolationException on Run() if Dispose() was called before

Consider the situation:

List<string> pdfPaths = GetPdfFiles(path);

foreach (string pdfPath in pdfPaths)
{
    using (var job = new Job())
    {
        var result = job.InputFile(pdfPath).Check().Run(out var output);
    }
}

pdfPaths contains at least two documents. The second job run raises "System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt." at the invocation of _native.RunFromJSON(json).

The cause is the cleanup of the default logger in the Dispose method in conjunction with the circumstance that the default logger is static in the C++ lib.
The second invocation of _native.RunFromJSON(json) will therefore lead to an use-after-free error in the native lib.

Proposed solution:
Deletion of the line which cleans up the default logger in the Dispose method.
While technically it is a memory leak once no more further jobs will be executed, this should not be a concern as the memory usage of the default logger is very small.

Note:
I am currently working on an adjustment of the Job class to enable parallel execution. In that context, the default logger will and can not be used, so this issue will not occur. I will create an issue for the adjustment in the upcoming days and submit a pull request.

qpdf JSON v2

I really like your library, thanks for all the work you put into it. I have a quick question. Will you be updating your library to utilize the new JSON version 2 and functionality introduce with qpdf v11?

No Symbols exported in .NetStandard

When including the Nuget package, only under .Net Framework are there any class definitions, under .Net Standard I'm unable to use any of the classes.

Issue with Non-ASCII Characters in File Paths

The library seems to have an issue handling file paths that include non-ASCII characters, such as 'ą' or 'ć'.
The following code:

using QPdfNet;

string? output;

new Job().InputFile(@"D:\abc\test.pdf").Check().Run(out output);
Console.WriteLine(output);

Console.WriteLine();

new Job().InputFile(@"D:\ąbć\test.pdf").Check().Run(out output);
Console.WriteLine(output);

gives following output:

checking D:\abc\test.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
No syntax or stream encoding errors found; the file may still contain
errors that qpdf cannot detect

qpdfjob json: open D:\?b?\test.pdf: No such file or directory

QPdfNet version: 1.4.3
OS: Windows 10
.NET version: 8

tesseract51.dll

Hello,

unfortunately, trying to add your library does not work for me. The correct qpdf29.dll's are copied but named wrong. Therefore the library does not find the DLL when starting. Attached is a screenshot from my IDE as well as from the current Nuget package.

image
image
image

Repo cleanup

I recommend cleaning up the repository. I analyzed your .gitignore file, and many items are already excluded. However, I suspect that before the file was added, many directories were committed that are now excluded. This significantly impeded the readability of the commits because numerous generated files were consistently being included in each commit.

My suggestion:

  1. Remove the line "x64" from the .gitignore file, since the files under QpdfNetTest\x64 are needed for the tests.
  2. Open "Git Bash Here" in the directory with the local repository and .git.
  3. Execute git ls-files -i --exclude-standard --directory -z| xargs -0 rm -r -f &> /dev/null
  4. Make a commit and push.

This will delete all files ignored by .gitignore.

Exception System.Reflection.TargetInvocationException on example Encrypt code

When attempting to run the Encrypt example code, the application throws this exception:
System.Reflection.TargetInvocationException: 'Exception has been thrown by the target of an invocation.'
Inner Exception
ArgumentNullException: Value cannot be null.
Parameter name: ptr
This exception was originally thrown at this call stack:
[External Code]

image

image

I downloaded the latest QPDF MSVC 32-bit and 64-bit builds from the QPDF project and copied all the files in the 'bin' folder of each one into the project's /bin/debug/x86 and /bin/debug/x64 folders, respectively.

System.Runtime.InteropServices.SEHException when calling RunFromJSON

Context
I want to use QPdfNet in my .NET Framework 4.8 project. Within the project, I'm modifying PDF documents using PdfSharp, but PdfSharp is not very tolerant of errors in the PDF structure. Therefore, I need to validate the documents before making modifications. I plan to do this using QPDF, specifically with the beautiful wrapper QPdfNet. My application runs on a Windows 10 machine.

Problem
When calling an external native C++ function, I encounter an exception: System.Runtime.InteropServices.SEHException: 'External component has thrown an exception.'.

The value of the ErrorCode and HResult properties of this exception both are -2147467259. The stack trace is:

at QPdfNet.Job.InternalRun(String& output, Byte[]& data) in C:\Temp\QPdfNet\QPdfNet\Job.cs:line 2265
at QPdfNet.Job.Run(String& output) in C:\Temp\QPdfNet\QPdfNet\Job.cs:line 2298
at PdfSharpTest.Program.IstPdfDokumentValide(String pdfPath) in C:\Temp\PdfSharpTest\PdfSharpTest\Program.cs:line 67
at PdfSharpTest.Program.SindPdfDokumenteValide(List`1 pdfPaths) in C:\Temp\PdfSharpTest\PdfSharpTest\Program.cs:line 55
at PdfSharpTest.Program.Main() in C:\Temp\PdfSharpTest\PdfSharpTest\Program.cs:line 41

Repro steps

  1. Create a new ConsoleApplication project, choose .NET 4.8 or .NET 4.6.2 (I have this exception with both frameworks) as the target framework.
  2. In the Main method, add the lines:
QPdfNet.Loggers.Console logger = new QPdfNet.Loggers.Console();
var job = new Job(logger);
string pdfPath = "Path/To/A/Pdf/Document.pdf";
var result = job.InputFile(pdfPath).Check().Run(out var output);
  1. Compile and run the program.

Observations

  1. The exception does not occur when using the latest version of the framework, net8.0. Even though net8.0 is not officially supported, QPdfNet works seamlessly with it.
  2. When I include the function with the built-in DllImport attribute in my program, I can call it without any exceptions:
[DllImport("qpdf29.dll", CallingConvention = CallingConvention.Cdecl, EntryPoint = "qpdfjob_run_from_json")]
static extern int RunFromJSON(string json);

I suspect the issue lies in how the native function is loaded using [RuntimeDllImport(Constants.QPdfDllName, CallingConvention = CallingConvention.Cdecl, EntryPoint = "qpdfjob_run_from_json")].
3. The built-in Console Logger in QPdfNet reports that all native functions have been successfully loaded, specifically: Successfully loaded native function 'qpdfjob_run_from_json' with handle '140734778344928'.

NuGet package not packaged properly?

Hi,

I've imported your package and dependencies from NuGet - the intellisense and stuff works perfectly. However, when I attempt to run a quick test, I'm met with this exception:

[DllNotFoundException: Failed to find library 'qpdf29.dll' for platform x64]

I'm guessing it's as simple as you doing some sort of versioning in the name of the DLL, as opposed to the metadata of it?

For now I can probably work around it by renaming the DLL manually, but it's probably something worth looking at.

[Linux][Docker] libqpdf29.dll.so not found

Hi,

I've imported your package from NuGet - this lib works perfectly on windows. However, when I attempt to run my application in Docker container, there is exception:

[DllNotFoundException: Failed to find library 'libqpdf29.dll.so' for platform x64]

Do you have any advice, how use this library on Linux/Docker platform ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.