Giter Club home page Giter Club logo

document-merge-service's Introduction

Document Merge Service

Build Status Black License: GPL-3.0-or-later

A document template merge service providing an API to manage templates and merge them with given data. It can also be used to convert Docx files to PDF.

Installation

Requirements

  • docker
  • docker-compose

After installing and configuring those, download docker-compose.yml and run the following command:

docker-compose up -d

You can now access the api at http://localhost:8000/api/v1/ which includes a browsable api.

Workaround LibreOffice lockup

The workaround has a setting called ISOLATE_UNOCONV, it is only enabled in the development environment. If ISOLATE_UNOCONV is enabled the container needs CAP_SYS_ADMIN. See docker-compose.override.yml.

cap_add:
  - CAP_SYS_ADMIN
security_opt:
  - apparmor:unconfined
environment:
  - ISOLATE_UNOCONV=true

Getting started

Uploading templates

Upload templates using the following:

curl --form [email protected] --form name="Test Template" --form engine=docx-template http://localhost:8000/api/v1/template/

Merging a template

After uploading successfully, you can merge a template with the following call:

curl -H "Content-Type: application/json" --data '{"data": {"test": "Test Input"}}' http://localhost:8000/api/v1/template/test-template/merge/ > output.docx

Converting a template

To convert a standalone Docx file the following call can be used:

curl -X POST --form [email protected] --form target_format="pdf" http://localhost:8000/api/v1/convert > example.pdf

Further reading

  • Configuration - Further configuration and how to do a production setup
  • Usage - How to use the DMS and it's features
  • Contributing - Look here to see how to start with your first contribution. Contributions are welcome!

License

Code released under the GPL-3.0-or-later license.

document-merge-service's People

Contributors

anehx avatar czosel avatar dependabot-preview[bot] avatar dependabot[bot] avatar fkm avatar fkm-adfinis avatar ganwell avatar luytena avatar open-dynamix avatar pyup-bot avatar sliverc avatar stephanh90 avatar trowik avatar winged avatar winpat avatar yelinz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

document-merge-service's Issues

Fix broken exports

The exported documents are broken. Opening the document with a word processor looks more like the output from opening a docx file in a text editor.

I tested the included service docx-template directly and that works as expected.

There is a passing test at:
https://github.com/adfinis-sygroup/document-merge-service/blob/master/document_merge_service/api/tests/test_template.py#L125
But when manually raising an AssertionError, the saved file is also garbled.

I also had the problem that the following line from Docker Compose didn't work:
https://github.com/adfinis-sygroup/document-merge-service/blob/master/docker-compose.override.yml#L8
My workaround was to manually enter the value of $UID (in my case 1000).

I used the following file to make my tests:
https://github.com/adfinis-sygroup/document-merge-service/blob/master/document_merge_service/api/tests/data/docx-template.docx

Test all supported databases in Travis

MySQL, Postgres and Sqlite are supported by document-merge-service but only sqlite3 is tested in Travis. All engines should be tested for proper support.

Add date filter

In order to make date formatting in documents more powerful / flexible, adding a date filter would be nice. Maybe something like

{{ my_date | date("%d.%m.%Y") }}

Not sure if a separate datetime filter would be needed, probably date could also support printing the "time" part.

Add CORS headers

As the service will probably almost never run on the same domain and port as the client application we need to set CORS headers.

From what I've seen, this should not be too much work:
https://stackoverflow.com/a/38162454

I guess the trouble starts with the configuration. The - IMO - ideal solution would be to have the configuration in the Docker Compose file. But not sure, if that makes sense 😄

Missing docs and examples - even in new Version 3

Where are docs to this great service?
There is a test directory but no examples which we could learn from.

What I'd like to have:

  • How create docx templates
  • Which placeholders are allowed?
  • What are limitations of the templating feature?
  • Samples, samples, samples (docx-templates, csv merge data, result documents in pdf format)
  • The most simple usage pattern

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Docker setup

Provide a base image that only installs the required dependencies which can be used as image for the build step of your own DMS image. The base image should use env vars that define what dependencies should be installed. However, we should still offer a full blown DMS image that one can use if they don't care about image size etc.

'datetime.date' object has no attribute 'hour'

I ran into an exception when trying to export data that includes a date.

I assume it's from the application.mf_erfassung_terminvorgabe. The relevant part of the template would be:

{%p if application.mf_erfassung_terminvorgabe %}
Terminvorgabe: {{ application.mf_erfassung_terminvorgabe | date("%a %d.%m.%y") }}
{%p endif %}

Error Message (extracts)

AttributeError at /api/v1/template/test-1/merge/
'datetime.date' object has no attribute 'hour'

Traceback:

File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
  34.             response = get_response(request)

File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
  115.                 response = self.process_exception_by_middleware(e, request)

File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
  113.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/usr/local/lib/python3.6/site-packages/django/views/decorators/csrf.py" in wrapped_view
  54.         return view_func(*args, **kwargs)

File "/usr/local/lib/python3.6/site-packages/rest_framework/viewsets.py" in view
  114.             return self.dispatch(request, *args, **kwargs)

File "/usr/local/lib/python3.6/site-packages/rest_framework/views.py" in dispatch
  497.             response = self.handle_exception(exc)

File "/usr/local/lib/python3.6/site-packages/rest_framework/views.py" in handle_exception
  457.             self.raise_uncaught_exception(exc)

File "/usr/local/lib/python3.6/site-packages/rest_framework/views.py" in raise_uncaught_exception
  468.         raise exc

File "/usr/local/lib/python3.6/site-packages/rest_framework/views.py" in dispatch
  494.             response = handler(request, *args, **kwargs)

File "./document_merge_service/api/views.py" in merge
  52.         response = engine.merge(serializer.data["data"], response)

File "./document_merge_service/api/engines.py" in merge
  29.         doc.render(data, get_jinja_env())

File "/usr/local/lib/python3.6/site-packages/docxtpl/__init__.py" in render
  266.         xml_src = self.build_xml(context, jinja_env)

File "/usr/local/lib/python3.6/site-packages/docxtpl/__init__.py" in build_xml
  229.         xml = self.render_xml(xml, context, jinja_env)

File "/usr/local/lib/python3.6/site-packages/docxtpl/__init__.py" in render_xml
  211.             dst_xml = template.render(context)

File "/usr/local/lib/python3.6/site-packages/jinja2/asyncsupport.py" in render
  76.             return original_render(self, *args, **kwargs)

File "/usr/local/lib/python3.6/site-packages/jinja2/environment.py" in render
  1008.         return self.environment.handle_exception(exc_info, True)

File "/usr/local/lib/python3.6/site-packages/jinja2/environment.py" in handle_exception
  780.         reraise(exc_type, exc_value, tb)

File "/usr/local/lib/python3.6/site-packages/jinja2/_compat.py" in reraise
  37.             raise value.with_traceback(tb)

File "<template>" in top-level template code
  10. <source code not available>

File "./document_merge_service/api/jinja.py" in dateformat
  16.     return format_date(parsed_value, format, locale=locale)

File "/usr/local/lib/python3.6/site-packages/babel/dates.py" in format_date
  700.     return pattern.apply(date, locale)

File "/usr/local/lib/python3.6/site-packages/babel/dates.py" in apply
  1232.         return self % DateTimeFormat(datetime, locale)

File "/usr/local/lib/python3.6/site-packages/babel/dates.py" in __mod__
  1229.         return self.format % other

File "/usr/local/lib/python3.6/site-packages/babel/dates.py" in __getitem__
  1267.             return self.format_period(char)

File "/usr/local/lib/python3.6/site-packages/babel/dates.py" in format_period
  1406.         period = {0: 'am', 1: 'pm'}[int(self.value.hour >= 12)]

Exception Type: AttributeError at /api/v1/template/test-1/merge/
Exception Value: 'datetime.date' object has no attribute 'hour'

Data

{
  "number": 50000,
  "application": {
    "verfahren": "Baugesuch",
    "bauvorhaben": "Alles neu.",
    "datum_eingang_arp": "2019-08-01",
    "mf_erfassung_terminvorgabe": "2019-08-31",
    "projektleiter": "some name",
    "mf_erfassung_status": "in Bearbeitung"
  },
  "userinfo": {
    "name": "Leitbehörde",
    "given_name": "Leitbehörde",
    "family_name": "",
    "email": "[email protected]",
    "groups": [
      "Leitbehörde/Koordinationsstelle"
    ],
    "roles": [
      "Leitbehörde/Koordinationsstelle",
      "Admin"
    ]
  },
  "circulations": [
    {
      "id": "950d08c6-f485-4efc-b24c-2faa10e2eb44",
      "status": "READY",
      "deadline": null,
      "created": "2019-08-14",
      "closed": null,
      "users": [],
      "groups": [
        "Leitbehörde/Koordinationsstelle"
      ],
      "tasks": [
        {
          "id": "83d223e8-cccf-4a4b-8a1c-67f49fdb55ba",
          "status": "COMPLETED",
          "precursor": "21cfbd7b-eba3-42b7-a497-a45d258cb9ba",
          "deadline": null,
          "created": "2019-08-14",
          "closed": "2019-08-14",
          "users": [],
          "groups": [
            "Amt für Umwelt"
          ],
          "circulations": [],
          "response": {
            "stellungnahme_ruckmeldungfazit": "Ablehnung",
            "stellungnahme_ablehnende_beurteilungen": "Nope!"
          }
        },
        {
          "id": "63fa17fb-d12b-45db-bf0f-ff15059412ae",
          "status": "COMPLETED",
          "deadline": null,
          "created": "2019-08-14",
          "closed": "2019-08-14",
          "users": [],
          "groups": [
            "Fachstelle 1"
          ],
          "circulations": [],
          "response": {
            "stellungnahme_ruckmeldungfazit": "Zustimmung",
            "stellungnahme_zustimmende_beurteilungen": "Passt!"
          }
        }
      ]
    }
  ]
}

Excel Engine

We want simple template fields in excel documents. There is a prototype in #472. If we can have conditionals and loops with low effort we take it. There is a new implementation by the same person that created the library used in the prototype.

Switch to poetry: #480

Plan

  • Try out the new library
    • How do I get excel files (via LibreOffice obviously but is that ok?)
    • Field replacement
    • Conditionals
    • Loops
  • Cleanup document-merge-service
    • Switch to poetry
    • Update dependencies
    • Caret the versions of packages that are only tested in the container
    • Fix regressions from updates
    • Fix CI problems
  • Integrate xltpl in DMS
    • Conversion
    • Verification
    • Examples
    • Tests
    • Review
  • Release

Progress

Test

image

Get infos in code via pkg_resources

import pkg_resources
my_version = pkg_resources.get_distribution('my-package-name').version

The "emptystring" jinja filter should accept a "default" parameter.

It's not always desired to replace a missing property with an empty string. For this the emptystring filter should optionally accept a default parameter and falling back to an empty string.

This also means the name of the filter should be changed to replacemissingvalue.

unoconv is deprecated / better solution for isolating LibreOffice

In theory we could switch to unoserver but it has the same issues as unoconv.

To fix the problem of hanging LibreOffices and race-conditions every call to LibreOffice should use --env:UserInstallation=$RANDOM_UNIQUE_TEMP_DIR. Of course it is not 100% certain that this works, until we tested it.

Using unshare to isolate LibreOffices needs the settings:

    cap_add:
      - CAP_SYS_ADMIN
    security_opt:
      - apparmor:unconfined

Error reporting

  • The reported errors are not very helpful at the moment and this should be extended.
  • Add email error handler which sends an email to the admins.
  • Add sentry integration.

When will there be releases?

When do you plan to create releases?

Can I already start to use this repo or is it still under heavy development and long before version 1.0.0?

Permission system is too simplistic

Currently, there is e.g. no way to say that someone can read / merge templates, but can't write / delete them. I think we should either add new configuration options for this, or introduce a full permission/visibility system as we have in other APIs.

Placeholder validation failes with complex placeholders

When using a placeholder which has more than just the placeholder name like this
{{NAME and ", represents " + NAME}} the validation fails.

Error log
document-merge-service_1  | Internal Server Error: /document-merge-service/api/v1/template/
document-merge-service_1  | Traceback (most recent call last):
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
document-merge-service_1  |     response = get_response(request)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
document-merge-service_1  |     response = self.process_exception_by_middleware(e, request)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
document-merge-service_1  |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
document-merge-service_1  |     return view_func(*args, **kwargs)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/rest_framework/viewsets.py", line 114, in view
document-merge-service_1  |     return self.dispatch(request, *args, **kwargs)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/rest_framework/views.py", line 505, in dispatch
document-merge-service_1  |     response = self.handle_exception(exc)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/rest_framework/views.py", line 465, in handle_exception
document-merge-service_1  |     self.raise_uncaught_exception(exc)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/rest_framework/views.py", line 476, in raise_uncaught_exception
document-merge-service_1  |     raise exc
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/rest_framework/views.py", line 502, in dispatch
document-merge-service_1  |     response = handler(request, *args, **kwargs)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/rest_framework/mixins.py", line 18, in create
document-merge-service_1  |     serializer.is_valid(raise_exception=True)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/rest_framework/serializers.py", line 235, in is_valid
document-merge-service_1  |     self._validated_data = self.run_validation(self.initial_data)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/rest_framework/serializers.py", line 433, in run_validation
document-merge-service_1  |     value = self.validate(value)
document-merge-service_1  |   File "./document_merge_service/api/serializers.py", line 70, in validate
document-merge-service_1  |     available_placeholders=available_placeholders, sample_data=sample_data
document-merge-service_1  |   File "./document_merge_service/api/engines.py", line 60, in validate
document-merge-service_1  |     self.validate_template_syntax(available_placeholders, sample_data)
document-merge-service_1  |   File "./document_merge_service/api/engines.py", line 107, in validate_template_syntax
document-merge-service_1  |     doc.render(ph, env)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/docxtpl/__init__.py", line 266, in render
document-merge-service_1  |     xml_src = self.build_xml(context, jinja_env)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/docxtpl/__init__.py", line 229, in build_xml
document-merge-service_1  |     xml = self.render_xml(xml, context, jinja_env)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/docxtpl/__init__.py", line 211, in render_xml
document-merge-service_1  |     dst_xml = template.render(context)
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/jinja2/environment.py", line 1090, in render
document-merge-service_1  |     self.environment.handle_exception()
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/jinja2/environment.py", line 832, in handle_exception
document-merge-service_1  |     reraise(*rewrite_traceback_stack(source=source))
document-merge-service_1  |   File "/usr/local/lib/python3.6/site-packages/jinja2/_compat.py", line 28, in reraise
document-merge-service_1  |     raise value.with_traceback(tb)
document-merge-service_1  |   File "", line 23, in top-level template code
document-merge-service_1  | TypeError: must be str, not _MagicPlaceholder

Templates files are not deleted

When a template is deleted or updated with a new file the old files should be deleted otherwise we clutter the filesystem. This also needs some kind of migration to delete the thousands (in my use-case) of obsolete files that are lying on some filesystem and are never to be used.

Store files in database

Currently files are stored in MEDIA_ROOT which means to deploy document-merge-service it is also necessary to have a file share volume and not just a database.

Issue is that this is not really documented but on the other hand it would be better to save the templates in the database for easier persistence.

Add "meta" field

It would be nice if applications could save custom metadata for templates. I'd propose adding a meta field (dict or json).

Readme

Can we have a README to explain what this is, what it does, and how to use it?

Define role-based permissions

Is it currently possible to define separate read and upload permission based on the user's role?

Our current use-case: Only admins can upload templates, while everyone with a valid token is allowed to read them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.