Giter Club home page Giter Club logo

merge-pdf's Introduction

README

A suite of tools written in perl to assist with offline electronic marking of skills-based assessments using PDF forms, cloud storage (such as Google Drive), and mobile devices (such as iPads).

But can be easily adapted to do many other tasks.

Overview

  • merge-pdf
    Given a csv file containing a list of student information, and an empty PDF Form as a marking rubric template, generate a copy of the PDF for each student, matching any fields in the PDF template with values from matching field names in the csv file, and naming the PDF file according to a given specification, based on values in the csv file.
  • merge-csv
    Given a list of PDF files generated by merge-pdf, read the contents and extract values stored in the named fields, and write into a csv file. In other words, retrieve values stored in the forms created by merge-pdf. This tool is used to populate a marking spreadsheet csv file that can be loaded into an LMS grade book.
  • join-csv
    Given two csv files with a column containing values common to both (such as a student ID), join them together. In the context of the suite of tools, join-csv can be used to merge the downloaded grading spreadsheet downloaded from an offline assessment in Moodle (see screenshot below), with the grades/marks coalesced into a csv file using merge-csv from the pdf files.
    Download Grading Worksheet

The suite of tools interact as given in the following illustration:

                                        ...
                                     /- PDF-\
                                    /        \
       PDF Template------ merge-pdf --- PDF--- merge-csv----join-csv --> Final grades/marks csv file --> Upload back to Moodle
                         /          \        /             /
                        /            \- PDF-/             /
                       /                ...              /
 Grading CSV Worksheet-----------------------------------
 (Downloaded from Moodle)

Contributions

Contributions to this suite are very welcome, as is re-use and adaptation. See the TODO.md file for a current list of outstanding features.

Installation

To install these tools on your computer, you can clone this repository or download the zip in the usual ways.

Once you have it on your local computer, you need to install the local::lib cpan module to your computer using:

$ sudo ./cpanm install local::lib

This is the only global cpan installed module required, as merge-pdf uses a localised installation path for its required modules, thus not polluting your global module installs. Once you have local::lib installed, next run the install_modules shell script as the user who will be using the tools (not root user). Using cpanm (included in this distribution), the script will download all the required modules and you will be all set to go.

Enter the bin directory and run the commands eg ./merge-pdf

The remainder of this README details the usage of this suite of 3 tools.

merge-pdf

Synopsis

merge-pdf - Merge csv file data into PDF form files

This script performs the opposite task to merge-csv.

Usage

$ merge-pdf <csv filename> <destination path> <pdf filename> [...]

<csv filename> is the path to a csv file w/header rows containing data to be merged into the PDF form files.

Field names in the PDF form files must match the csv field headers.

The <destination path> is where the merged PDF form files will be saved. The <destination path> can be templated to use values from the csv file.

Example

$ merge-pdf class2016.csv /tmp/2016/%campus%/%className%/%studentName%-%pdf% rubric.pdf

The above example will result in the pdf file rubric.pdf having like-named fields filled with rows from class2016.csv and for each row in the csv file, all the PDF files will be written out to /tmp/2016/%campus%/%className%/ directory, where %campus% will be substituted for the value of the campus field in the given row, and likewise the classname value. The resulting PDF file will be named %studentName%-%pdf% where %studentName% will be substituted for the value of the studentName field in the given row, and %pdf% will be substituted with the original name of the pdf file, which in this case is "rubric.pdf".

/tmp/2016/ROCKHAMPTON/LAB1/Fred Smith-rubric.pdf

Description

This script takes a csv file, one or more pdf filenames with form fields, and a destination path as command line arguments.

For each row in the csv file, the script will merge like-named field values into each pdf form file, and write a merged copy of all pdf files named into the destination path.

The destination path itself can be composed of values from the csv file, as well as the original name of the PDF file being merged. This ensures that a unique destination filename can be be generated for each combination of row in the CSV file and PDF file on the command line.

CSV FILE

The CSV File must contain a header row, that gives the name of each field in the file. The CSV File can use double-quotes to escape any special characters in a field such as a comma or a carriage return.

PDF FILES

The script can take one or more PDF filenames on the command line. For each row in the CSV file, a merged copy of each PDF file will be created.

For example:

A CSV file containing:

student_id,student_name,class_name
S12345678,Fred Smith,Class1
S87654321,Joanna Carpenter,Class2

two PDF files: report1.pdf and report2.pdf

and the following command:

$ merge-pdf students.csv /tmp/%student_name%_%pdf% report1.pdf report2.pdf

Will generate 4 PDF files:

  • Fred Smith_report1.pdf
  • Fred Smith_report2.pdf
  • Joanna Carpenter_report1.pdf
  • Joanna Carpenter_report2.pdf

DESTINATION PATH

The <destination path> is where the merged PDF form files will be saved. The <destination path> can be templated to use values from the csv file.

The syntax for doing this is as follows:

%fieldname%

The special template field %pdf% can be used to represent the original filename of the PDF file/s. If %pdf% is not present in the , and the destination path ends with a /, then the original filename is appended to the unchanged. Make sure that the destination path is unique for each row in the CSV file, or the script will error.

If the %pdf% field is located somewhere in the path, other than the end, then as a convenience, the extension is removed from the filename. This means that the pdf filename can be used as a directory name in the path of the file, and have the extension removed. E.g.

$ merge-pdf students.csv /tmp/2016/%pdf%/%studentName%.pdf report1.pdf report2.pdf

Will result in:

/tmp/2016/report1/Fred Smith.pdf /tmp/2016/report2/Fred Smith.pdf

See further examples above.

merge-csv

Synopsis

merge-csv - Extract form field values from PDF files and merge into a single CSV file

This script performs the opposite task to merge-pdf.

Usage

$ merge-csv <csv filename> <fieldlist> <pdf filename> [...]

$ find . -name \*.pdf -print | merge-csv <csv filename> <fieldlist> [-]

<csv filename> is the path to a csv file w/header rows to contain data from the PDF form files.

<fieldlist> Field names in the PDF form will match the csv field headers.

<pdf filename> one or more pdf filenames on the command line or if no pdf filenames given, or simply the filename - then read pdf filenames from standard input

Example

$ merge-csv class2016-results.csv 'Assessment,studentId,studentName,Total Marks' *.pdf

The above example will result in the class2016-results.csv file being generated with a header row given as:

Assessment,studentId,studentName,Total Marks

and then all subsequent rows, being the values extracted from each pdf file passed to the script via command line (*.pdf).

If there were 6 PDF files, output might look like:

Assessment,studentId,studentName,"Total Marks"
"Cardiac Arrest",S01234567,"Barry Allen",0
"Conscious Patient",S01234567,"Barry Allen",0
"Cardiac Arrest",S12345678,"Luke Skywalker",22.5
"Conscious Patient",S12345678,"Luke Skywalker",30
"Cardiac Arrest",S23456789,"James Bond",17.25
"Conscious Patient",S23456789,"James Bond",49

Description

This script will merge the contents of form fields from multiple PDF documents into a single csv file. A header row will be created, based on the input provided on the command line. Then values of like-named fields within each PDF file will be extracted and stored in the csv file. There will be one row per PDF file.

To identify a 'type' of PDF form, add a hidden text field with default values.

PDF Filenames can be passed on the command line, or if there are too many or are deeply nested in a directory structure, alternately filenames can be provided on standard input.

join-csv

Synopsis

join-csv - Join rows from two csv files using a unique field common to both

Usage

$ join-csv <csv filename1> <csv filename2> <filename1field>:<filename2field> [
<output csv> | - ]

<csv filename1> is the path to first csv file to join. It must contain a header row. Fields from this csv file will be added to the output in the same order they appear in the original, and will appear before fields of <csv filename2>.

<csv filename2> is the second csv file to join with. It has the same requirements as file 1. Fields will retain order but be added to end of fields from <csv filename1>.

<filename1field>:<filename2field> specifies the field name from each csv file on which to test for equality for joining. The equality test is conducted alphanumerically and case-insensitively. The field for filename1 is given first, before the field for filename2. They are separated by a full colon. For example:

csv-file1-fieldname:csv-file2-fieldname

Any shell special characters (such as spaces) will need to be escaped or quoted.

If a path to a filename is specified, the joined results will be written to that filename. Otherwise, if the filename is given as '-' or no filename is given at all, the resulting joined results will be sent to standard output.

Example

Given file class2016-assignment1-results.csv as:

name,studentid,result
Fred Smith,s0000008,99
Sally Jacks,s0000007,94
John Oxford,s0000009,47
Mary Sale,s0000010,91
Joseph Banks,s0000004,19

and file class2016-assignment2-results.csv as:

name,id,result
Mary Sale,S0000010,100
Fred Smith,S0000008,90
Sally Jacks,S0000007,44
John Oxford,S0000009,77

If we run:

$ join-csv class2016-assignment1-results.csv class2016-assignment1-results.csv
studentid:id class2016-results.csv>

then class2016-results.csv will contain:

name,studentid,result,name,id,result
"Sally Jacks",S0000007,94,"Sally Jacks",S0000007,44
"Fred Smith",S0000008,99,"Fred Smith",S0000008,90
"John Oxford",S0000009,47,"John Oxford",S0000009,77
"Mary Sale",S0000010,91,"Mary Sale",S0000010,100
"Joseph Banks",s0000004,19,,,

The capitalisation does not matter on the field that is being used to join.

The extra row in assignment1 for Joseph was still merged, but had no values from assignment2 csv file because there was no row for him there. If additional rows were to exist in class2016-assignment2-results.csv, they too would be added at the bottom, but would only contain values from file2.

Description

This script takes a file path to two separate csv files. A field name given in the header of each csv file that contains values that are unique to each csv but common to both csv files can be specified as a basis for joining.

The csv files need not be sorted on the given joining fields, however the values for the joining field in each csv file must all be unique on each row.

Licence

Copyright (c) 2016 Damien Clark, Damo's World

Licenced under the terms of the GPLv3
GPLv3

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL DAMIEN CLARK BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

merge-pdf's People

Contributors

damoclark avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.