Giter Club home page Giter Club logo

testing's Introduction

oxfama.transmogrifier

Tranmogrifier pipelines for the analysis and export of Plone content from the Oxfam America website.

Based on collective.transmogrifier and related tools.

What do You Get?

This package provides a transmogrifier export pipeline which will dump the field values of the schema of all of the items in the oxfam america website into a set of CSV files, organized by object type. Each type of object which has a schema will have its own csv file. Each file will have a header row which lists the fields output in alphabetical order. Each row after the header will contain the values for the given fields, if any, present on a single object within the site. In addition to schema fields, the path to the object within the plone site and the UID and UUID (if present) will be provided.

For file fields (images, files, etc), the value will be a simple string indicating that the field is a binary file field. No file data will be contained in the csv export. File data will be exported separately (see below).

For reference fields, the type and UID of the referenced object(s) will be listed. This information can be used to look up the referenced object. Simply find the row with the given UID value in the csv file named for the type.

You can control which fields are output (see Controlling the Pipeline below).

All binary file data (images, media, .pdf, .doc, etc.) will be written out to a filesystem structure that mirrors the Plone site structure. Each file will be located at the same path where it is found in the site. The sole exception to this is if a content object has more than one binary file field, in which case a folder named for the Plone object will be created and each of the files will be written to that folder.

Installing This Package

As this package is unreleased, installing it requires the mr.developer buildout extension.

First, include the package in the [sources] section of your buildout. The source is in a private repository, so unless you have ssh authentication for git set up on the machine where you are working, you'll need to use HTTP authentication:

oxfama.transmogrifier = git https://github.com/oxfamamerica/oxfama.transmogrifier.git

Next, include the package name in the eggs = section of the buildout:

eggs =
    ...
    oxfama.transmogrifier

Once this is done, you can re-run buildout. You should also see collective.transmogrifier and quintagroup.transmogrifier installed as dependencies of this package.

This package does not provide an installable Generic Setup profile, so you should not expect to see it listed either in the Add-ons panel in Plone Site Setup or in the list of available add-ons when you add a new Plone site.

To verify that the package is properly installed, go to the ZMI (Zope Management Interface). Find and click on the portal_setup tool and then click on the import tab. Look for the Oxfam America Site Dump profile listed in the drop-down list of available Profiles or Snapshots at the top.

Running the Export Pipeline

To run the export dump, you'll need to use the portal_setup as described above.

  • Go to the ZMI
  • Find and click on the portal_setup tool
  • Click on the Import tab
  • Find and select the Oxfam America Site Dump profile in the Select Profile or Snapshot dropdown.
  • Find the Run transmogrifier pipeline step in the list of available import steps. Click the checkbox to select it.
  • At the bottom of the page, unselect the Include dependencies checkbox.
  • Click the import selected steps button.

The export will run for some time. You can see progress in the terminal if you are running the site in fg mode.

After all items are dumped, the CSV files will be written to the destination provided in the pipeline configuration (see Controlling the Pipeline below)

Controlling the Pipeline

There are a number of settings you can control for the pipeline. These controls are available by editing the content_to_csv.cfg configuration file in this package. This file is located in oxfama/transmogrifier/export.

The file is organized into a series of sections, each delineated by a [name] in square brackets.

The [transmogrifier] section provides a list of the pipeline sections that make up the export pipeline as well as the local_destination which should be an absolute filesystem path to the folder where the CSV output files will be written:

[transmogrifier]
# configure pipeline and other required information
pipeline =
    source
    schemadumper
    writer
    logger

local_destination = /absolute/path/to/folder

It is vital to ensure that the user running the Plone process have write access to the destination folder. If no local destination is provided, the system will attempt to use /tmp.

The [source] section provides settings for the SiteWalkerSection that is used to walk the object graph of the site and read the contents. You can provide this section a starting path, if you wish to focus only on one section of the website. This path should be absolute, based from the ZMI (/). The object identified by the path will be included. The walk can be limited using the limit setting. At most this number of objects will be dumped. If the setting is omitted, or set to 0, all objects will be dumped:

[source]
blueprint = oxfama.transmogrifier.sitewalker
start-path = /oxfam
limit = 1000

The [schemadumper] section controls the way in which the schema of objects is converted to a csv row. You may provide a list of fields to exclude. If the named field exists in an object's schema, it will be left out of the final csv file for that type. This can be used to control the volume of data dumped:

[schemadumper]
blueprint = oxfama.transmogrifier.schemadumper
exclude =
    text
    description
    file
    image

The [filewriter] section controls the writing of binary file data to the filesystem. The path variable (which defaults to the local_destination value set above) determines the root filesystem path on the server to which all files will be written. This folder must exist and be writable by the user running Plone. No default location will be set if this value is omitted. The context variable determines how the files will be exported. Leave this set to directory to have all files written to the filesystem.

The [writer] section controls the writing of csv files to the filesystem. There are no settings currently available for this section.

The [logger] section controls the writing of log output for each object as it is passed through the pipeline. You can provide a list of the keys for each item that will be written to the log line.

To omit any section, simply remove its name from the pipeline setting in the [transmogrifier] section of the configuration file.

testing's People

Contributors

kofic avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.