Giter Club home page Giter Club logo

docxconverter's Introduction

DOCX to JATS XML Converter Plugin

The plugin for OJS 3.1+ that allows to convert articles in DOCX to JATS XML format. The output is compatible with Texture plugin - JATS XML online editor.

DOCX is an archive that complies with Open Office XML format. It is commonly used for creation and editing of text documents, including as an exchange format for scientific articles, mainly between an author and a publisher. The converter is designed to support the output from MS Word, LibreOffice Writer, and Google Docs.

This plugin is aimed at helping publishers that are using JATS XML as a pivotal format for their publication workflow. The idea behind the converter is to create a basic structure of the document from given DOCX file and then prepare the manuscript for the production by JATS XML online editor, like the Texture plugin. The converter is written in pure PHP and doesn't require any additional external extensions for basic functionality.

Installation

The plugin can be installed in two ways, 1) by downloading the latest stable release or 2) by cloning the master branch. The latter can be accomplished with Git by cloning the repo with submodules into plugins/generic directory starting from the web root of OJS instance. It's as simple as: git clone --recurse-submodules https://github.com/Vitaliy-1/docxConverter.git.

After installation plugin should be activated in the plugins menu: Settings -> Website -> Plugins -> check enable near DOCX to JATS XML Converter Plugin. The button Convert to JATS XML will appear as a dropdown under Draft DOCX file on any Workflow stage.

Important note: as parsing references is not yet supported, the only way to add them is manually with available JATS XML WYSIWYG editor - Texture Plugin for OJS 3.1+. How to use them together:

What article elements are supported?

It is planned that DOCX to JATS XML Converter will support all major features of DOCX. The table below lists elements that are already supported and are planned to be developed in near future. The row Planned for the 1.0.0 release means that it's likely to be included in the first stable release, otherwise it's planned to be included later.

Feature Supported Planned for the 1.0.0 release Notes
Paragraphs ✔️ All formatted text in the paragraph can be neste, e.g. bold, bold + italic.
Bold ✔️
Italic ✔️
Text superscript ✔️
Text Subscript ✔️
Strikethrough ✔️
Lists ✔️ Can be nested
List style ✔️
Headings and sections ✔️ Can be nested; OOXML headings are tranformed to the JATS XML sections with title and correspondent level
Tables ✔️
Cells with row- and colspan ✔️
Table caption ✔️
JPEG and PNG Figures ✅ (Partially)
Figure caption ✔️
Diagrams
Formulas
Footnotes
MS Word citations
Zotero citations
External links ✔️
OOXML metadata OOXML contains limited set of metadata and this feature is rarely used by authors
Article's metadata from OJS Metadata, like authors names, their affiliation, and article title is transfered from OJS; doesn't support abstracts yet

How to achieve best results?

The best results can be obtained only with articles that are structured. DOCX to JATS Converter Plugin should work with DOCX files produced by Google Docs, MS Word, and LibreOffice Writer. Although, there can be some drawbacks because these formats are not fully intercompatible.

Google Docs

The link to the general example: google document. When working with Google Docs it should be kept in mind that it doesn't support citations, reference list, figure and table caption. General recommendations:

  • Sections and Headings. To distinguish sections of scientific articles built-in headings can be used. Headings level represents the level of the section, thus they can be nested. Start a new line, choose a text style from a dropdown menu in the left upper corner: heading 1, heading 2 or another level. By default the text style is normal text. Guideline on Youtube: https://www.youtube.com/watch?v=q58KRXwg93E. Note: there is no need to create table of content.
  • Formatted text. Bold, italic, and other text formatting is fully supported. The correspondent menu items are situated at the center position of the toolbar.
  • Tables and figures. Tables and figures can be attached using insert button on the left-top side of the toolbar. Cells merging is supported. Unfortunately, caption are not supported by the Google Docs. How to create a table: https://www.youtube.com/watch?v=5HkarJaViQU; how to insert an image: https://www.youtube.com/watch?v=5Eh5WmTJ6qo
  • Lists. Lists can be inserted using the items on a toolbar that are positioned to the right from formatted text options. Nested lists are supported, to change the level of the list item press tab (level down) or shift + tab (level up) keyboard buttons while the cursor is on the needed list item. How to manage lists: https://www.youtube.com/watch?v=g2UhdpozSdQ
  • Export as DOCX. File -> Download as -> Microsoft Word (.docx)
  • Upload to OJS. Produced file can be download to the Copyediting or Production stage as a Draft File, Convert to JATS XML button will appear in the dropdown menu under the file.

LibreOffice and MS Word

Coming soon.

Troubleshooting

For proposals and bugs tracking please open an issue on the converter's page.

docxconverter's People

Contributors

vitaliy-1 avatar vormia avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.