Giter Club home page Giter Club logo

bagit's Introduction

bagit's People

Contributors

steffenfritz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

subugoe

bagit's Issues

Path encoding bug

I recently discovered that the BagIt 1.0 specification requires that CR, LF, and % in file paths within manifest files are percent-encoded, and that there isn't a single BagIt implementation that does this correctly. Implementations either only encode CR and LF but not % or they encode nothing.

This implementation does not encode paths in the manifest, which means that it would fail to validate BagIt 1.0 bags that include file paths containing CR, LF, or %. Likewise, it would create bags that would fail BagIt 1.0 validation in the case that there are paths that naturally contain percent-encoded characters.

For example, let's say a bag contains the file data/file%0A1.txt. This file should be written to the manifest per the spec as data/file%250A1.txt. However, this implementation writes it as data/file%0A1.txt. This means, that when this implementation validates a properly constructed 1.0 bag it will look for the file data/file%250A1.txt which does not exist. Similarly, if another implementation that follows the spec attempts to validate a bag produced by this implementation, it would look for data/file\n1.txt, which does not exist.

It would seem desirable to me to move the ecosystem in the direction of properly implementing the 1.0 specification, while at the same acknowledging that there are a large number of 1.0 bags in existence that may then become invalid.

As such, it may be prudent to, when validating bags, fall back on a series of tests. You may want to first attempt to validate per the spec, and then, if a file cannot be found, attempt to locate it by either only decoding the CR and LF or leaving the path unchanged, ideally validating all of the files using the same method.

I have not examined fetch.txt implementations, but the same encoding requirements exist for paths in that file as well. This is potentially a thornier problem to address in a backward compatible way as it is unclear if the path data/file%250A1.txt is supposed to create data/file%250A1.txt (incorrect) or data/file%0A1.txt (correct).

Finally, I created a related ticket against the spec discussing this encoding problem, in particular how it breaks checksum utility compatibility.

Add fetch.txt command line flag

What:
Add a command line flag that inserts a fetch.txt file into the bag.

Why:
BagIt allows the usage of a fetch.txt file that references remote resources.

Things to consider:
The referenced resources MUST be listed in all payload manifest files. So it is necessary to download all resources temporary and calculate all hashsums on creation time.

Validation fails depending on whitespaces

Describe the bug
Validation fails if there are more whitespaces between hashsum and file path. BagIt 1.0 allows more than one whitespaces.

To Reproduce

  1. create bag
  2. add additional whitespace between hashsum and filepath in manifest-algo.txt
  3. Validation fails

Refactor code

The code base is three years old and needs some re-work.

Add tagmanifest to create and validate

BagIt version 1.0 allows optional tagmanifest files

Describe the solution you'd like
Add a command line argument for creating at least one optional tagmanifest file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.