Giter Club home page Giter Club logo

bagitphp's People

Contributors

davidmcclure avatar erochest avatar john-devil avatar ubermichael avatar whikloj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bagitphp's Issues

Duplicate tags and tag case folding in bag-info.txt

Hello, I did some testing with existing bag-info.txt, and used BagItPHP to update the structure. I noticed two unexpected things:

  • All tags are changed to lower case; i.e. "DC-Title" is saved as "dc-title", which is not what i want.
  • All tags are only allowed once (probably due to an php array-key). However, this is also not what i wanted, it is allowed in Dublin Core to have multiple occurences of the same tag (from dublincore.org: 4.5 Repeatability, All elements in the Dublin Core are repeatable. For example, multiple author elements would be used when a resource has multiple authors.)

DC-Author: Wayne Graham
DC-Author: Mark Jordan

gets changed to:

dc-author: Mark Jordan

Your opinion?

Populating bag-info.txt

Hi,

I can't figure out how to populate bag-info.txt. I've tried the following, but bag-info.txt is not populated:

 $bag = new BagIt($bag_output_path);
 $bag->bagInfoData = array(
  'First' => 'This is the first tag value', 
  'Second' => 'This is the second tag value'
 );
 $bag->addFile('/tmp/test.txt', 'test.txt');
 $bag->update();
 $bag->package($serialized_bag_path, 'zip');

I've taken a look at the bagit.php code and can see the _createExtendedBag() function, but at the end of the function it looks like an empty file is created by touch() and $this->bagInfoData = array() isn't populated with any values. Any help would be appreciated.

Default Payloads

Should the default payload be auto-magically relative to the 'data' directory, or are there instances where this would lead to some consequences?

Remove `build.xml`

Now that we're using make + composer, I'm not sure that we need this anymore. We should probably migrate the code quality tasks to the Makefile, though.

Bags in .zip format with no payload files in /data don't validate

The ZipArchive class doesn't create empty directories when it generates zip files. This means that the /data directory doesn't get created when a bag without any payload files is compressed to the .zip format. Then, when the bag is read, the library kicks out an error when it doesn't find the data directory and the bag doesn't validate.

Fix by checking for the data directory on ingestion and creating it if it doesn't exist.

_openBag does not create a missing bagit.txt

If you construct a BagIt on an existing empty directory, Bagit calls _openBag (as it should) but does not (re)create the bagit.txt file.

I'm not sure that it should (re)create bagit.txt -- that is why I filed an issue rather than forking BagItPHP and fixing the issue.

fetch.txt is created even if bag doesn't use it

fetch.txt is added to the bag even if $fetch is false in the BagIt constructor. I think this file should not be added if it is not used as the BagIt spec indicates that fetch.txt is optional.

I would be happy to add code to eliminate fetch.txt if $fetch is false and issue a pull request.

Can't use an algorithm that isn't sha1 or md5

I'm new to BagIt so I apologize if this is an obvious question.

If I try to specify sha256 or sha512 I get an error due to this check.

The BagIt specification says you SHOULD use one of these two algorithms but does not disallow others. Is there a reason for forcing a choice of one these two algorithms?

Alternatively, would it be acceptable to look at extending the code to allow the specification of more than one algorithm and have multiple manifests generated?

Create a bag not extended but ask to fetch causes error.

Current logic in the bagit.php constructor means you can create a bag that is not extended but also make it fetch the contents of the fetch file.

To replicate, do

$tmp = "someNewDirectory";
new BagIt($tmp, false, false, true);

This results in the error Error: Call to a member function download() on null

Fetch should only be called if there is a fetch file loaded.

List and clear bag info keys

Bags cannot provide a list of info data keys, nor can it clear the info data, unless code accesses the bagInfoData member variable directly.

There should be methods in the BagIt class to access a list of keys and to clear the bag info data.

Fetched files not added to output of getBagContents()

Using the code in the "Reading a bag" example, with a bag that produces no errors and that has a fetch.txt, I was expecting the files identified in fetch.txt to end up in the same output directory as my payload files.

I can confirm that the remote file is being downloaded to the temporary directory with the rest of the payload files, but it is not being added to the output of getBagContents(). Is the intent that it get added to that array or do fetched files need to be handled separately? If the latter is the case, the example should illustrate how they could be copied like the payload files into the 'final/destination/' directory.

Remove line breaks from file encoding

From PHP

Notice: iconv(): Wrong charset, conversion from UTF-8\r' to UTF-8' is not allowed at...

Solution

lib/bagit_utils.php
readFileText method, Line 219, add:

    // Remove line breaks from $fileEncoding.
    $fileEncoding = preg_replace( '/\r|\n/', '', $fileEncoding );

Pull Request #18

Optionally have all errors throw some kind of Exception

It would be nice if there was a setting so all errors would throw some kind of Exception rather than requiring the extra step of calling getBagErrors() (tripped over this issue myself today).

I'm not likely to try and implement this change myself any time soon, as the code I'm working on (as of now) always remembers to call getBagErrors().

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.