scholarslab / bagitphp Goto Github PK
View Code? Open in Web Editor NEWA PHP implementation of https://wiki.ucop.edu/display/Curation/BagIt.
Home Page: http://scholarslab.github.com/BagItPHP/
License: Apache License 2.0
A PHP implementation of https://wiki.ucop.edu/display/Curation/BagIt.
Home Page: http://scholarslab.github.com/BagItPHP/
License: Apache License 2.0
Hello, I did some testing with existing bag-info.txt, and used BagItPHP to update the structure. I noticed two unexpected things:
DC-Author: Wayne Graham
DC-Author: Mark Jordan
gets changed to:
dc-author: Mark Jordan
Your opinion?
There are getBagInfo()
, getBagInfoKeys()
and getBagInfoData($key)
functions.
But getBagInfo()
actually refers to information parsed from the bagit.txt
where the other two refer to information parsed from the bag-info.txt
file.
Seems like getBagInfo()
should have a different name, like getBagDeclaration()
?
The line
if(count($bag->getErrors()) == 0 {
should be
if (count($bag->getBagErrors()) == 0) {
Wrong method name and also missing closing ')'.
Hi,
I can't figure out how to populate bag-info.txt. I've tried the following, but bag-info.txt is not populated:
$bag = new BagIt($bag_output_path);
$bag->bagInfoData = array(
'First' => 'This is the first tag value',
'Second' => 'This is the second tag value'
);
$bag->addFile('/tmp/test.txt', 'test.txt');
$bag->update();
$bag->package($serialized_bag_path, 'zip');
I've taken a look at the bagit.php code and can see the _createExtendedBag() function, but at the end of the function it looks like an empty file is created by touch() and $this->bagInfoData = array() isn't populated with any values. Any help would be appreciated.
Should the default payload be auto-magically relative to the 'data' directory, or are there instances where this would lead to some consequences?
Now that we're using make
+ composer
, I'm not sure that we need this anymore. We should probably migrate the code quality tasks to the Makefile
, though.
Originating from #15
The ZipArchive class doesn't create empty directories when it generates zip files. This means that the /data directory doesn't get created when a bag without any payload files is compressed to the .zip format. Then, when the bag is read, the library kicks out an error when it doesn't find the data directory and the bag doesn't validate.
Fix by checking for the data directory on ingestion and creating it if it doesn't exist.
If you construct a BagIt on an existing empty directory, Bagit calls _openBag (as it should) but does not (re)create the bagit.txt file.
I'm not sure that it should (re)create bagit.txt -- that is why I filed an issue rather than forking BagItPHP and fixing the issue.
fetch.txt is added to the bag even if $fetch is false in the BagIt constructor. I think this file should not be added if it is not used as the BagIt spec indicates that fetch.txt is optional.
I would be happy to add code to eliminate fetch.txt if $fetch is false and issue a pull request.
I'm new to BagIt so I apologize if this is an obvious question.
If I try to specify sha256
or sha512
I get an error due to this check.
The BagIt specification says you SHOULD
use one of these two algorithms but does not disallow others. Is there a reason for forcing a choice of one these two algorithms?
Alternatively, would it be acceptable to look at extending the code to allow the specification of more than one algorithm and have multiple manifests generated?
Current logic in the bagit.php
constructor means you can create a bag that is not extended but also make it fetch the contents of the fetch file.
To replicate, do
$tmp = "someNewDirectory";
new BagIt($tmp, false, false, true);
This results in the error Error: Call to a member function download() on null
Fetch should only be called if there is a fetch file loaded.
Bags cannot provide a list of info data keys, nor can it clear the info data, unless code accesses the bagInfoData member variable directly.
There should be methods in the BagIt class to access a list of keys and to clear the bag info data.
Using the code in the "Reading a bag" example, with a bag that produces no errors and that has a fetch.txt, I was expecting the files identified in fetch.txt to end up in the same output directory as my payload files.
I can confirm that the remote file is being downloaded to the temporary directory with the rest of the payload files, but it is not being added to the output of getBagContents(). Is the intent that it get added to that array or do fetched files need to be handled separately? If the latter is the case, the example should illustrate how they could be copied like the payload files into the 'final/destination/' directory.
From PHP
Notice: iconv(): Wrong charset, conversion from UTF-8\r' to
UTF-8' is not allowed at...
Solution
lib/bagit_utils.php
readFileText method, Line 219, add:
// Remove line breaks from $fileEncoding.
$fileEncoding = preg_replace( '/\r|\n/', '', $fileEncoding );
Pull Request #18
If i set
$bag->setHashEncoding("md5");
The manifest is indeed changed to:
manifest-md5.txt
The tagmanifest remains
tagmanifest-sha1.txt
Shouldn't this also be using md5?
It would be nice if there was a setting so all errors would throw some kind of Exception rather than requiring the extra step of calling getBagErrors() (tripped over this issue myself today).
I'm not likely to try and implement this change myself any time soon, as the code I'm working on (as of now) always remembers to call getBagErrors().
Related to mjordan/islandora_bagger#38
This is where your installed version of PHP does not have the Zip extension installed which BagItPHP assumes. This results in an error like.
In bagit_utils.php line 437:
Attempted to load class "ZipArchive" from the global namespace.
Did you forget a "use" statement?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.