samvera / hydra-derivatives Goto Github PK
View Code? Open in Web Editor NEWDerivative generation for Samvera repositories
License: Other
Derivative generation for Samvera repositories
License: Other
Move code from Hydra::Works full text extraction into hydra-derivatives allowing the use of the #make_derivatives configuration process defined by hydra-derivatives to identify that full text extraction should occur.
See issue samvera/hydra-works#220 in Hydra::Works for more details.
No such file or directory @ unlink_internal - /tmp/20516b3c-9efa-4b2e-b471-33fbdd822da7-120150827-22031-3paka9.pdf
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives/document.rb:20:in `unlink'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives/document.rb:20:in `block in encode_file'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives/services/tempfile_service.rb:34:in `block in default_tempfile'
/usr/local/lib/ruby/2.2.0/tempfile.rb:319:in `open'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives/services/tempfile_service.rb:25:in `default_tempfile'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives/services/tempfile_service.rb:20:in `tempfile'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives/services/tempfile_service.rb:7:in `create'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives/document.rb:14:in `encode_file'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives/shell_based_processor.rb:22:in `block in process'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives/shell_based_processor.rb:17:in `each'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives/shell_based_processor.rb:17:in `process'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives.rb:104:in `transform_file'
/opt/goldenseal/shared/bundle/ruby/2.2.0/bundler/gems/hydra-works-dc0bfd08eb24/lib/hydra/works/models/concerns/generic_file/derivatives.rb:17:in `block (2 levels) in <module:Derivatives>'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives.rb:66:in `call'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives.rb:66:in `block in create_derivatives'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives.rb:64:in `each'
/opt/goldenseal/shared/bundle/ruby/2.2.0/gems/hydra-derivatives-2.0.0/lib/hydra/derivatives.rb:64:in `create_derivatives'
/opt/goldenseal/shared/bundle/ruby/2.2.0/bundler/gems/curation_concerns-9b49d4d66ed5/curation_concerns-models/app/jobs/create_derivatives_job.rb:10:in `run'
/opt/goldenseal/shared/bundle/ruby/2.2.0/bundler/gems/curation_concerns-9b49d4d66ed5/curation_concerns-models/lib/curation_concerns/models/resque.rb:32:in `perform'
There are few instances of its
, and potentially other stuff that went away with rspec 3.
Removed from curation_concerns in samvera-deprecated/curation_concerns#184
When running Hydra::Derivatives::DocumentDerivatives
with a file name that includes parenthesis soffice
exits with an error.
Unable to execute command "soffice --invisible --headless --convert-to pdf --outdir /tmp /tmp/file(1).pptx".
Exit code: pid 18230 exit 2 Error message: sh: 1: Syntax error: "(" unexpected
Currently instructions specify 0.8.x (0.8.5 is known to be good)
.
Is there any reason why destination_name needs to be a concatenation of source_file and derivative name when it gets set here?
I think it makes sense to have it just be the derivative name. The way this works now I have to strip out the source name in Hydra::Works.
Remove mimeType references, pid references, and any other issues.
WARN: Unable to find a registered mime type for nil on http://127.0.0.1:8080/fedora/rest/prod/0r/96/73/73/0r967373m
Which is logged here:
https://github.com/projecthydra/hydra-derivatives/blob/master/lib/hydra/derivatives/extract_metadata.rb#L19
In curation_concerns we're delegating mime_type to the output of characterization here:
https://github.com/projecthydra-labs/curation_concerns/blob/6cd47b5173305e37ba7563850c3deaea09b60f45/curation_concerns-models/app/models/concerns/curation_concerns/generic_file/characterization.rb#L7
For the default processors: image, audio, video, document
Pass the name of the directive responsible for the derivative (e.g. thumbnail) as the original_name on the derivative file that gets emitted to output_file_service.call. This can be a convenience to downstream uses. The original_name is currently set to "derivative" and was previously nil.
The original_name attribute of the file passed to output_file_service.call
should be the name of the directive that caused the file to be generated. For example:
transform_file :original_file, thumbnail: { format: 'jpg', size: '338x493' }
should eventually pass a file to the output_file_service that has the attribute original_name set as "thumbnail".
Related to #70.
Some TIFF files contain metadata that causes ImageMagick to exit with non-zero status. This causes the derivative job to fail.
This is my output from Sufia:
`mogrify -resize 200x150> /tmp/mini_magick20160829-32286-19a488t.tiff\` failed with error: mogrify: /tmp/mini_magick20160829-32286-19a488t.tiff: unknown field with tag 37724 (0x935c) encountered. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/715. mogrify: /tmp/mini_magick20160829-32286-19a488t.tiff: Unknown tag 37724. `TIFFSetField' @ error/tiff.c/TIFFErrors/499.
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/mini_magick-4.5.1/lib/mini_magick/shell.rb:18:in `run'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/mini_magick-4.5.1/lib/mini_magick/tool.rb:92:in `call'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/mini_magick-4.5.1/lib/mini_magick/tool.rb:40:in `new'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/mini_magick-4.5.1/lib/mini_magick/image.rb:504:in `mogrify'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/mini_magick-4.5.1/lib/mini_magick/image.rb:395:in `method_missing'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/hydra-derivatives-3.1.1/lib/hydra/derivatives/processors/image.rb:32:in `block in create_resized_image'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/hydra-derivatives-3.1.1/lib/hydra/derivatives/processors/image.rb:38:in `create_image'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/hydra-derivatives-3.1.1/lib/hydra/derivatives/processors/image.rb:31:in `create_resized_image'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/hydra-derivatives-3.1.1/lib/hydra/derivatives/processors/image.rb:25:in `process_without_timeout'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/hydra-derivatives-3.1.1/lib/hydra/derivatives/processors/image.rb:8:in `process'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/hydra-derivatives-3.1.1/lib/hydra/derivatives/runners/runner.rb:32:in `block (2 levels) in create'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/hydra-derivatives-3.1.1/lib/hydra/derivatives/runners/runner.rb:29:in `each'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/hydra-derivatives-3.1.1/lib/hydra/derivatives/runners/runner.rb:29:in `block in create'
/data/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/curation_concerns-1.2.0/app/services/curation_concerns/local_file_service.rb:7:in `call'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/hydra-derivatives-3.1.1/lib/hydra/derivatives/runners/runner.rb:43:in `source_file'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/hydra-derivatives-3.1.1/lib/hydra/derivatives/runners/runner.rb:28:in `create'
/data/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/curation_concerns-1.2.0/app/models/concerns/curation_concerns/file_set/derivatives.rb:40:in `create_derivatives'
/data/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/curation_concerns-1.2.0/app/jobs/create_derivatives_job.rb:10:in `perform'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activejob-4.2.7/lib/active_job/execution.rb:32:in `block in perform_now'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:117:in `call'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:555:in `block (2 levels) in compile'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:505:in `call'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:498:in `block (2 levels) in around'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:343:in `block (2 levels) in simple'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/i18n-0.7.0/lib/i18n.rb:257:in `with_locale'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activejob-4.2.7/lib/active_job/translation.rb:7:in `block (2 levels) in <module:Translation>'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:441:in `instance_exec'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:441:in `block in make_lambda'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:342:in `block in simple'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:497:in `block in around'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:505:in `call'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:498:in `block (2 levels) in around'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:343:in `block (2 levels) in simple'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activejob-4.2.7/lib/active_job/logging.rb:23:in `block (4 levels) in <module:Logging>'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/notifications.rb:164:in `block in instrument'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/notifications.rb:164:in `instrument'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activejob-4.2.7/lib/active_job/logging.rb:22:in `block (3 levels) in <module:Logging>'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activejob-4.2.7/lib/active_job/logging.rb:43:in `block in tag_logger'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/tagged_logging.rb:68:in `block in tagged'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/tagged_logging.rb:26:in `tagged'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/tagged_logging.rb:68:in `tagged'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activejob-4.2.7/lib/active_job/logging.rb:43:in `tag_logger'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activejob-4.2.7/lib/active_job/logging.rb:19:in `block (2 levels) in <module:Logging>'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:441:in `instance_exec'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:441:in `block in make_lambda'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:342:in `block in simple'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:497:in `block in around'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:505:in `call'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:92:in `__run_callbacks__'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:778:in `_run_perform_callbacks'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activesupport-4.2.7/lib/active_support/callbacks.rb:81:in `run_callbacks'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activejob-4.2.7/lib/active_job/execution.rb:31:in `perform_now'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activejob-4.2.7/lib/active_job/execution.rb:21:in `execute'
/usr/local/hydra/lakeshore/shared/bundle/ruby/2.3.0/gems/activejob-4.2.7/lib/active_job/queue_adapters/resque_adapter.rb:46:in `perform'
The mogrify
command actually creates the correct derivative.
As per IM documentation [1] missing tags like the ones in the example above produce a status code between 300 and 399. Codes below 400 are considered warnings that still produce a valid image. Therefore the function calling the IM wrapper should not throw an exception if the return code is less than 400.
It is costly to pull down files and write them to disk unnecessarily. For sufficiently large files, this will break the ingest/derivative pipeline. This is made worse by attempts at job parallelization, where each job (potentially serviced on a different worker box) incurs this cost. But it is possible to avoid this problem.
Even though we are forking to shell for many of the non-ruby derivative processors, we should avoid forcing the input (and ideally output) to be literal filesystem files, when there is no such legitimate need:
This also allows optimizations for processors that don't use the bulk of a large file (e.g., only the metadata and first 2 minutes of, say, a 6 hour video). They can read until satisfied and then reset/close the IO. Most of the GBs are never pulled down, never put in memory, and never written to disk.
With a cloud-based platform like Hyku, it is very conceivable that this derivatives code is the tightest bottleneck in supporting large files.
Converting documents to pdf using libreoffice produces this error:
No such file or directory @ unlink_internal - /var/folders/yb/pbl7568x7073w02tyghtk3140000gp/T/thumbnail.pdf
Libreoffice converts the document to pdf successfully, but using the name of the source file. So, if your source file is example.odt
the resulting file is named example.pdf
. The encoding methods are expecting a file named thumbnail.pdf
. See:
Returns this:
Unable to execute command "kdu_compress -i /tmp/sufia20151020-74582-10wnpfx.tif -o /tmp/sufia20151020-74582-dpm4u5.jp2 -rate 2.4,1.48331273,0.91675694,0.56659885,0.3501847,0.21643059,0.13376427,0.0826726 -jp2_space sRGB -double_buffering 10 -num_threads 4 -no_weights Clevels=1 "Stiles={1024,1024}" "Cblk={64,64}" Cuse_sop=yes Cuse_eph=yes Corder=RPCL ORGgen_plt=yes ORGtparts=R". Exit code: pid 74792 SIGPIPE (signal 13)
The command runs fine on its own, I assume this is a problem with the new code around popen3?
Many partners use Kakadu for making JPEG2000 service files and have established profiles or "recipes" for doing so. As these recipes are complex and varied, the recipe and scenario should be configurable in a few different ways: via a config file, by passing a string, or else letting the application make a best guess.
This is expensive, potentially out of scope, requires parsing XML...all just to find out if the image is color or grayscale.
This
quality = image['%[channels]'] == 'gray' ? 'gray' : 'color'
is all we need.
See hydra-tech message from 9/4/2015:
I'm having the same problem. It seems like their fix could be pushed upstream.
Happy to do this and submit a PR unless there's a reason not to.
When processing office categorized documents and the conversion to pdf fails you get a strange error message No Such file or directory
instead of some indication that the pdf conversion failed.
This also shows as bad URI(is not URI?)
when the original filename contains a space or other non URI character.
Here is where the conversion happens that is returning no output:
https://github.com/projecthydra/hydra-derivatives/blob/master/lib/hydra/derivatives/processors/document.rb#L38
It would be better to capture the error and or verify the file exists before passing on to the ImageProcessor: https://github.com/projecthydra/hydra-derivatives/blob/master/lib/hydra/derivatives/processors/document.rb#L24
When I point my Gemfile to the master branch of projecthydra/hydra-derivatives, I get the following error starting rails console: https://gist.github.com/coblej/9209242 .
@jeremyf The error seems to be coming from https://github.com/projecthydra/hydra-derivatives/blob/master/lib/hydra/derivatives/railtie.rb#L3 . I think you made the commit that added this. Any insights on why it might be causing a problem? Should it be just "initializer" rather than "config.initializer" for Rails 4?
We don't really want any of this flattening business. Best way to make it optional? @awead
When using Processors::Image to create thumbnails from PDF files, the resulting image is completely black. This is a common issue in Imagemagick and there are available fixes:
Additional options need to be passed to the convert command if pdf is the source file.
To reproduce, execute convert -resize '200x150>' test.pdf test.jpg
on the attached file.
Passing any one of the following options to the convert command fixes the issue:
-flatten
-alpha flatten
-alpha remove
I think there is a non-backwards-compatible change introduced between 3.1.3 and 3.1.4. Starting with 3.1.4, I see this error when trying to generate a JP2:
Failure/Error:
Hydra::Derivatives::Jpeg2kImageDerivatives.create(
filename,
outputs: [
label: 'intermediate_file',
service: {
datastream: 'intermediate_file',
recipe: :default
},
url: derivative_url('intermediate_file')
]
NoMethodError:
protected method `long_dim' called for Hydra::Derivatives::Processors::Jpeg2kImage:Class
Steps to reproduce the problem:
irb -I./lib
2.3.3 :001 > require 'hydra/derivatives'
=> true
2.3.3 :002 > Hydra::Derivatives.enable_ffmpeg
=> true
2.3.3 :003 > Hydra::Derivatives.enable_ffmpeg = false
=> false
2.3.3 :004 > Hydra::Derivatives.enable_ffmpeg
=> true
enable_ffmpeg
always returns true, even when you set it to falseIt looks like hydra-derivatives now expects FITS 0.8.x output or something similar. Can someone confirm which versions of FITS are compatible with the current release (3.0.1) of hydra-derivatives and update the README?
The current README says version 0.6.2 is required here: https://github.com/projecthydra/hydra-derivatives#dependencies
If I want to create my own processor class, I have to create an instance of Hydra::Derivatives::MyProcessor
. It would easier to create:
class MyProcessor < Hydra::Derivatives::Image
end
and pass that class in the makes_derivatives
block. From what I can tell, I can't do that.
Turn on coveralls for hydra derivatives
audio_derivatives_spec.rb
is located under spec/services
and that doesn't seem like an appropriate place for it. It should maybe move into units/transcoding_spec.rb
?
Libreoffice returns zero status when it errors:
awead@pooh T $ soffice --invisible --headless --convert-to doc --outdir /. non-existent-file.txt
Error: source file could not be loaded
awead@pooh T $ echo $?
0
Output is directed to STDERR, so maybe we should parse that and raise something?
The output file service is configured to pass around mime_type as an option and use that when attaching files. Rather than passing around the mime_type option, the file object should respond to .mime_type. Also the file sent to the output file service should respond to .read. This is blocking samvera/hydra-works#183
Anything that calls .output_file_service (as well as uses it such as the PersistBasicContainedOutputFile service) should be modified to remove the mime_type option and to make sure the file passed around responds to .mime_type.
These are mainly found in the derivative processors:
Dir::Tmpname.create(['sufia', ".#{file_suffix}"], Hydra::Derivatives.temp_file_base){}
at line 40 in hydra-derivatives/lib/hydra/derivatives/processors/shell_based_processor.rb
Dir::Tmpname.create(['sufia', ext], Hydra::Derivatives.temp_file_base){}
at line 62 in hydra-derivatives/lib/hydra/derivatives/processors/jpeg2k_image.rb
> file = File.open('/etc/passwd')
=> #<File:/etc/passwd>
> io1 = Hydra::Derivatives::IoDecorator.new(file, 'image/png', "foobar.png")
=> #<File:/etc/passwd>
> io2 = Hydra::Derivatives::IoDecorator.new(file, 'image/png', "foobar.png")
=> #<File:/etc/passwd>
> file == file
=> true
> io1.original_name == io2.original_name
=> true
> io1.mime_type == io2.mime_type
=> true
> io1.__getobj__ == io2.__getobj__
=> true
> io1 == io2
=> false
Final line should be true
. Literally all aspects of the two objects are identical, including the internal SimpleDelegator delegate object! Failing to provide adequate comparison makes the class untestable.
Indeed, in rspec testing, I get unhelpful failures like:
expected: (#<Hydra::Derivatives::IoDecorator(#<File:/Users/atz/repos/hyrax/spec/fixtures/world.png>)>)
got: (#<Hydra::Derivatives::IoDecorator(#<File:/Users/atz/repos/hyrax/spec/fixtures/world.png>)>)
Diff:
Literally, "I got exactly what I expected, there is no discernable difference, but I failed anyway."
Note that this seems to be a limitation of SimpleDelegator with IO objects that is not part of the objects themselves:
> file == file
=> true
> SimpleDelegator.new(1) == SimpleDelegator.new(1)
=> true
> SimpleDelegator.new(file) == SimpleDelegator.new(file)
=> false
That makes it a particularly questionable choice for Hydra::Derivatives::IoDecorator
, since all it does is wrap IO.
As the recipe it is likely being read in from a YAML file, making the key a String makes more sense.
I was running derivative creation on the ~75k objects in our repository. However, suddenly, all of my derivatives started error-ing and the AVI Processing server reported it had no space left on its hard drive. Looking at /tmp/, there are tons of files like: mini_magick20140730-1251-4a7xzf.jpg that ate up all of the available space (~60 GB of them).
I recalled that when using MiniMagick in the past, I was using a specific unmerged pull request from someone else for that exact issue in the past (though I stuck with RMagick in the end due to other image processing bugs MiniMagick has). Regardless, the relevant unmerged pull request is: minimagick/minimagick#188
Once the current backlog of ~2k remaining failed objects finishes, I'll attempt to use that old pull request to confirm it fixes the issue as it did in the past. Assuming it does, not sure if the best course of action is to pressure MiniMagick about accepting it to fix there bug or just using that unmerged version of the code?
To duplicate the jpg spam, just create a thumbnail derivative from a tif and check your /tmp/ directory. These files will get deleted upon a restart (by default) but MiniMagick leaving them behind when done with them just doesn't seem like correct behavior.
I'm seeing a few failures running the "attached RAW image" specs at https://github.com/projecthydra/hydra-derivatives/blob/master/spec/units/transcoding_spec.rb#L104. I have a fix ready, PR incoming.
Steps to reproduce:
run bundle exec rspec spec/units/transcoding_spec.rb:104
The deprecated method transform_datastream is passing an empty options hash to transform_file. This means that only image derivatives get created as the default processor.
def transform_datastream(file_name, transform_directives, opts={})
transform_file(file_name, transform_directives, opts={})
end
deprecation_deprecate :transform_datastream
It should just pass opts.
This is shown in PR #59
hydra-derivatives currently uses libfdk_aac for encoding audio, if you just install ffmpeg
from a Linux distro's package repos it's not going to have it and you'll need to compile ffmpeg to support it.
All of the processors wrap files/IO streams in a IoDecorator class, which responds to .original_name
. It isn't set in the processors, so currently it is up to the output_file_service to determine its value. The PersistBasicContainedOutputFileService uses the determine_original_name method to set it with a value. However, since all of the IoDecorator objects respond to .original_name
and that value is nil, the function returns nil.
I'm not really sure how original_name
is used outside of Derivatives, but maybe we should have it set in the processors to ensure it returns a non-nil value.
Make sure runners have tests and are organized appropriately in spec/runners
Under load, it is terribly inefficient to fork to system to spin up a new JVM for each FITS call. It is bad enough to crash several increasingly large AWS VMs that we have been testing out. It imposes a huge scaling cost on attempting to get parallelism.
A more appropriate architecture would put the edu.harvard.hul.ois.fits.Fits
class in memory once and service multiple requests from the same JVM, exactly what https://github.com/harvard-lts/FITSservlet does. Hydra::Derivatives
should support (and likely recommend) the servlet architecture.
Requires #124
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.