Giter Club home page Giter Club logo

jhove's Introduction

JHOVE

JSTOR/Harvard Object Validation Environment

Build Status Build Status Build Status Maven Central CodeCov Coverage Codacy Badge

GitHub issues GitHub forks

Licensing

Copyright 2003-2012 by JSTOR and the President and Fellows of Harvard College, 2015-2022 by the Open Preservation Foundation. JHOVE is made available under the GNU Lesser General Public License (LGPL).

Rev. 1.26.1, 2022-07-14

JHOVE Homepage

http://jhove.openpreservation.org/

Overview

JHOVE (the JSTOR/Harvard Object Validation Environment, pronounced "jove") is an extensible software framework for performing format identification, validation, and characterization of digital objects.

  • Format identification is the process of determining the format to which a digital object conforms: "I have a digital object; what format is it?"
  • Format validation is the process of determining the level of compliance of a digital object to the specification for its purported format: "I have an object purportedly of format F; is it?"
  • Format characterization is the process of determining the format-specific significant properties of an object of a given format: "I have an object of format F; what are its salient properties?"

These actions are frequently necessary during routine operation of digital repositories and for digital preservation activities.

The output from JHOVE is controlled by output handlers. JHOVE uses an extensible plug-in architecture; it can be configured at the time of its invocation to include whatever specific format modules and output handlers that are desired. The initial release of JHOVE includes modules for arbitrary byte streams, ASCII and UTF-8 encoded text, AIFF and WAVE audio, GIF, JPEG, JPEG 2000, TIFF, and PDF; and text and XML output handlers.

The JHOVE project is a collaboration of JSTOR and the Harvard University Library. Development of JHOVE was funded in part by the Andrew W. Mellon Foundation. JHOVE is made available under the GNU Lesser General Public License (LGPL; see the file LICENSE for details).

JHOVE is currently being maintained by the Open Preservation Foundation.

Pre-requisites

  1. Java JRE 1.8
    Version 1.20 of JHOVE is built and tested against Oracle JDK 8, and OpenJDK 8 on Travis. Releases are built using Oracle JDK 8 from the OPF's Jenkins server.

  2. If you would like to build JHOVE from source, then life will be easiest if you use Apache Maven.

Getting JHOVE

For Users: JHOVE Cross Platform Installer

You can download the latest version of JHOVE here.

For Developers: JHOVE JARs via Maven

From v1.16 onwards all production releases of JHOVE are deployed to Maven Central. Add the version of JHOVE you'd like to use as a property in your Maven POM:

<properties>
  ...
  <jhove.version>1.20.1</jhove.version>
</properties>

Use this dependency for the core classes Maven module (e.g. JhoveBase, Module, ModuleBase, etc.):

<dependency>
  <groupId>org.openpreservation.jhove</groupId>
  <artifactId>jhove-core</artifactId>
  <version>${jhove.version}</version>
</dependency>

this for the JHOVE internal module implementations:

<dependency>
  <groupId>org.openpreservation.jhove</groupId>
  <artifactId>jhove-modules</artifactId>
  <version>${jhove.version}</version>
</dependency>

this for the JHOVE external module implementations:

<dependency>
  <groupId>org.openpreservation.jhove</groupId>
  <artifactId>jhove-ext-modules</artifactId>
  <version>${jhove.version}</version>
</dependency>

and this for the JHOVE applications:

<dependency>
  <groupId>org.openpreservation.jhove</groupId>
  <artifactId>jhove-apps</artifactId>
  <version>${jhove.version}</version>
</dependency>

If you want the latest development packages you'll need to add the Open Preservation Foundation's Maven repository to your settings file:

  <profiles>
    <profile>
      <id>opf-artifactory</id>
      <repositories>
        <repository>
          <snapshots>
            <enabled>false</enabled>
          </snapshots>
          <id>central</id>
          <name>opf-dev</name>
          <url>http://artifactory.openpreservation.org/artifactory/opf-dev</url>
        </repository>
      </repositories>
    </profile>
  </profiles>
  <activeProfiles>
    <activeProfile>opf-artifactory</activeProfile>
  </activeProfiles>

You can then follow the instructions above to include particular Maven modules, but you can now also choose odd minor versioned development builds. At the time of writing the latest development version could be included by using the following property:

<properties>
  ...
  <jhove.version>1.21.1</jhove.version>
</properties>

or even:

<properties>
  ...
  <jhove.version>[1.21.0,1.22.0]</jhove.version>
</properties>

to always use the latest 1.21 build.

For Developers: Building JHOVE from Source

Clone this project, checkout the integration branch, and use Maven, e.g.:

git clone [email protected]:openpreserve/jhove.git
cd jhove
git checkout integration
mvn clean install

See the Project Structure section for a guide to the Maven artifacts produced by the build.

Installation

Application Installation

Download the JHOVE installer. The installer itself requires Java 1.6 or later to be pre-installed. Installation is OS dependant:

Windows

Currently only tested on Windows 7.

Simply double-click the downloaded installer JAR. If Java is installed then the windowed installer will guide you through selection. It's best to stay with the default choices if installing the beta.

Once the installation has finished you'll be able to double-click C:\Users\yourName\jhove\jhove-gui to start the JHOVE GUI. Alternatively, open a Command window, e.g. press the Windows key and type cmd, then issue these commands:

C:\Users\yourName>cd jhove
C:\Users\yourName\jhove>jhove

to display the command-line usage message.

It is also possible to use JHOVE with the openJDK, e. g. jdk-13. It might be necessary to set the java path in the Environment variables, for which one usually needs administration rights for the windows machine.

Mac OS

Currently only tested on OS X Mavericks.

Simply double-click the downloaded installer JAR. If Java is installed then the windowed installer will guide you through selection. It's best to stay with the default choices if installing the beta.

Once the installation has finished you'll be able to double-click /Users/yourName/jhove/jhove-gui to start the JHOVE GUI. Alternatively, open a Terminal command window and then issue these commands:

cd ~/jhove
./jhove

to display the command-line usage message.

Linux

Currently tested on Ubuntu 16.10 and Debian Jessie.

Once the installer has downloaded, start a terminal, e.g. Ctrl+Alt+T, and type the following, assuming the download is in ~/Downloads:

java -jar ~/Downloads/jhove-latest.jar

Once the installation is finished you'll be able to:

cd ~/jhove
./jhove

to run the command-line application and show the usage message. Alternatively:

cd ~/jhove
./jhove-gui

will run the GUI application.

Distribution

We've moved to Maven and have taken the opportunity to update the distribution. For now we're producing:

  • a Maven package, for developers wishing to incorporate JHOVE into their own software;
  • a "fat" (1MB) JAR that contains the old CLI and desktop GUI, for anyone who doesn't want to use the new installer; and
  • a simple cross-platform installer that installs the application JAR, support scripts, etc.

Usage

jhove [-c config] [-m module] [-h handler] [-e encoding] [-H handler]
           [-o output] [-x saxclass] [-t tempdir] [-b bufsize]
           [-l loglevel] [[-krs] dir-file-or-uri [...]]

-c config   Configuration file pathname
-m module   Module name
-h handler  Output handler name (defaults to TEXT)
-e encoding Character encoding used by output handler (defaults to UTF-8)
-H handler  About handler name
-o output   Output file pathname (defaults to standard output)
-x saxclass SAX parser class (defaults to J2SE default)
-t tempdir  Temporary directory in which to create temporary files
-b bufsize  Buffer size for buffered I/O (defaults to J2SE 1.4 default)
-l loglevel Logging level
-k          Calculate CRC32, MD5, and SHA-1 checksums
-r          Display raw data flags, not textual equivalents
-s          Format identification based on internal signatures only
dir-file-or-uri Directory or file pathname or URI of formated content
                stream

All named modules and output handlers must be found on the Java CLASSPATH at the time of invocation. The JHOVE driver script, jhove/jhove, automatically sets the CLASSPATH and invokes the Jhove main class:

jhove [-c config] [-m module] [-h handler] [-e encoding] [-H handler]
      [-o output] [-x saxclass] [-t tempdir] [-b bufsize] [-l loglevel]
      [[-krs] dir-file-or-uri [...]]

The following additional programs are available, primarily for testing and debugging purposes. They display a minimally processed, human-readable version of the contents of AIFF, GIF, JPEG, JPEG 2000, PDF, TIFF, and WAVE files:

java ADump  aiff-file
java GDump  gif-file
java JDump  jpeg-file
java J2Dump jpeg2000-file
java PDump  pdf-file
java TDump  tiff-file
java WDump  wave-file

For convenience, the following driver scripts are also available:

adump  aiff-file
gdump  gif-file
jdump  jpeg-file
j2dump jpeg2000-file
pdump  pdf-file
tdump  tiff-file
wdump  wave-file

The JHOVE Swing-based GUI interface can be invoked from a command shell from the jhove/bin sub-directory:

jhove-gui -c <configFile>

where <configFile> is the pathname of the JHOVE configuration file.

Project Structure

A quick introduction to the restructured Maven project. The project's been broken into three Maven modules with an additional installer module added.

jhove/
  |-jhove-apps/
  |-jhove-core/
  |-jhove-installer/
  |-jhove-ext-modules/
  |-jhove-modules/

All Maven artifacts are produced in versioned form, i.e. ${artifactId}-${project.version}.jar, where ${project.version} defaults to 1.20.0 unless you explicitly set the version number.

jhove

The jhove project root acts as a Maven parent and reactor for the sub-modules. This simply builds sub-modules and doesn't produce any artifacts, but decides which sub-modules are built.

The jhove-core and jhove-modules are most likely all that are required for developers wishing to call and run JHOVE from their own code.

jhove-core

The jhove-core module contains all of the main data type definitions and the output handlers. This module produces a single JAR:

./jhove/jhove-core/target/jhove-core-${project.version}.jar

The jhove-core JAR contains a single module implementation, the default BytestreamModule. For the format-specific modules you'll need the jhove-modules JAR.

jhove-modules

The jhove-modules contains all of JHOVE's core format-specific module implementations, specifically:

  • AIFF
  • ASCII
  • GIF
  • HTML
  • JPEG
  • JPEG 2000
  • PDF
  • TIFF
  • UTF-8
  • WAVE
  • XML

These are all packaged in a single modules JAR:

./jhove/jhove-modules/target/jhove-modules-${project.version}.jar

jhove-ext-modules

The jhove-ext-modules contains JHOVE modules developed by external parties, specifically:

  • PNG
  • WARC
  • GZIP
  • EPUB

These are all packaged in a single modules JAR:

./jhove/jhove-ext-modules/target/jhove-ext-modules-${project.version}.jar

jhove-apps

The jhove-apps module contains the command-line and GUI application code and builds a fat JAR containing the entire Java application. This JAR can be used to execute the command-line app:

./jhove/jhove-apps/target/jhove-apps-${project.version}.jar

jhove-installer

Finally, the jhove-installer module takes the fat JAR and creates a Java-based installer for JHOVE. The installer bundles up invocation scripts and the like, installs them under <userHome>/jhove/ (default, can be changed) while also looking after:

  • variable substitution to ensure that JHOVE_HOME and the like are set to reflect a user's install location;
  • making sure that Windows users get batch scripts, while Mac and Linux users get bash scripts; and
  • optionally generating unattended install and uninstall files.

The module produces two JARs, one called jhove-installer-${project.version}, which contains the JARs for the installer, and an executable JAR to install JHOVE:

./jhove/jhove-installer/target/jhove-xplt-installer-${project.version}.jar

The xplt stands for cross-platform.

jhove's People

Contributors

andreakb avatar anjackson avatar archivist-liz avatar asciim0 avatar bezrukovm avatar bitsgalore avatar brunolmfg avatar carlwilson avatar david-russo avatar deanforsmith avatar garvita-jain avatar georgiamoppett avatar gmcgath avatar jaygattusonlnz avatar jolf avatar karenhanson avatar leefrank9527 avatar marhop avatar marti1125 avatar maximplusov avatar nvanderperren avatar paulmer avatar prettybits avatar pwinckles avatar rosetta-development avatar rsteph-de avatar sakthi-goutham avatar samalloing avatar tledoux avatar transifex-integration[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jhove's Issues

WaveModule->FormatChuk: ArrayIndexOutOfBoundsException

Class: package edu.harvard.hul.ois.jhove.module.wave.FormatChunk

The setting of "compName" in readChunk() will give an ArrayIndexOutOfBoundsException for all "compressionCode" values greater than than the length of WaveStrings.COMPRESSION_INDEX[](eg. 0xfffe).

Also the calculated value will be wrong for all values of "compressionCode" that is greater than 0xB.

Enhance JHOVE's checksumming capabilities

Dev Effort

1D

See PR: #386

Description

The digest algorithms currently supported by JHOVE are:

  • CRC32
  • MD5
  • SHA1

Java provides native support for these additional algorithms:

  • SHA256
  • SHA384
  • SHA512

These could be added quite easily but this would also require a change to JHOVE's config to allow the user to select the algorithms they wanted to use.

README out of date

There is a v1.14 release, but the README still has a lot of outdated info, e.g. "The OPF is preparing to release a JHOVE 1.12.x-beta in September".

XMLHandler outputs wrong bitPerSample in MIX

While using XMLHandler, JHOVE creates incorrect number (2) of mix:bitsPerSampleValue MIX tags:

<mix:ImageColorEncoding>
  <mix:BitsPerSample>
    <mix:bitsPerSampleValue>8</mix:bitsPerSampleValue>
    <mix:bitsPerSampleValue>8</mix:bitsPerSampleValue>
    <mix:bitsPerSampleUnit>integer</mix:bitsPerSampleUnit>
  </mix:BitsPerSample>
  <mix:samplesPerPixel>3</mix:samplesPerPixel>
</mix:ImageColorEncoding>

The attached patch fixes iteration over the array of bitsPerSampleValue starting from 0. The accompanying patch is available from SourceForge: https://sourceforge.net/p/jhove/patches/_discuss/thread/5d7c7155/77d4/attachment/XmlHandler.patch

JPEG/Exif image incorrectly "Not well-formed"

When a JPEG file begins with a APP1 and has a APP0 after, the file is declared "Not well-formed" with the following message : "JFIF APP0 marker not at beginning of file"

Even though the file is indeed not conformant with JPEG JFIF standard, it still conforms to the JPEG/Exif standard (also known as JEITA CP-3451) which Jhove is supposed to handle as specify in http://jhove.openpreservation.org/modules/jpeg/

The following file shows this behaviour
20150213_140637.zip

ICC Profile extraction

Hi,
I'm using JHove 1.9 and I want to validate ICC Profile from TIFF image created by scanner. Can JHove extract this information and add it into metadata?
I don't know TIFF standard very close. I want to extract CPP-CS-2012-1498 from following information:

00000500 00 07 57 2c 00 07 f9 01 43 49 45 44 00 07 57 2c |..W,....CIED..W,|
00000510 00 07 f9 01 64 65 73 63 00 00 00 00 00 00 00 11 |....desc........|
00000520 43 50 50 2d 43 53 2d 32 30 31 32 2d 31 34 39 38 |CPP-CS-2012-1498|

Could you help me please ?

@gmcgath commented on SourceForge:

ICC Profile validation would certainly be a useful thing for JHOVE to do. It's a significant task, though, so it's not likely to happen unless it gets funding from somewhere.
If anyone does want to undertake this project, of course, they're welcome to contribute. JHOVE is open source, after all.

JHOVE Incorrectly reading beyond RIFF 'data' Chunk ID and calling it invalid...

I have received a 2GB wav file that I'm having difficulty validating in JHOVE. The tool tells me that I have an invalid character within a CHUNK ID.

Analyzing the file, however, and it seems that JHOVE is reading beyond the CHUNK ID and returning an invalid result.

52 49 46 46 
Chunk ID: 'RIFF'

F8 DE A0 84 
Chunk Size: ~2GB

57 41 56 45 
Format: 'WAVE'

66 6D 74 20 
Sub Chunk 1 ID: 'fmt'

10 00 00 00 
Sub Chunk 1 Size: 16

01 00 
Audio Format: WAVE_FORMAT_PCM

01 00 
Number of Channels: 1

00 77 01 00 
Sample Rate: 96000

00 65 04 00 
Byte Rate: 288,000

03 00 
Block Align: 3

18 00 
Bits per sample: 24-bits

64 61 74 61 
Sub Chunk 2 ID: 'data'

80 C6 A0 84 
Sub Chunk 2 Size: ~1.6GB

A7 *05 00* 70 04 00 6E F6 FF E9 FC FF F7 F4 FF B5 24 00
... data / payload ...

The error message seems to be returned from this part of the code:

https://github.com/gmcgath/jhove/blob/0dc774d98efa8c7581fe1602c3f6e713f499201d/src/main/java/edu/harvard/hul/ois/jhove/module/iff/ChunkHeader.java#L53

The byte causing the first issue is 0x05 at offset 46, I've starred offset 46 and 47. See also the screenshot.

The screenshot has been generated by looking at the following snippet from the 2GB file:

52 49 46 46 F8 DE A0 84 57 41 56 45 66 6D 74 20 10 00 00 00 01 00 01 00 00
77 01 00 00 65 04 00 03 00 18 00 64 61 74 61 80 C6 A0 84 A7 05 00 70 04 00 
6E F6 FF E9 FC FF F7 F4 FF B5 24 00 F2 FC FF 88 FC FF 2C E8 FF 1B 08 00 74 
03 00 26 EE FF 20 F6 FF 86 F6 FF 33 01 00 5F F3 FF C0 FC FF 47

The analysis shows, that 0x05 is no longer in the CHUNK ID, nor is the preceding byte 0x00, which will also show up in error if one artificially turns 0x05 into a byte greater than 0x32.

Screenshot:

invalid_character

JHOVE Version: 1.11
Java: 1.7
Platform: Windows XP SP3
Creating Application (WAV): Adobe Audition CS6 (Macintosh)

Whitespace in name of target file

Is there a way to work on files with witespace(s)?
e.g. on Linux

touch 'test file.zip'
jhove test\ file.zip # or even jhove 'test file.zip', I tried it in many different ways 

The output from jhove is (it thinks I pass 2 files):

Jhove (Rel. 1.15.0-SNAPSHOT, 2016-08-29)
 Date: 2016-08-29 13:11:25 CEST
 RepresentationInformation: test\
  Status: Not well-formed
  ErrorMessage: file not found
 RepresentationInformation: file.zip
  Status: Not well-formed
  ErrorMessage: file not found

The only way it worked for me was to use wildcard

jhove 'test*file.zip' 

Then I get the Output:

Jhove (Rel. 1.15.0-SNAPSHOT, 2016-08-29)
 Date: 2016-08-29 13:11:44 CEST
 RepresentationInformation: test file.zip
  ReportingModule: BYTESTREAM, Rel. 1.3 (2007-04-10)
  LastModified: 2016-08-29 13:08:11 CEST
  Size: 0
  Format: bytestream
  Status: Well-Formed and valid
  SignatureMatches:
   WARC-kb
   GZIP-kb
  InfoMessage: Zero-length file
  MIMEtype: application/octet-stream

BUT THAT IS TO DIRTY !!!

Is it possible to repair it, or is there another useable way to utilize jhove without such problems?

incorrect validity report on image

I tested JHOVE2 (1.11) with a "invalid image" (https://bitbucket.org/tdar/tdar.src/src/9c2656809786e6a8730e57e3b71333b9aa5258fd/test-resources/src/main/resources/images/sample_image_formats/grandcanyon_lzw_corrupt.tif?at=default ) It won't open in Photoshop or preview, tiffinfo (libtiff) and identify (imagemagick) both read it as invalid, but jhove seems to report it as "well formed and valid":

[abrin@dev jhove]$ ./jhove ~tdar/tdar.src/test-resources/src/main/resources/images/sample_image_formats/grandcanyon_lzw_corrupt.tif 
Mar 15, 2016 9:02:57 AM edu.harvard.hul.ois.jhove.JhoveBase init
SEVERE: Testing SEVERE level
Jhove (Rel. 1.11, 2013-09-29)
 Date: 2016-03-15 09:02:57 MST
 RepresentationInformation: /home/tdar/tdar.src/test-resources/src/main/resources/images/sample_image_formats/grandcanyon_lzw_corrupt.tif
  ReportingModule: BYTESTREAM, Rel. 1.3 (2007-04-10)
  LastModified: 2015-08-30 16:06:33 MST
  Size: 50496
  Format: bytestream
  Status: Well-Formed and valid
  MIMEtype: application/octet-stream


[abrin@dev jhove]$ tiffinfo ~tdar/tdar.src/test-resources/src/main/resources/images/sample_image_formats/grandcanyon_lzw_corrupt.tif 
/home/tdar/tdar.src/test-resources/src/main/resources/images/sample_image_formats/grandcanyon_lzw_corrupt.tif: Not a TIFF or MDI file, bad magic number 24909 (0x614d).
[abrin@dev jhove]$ identify ~tdar/tdar.src/test-resources/src/main/resources/images/sample_image_formats/grandcanyon_lzw_corrupt.tif 
identify.im6: Not a TIFF or MDI file, bad magic number 24909 (0x614d). `/home/tdar/tdar.src/test-resources/src/main/resources/images/sample_image_formats/grandcanyon_lzw_corrupt.tif' @ error/tiff.c/TIFFErrors/508.

CrossRefStream incorrectly assumes /Index value is a 2 element array

The "isValid" method of CrossRefStream is hard coded to assume that an Index element, if present, is an array of exactly 2 integers. According to the specification, the Index element is "an array containing a pair of integers for each subsection in this section." (http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference15_v6.pdf, page 83 and PDF versions 1.6 and 1.7) Documents that have more than one subsection fail validation as a result. PdfModule.readXRefStreams appears to incorporate this assumption and does not validate object numbers against the actual object ranges specified in the index, but instead looks for numbers between 0 and the (meaningless?) value of CrossRefStream.getNumObjects()

Enable generation of textMD property for text files

Here is a patch against 1.4 version so that Jhove can generate a property conformant with the textMD schema (see http://www.loc.gov/standards/textMD\) for textual files.
The initial thought was to make a simple XSLT transform over the output of jhove in order to generate this information but this doesn't work well because:

  1. not all the needed information is generated by jhove or the output information is already bundled and
  2. the correct management of the charset and the language need to be programmatically verified.

This patch modifies 4 modules:

  • ASCII-hul
  • UT8-hul
  • HTML-hul
  • XML-hul (the version number has been modified appropriately).

A parameter withTextMD=true activates for each module the generation of the property (see jhove-withTextMD.conf, for an example)
The default is to not generate it to behave as before.
I added the determination of the line ending in html and xml to be able to generate the required element :

  • there is no performance penalty since the stream classes have been modified using the same algorithm that the one in ASCII module.
  • I decided NOT to add a TextMDMetadata property type so that the schema jhove.xsd will be unchanged.

So the TextMDMetadata property is of OBJECT type.
The TextHandler and XmlHandler are modified to generate the information (the version number has been modified appropriately).
Hope this patch could be added into Jhove to enhance its handling of textual files.
Thanks for your attention.

The accompanying patch is available from SourceForge: https://sourceforge.net/p/jhove/patches/_discuss/thread/ef9d4da0/52ff/attachment/withTextMD.patch

Java exception under Windows; seems to be config related

Dev Effort

0.5D

Description

After installing JHOVE on Windows and configuring it as described in the readme, execution results in:

Exception in thread "main" java.lang.NoClassDefFoundError: edu/harvard/hul/ois/j
hove/viewer/ConfigWindow
        at edu.harvard.hul.ois.jhove.DefaultConfigurationBuilder.writeDefaultCon
figFile(Unknown Source)
        at edu.harvard.hul.ois.jhove.JhoveBase.init(Unknown Source)
        at Jhove.main(Unknown Source)
Caused by: java.lang.ClassNotFoundException: edu.harvard.hul.ois.jhove.viewer.Co
nfigWindow
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 3 more

This happens under Windows 7; java version is 1.8.0_51. Going back through my files I see this is the same error I got almost 2 years ago:

http://openpreservation.org/blog/2014/01/31/why-cant-we-have-digital-preservation-tools-just-work/

And I remember this started happening after some changes to the way JHOVE looks for its configuration, which was prompted by this long-running issue:

http://sourceforge.net/p/jhove/bugs/53/

If I explicitly specify the location of the config file with the c switch, e.g.:

jhove -c C:\jhove\conf\jhove.conf

In this case JHOVE does run normally.

Problem with PDF annotation dictionaries

A file from the Open Planets Foundation format corpus, simple-annotated-in-adobe-x.pdf, is reported as well-formed but not valid, with the not very informative message "Invalid annotations." Setting breakpoints reveals that where an array is expected for the "Annots" array of annotation dictionaries, a keyword is being found instead. I can't immediately figure out why this is. Even if it's not in accordance with the spec, it's an Adobe-generated file.

Further comment from @gmcgath :

File from format corpus:
simple-annotated-in-adobe-x.pdf

and again from @gmcgath

A similar problem exists in the same file with the "Names" dictionary. This looks like an underlying feature of PDF that I've overlooked.

and again from @gmcgath 22-05-2013

I've posted a question at http://superuser.com/questions/589207/can-a-keyword-be-in-a-pdf-annots-array to see if anyone can explain what's going on. So far there have been no answers.

TIFF module should check for overlapping tag data

Dev Effort

1D

Description

The TIFF specification says: "No data should be referenced from more than one place.TIFF readers and editors are under no obligation to detect this condition and handle it properly. This would not be a problem if TIFF files were read-only entities, but they are not. This warning covers both TIFF field value offsets and fields that are defined as offsets, such as StripOffsets."

The TIFF module doesn't currently check this, and some TIFF files cheat on this point, e.g., by using the same data storage for X and Y resolution if they're the same. Since this is a violation of the spec with regard to file structure, this should really be checked. We have a request for this check.

Why is jhoveHome needed?

Dev Effort

0.5D investigation

Description

Before running jhove one needs to set jhoveHome in the configuration file. I don't really understand why this variable is needed, since the relative locations of the launcher scripts and JARs are always fixed. So I think all launch scripts / jars should be able to 'know' their dependencies without any user input (also for each re-install or update the config gets overwritten by the default values, which makes thing unnecessarily complex for a user).

PDF module error with TeX-created documents

User Chris Yocum reports:
Anyway, here is the output that I am getting. You can try this on any TeX generated document and it should give you the same results.

java.lang.ClassCastException:
edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary at
edu.harvard.hul.ois.jhove.module.PdfModule.readDocCatalogDict(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.process(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(Unknown Source)
at Jhove.main(Unknown Source)

Tomas Fischer 03-04-2013 :

I can confirm this bug, although the file is not TeX-generated, but from Acrobat Distiller. The file is attached. Here is my complete output:

Jhove (Rel. 1.9, 2012-12-17)
Date: 2013-03-04 13:59:26 CET
RepresentationInformation: b6c99639fc62e6a7430b78f6d8494931_http___www_bolagsverket_se_polopoly_fs_1_5530__Menu_general_column_content_file_p25_personinformation.pdf
ReportingModule: PDF-hul, Rel. 1.7 (2012-08-12)
 LastModified: 2013-01-04 12:22:13 CET
 Size: 80219
 Format: PDF
 Version: 1.6
 Status: Not well-formed
 SignatureMatches:
  PDF-hul
 ErrorMessage: Unexpected error in findFonts: java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary
  Offset: 1849
 MIMEtype: application/pdf
 PDFMetadata: 
  Objects: 0
  FreeObjects: 1
  IncrementalUpdates: 0
  DocumentCatalog: 
   PageLayout: SinglePage
   PageMode: UseNone
  Filters: 
   FilterPipeline: FlateDecode
  Fonts: 
   TrueType: 
    Font: 
     BaseFont: CBMFOF+Garamond
     FontSubset: true
     FirstChar: 32
     LastChar: 246
     FontDescriptor: 
      FontName: CBMFOF+Garamond
      Flags: Serif, Nonsymbolic
      FontBBox: -139, -307, 1063, 986
      FontFile2: true
     Encoding: WinAnsiEncoding
  XMP: <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26        ">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
     <rdf:Description rdf:about=""
           xmlns:dc="http://purl.org/dc/elements/1.1/">
        <dc:format>application/pdf</dc:format>
        <dc:creator>
           <rdf:Seq>
              <rdf:li>Bolagsverket</rdf:li>
           </rdf:Seq>
        </dc:creator>
        <dc:title>
           <rdf:Alt>
              <rdf:li xml:lang="x-default">Produktbeskrivning P25_Personinformation</rdf:li>
           </rdf:Alt>
        </dc:title>
     </rdf:Description>
     <rdf:Description rdf:about=""
           xmlns:xmp="http://ns.adobe.com/xap/1.0/">
        <xmp:CreateDate>2008-10-13T15:55:07+02:00</xmp:CreateDate>
        <xmp:CreatorTool>PScript5.dll Version 5.2.2</xmp:CreatorTool>
        <xmp:ModifyDate>2012-08-17T15:56:07+02:00</xmp:ModifyDate>
        <xmp:MetadataDate>2012-08-17T15:56:07+02:00</xmp:MetadataDate>
     </rdf:Description>
     <rdf:Description rdf:about=""
           xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
        <pdf:Producer>Acrobat Distiller 8.1.0 (Windows)</pdf:Producer>
     </rdf:Description>
     <rdf:Description rdf:about=""
           xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
        <xmpMM:DocumentID>uuid:c90d60fd-280e-4af3-bf14-87f96badb896</xmpMM:DocumentID>
        <xmpMM:InstanceID>uuid:dde7d516-b11d-4d86-be2a-5cc56c489a1d</xmpMM:InstanceID>
     </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
Pages: 
   Page: 
    Label: 1
   Page: 
    Label: 2
   Page: 
    Label: 3
   Page: 
    Label: 4
   Page: 
    Label: 5
   Page: 
    Label: 6
   Page: 
    Label: 7

b6c99639fc62e6a7430b78f6d8494931_http___www_bolagsverket_se_polopoly_fs_1_5530__Menu_general_column_content_file_p25_personinformation.pdf

@gmcgath replied

JHOVE is getting caught because it's seeing a keyword where it expects a font dictionary in a page node's resources. As far as I can tell from reading the spec, this is incorrect PDF. I've fixed it so that instead of throwing an exception it reports that it failed to see a font dictionary. This is in the checked-in PdfModule.java.
This seems to imply that many TeX-generated PDFs are broken. If there's something I've missed and a keyword object is valid in this context, please let me know. At least now the error message is more to the point, and there won't be a stack dump.

Thomas Fischer replied 05-06-2013:

The fix doesn't seem to cover all cases. I was able to create a PDF file using pdfLaTeX which recreates the crash in 1.10b2. The crash is triggered as soon as I include the MinionPro font (i.e. commenting the MinionPro package makes jHove run ok):
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[lf]{MinionPro}
\begin{document}
ABC
\end{document}
The output looks like this:

java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary at edu.harvard.hul.ois.jhove.module.PdfModule.readDocCatalogDict(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.process(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(Unknown Source)
at Jhove.main(Unknown Source)
Jhove (Rel. 1.9, 2013-05-28)
Date: 2013-06-05 10:08:04 CEST
RepresentationInformation: /tmp/test.pdf
ReportingModule: PDF-hul, Rel. 1.7 (2012-08-12)
LastModified: 2013-06-05 10:00:09 CEST
Size: 42554
Format: PDF
Status: Not well-formed
SignatureMatches:
PDF-hul
ErrorMessage: No document catalog dictionary
Offset: 0
MIMEtype: application/pdf

BTW, both the version from CVS and the tar-ball report version number 1.9 instead of 1.10b2 or something else.

@gmcgath replied:

Re Thomas Fischer: I'm not getting a crash, and it looks from the output you've posted as if JHOVE is in fact running to completion after writing out a stack dump. However, JHOVE isn't processing the file properly, or else it's broken and Acrobat is able to open it anyway. (This may hinge on fine points of what "broken" means.) I'm seeing that in trying to read the document catalog dictionary, JHOVE is instead getting a keyword of "rstChar". This is most likely a fragment of a "FirstChar" keyword.
There is legitimately a bug, but I'm afraid it will have to stay open for version 1.10. Hopefully I or someone else will find a fix for it later.

Denis Bitouzé 03-11-2013:

Hi,
is this bug still present in current version of JHOVE 1.11?
Best regards.

edu.harvard.hul.ois.jhove.ModuleBase: skipBytes() might not skip all requested bytes

For large Wavefiles (>100MB) it happens that not all bytes in the DataChunk are skipped as expected in the method:

public long skipBytes(DataInputStream stream, long bytesToSkip, ModuleBase counted)

this seems to be because the call:

long n = stream.skip(bytesToSkip);

Actually might skip fewer bytes than requested (This is also stated in the Java Documentation). If this occurs it will most probably cause the parsing of the Wave file to fail, since the pointer to the next chunk will be placed inside the DATA chunk.

To avoid this problem the "long n = stream.skip(bytesToSkip);" call could be placed inside a loop that continues until all the desired bytes are skipped, or no more bytes kan be skipped (ie n=0).

JHOVE reporting PDF as v1.3 and as ISO PDF/A-1, Level B

Siegfried reports the file to be PDF v1.3 and not pdf/a.

JHOVE output snippet:

Jhove (Rel. 1.12.48, 2016-05-12)
 Date: 2016-08-08 10:09:34 BST
 RepresentationInformation: c:\Users\pmay\Downloads\281474990846918.pdf
  ReportingModule: PDF-hul, Rel. 1.7 (2012-08-12)
  LastModified: 2016-08-05 14:49:21 BST
  Size: 205146
  Format: PDF
  Version: 1.3
  Status: Well-Formed, but not valid
  SignatureMatches:
   PDF-hul
  ErrorMessage: <snip...>
  MIMEtype: application/pdf
  Profile: ISO PDF/A-1, Level B
  PDFMetadata: <snip...>

ReleaseDetailsTest.java fails when compiling during standard time (NZST)

Likely because the time zone is explicitly set to daylight time here, as per, https://docs.oracle.com/javase/7/docs/api/java/util/TimeZone.html#getDisplayName(boolean,%20int) when true is called:

TimeZone.getDefault().getDisplayName(true, TimeZone.SHORT)

Affected code here:

assertEquals("ReleaseDetails [version=0.1.2-TESTER, buildDate=Sun Jul 31 00:00:00 " + TimeZone.getDefault().getDisplayName(true, TimeZone.SHORT) + " 2011]", instance.toString());

ICCProfiles in JPEG are not extracted

When a JPEG file embeds a ICCProfile in an APP2 data segment, this information doesn't appear in the associated mix information (NisoImageMetadata) : the information is supposed to be located in the IccProfile element.

Such an ICCProfile can be validated by intending to construct an java.awt.color.ICC_Profile with the getInstance() method in java.

Menus get lost when closing document window in GUI on OS X

If I "Open" a file with the GUI version of JHOVE version 12 beta and then close it again, bringing the focus back to the main window, the File, Edit, and Help menus disappear. They can be restored by bringing up "About JHOVE" from the JhoveView menu and then closing the resulting window. This was on OS X 10.10.5 (Yosemite).

After noticing this in 1.12 beta, I tried it with 1.9 and got the same result. It seems I should have noticed if it was happening all along. I tried it on Linux, and the problem doesn't occur there. I suspect it's something that's turned up in recent versions of OS X.

Example of using jhove within another java application

Hello,

I am working on incorporating jhove into another application, however there doesn't seem to be any documentation on how to do so. Could you please point me to some examples of how to validate the various module types in my own java application?

Thanks

Java exception Mac

Running jhove in command line on Mac gives the following error:

Exception in thread "main" java.lang.NoClassDefFoundError: JHOVE
Caused by: java.lang.ClassNotFoundException: JHOVE
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

Additions to JPEG2000 MIX Output

Dev Effort

1D

Description

I have two feature requests to JPEG2000 module's MIX output that I think would be useful additions:

  1. The default display resolution of the JPEG2000 file reported in the MIX output, preferably as dpi.
    jhove:property
    jhove:nameDefaultDisplayResolution/jhove:name
    <jhove:values arity="Array" type="Property">
    jhove:property
    jhove:nameHorizResolution/jhove:name
    <jhove:values arity="List" type="Property">
    jhove:property
    jhove:nameNumerator/jhove:name
    <jhove:values arity="Scalar" type="Integer">
    jhove:value3870/jhove:value
    /jhove:values
    /jhove:property
    jhove:property
    jhove:nameDenominator/jhove:name
    <jhove:values arity="Scalar" type="Integer">
    jhove:value32768/jhove:value
    /jhove:values
    /jhove:property
    jhove:property
    jhove:nameExponent/jhove:name
    <jhove:values arity="Scalar" type="Integer">
    jhove:value5/jhove:value
    /jhove:values
    /jhove:property
    /jhove:values
    /jhove:property
    jhove:property
    jhove:nameVertResolution/jhove:name
    <jhove:values arity="List" type="Property">
    jhove:property
    jhove:nameNumerator/jhove:name
    <jhove:values arity="Scalar" type="Integer">
    jhove:value3870/jhove:value
    /jhove:values
    /jhove:property
    jhove:property
    jhove:nameDenominator/jhove:name
    <jhove:values arity="Scalar" type="Integer">
    jhove:value32768/jhove:value
    /jhove:values
    /jhove:property
    jhove:property
    jhove:nameExponent/jhove:name
    <jhove:values arity="Scalar" type="Integer">
    jhove:value5/jhove:value
    /jhove:values
    /jhove:property
    /jhove:values
    /jhove:property
<mix:SpatialMetrics>
  <mix:samplingFrequencyUnit>in.</mix:samplingFrequencyUnit>
  <mix:xSamplingFrequency>
    <mix:numerator>300</mix:numerator>
    <mix:denominator>1</mix:denominator>
  </mix:xSamplingFrequency>
  <mix:ySamplingFrequency>
    <mix:numerator>300</mix:numerator>
    <mix:denominator>1</mix:denominator>
  </mix:ySamplingFrequency>
</mix:SpatialMetrics>
  1. MIX output of the used compression scheme Lossy / Lossless, like:
<mix:Compression>
  <mix:compressionScheme>JPEG 2000 Lossless</mix:compressionScheme>
</mix:Compression>

Runs out of java heap space while JHOVE process a specific pdf against tag profiles

Dev Effort

1D

Description

While JHOVE processes this pdf, http://www.fcla.edu/daitss-test/files/01471-213X-12-33-S2.pdf, it runs out of all JAVA heap space. Is there an infinite loop during tag profile checking?

./jhove -c conf/jhove.conf -m pdf-hul ~/Workspace/describe/01471-213X-12-33-S2.pdf 
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.(StringBuilder.java:68)
at edu.harvard.hul.ois.jhove.module.pdf.Tokenizer.getNext(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readArray(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readDictionary(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.getObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.resolveIndirectObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.isStructElem(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.buildSubtree(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.buildSubtree(Unknown Source)


at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.buildSubtree(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureTree.getChildren(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureTree.(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.TaggedProfile.satisfiesThisProfile(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.PdfProfile.satisfiesProfile(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.process(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(Unknown Source)

Stefan Hein - 29-04-2013:

I have the same problem with the following pdf file: http://docserv.uni-duesseldorf.de/servlets/DerivateServlet/Derivate-25614 Are there any updates to that issue?

@gmcgath 29-04-2013:

I'm working on this issue. One user showed that with huge amounts of patience and memory, at least some PDF files that appear to be in an infinite loop are completed after several hours. The StructureTree object can take a huge amount of memory for some files, but once it's build only a couple of flags that were set during its construction are checked. This suggests that the whole tree doesn't have to be in memory at once. I hope to have a fix that takes this into account before too long.
Thanks for the additional example. I'll use it in testing.

Stefan Hein 20-08-2016:

Unfortunately this reported bug is still existing in JHOVE 1.10.

Issues with JPEG2000 validation

Hello Folks,
sample test have shown that JHOVE cannot cope with certain JPEG2000-files.
If selecting the JPEG2000-module, the JHOVE GUI version will not show findings. For the JHOVE-library, the error is: java.io.EOFException in the code line: jb.process(app, module, handler, files.get(i).toString());

I have example files:
The [Jplyzer testfiles] https://github.com/openpreserve/jpylyzer-test-files/blob/master/bitwiser-icc-corrupted-tagcount-1951.jp2 do not work with JHOVE:

The jpeg2000 from the [google image testsuite] https://drive.google.com/file/d/0B9lJIDXo2oPYZlNnVnRKRFdwVDg/edit do work with JHOVE.

I have not yet found the difference betweent the two of them Jplyzer can cope with them all so far.
Best, Yvonne

Empty message body in the validation error

While processing this PDF with JHOVE, http://www.fcla.edu/daitss-test/files/Zheng_Liping_200512_PHD.pdf. An error occurs with a status of "Well-Formed, but not valid"

<size>3649429</size>
<format>PDF</format>
<version>1.5</version>
<status>Well-Formed, but not valid</status>
<sigMatch>
  <module>PDF-hul</module>
</sigMatch>
<messages>
  <message offset="2098097" severity="error"></message>
 <message offset="2098153" severity="error"></message>
</messages>

Shouldn't there be a message body indicating what is validation error?

@gmcgath 02-06-2013:

I've checked in a new version of PdfModule.java that fixes the problem. addDestination was failing to check whether it could safely get a page object number, and throwing a NullPointerException when it couldn't. The handler was assuming there would be a message string in the exception. I've fixed it on both of these points. For now it can be built with the updated source code; it should be fixed in JHOVE 1.10, whenever that happens.

jhove installer: more information in window of step 1

The step 1 info ("Please read the following information:") currently only includes the jhove logo and version number. Would be beneficial to have some more info here (such as github link - or a pointer towards the option to save the installation process into an auto-install script at the end).

Use of xsi:type in AES output

Dev Effort

1D

Description

Both the WAVE and AIFF modules embed audio metadata in AES format without providing a schema. One of the produced elements make use of xsi:type, <tcf:filmFraming tcf:framing="NOT_APPLICABLE" xsi:type="tcf:ntscFilmFramingType"/>.

Because JHOVE schema does not validate embedded xml (processContents="skip"), the use of xsi:type does not cause problem. However, METS & PREMIS schema will validate embedded xml if sufficient definition is available (processContents="lax").

When we import this element into PREMIS document, it is not valid because xsi:type references a Type Definition (http://www.w3.org/TR/xmlschema-1/#xsi_type), thus explicit assertion of type validation is attempted.

The type tcf:ntscFilmFramingType cannot be resolved and causes validation to fail.
Looking into aes.org, we cannot find a schema describing the element in the namespace: http://www.aes.org/tcf.

It appears the AES X098B schema is not publicly available yet (according to Gary).

JhoveView taking ten minutes to initialise...

We've spotted this in our environments here at Archives New Zealand. Out IT vendor in the larger department has also found the same issue after testing quite considerably.

Here is their description of the issue:

I managed to install this on my Win7 PC and I get the same result, takes approx. 9min 50secs every time??!
I first tried installing it under my username, then under C:\Temp and got the same results with both locations
I then tried 2 different versions of Java – 706071 and 802518, still with the same result.
I then downloaded a Java decompile tool and decompiled all of the class files (thousands of them!!) I trawled through all of the files that would be the obvious culprit but found nothing (I’m no expert on Java mind you..)
I found heaps of LOOPS within the code but nothing that stood out, I could not find any code relating to a TIMEOUT either, I was thinking there was a 9:50 timeout somewhere??
I also tried the same test on a separate PC with the same result.
SO… in short, I do not know what is causing this? Is this supposed to be running on a particular version of Java, Is this software supported at all? Do you know if there is another method of using it to bypass this? (via cmd window or batch file..?) Sorry this is the first time I’ve seen this application and I’ve tried everything which seems logical to resolve it.

Any help appreciated as JhoveView is a useful tool for teaching, and also getting results quickly.

Thanks,

Ross

PDF 1.7

The PDF module currently doesn't support PDF 1.7 / ISO 32000. It would be very desirable to update it to 1.7.

ICCProfiles in TIFF files are not extracted

When a TIFF file embeds a ICCProfile in an TIFFTAG_ICCPROFILE (code 34675), this information doesn't appear in the associated mix information (NisoImageMetadata) : the information is supposed to be located in the IccProfile element.

Such an ICCProfile can be validated by intending to construct an java.awt.color.ICC_Profile with the getInstance() method in java.

java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary

I'm getting a exception when running jhove on a PDF. This happens rarely.

jhove$ ./jhove -c ../cular/ingest/target/classes/jhove.conf ~/fulltext.pdf 
Sep 16, 2015 1:21:25 PM edu.harvard.hul.ois.jhove.JhoveBase init
SEVERE: Testing SEVERE level
java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary
at edu.harvard.hul.ois.jhove.module.PdfModule.readDocCatalogDict(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.process(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(Unknown Source)
at Jhove.main(Unknown Source)
Jhove (Rel. 1.11, 2013-09-29)
Date: 2015-09-16 13:21:26 EDT
RepresentationInformation: /users/bdc34/fulltext.pdf
ReportingModule: BYTESTREAM, Rel. 1.3 (2007-04-10)
LastModified: 2015-09-16 13:08:10 EDT
Size: 938845
Format: bytestream
Status: Well-Formed and valid
SignatureMatches:
PDF-hul
MIMEtype: application/octet-stream```

JhoveView: Markup Parsing Error: dynPolLoginRedirect.html

Dev Effort

1D

Description

Ubuntu 10.04.4 LTS
JHOVE 1.10

Running JhoveView I get an error, however it proceeds to start as expected. Command and

Trace below.

Command: java -jar JhoveView.jar

[Warning] jhove.conf:6:73: schema_reference.4: Failed to read schema document 'http://hul.harvard.edu/ois/xml/xsd/jhove/jhoveConfig.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
[Error] jhove.conf:6:73: cvc-elt.1: Cannot find the declaration of element 'jhoveConfig'.
[Fatal Error] dynPolLoginRedirect.html:1:3: The markup in the document preceding the root element must be well-formed.

PDF module - Indirect objects in image dictionary not handled

We found an issue with scanned technical drawings (large bitmaps in CCITT G4 format) where image width and height are indirect objects. Jhove do not handle this case but try to access them as SimpleObjects leading to a ClassCastException.
The fix is a few new lines i PdfModule.java.
Diff:

This patch file was generated by NetBeans IDE

It uses platform neutral UTF-8 encoding and \n newlines.

--- C:\usr\sw\jhove-1.11-original\classes\edu\harvard\hul\ois\jhove\module\PdfModule.java
+++ C:\usr\sw\jhove-1.11\classes\edu\harvard\hul\ois\jhove\module\PdfModule.java
@@ -1990,13 +1990,20 @@
imgList.add (new Property ("NisoImageMetadata",
PropertyType.NISOIMAGEMETADATA, niso));
niso.setMimeType("application/pdf");

  •                                PdfSimpleObject widObj = (PdfSimpleObject)
    
  •                                    xobdict.get ("Width");
    
  •                                PdfSimpleObject widObj = null;
    
  •                                PdfSimpleObject htObj = null;
    
  •                                if (xobdict.get("Width") instanceof PdfIndirectObj) {
    
  •                                    PdfIndirectObj io = (PdfIndirectObj)xobdict.get("Width");
    
  •                                    widObj = (PdfSimpleObject)resolveIndirectObject(io);
    
  •                                    io = (PdfIndirectObj)xobdict.get("Height");
    
  •                                    htObj = (PdfSimpleObject)resolveIndirectObject(io);
    
  •                                }
    
  •                                else {
    
  •                                    widObj = (PdfSimpleObject)xobdict.get ("Width");
    
  •                                    htObj = (PdfSimpleObject)xobdict.get ("Height");
    
  •                                }
                                 niso.setImageWidth(widObj.getIntValue ());
    
  •                                PdfSimpleObject htObj = (PdfSimpleObject)
    
  •                                    xobdict.get ("Height");
                                 niso.setImageLength(htObj.getIntValue ());
    
  •                              // Check for filters to add to the filter list
                                 Filter[] filters = ((PdfStream) xob).getFilters ();
                                 String filt = extractFilters (filters, (PdfStream) xob);
    

/Håkan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.