Giter Club home page Giter Club logo

image-type's Issues

Add support for HEIF formats

PCD - Kodak Photo CD

PCD - Kodak Photo CD

Extension: *.pcd

Magic bytes

The ASCII string "PCD_IPI" appears in the file, usually(?) at offset 2048.

Header contains PCD_IPI and PCD_OPA.

overview = compareBytes(pcdFile->header.signature,"PCD_OPA") == 0;

if ((compareBytes(pcdFile->ipiHeader.ipiSignature,"PCD_IPI") != 0) && !overview) {
    strncpy(errorString, "That is not a valid PCD file", kPCDMaxStringLength*3-1);
    return false;
}

Source:

Detect IIQ (Phase One, Mamiya) RAW images

IIQ RAW S and IIQ RAW L

IIQ RAW stands for Intelligent Image Quality RAW. It is an intelligent way of turning the full 16 bit image data captured by the camera into a compact RAW file format.

The IIQ Large RAW format is unique because it is completely lossless. IIQ RAW Large can be processed into a 16 bit TIFF, even though it is only half the size of a traditional RAW file.

The IIQ Small RAW format is based on the full 16 bit data that is captured by the digital back’s CCD. However, unlike IIQ RAW Large, it is not 100% lossless. Most users will not notice any quality difference between the two file formats especially if the IIQ RAW Small format capture is well exposed and set on a low ISO rating.

Extension: iiq or tiff

Identification

Since IIQ Raw is structurally identical to a TIFF, magic-based identification tools like Unix file and Apache Tika identify these files as image/tiff. Exiftool is able to correctly identify the format:

Detect ERF (Epson)

Epson RAW Format

This format was created to save images in RAW format that were created with Epson digital cameras. The image will be saved in uncompressed form and it can be printed directly to Epson printers. This format can be edited with the Epson PhotoRAW software or via plug-ins in other image editing programs.

ERF files are generated by the Epson R-D1 and R-D1s digital rangefinder cameras.

Identification

tag=0x010e/ 270 "EPSON DSC Picture."
tag=0x010f/ 271 "SEIKO EPSON CORP.."

Hard to identify in a safe way. This format is very rare and will not be supported.

Detect MOS (Leaf)

Leaf MOS (or Aptus MOS, Mamiya MOS, etc.) is a raw image format used by some Mamiya/Leaf/Aptus digital cameras. It is sometimes considered, along with Mamiya MEF, to be a member of the family of formats called Mamiya RAW.

Format

MOS is based on TIFF. The full-size image is in the first IFD, and uses Lossless JPEG compression.

MOS apparently uses a custom metadata format (stored in TIFF tag 34310), which uses the ASCII signature "PKTS".

Identification

The first (and only?) TIFF IFD uses Compression type 99, and contains tag 34310.

This format will not be supported.

PFM - Portable Float Map (HDR)

Portable Float Map (.pfm): The HDR format with 32 Bit is similar to RAW a simple image, therefore very compatible, but also memory consuming (12 MB for 1 MP).

PFM (Portable Float Map) is an unofficial extension to the pbm image format collection that supports HDR imaging, namely a floating point value per r,g,b [or a single floating point value for grey scale HDR images].

MS-WMF: Windows Metafile Format

MS-WMF: Windows Metafile Format

A Windows metafile is a container for an image, which is defined by series of variable-length records, called metafile records.

Type: Vector

Magic bytes

Most Microsoft Windows applications that create metafiles prepend a 22-byte header to the file.

Magic number always: 9AC6CDD7h

Enhance parseType method

As title, I found that the ImageTypeDetector::parseType method is very complicated for detecting different image formats.

The image formats have different approach about reading byte lengths.

To enhance parseType method, my following suggestions are as follows:

  • Split different methods from parseType methods to detect different images.
  • Read constraint bytes can detect every different image formats because we can make sure that the magic bytes should be included in these specific const bytes.
    For example, we can read 30 bytes firstly then using substr to check the magic bytes are located in current indexes currently.

What do you think about this, @odan?

Throw exception when it's unrecognized image

I found that ImageTypeDetector class will return null if the image cannot be recognized.

I think we throw exception and message is Unrecognized image file: /path/to/image/file.

This can let users know that image is invalid or broken.

And it can also let us know whether image is recognized correctly.

Detect ARW, SRF, SR2 (Sony)

Consider closing file handler after using SplFileObject and SplFileInfo

As title, it seems that the SplFileObject and SplFileInfo will let them be null after using them are done.

It seems that the ImageTypeDetector class uses the unset to release the SplFileObject variable.

But the SplFileInfo variable does not in ImageTypeDetectorTest class.

BTW, using the unset or assigning null is same as closing file handler.

Which one we prefer using in this repository?

Detect MEF (Mamiya)

Mamiya MEF is a raw camera image format (or family of formats?) used by some Mamiya digital cameras.

It is sometimes known as Mamiya RAW, though this can also be a collective name for both MEF and Leaf MOS.

Format
At least some MEF files use DNG format, or something similar to DNG. The first IFD (refer to TIFF) contains a thumbnail image, and is expected to have three sub-IFDs. The first sub-IFD contains the full-resolution image, and the others contain reduced-resolution images.

There is also an Exif IFD. It includes a MakerNote with no signature, which appears to be an integrated or embedded IFD.

No clear identification possible.

PSB (Photoshop Large Document)

PSB (Large Document Format)

PSB is a file extension for an image file used by Adobe Photoshop. PSB files are a large document format similar to a PSD file but for a larger image size.

The Large Document Format (8BPB/PSB) supports documents up to 300,000 pixels in any dimension. All Photoshop features, such as layers, effects, and filters, are supported by the PSB format. The PSB format is identical to the Photoshop native format in many ways.

Identification

The file header contains the basic properties of the image.

Length Description
4 Signature: always equal to '8BPS' . Do not try to read the file if the signature does not match this value.
2 Version: always equal to 1. PSB version is 2.

Source: https://www.adobe.com/devnet-apps/photoshop/fileformatashtml/#50577409_pgfId-1057388

Detect HDR images

HDR: High Dynamic Range Image

HDR formats

  • Cineon and DPX (.cin,.dpx): Actually developed for film scans, like RAW uncompressed digital image, therefore very memory intensive
  • Portable Float Map (.pfm,.pbm): The HDR format with 32 Bit is similar to RAW a simple image, therefore very compatible, but also memory consuming (12 MB for 1 MP)
  • Floating Point Tiff (*.tiff): Very flexible, up to 32 bit color depth
  • Radiance (*.hdr, *.pic): The first HDR format (1987) has a large dynamic range, but also weaknesses in color resolution; due to its broad user base, it is ideal for exchange
  • OpenEXR (*.exr): Very high color resolution, efficient compression, sufficient dynamic range
  • Jpeg HDR: Is regarded as tomorrow's format, but is not yet very widespread due to its licensing procedure

DICOM

Digital Imaging and Communications in Medicine (DICOM)

DICOM is the standard for the communication and management of medical imaging information and related data.

Magic numbers: (.{128}DICM|\0[\x02\x04\x06\x08]\0[\0-\x20]|[\x02\x04\x06\x08]\0[\0-\x20]\0)

PDN - Paint.NET file format

The ".pdn" format does not have a binary specification, it consists of Paint.NET's internal object state saved to a file using .NET serialization.

Magic bytes

The first 4 bytes: PDN3

Source:

magicStr = fh.read(4).decode('ascii')

if magicStr != 'PDN3':
    raise PDNReaderError('Invalid magic string for PDN file: %s' % magicStr)

    headerSizeStr = fh.read(3) + b'\x00'
    if len(headerSizeStr) != 4:
        raise PDNReaderError('Unable to read header size. File may be corrupted.')

Detect DNG (Digital Negative)

EMF - Windows Enhanced Metafile

EMF - Windows Enhanced Metafile

http://fileformats.archiveteam.org/wiki/Enhanced_Metafile

Enhanced Metafile (EMF) is a vector graphics format native to 32-bit versions of Microsoft Windows. It is the successor to Windows Metafile (WMF).

There is an extension of the format, named Enhanced Metafile Format Plus Extensions (EMF+).

The .emz filename extension is reportedly used for gzip-compressed EMF files.

Windows Enhanced Metafile (EMF) is a graphics format from Microsoft and a successor to Windows Metafile. It extends the arbitrarily scalable vector graphics with the possibility to use raster graphics as filling. In contrast to its predecessor, EMF offers the possibility to work with Bézier curves and can therefore also be used for more complex graphics with curves. EMF can be used as a file format for the exchange of vector data between illustration programs and MS Office applications.

While WMF is a 16-bit format, EMF is a 32-bit format.

Identification

EMF files begin with bytes 01 00 00 00 (representing record type EMR_HEADER), and have ASCII " EMF" (with the leading space) at file offset 40.

EMF+ files are EMF files with the following characteristics. Let n be the 32-bit integer at offset 4. At offset n is the 32-bit integer 0x00000046 (representing record type EMR_COMMENT). At offset n+12 is the ASCII string "EMF+".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.