Giter Club home page Giter Club logo

Comments (5)

MitchellBlack avatar MitchellBlack commented on June 4, 2024 1

@DamienIrving scrap the above, a more elegant solution is:

# Create a sample image
import pylab as plt
import numpy as np
X = np.random.random((50,50))
plt.imshow(X)
plt.savefig('test.pdf',metadata={'Title':'Data provenance here'}) 
plt.savefig('test.png',metadata={'History':'Data provenance here'})

For pdf images the standard keys are 'Title', 'Author', 'Subject', 'Keywords', 'Creator', 'Producer', 'CreationDate', 'ModDate', and 'Trapped'. See metadata of PdfPages

For png images the keys must be shorter than 79 chars. See metadata of print_png

from python-aos-lesson.

MitchellBlack avatar MitchellBlack commented on June 4, 2024 1

Another option is to simply open the image using vim (I did this to check that my suggestions worked). The 'History/Title' shows up among the binary jargon. However, this certainly isn't as 'clean' as what you have proposed with exiftool.

from python-aos-lesson.

hot007 avatar hot007 commented on June 4, 2024

Nice!

from python-aos-lesson.

DamienIrving avatar DamienIrving commented on June 4, 2024

Thanks, @MitchellBlack. This is great.

I think for the provenance lesson we'd only need to cover .png, .pdf and perhaps .svg (the latter accepts the Title keyword too).

The second part of the problem is viewing the metadata once we've added it to an image. Using Python is a bit messy, because you basically need a different library for each file format (in the following examples I've created a rainfall image file for various file formats with Log of command line entries... entered into the metadata):

from PIL import Image

image = Image.open('rainfall.png')
image.load()
print(image.info)
{'Software': 'Matplotlib version3.3.3, https://matplotlib.org/', 'History': 'Log of command line entries...', 'dpi': (72, 72)}
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument

fp = open('rainfall.pdf', 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)

print(doc.info)
[{'CreationDate': b"D:20210206134032+11'00'", 'Creator': b'Matplotlib v3.3.3, https://matplotlib.org', 'Producer': b'Matplotlib pdf backend v3.3.3', 'Title': b'Log of command line entries...'}]

It's probably easier to use a command line program that can handle many different image formats instead. Lots of people online suggest the identify command line tool that comes with ImageMagick, but I found it hard to install on my Mac (the conda recipies for it didn't work, I didn't want to install brew, my laptop thought it was malware, etc). A more straightforward alternative is exiftool, for which the conda recipes work great. It works for lots of different image formats, e.g:

$ conda install exiftool
$ exiftool rainfall.png 
ExifTool Version Number         : 11.99
File Name                       : rainfall.png
Directory                       : .
File Size                       : 75 kB
File Modification Date/Time     : 2021:02:06 08:55:30+11:00
File Access Date/Time           : 2021:02:06 08:55:32+11:00
File Inode Change Date/Time     : 2021:02:06 08:55:30+11:00
File Permissions                : rw-r--r--
File Type                       : PNG
File Type Extension             : png
MIME Type                       : image/png
Image Width                     : 864
Image Height                    : 360
Bit Depth                       : 8
Color Type                      : RGB with Alpha
Compression                     : Deflate/Inflate
Filter                          : Adaptive
Interlace                       : Noninterlaced
Software                        : Matplotlib version3.3.3, https://matplotlib.org/
History                         : Log of command line entries...
Pixels Per Unit X               : 2835
Pixels Per Unit Y               : 2835
Pixel Units                     : meters
Image Size                      : 864x360
Megapixels                      : 0.311
$ exiftool rainfall.svg
ExifTool Version Number         : 11.99
File Name                       : rainfall.svg
Directory                       : .
File Size                       : 637 kB
File Modification Date/Time     : 2021:02:06 14:29:11+11:00
File Access Date/Time           : 2021:02:06 14:29:12+11:00
File Inode Change Date/Time     : 2021:02:06 14:29:11+11:00
File Permissions                : rw-r--r--
File Type                       : SVG
File Type Extension             : svg
MIME Type                       : image/svg+xml
Image Height                    : 360pt
SVG Version                     : 1.1
View Box                        : 0 0 864 360
Image Width                     : 864pt
Xmlns                           : http://www.w3.org/2000/svg
Title                           : Log of command line entries...
Work Type                       : http://purl.org/dc/dcmitype/StillImage
Work Title                      : Log of command line entries...
Work Date                       : 2021:02:06 14:29:10.697468
Work Format                     : image/svg+xml
Work Creator Agent Title        : Matplotlib v3.3.3, https://matplotlib.org/
$ exiftool rainfall.pdf
ExifTool Version Number         : 11.99
File Name                       : rainfall.pdf
Directory                       : .
File Size                       : 200 kB
File Modification Date/Time     : 2021:02:06 13:40:33+11:00
File Access Date/Time           : 2021:02:06 13:45:43+11:00
File Inode Change Date/Time     : 2021:02:06 13:40:33+11:00
File Permissions                : rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : No
Create Date                     : 2021:02:06 13:40:32+11:00
Creator                         : Matplotlib v3.3.3, https://matplotlib.org
Producer                        : Matplotlib pdf backend v3.3.3
Title                           : Log of command line entries...
Page Count                      : 1

... or with a bit of cleaning:

$ exiftool rainfall.pdf | grep '^Title*' | cut -f2 -d ":"
 Log of command line entries...

from python-aos-lesson.

DamienIrving avatar DamienIrving commented on June 4, 2024

I've had a go at updating the Data Provenance lesson so that the command log is written to the output PNG metadata:
https://carpentrieslab.github.io/python-aos-lesson/09-provenance/index.html

The old lesson was also a bit confusing because cmdprov.new_log would return the command that was executed in the background in order to launch the Jupyter notebook, so I've changed the lesson to avoid that confusion.

from python-aos-lesson.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.