Comments (5)
@DamienIrving scrap the above, a more elegant solution is:
# Create a sample image
import pylab as plt
import numpy as np
X = np.random.random((50,50))
plt.imshow(X)
plt.savefig('test.pdf',metadata={'Title':'Data provenance here'})
plt.savefig('test.png',metadata={'History':'Data provenance here'})
For pdf images the standard keys are 'Title', 'Author', 'Subject', 'Keywords', 'Creator', 'Producer', 'CreationDate', 'ModDate', and 'Trapped'. See metadata
of PdfPages
For png images the keys must be shorter than 79 chars. See metadata
of print_png
from python-aos-lesson.
Another option is to simply open the image using vim (I did this to check that my suggestions worked). The 'History/Title' shows up among the binary jargon. However, this certainly isn't as 'clean' as what you have proposed with exiftool.
from python-aos-lesson.
Nice!
from python-aos-lesson.
Thanks, @MitchellBlack. This is great.
I think for the provenance lesson we'd only need to cover .png, .pdf and perhaps .svg (the latter accepts the Title
keyword too).
The second part of the problem is viewing the metadata once we've added it to an image. Using Python is a bit messy, because you basically need a different library for each file format (in the following examples I've created a rainfall
image file for various file formats with Log of command line entries...
entered into the metadata):
from PIL import Image
image = Image.open('rainfall.png')
image.load()
print(image.info)
{'Software': 'Matplotlib version3.3.3, https://matplotlib.org/', 'History': 'Log of command line entries...', 'dpi': (72, 72)}
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
fp = open('rainfall.pdf', 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)
print(doc.info)
[{'CreationDate': b"D:20210206134032+11'00'", 'Creator': b'Matplotlib v3.3.3, https://matplotlib.org', 'Producer': b'Matplotlib pdf backend v3.3.3', 'Title': b'Log of command line entries...'}]
It's probably easier to use a command line program that can handle many different image formats instead. Lots of people online suggest the identify
command line tool that comes with ImageMagick, but I found it hard to install on my Mac (the conda recipies for it didn't work, I didn't want to install brew, my laptop thought it was malware, etc). A more straightforward alternative is exiftool
, for which the conda recipes work great. It works for lots of different image formats, e.g:
$ conda install exiftool
$ exiftool rainfall.png
ExifTool Version Number : 11.99
File Name : rainfall.png
Directory : .
File Size : 75 kB
File Modification Date/Time : 2021:02:06 08:55:30+11:00
File Access Date/Time : 2021:02:06 08:55:32+11:00
File Inode Change Date/Time : 2021:02:06 08:55:30+11:00
File Permissions : rw-r--r--
File Type : PNG
File Type Extension : png
MIME Type : image/png
Image Width : 864
Image Height : 360
Bit Depth : 8
Color Type : RGB with Alpha
Compression : Deflate/Inflate
Filter : Adaptive
Interlace : Noninterlaced
Software : Matplotlib version3.3.3, https://matplotlib.org/
History : Log of command line entries...
Pixels Per Unit X : 2835
Pixels Per Unit Y : 2835
Pixel Units : meters
Image Size : 864x360
Megapixels : 0.311
$ exiftool rainfall.svg
ExifTool Version Number : 11.99
File Name : rainfall.svg
Directory : .
File Size : 637 kB
File Modification Date/Time : 2021:02:06 14:29:11+11:00
File Access Date/Time : 2021:02:06 14:29:12+11:00
File Inode Change Date/Time : 2021:02:06 14:29:11+11:00
File Permissions : rw-r--r--
File Type : SVG
File Type Extension : svg
MIME Type : image/svg+xml
Image Height : 360pt
SVG Version : 1.1
View Box : 0 0 864 360
Image Width : 864pt
Xmlns : http://www.w3.org/2000/svg
Title : Log of command line entries...
Work Type : http://purl.org/dc/dcmitype/StillImage
Work Title : Log of command line entries...
Work Date : 2021:02:06 14:29:10.697468
Work Format : image/svg+xml
Work Creator Agent Title : Matplotlib v3.3.3, https://matplotlib.org/
$ exiftool rainfall.pdf
ExifTool Version Number : 11.99
File Name : rainfall.pdf
Directory : .
File Size : 200 kB
File Modification Date/Time : 2021:02:06 13:40:33+11:00
File Access Date/Time : 2021:02:06 13:45:43+11:00
File Inode Change Date/Time : 2021:02:06 13:40:33+11:00
File Permissions : rw-r--r--
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.4
Linearized : No
Create Date : 2021:02:06 13:40:32+11:00
Creator : Matplotlib v3.3.3, https://matplotlib.org
Producer : Matplotlib pdf backend v3.3.3
Title : Log of command line entries...
Page Count : 1
... or with a bit of cleaning:
$ exiftool rainfall.pdf | grep '^Title*' | cut -f2 -d ":"
Log of command line entries...
from python-aos-lesson.
I've had a go at updating the Data Provenance lesson so that the command log is written to the output PNG metadata:
https://carpentrieslab.github.io/python-aos-lesson/09-provenance/index.html
The old lesson was also a bit confusing because cmdprov.new_log
would return the command that was executed in the background in order to launch the Jupyter notebook, so I've changed the lesson to avoid that confusion.
from python-aos-lesson.
Related Issues (20)
- Import lesson template's scripts for format checking and rendering HOT 7
- Update 06-github HOT 1
- Finish large data lesson HOT 1
- JupyterLab
- Reconsider asserts HOT 5
- Pangeo Binder as a backup
- Use shorter file names?
- Add example of zoomed in lat/lon in addition to the global plot
- Add a map_blocks example
- Add content on Dask task graph and debugging HOT 1
- Create a synthetic large dataset?
- Other options for parallel processing
- New EOS book on Earth Observation Using Python: A Practical Programming Guide HOT 2
- Dead link in the Large Data section HOT 1
- xarray.compute() should return an xarray instance
- conda-forge channel needs full path or can't be added HOT 3
- Expand the vectorisation lesson to "xarray thinking" HOT 1
- Helper script references non-existent data file
- Capturing small changes HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-aos-lesson.