yaledhlab / pix-plot Goto Github PK

A WebGL viewer for UMAP or TSNE-clustered images

License: MIT License

JavaScript 55.99% HTML 10.83% Python 21.16% CSS 12.01%

webgl web-app visual-culture machine-vision data-visualization

pix-plot's Introduction

PixPlot

This repository contains code that can be used to visualize tens of thousands of images in a two-dimensional projection within which similar images are clustered together. The image analysis uses Tensorflow's Inception bindings, and the visualization layer uses a custom WebGL viewer.

See the change log for recent updates.

Installation & Dependencies

We maintain several platform-specific installation cookbooks online.

Broadly speaking, to install the Python dependencies, we recommend you install Anaconda and then create a conda environment with a Python 3.7 runtime:

conda create --name=3.7 python=3.7
source activate 3.7

Then you can install the dependencies by running:

bash
pip install https://github.com/yaledhlab/pix-plot/archive/master.zip

The website that PixPlot eventually creates requires a WebGL-enabled browser.

Quickstart

If you have a WebGL-enabled browser and a directory full of images to process, you can prepare the data for the viewer by installing the dependencies above then running:

pixplot --images "path/to/images/*.jpg"

To see the results of this process, you can start a web server by running:

# for python 3.x
python -m http.server 5000

# for python 2.x
python -m SimpleHTTPServer 5000

The visualization will then be available at http://localhost:5000/output.

Sample Data

To acquire some sample data with which to build a plot, feel free to use some data prepared by Yale's DHLab:

pip install image_datasets

Then in a Python script:

import image_datasets
image_datasets.oslomini.download()

The .download() command will make a directory named datasets in your current working directory. That datasets directory will contain a subdirectory named 'oslomini', which contains a directory of images and another directory with a CSV file of image metadata. Using that data, we can next build a plot:

pixplot --images "datasets/oslomini/images/*" --metadata "datasets/oslomini/metadata/metadata.csv"

Creating Massive Plots

If you need to plot more than 100,000 images but don't have an expensive graphics card with which to visualize huge WebGL displays, you might want to specify a smaller "cell_size" parameter when building your plot. The "cell_size" argument controls how large each image is in the atlas files; smaller values require fewer textures to be rendered, which decreases the GPU RAM required to view a plot:

pixplot --images "path/to/images/*.jpg" --cell_size 10

Controlling UMAP Layout

The UMAP algorithm is particularly sensitive to three hyperparemeters:

--min_dist: determines the minimum distance between points in the embedding
--n_neighbors: determines the tradeoff between local and global clusters
--metric: determines the distance metric to use when positioning points

UMAP's creator, Leland McInnes, has written up a helpful overview of these hyperparameters. To specify the value for one or more of these hyperparameters when building a plot, one may use the flags above, e.g.:

pixplot --images "path/to/images/*.jpg" --n_neighbors 2

Curating Automatic Hotspots

If installed and available, PixPlot uses Hierarchical density-based spatial clustering of applications with noise, a refinement of the earlier DBSCAN algorithm, to find hotspots in the visualization. You may be interested in consulting this explanation of how HDBSCAN works.

Tip: If you are using HDBSCAN and find that PixPlot creates too few (or only one) 'automatic hotspots', try lowering the --min_cluster_size from its default of 20. This often happens with smaller datasets (less than a few thousand.)

If HDBSCAN is not available, PixPlot will fall back to scikit-learn's implementation of KMeans.

Adding Metadata

If you have metadata associated with each of your images, you can pass in that metadata when running the data processing script. Doing so will allow the PixPlot viewer to display the metadata associated with an image when a user clicks on that image.

To specify the metadata for your image collection, you can add --metadata=path/to/metadata.csv to the command you use to call the processing script. For example, you might specify:

pixplot --images "path/to/images/*.jpg" --metadata "path/to/metadata.csv"

Metadata should be in a comma-separated value file, should contain one row for each input image, and should contain headers specifying the column order. Here is a sample metadata file:

filename	category	tags	description	permalink	Year
bees.jpg	yellow	a\|b\|c	bees' knees	https://...	1776
cats.jpg	dangerous	b\|c\|d	cats' pajamas	https://...	1972

The following column labels are accepted:

Column	Description
filename	the filename of the image
category	a categorical label for the image
tags	a pipe-delimited list of categorical tags for the image
description	a plaintext description of the image's contents
permalink	a link to the image hosted on another domain
year	a year timestamp for the image (should be an integer)
label	a categorical label used for supervised UMAP projection
lat	the latitudinal position of the image
lng	the longitudinal position of the image

IIIF Images

If you would like to process images that are hosted on a IIIF server, you can specify a newline-delimited list of IIIF image manifests as the --images argument. For example, the following could be saved as manifest.txt:

https://manifests.britishart.yale.edu/manifest/40005
https://manifests.britishart.yale.edu/manifest/40006
https://manifests.britishart.yale.edu/manifest/40007
https://manifests.britishart.yale.edu/manifest/40008
https://manifests.britishart.yale.edu/manifest/40009

One could then specify these images as input by running pixplot --images manifest.txt --n_clusters 2

Demonstrations (Developed with PixPlot 2.0 codebase)

Link	Image Count	Collection Info	Browse Images	Download for PixPlot
NewsPlot: 1910-1912	24,026	George Grantham Bain Collection	News in the 1910s	Images, Metadata
Bildefelt i Oslo	31,097	oslobilder	Advanced search, 1860-1924	Images, Metadata

Acknowledgements

The DHLab would like to thank Cyril Diagne and Nicolas Barradeau, lead developers of the spectacular Google Arts Experiments TSNE viewer, for generously sharing ideas on optimization techniques used in this viewer, and Lillianna Marie for naming this viewer PixPlot.

pix-plot's People

Stargazers

Watchers

Forkers

hkcaesar cometyang shafiahmed magfoto tokee wegiangb randomthoughts2018 lixuanxian simarsingh24 truongthanhdat nanne gothicfox ajinkyapuar esmaeilinia dailyactie shubhampachori12110095 akuhnregnier broadwell rameshoswal bonetrade ievanavikiene estampa songminghu2004 vcchy evankaras kirschbombe erikradisch passau-centre-for-ehumanities elrayle diegosiqueir4 mdlincoln lluisgomez dainst shkr dl002 xmacex daniopitz morad1998 kesavadas mikeobr siayou codeaudit phymucs luffy-yu themoep i5anoff bado-lee cogapplabs dmreagan herlai toddberreth wangtyonly irisbox moushuai melnimr zlapp safeisnotanoption random-walks scipunimelb asears bchadha1 sitems fr1ll yezi-lee invisiblearchitects radomskia arbaba sophie7 beeblook junqi-li kruus janysunny alix-tz mannu-arneja albert-92 joseed-europeana dastjead albertyzhang amiragamalyassin ed-fish mariabarkouzou eugene2candy qs-wang philippemoussalli njucly technologyarts ahva-ubc yeeyangtee copes3d matthewrberning archaeoklammt yangsuhui noostrata libevm laurensvoerman jayasheel rbarreto michielree alice98nnj mhleal

pix-plot's Issues

Can't visualize images

When running python -m http.server 5000 in the pix-plot folder I encounter such things:

Serving HTTP on 0.0.0.0 port 5000 ... 127.0.0.1 - - [12/Oct/2018 13:10:50] code 404, message File not found 127.0.0.1 - - [12/Oct/2018 13:10:50] "GET /output/plot_data.json HTTP/1.1" 404 - 127.0.0.1 - - [12/Oct/2018 13:13:25] "GET / HTTP/1.1" 200 - 127.0.0.1 - - [12/Oct/2018 13:13:26] code 404, message File not found 127.0.0.1 - - [12/Oct/2018 13:13:26] "GET /output/plot_data.json HTTP/1.1" 404 -

And on the 'localhost:5000' page in my browser I have:

What am I doing wrong?

3D image location

Hi,
Thanks for this fantastic code.

I was wondering, is it possible to change the number of dimensions in the UMAP reduction to 3, and then visualise the 3D distribution instead of the 2D plane with parallax?

Cheers

Connect from a client computer

I install and run pix-plot on a computing server, but it doesn't support WebGL. How can I run it on the server (the computing part) and stream the result/viewer to the client computer I'm using?
Thank you

File names with multiple dots fails

Having a file named `double.dot.jpg' as part of the images throws an exception when creating atlas:

montage: unable to open image `output/thumbs/32px/double.jpg': No such file or directory @ error/blob.c/OpenBlob/2712.

Note how the .dot-part has been removed.

I guess it is because of line 243 os.path.basename(img).split('.')[0]. StackOverflow suggests to use splitext instead.

Upgrade greyscale images to 3-channel RGB?

Some digitized collections of images are in 8-bit greyscale (0-255, 1 channel) jpeg format. The libraries we use tend to flail with a "unexpected data shape" error in this case, since they expect 3-channel RGB. I suspect many folks won't know how to fix the problem, due to the obtuse TF error message. I believe PIL can non-destructively convert to RGB (in the sense of it won't harm existing color images) with img.convert("RGB") or similar...

Such conversions just duplicate the single grey ramp into all 3 color channels, so there's no information loss or judgement call required...

Use indexed buffer geometry with custom attributes

to minimize activity in each draw call and optimize fps

unrecognized arguments

Loading the directory of images that worked on the master branch does not work with experimental. It throws the error "unrecognized arguments."

Rendered image size

Thanks for the great application.
I have a quick question.
Is it possible to adjust image size of each data point?
Currently Im rendering about only 200 data points with very small resolutions. Distances for each data points are quite far apart so there are mostly black spaces in the screen.
Id really appreciate your help.

Question: absolute limit of images that can be plotted?

Hi all,

Just a quick question, wanted to ask what the maximum number of images that can be plotted on pix-plot. I know the examples show around 27,000 images being plotted but what would be the absolute maximum? I have a dataset consisting around 240,000 images that are 64x64pixels and was wondering if it would be at all possible to plot maybe half this dataset.

Thanks.

Making selective images hidden or invisible

I was wondering if you have any advice or if there is anyway to allow images to be visible and invisible in pix-plot. I have set up my images with dates as their names, and what I want to do this after a button is clicked I want to make ever year, expect the year the button represents, invisible. I have been working on this for awhile but I have yet to find a solution. Any help would be very much appreciated.

pix-plot: unknown command line flag

I am trying to make my own pix-plot however when I try to run this line;

python utils\process_images.py "C:\Users\oakie\Pictures\*.jpg"

It errors and I get this as the output;

Traceback (most recent call last):
File "utils\process_images.py", line 22, in
FLAGS.model_dir = '/tmp/imagenet'
File "C:\Users\oakie\AppData\Local\conda\conda\envs\2018Summer\lib\site-packages\tensorflow\python\platform\flags.py", line 88, in setattr
return self.dict['__wrapped'].setattr(name, value)
File "C:\Users\oakie\AppData\Local\conda\conda\envs\2018Summer\lib\site-packages\absl\flags_flagvalues.py", line 496, in setattr
return self._set_unknown_flag(name, value)
File "C:\Users\oakie\AppData\Local\conda\conda\envs\2018Summer\lib\site-packages\absl\flags_flagvalues.py", line 374, in _set_unknown_flag
raise _exceptions.UnrecognizedFlagError(name, value)
absl.flags._exceptions.UnrecognizedFlagError: Unknown command line flag 'model_dir'

Do you know what is going on, any suggestions. I am using Windows OS and Anaconda prompt. I have everything I need installed and everything seems to be working expect when I try to actually run pix-plot

Install error: Could not find tensorflow version

pip install pixplot

gave me this:

ERROR: Could not find a version that satisfies the requirement tensorflow<=2.0.0,>=1.14.0 (from pixplot) (from versions: 2.2.0rc1, 2.2.0rc2)
ERROR: No matching distribution found for tensorflow<=2.0.0,>=1.14.0 (from pixplot)

As a workaround I switched to python 3.6 (from 3.8) and the install worked.

Clustering is calculated, but discarded

For the visualization, the utils/process_images.py produces plot_data.json file. In the program, the write_json() function calls get_centroids() which calculates a k-means clustering in the reduced space for the purposes of selecting an image to "summarise" each of the clusters, as "hotspots". However, the clustering is then discarded.

It might, or it might not, be nice to keep the clustering results of k-means and store it for further use.

Processing images failing on experimental

I'm getting AttributeError: module 'tensorflow_core._api.v2.config' has no attribute 'experimental_list_devices'. A keras issue?

Is it possible to maintain relative size among image thumbnails?

Hello again :)
I have a question regarding if it's possible to show images concerning their relative sizes.
I'm experimenting with the experimental branch.

For example, the actual height of the above three original images is the same but, since get_atlas_data resizes every image with resize_to_height, they are all normalized to their max height or width.
I've tried testing with square_cells option with "true" but it's resulting weirdly with images patched on one another. It would be really helpful if you can give me some comments or ideas.
Thank you in advance!!

Clustering results are discarded - new release

HI all, after the new release of pixplot 17 days ago I was wondering how the display the number of images / images names in a cluster.
The code from posts in the past (#65, #72) do not work anymore. Thanks again for this!

Would be much appreciated!

Cheers,
Andi

Experimental: kdtree.py Troublesome data array

The #89 problem fixed and images successfully vectorized, I am getting a troublesome data array and when attempting to serve, a manifest.json file is missing.

Count images in a cluster

Question: Is it possible to get the number of images in each cluster?
Probably this is a dumb question since I have no background in image processing or machine learning.

ImageMagick issues on PCs

When running Pix-Plot on a PC, I'm getting this error:

'identify' is not recognized as an internal or external command,
operable program or batch file.
'identify' is not recognized as an internal or external command,
operable program or batch file.
'identify' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
File "utils/process_images.py", line 438, in
clusters=args.clusters, validate_files=args.validate_files)
File "utils/process_images.py", line 62, in init
self.validate_inputs(validate_files)
File "utils/process_images.py", line 98, in validate_inputs
print(message)
File "C:\Python35\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0301' in position 1397: character maps to

Idea: metadata on hover

Chatting with @broadwell and we thought of something slightly different than the "deep link on click" use case: a hover state where a caption shows when you're mousing over each image. This might lend some context without a complete jump to a different page. (Wouldn't replace deep-link functionality.)

Presumably the metadata would either be at the top or the bottom of the screen (our current site design would support the top very easily). This instead of an info box that followed the cursor, which we think would be distracting.

Memory problem with experimental

In my quest for making a visualization of 270K images, I tried the master checkout as well as experimental. master ran through the processing steps without problems, but the visualization did not scale well. experimental seems to have the visualization part handle, but unfortunately I am not able to complete a render. It fails with an out of memory (errors below) on a machine with 100GB free. Is there something I can disable or tweak to reduce the memory requirements?

python utils/process_images.py --clusters=20 --validate_images=False --image_files="/data01/dsc/static_content/pixplot/kb_all/output/1200/*.jpg"

 * loaded 270680 of 270682 image vectors
 * loaded 270681 of 270682 image vectors
 * loaded 270682 of 270682 image vectors
 * calculating 20 clusters
 * generating image position data
 * building lower-dimensional projections
Traceback (most recent call last):
  File "utils/process_images.py", line 638, in <module>
    tf.app.run()
  File "/data01/dsc/static_content/pixplot/kb_all_experimental/ex/lib64/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "utils/process_images.py", line 635, in main
    PixPlot(image_glob)
  File "utils/process_images.py", line 75, in __init__
    if self.flags.process_images: self.process_images()
  File "utils/process_images.py", line 190, in process_images
    self.write_json()
  File "utils/process_images.py", line 391, in write_json
    'cells': self.get_cell_data(),
  File "utils/process_images.py", line 333, in get_cell_data
    layout_models = self.get_layout_models()
  File "utils/process_images.py", line 376, in get_layout_models
    'tsne_2d': center_features(tsne_2d_model.fit_transform(vecs)),
  File "/data01/dsc/static_content/pixplot/kb_all_experimental/ex/lib64/python3.6/site-packages/sklearn/manifold/t_sne.py", line 884, in fit_transform
    embedding = self._fit(X)
  File "/data01/dsc/static_content/pixplot/kb_all_experimental/ex/lib64/python3.6/site-packages/sklearn/manifold/t_sne.py", line 730, in _fit
    squared=True)
  File "/data01/dsc/static_content/pixplot/kb_all_experimental/ex/lib64/python3.6/site-packages/sklearn/metrics/pairwise.py", line 1240, in pairwise_distances
    return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
  File "/data01/dsc/static_content/pixplot/kb_all_experimental/ex/lib64/python3.6/site-packages/sklearn/metrics/pairwise.py", line 1083, in _parallel_pairwise
    return func(X, Y, **kwds)
  File "/data01/dsc/static_content/pixplot/kb_all_experimental/ex/lib64/python3.6/site-packages/sklearn/metrics/pairwise.py", line 245, in euclidean_distances
    distances = safe_sparse_dot(X, Y.T, dense_output=True)
  File "/data01/dsc/static_content/pixplot/kb_all_experimental/ex/lib64/python3.6/site-packages/sklearn/utils/extmath.py", line 189, in safe_sparse_dot
    return fast_dot(a, b)
MemoryError

Flickering (and wrong) images when zooming in

I just played on the experimental branch and as soon as I zoom in and the 32px thumbs are getting replaced with higher resolutions, the images start to flicker and often show only black rectangles or parts of the atlases:

I am not quite sure what is happening. While zooming, one can see that the correct data is loaded as well, but as soon as the zoom animation stops, things are as shown above.

Just wanted to report this bug in case you haven't already seen it. Amazing piece of work, though. Thank you so much!

(Though I don't think it's important: I used my own dataset and distance metric for it. Had no problems during the processing steps whatsoever.)

EDIT:
Chrome: Same problem
Microsoft Edge: Same problem
Firefox: Does not show the 32px thumbs at all. Also flickering and black rectangles, but no parts of the atlas files are shown.

Idea: split up positions and clusters into 2 json files

Curating the k-means generated clusters is unwieldy due to the size of the positions data in the json file. Generating a separate clusters_data.json (perhaps even pretty-printed, because n is only 20 by default) would make editing this file easier.

Create deeplinking mechanism

so users can deeplink each image to another host with the same image

Recursion problem with umap?

While processing 26K images, pix-plot threw the exception

Traceback (most recent call last):
  File "pix-plot/utils/process_images.py", line 349, in <module>
    PixPlot(image_dir=sys.argv[1], output_dir='output')
  File "pix-plot/utils/process_images.py", line 58, in __init__
    self.write_json()
  File "pix-plot/utils/process_images.py", line 261, in write_json
    'positions': self.get_2d_image_positions(),
  File "pix-plot/utils/process_images.py", line 173, in get_2d_image_positions
    model = self.build_model(self.image_vectors)
  File "pix-plot/utils/process_images.py", line 198, in build_model
    return model.fit_transform( np.array(image_vectors) )
  File "/usr/local/lib/python2.7/dist-packages/umap/umap_.py", line 1402, in fit_transform
    self.fit(X)
  File "/usr/local/lib/python2.7/dist-packages/umap/umap_.py", line 1361, in fit
    self.verbose
  File "/usr/local/lib/python2.7/dist-packages/umap/umap_.py", line 391, in rptree_leaf_array
    except (RuntimeError, RecursionError):
NameError: global name 'RecursionError' is not defined

Note: The line numbers for pix-plot are a bit off because I inserted a few lines, as described below.

This was with Python 2.7.12. It worked fine with a sample corpus of 70 images. The machine had 7GB of free mem when I started processing. I tried adding a line with sys.setrecursionlimit(2000) at the start of the script (Random Page On The Internet said that the default for Python is 1000), but that did not change anything.

What is the highest number of images that pix-plot can handle?

Saving vector representation as text or pickle

Question: the file extension of the vector representation is .npz:

https://github.com/YaleDHLab/tsne-images-webgl/blob/9fc627da78d09c11924a2654a5d2e5905faa96fa/utils/classify_images.py#L208

... but I think we may be saving as a generic text file:

https://github.com/YaleDHLab/tsne-images-webgl/blob/9fc627da78d09c11924a2654a5d2e5905faa96fa/utils/classify_images.py#L210

This only affects whether we numpy.load or numpy.loadtxt in later stages but just thought I'd check...

Docker image errors

I am getting the following incompatibility errors:
ERROR: google-auth 1.8.1 has requirement setuptools>=40.3.0, but you'll have setuptools 20.7.0 which is incompatible.
ERROR: markdown 3.1.1 has requirement setuptools>=36, but you'll have setuptools 20.7.0 which is incompatible.
ERROR: tensorboard 2.0.2 has requirement setuptools>=41.0.0, but you'll have setuptools 20.7.0 which is incompatible.
ERROR: tensorflow 2.0.0 has requirement numpy<2.0,>=1.16.0, but you'll have numpy 1.14.3 which is incompatible.

And then when I attempt to process the images, I get:

File "utils/process_images.py", line 40, in
flags = tf.app.flags
AttributeError: module 'tensorflow' has no attribute 'app'

Filtering images

That's a great piece of code !

What would be an idea to dynamically filter images at runtime ? I could add some values in the plot_data.json file and then in the imageData object. But using buildGeometry and only draw some images doesn't seem to be the solution because the faces are already indexed.

Maybe using the z positioning to put the filtered images behind the scene ?

Fail-fast or skipping of problematic images

I am in the process of testing pix-plot with 26K images with varying names.

Some of them contained parentheses, which caused pix-plot to fail very late in the process. Others had an apostrophe, which gave the same error.

Traceback (most recent call last):
  File "pix-plot/utils/process_images.py", line 349, in <module>
    PixPlot(image_dir=sys.argv[1], output_dir='output')
  File "pix-plot/utils/process_images.py", line 58, in __init__
    self.write_json()
  File "pix-plot/utils/process_images.py", line 261, in write_json
    'positions': self.get_2d_image_positions(),
  File "pix-plot/utils/process_images.py", line 174, in get_2d_image_positions
    return self.get_image_positions(model)
  File "pix-plot/utils/process_images.py", line 218, in get_image_positions
    with Image.open(thumb_path) as image:
  File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 2410, in open
    fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: "output/thumbs/32px/bi_20150530-1609_2_AstaCykelSt'tteben.jpg"

I checked and the file output/thumbs/32px/bi_20150530-1609_2_AstaCykelSt'tteben.jpg does exist, but I can understand why it could give problems in a script.

Some images were corrupt, which (guessing here) meant that the thumbnail-processing left no file, which again caused some later step to fail with a different error. Unfortunately I did not keep that error message.

So far I have handled the problem by finding & removing files with problematic characters when the script throws an exception and removing all corrupt images using for F in sources_sshfs/*.jpg; do if [[ -z $(identify "$F" 2> /dev/null) ]]; then echo "Removing $F" ; rm "$F" ; fi ; done, but I haven't completed a full run of the 26K images yet.

Whether due to naming or format problems, the overall problem is the same: The script needs to run for 10+ minutes before it fails. Preferably it could be made to just skip the problematic images, but alternatively it would be nice if it could fail early in the process?

Gradual image enhancement

I tried generating a pix-plot with 270,000 images. I was able to view it (choppily) for 20 seconds in Chrome or 60 seconds in Firefox before it crashed the graphics system on my Ubuntu desktop with 16GB of RAM and integrated graphics on an i5 CPU. A colleague with a MacBook got nowhere with Safari (it complained about too much memory being requested) but was able to view it choppily with Firefox, which allocated 14GB of RAM. The 270K images are from our open image collection and we would very much like to use PixPlot on it in a way that is usable for random visitors.

Glancing at the Google demo, the solution seems to be selective loading of higher-res images based on image proximity. This would also have the positive secondary effect that smaller collections would display nicely when zooming in.

Alas, JavaScript and Web-GL is very far from my knowledge bubble, so I don't know how to even approach this solution.

IIIF integration (import/export)

Some thoughts on how PixPlot can integrate with IIIF repos. This is not connected to the annotation functionality - this is only about getting images in and out.

Import

Parse a newline-delimited list of IIIF manifests instead of a folder of images and a csv.
In case of multiple images per manifest: take the first image. Assume this is the "main" image of the set. (Surely a false assumption but good enough to prove IIIF functionality. We are shooting for a use case where the objects are likely to be well represented by the first image, and the second might be something like the back of the painting or an uncropped variant with color balance targets.)
Take the label field of the manifest as the caption or title. (I believe label is the only field guaranteed to be present in a manifest.) Do not seek to create facets yet from the metadata (requires human judgement on a per-collection basis).

Export

If the PixPlot was generated from IIIF manifests, show the IIIF logo image on the Lightbox view with a link either...
- The raw IIIF manifest itself?
  - https://manifests.britishart.yale.edu/manifest/46796
- The IIIF manifest as an argument to a Mirador instance?
  - http://mirador.britishart.yale.edu/?manifest=https://manifests.britishart.yale.edu/manifest/46796

The problem with the first option is that the manifest itself isn't useful unless you have a Mirador or other IIIF client running somewhere. The problem with the second is, there's no central, universal Mirador instance that I know of (besides something running on your localhost). It's not clear if for example YCBA would be cool with us directing all links to their Mirador...

This Export idea only lets you drag one IIIF manifest at a time, not the much-cooler export of whole subsets of the cluster.

How to improve the resolution

Is there a control for increasing the resolution of the images shown when zooming?

is there a GPU support?

Hi,

Does this use the GPU?

Consider multi-threaded thumbnail generation

Would it be possible to do multithreaded thumbnail creation, either natively via a python pool or possibly via gnu parallel?

ImageMagick Installation Issue

When attempting to run the ImageMagick line of the Dependencies section of the ReadMe, the following error results:

Error: No such keg: /usr/local/Cellar/imagemagick

Is this an issue with version compatibility of Pix Plot and tensorflow?

Names of the images/data

Do the Images need to be named a certain way or put in a certain way?
For example, can the image names have spaces in then, or can they be numbers. Also, can you use other images besides jpg.

Implement multiselect for categorical metadata attribute filter

Instead of a single metadata level, it'd be nice to be able to select multiple levels and have them all influence the filtering process (all values would be and'd together).

update atlasCounts in tsne-webgl.js

We may need to dynamically update this line:

var atlasCounts = { '32px': 1, '64px': 3 }

...upon completion of the atlas maps... I think right now it requires some manual adjustment.

Location lookup from image

When an image is clicked, the URL changes to http://example.com/pixplot/#imagename. Judging from how the cluster-links works, it seems doable to make it work the other way around, so that changing the URL would make the view jump to the corresponding image.

This feature would allow for bookmarking and act as a building block for an image searcher by making it easy to see the found image in context.

Add feature to print time taken at each step

Request to enhance the code for printing time taken for each of the steps like building thumbs, making 2D projections, computing clusters, etc.

Building 2D projections is taking for ever in my case with 300 clusters and 80K images. Any reason why?

get the original image names and the centroid in a given cluster

This is a question I have. Can I get the names of the images that belong to all clusters?
For example, if my input images were named like img1.jpg, img2.jpg, and so on, can I get the cluster numbers and all images that belong to it and the centroid of that cluster. Like:
Cluster 1: img1.jpg, img4.jpg, ....................... : centroid: img4.jpg
Cluster 2: img6.jpg, img8.jpg, ....................... : centroid: img6.jpg
Cluster 3: img14.jpg, img41.jpg, ....................... : centroid: img41.jpg
Sorry if this is a very basic question, I am new to this area.

Parsing the IIIF manifest assumes a 600x is available

I tried to import a set of IIIF manifests and it returned properly formed URLs for the images, except that, in this collection, the full size image is 400x, so the URLs return nothing.

Adjust UMAP hyperparameters given user data

when running UMAP with very small or very large datasets, the default hyperparameters produce results that are too far apart or too tightly clustered. We can help fix this problem by setting UMAP's hyperparameters using insights from a user's dataset.

Since last update to pixplot package, pip install pixplot does not work properly

on Mac it returned this installation error:
TSNE<SplitTree, &(euclidean_distance(DataPoint const&, DataPoint const&))>::run(dou
ble*, int, int, double*, int, double, double, int, int, int, int, bool, int, double, double
, double*) in tsne.cpp.o
TSNE<SplitTree, &(euclidean_distance_squared(DataPoint const&, DataPoint const&))>:
:run(double*, int, int, double*, int, double, double, int, int, int, int, bool, int, double
, double, double*) in tsne.cpp.o
VpTree<DataPoint, &(euclidean_distance(DataPoint const&, DataPoint const&))>::build
FromPoints(int, int) in tsne.cpp.o
VpTree<DataPoint, &(euclidean_distance_squared(DataPoint const&, DataPoint const&))

::buildFromPoints(int, int) in tsne.cpp.o
"_srand", referenced from:
TSNE<SplitTree, &(euclidean_distance(DataPoint const&, DataPoint const&))>::run(dou
ble*, int, int, double*, int, double, double, int, int, int, int, bool, int, double, double
, double*) in tsne.cpp.o
TSNE<SplitTree, &(euclidean_distance_squared(DataPoint const&, DataPoint const&))>:
:run(double*, int, int, double*, int, double, double, int, int, int, int, bool, int, double
, double, double*) in tsne.cpp.o
"_time", referenced from:
TSNE<SplitTree, &(euclidean_distance(DataPoint const&, DataPoint const&))>::run(dou
ble*, int, int, double*, int, double, double, int, int, int, int, bool, int, double, double
, double*) in tsne.cpp.o
TSNE<SplitTree, &(euclidean_distance_squared(DataPoint const&, DataPoint const&))>:
:run(double*, int, int, double*, int, double, double, int, int, int, int, bool, int, double
, double, double*) in tsne.cpp.o
ld: symbol(s) not found for architecture x86_64
clang-4.0: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [/private/var/folders/sf/4y0f6rb52gg54rskg_jdn5zc0000gr/T/pip-install-lrmh_jmm/MulticoreTSNE/build/lib.macosx-10.14-x86_64-3.7/MulticoreTSNE/libtsne_multicore.so] Error 1
make[1]: *** [CMakeFiles/tsne_multicore.dir/all] Error 2
make: *** [all] Error 2

ERROR: Cannot find make? See above errors.

ERROR: Failed building wheel for MulticoreTSNE
Running setup.py clean for MulticoreTSNE
Failed to build MulticoreTSNE
ERROR: Could not build wheels for hdbscan which use PEP 517 and cannot be installed directly

on linux 16.4 it returned
Traceback (most recent call last):
File "/usr/local/bin/pip", line 5, in
from pip._internal.cli.main import main
ImportError: No module named 'pip._internal.cli.main'

ConnectionResetError: [Errno 54] Connection reset by peer when running "process_images.py"