Disentangled Sequential Autoencoder

PyTorch implementation of Disentangled Sequential Autoencoder, a Variational Autoencoder Architecture for learning latent representations of high dimensional sequential data by approximately disentangling the time invariant and the time variable features.

Results

We test our network on the Liberated Pixel Cup dataset consisting of sprites of video game characters of varying hairstyle, clothing, skin color and pose. We constrain ourselves to three particular types of poses, walking, slashing and spellcasting. The network learns disentangled vector representations for the static (elements like skin color and hair color) and dynamic aspects (motion) in the vectors f, and z1, z2, z3, .. z8 (one for each frame), respectively

Style Transfer

We perform style transfer by learning the f and z encodings of two characters that differ in both appearance and pose, and swap their z encodings. This causes the characters to interchange their pattern of motion while preserving appearance ,allowing manipulations like "blue dark elf walking" swapped with "lightskinned human spellcasting" gives "blue dark elf spellcasting" and "lightskinned human walking" respectively

Sprite 1	Sprite 2	Sprite 1's Body With Sprite 2's Pose	Sprite 2's Body With Sprite 1's Pose

Cosine Similarities of Encodings

We consider 12 pairs of randomly chosen sprites and compare the cosine similarities of their f and z encoding vectors. We observe that sprites having the same physical appearance have a high cosine similarity between their f encodings irrespective of whether their motion patterns are same or different and sprites having different physical appearance have a low cosine similarity between their f encodings Similarly sprites having similar motion patterns have a high cosine similarity between their z encodings irrespective of their appearance, and sprites having dissimilar motion patterns have a low cosine similaritiy of z encodings This further reinforces the fact that f encodes the time invariant features of the sprites while the z vectors encode the time variable features

Sprite 1	Sprite 2	Cosine Similarity of f	Cosine Similarity of z
		1.00	1.00
		0.83	0.09
		0.78	0.06
		0.19	0.95
		0.26	0.97
		0.18	0.97
		0.25	0.99
		0.31	0.98
		0.00	0.97
		0.75	0.08
		0.83	0.03
		0.73	0.08

arjunrao796123 / disentangled-sequential-autoencoder-1 Goto Github PK