Giter Club home page Giter Club logo

deep-convolutional-autoencoder's People

Contributors

arashsaber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-convolutional-autoencoder's Issues

Fetch values of the latent space

Hi @arashsaber
Thank you for the comprehensive explanation of the code. Can you let me know how do i fetch values of the compressed representation (latent space), which is 'fc2 layer' as i understand? I have been trying out code snippets for the same, but couldn't get the values yet.
Thank you

Allocation of X exceeds 10% of system memory for custom input.

I'm trying to extend the code to train on colored images of size 84x84. But I am getting the following error during training:

2018-07-23 11:23:45.136501: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-07-23 11:23:45.156209: W tensorflow/core/framework/allocator.cc:101] Allocation of 1792336896 exceeds 10% of system memory.
2018-07-23 11:23:47.214466: W tensorflow/core/framework/allocator.cc:101] Allocation of 1792336896 exceeds 10% of system memory.
2018-07-23 11:23:49.253364: W tensorflow/core/framework/allocator.cc:101] Allocation of 1792336896 exceeds 10% of system memory.

Initially I thought the issue was with my CPU not being compatible or my machine had to little memory, so I tried to train on a K80 GPU but got the same error. Now I am thinking I made an error in memory allocation but I can't pin point the issue. Here is my code:

import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from glob import glob
from utils import *  # Self-defined utils file in the same dir

# Some important consts
num_examples = 669
batch_size = 1  # try vary batch size to observe mode collapse. was 100, nb may run into mem issue for big batch size

# Fetch input data (faces/trees/imgs)
data_path = os.path.join("./data/celebC/", '*.jpg')
data = glob(data_path)

if len(data) == 0:
    raise Exception("[!] No data found in '" + data_path+ "'")

np.random.shuffle(data)
imreadImg = imread(data[0])  # test read an image

if __debug__:
    print(imreadImg.shape)

if len(imreadImg.shape) >= 3: # check if image is a non-grayscale image by checking channel number
    c_dim = imread(data[0]).shape[-1]
else:
    c_dim = 1

is_grayscale = (c_dim == 1)

# tf Graph Input
# face data image of shape 84*84=7056 N.B. originally without the depth 3
x = tf.placeholder(tf.float32, [1, 84*84*3], name='InputData')

print(x.shape)

# This is
logs_path = "./logs-CAE/"
#   ---------------------------------
"""
We start by creating the layers with name scopes so that the graph in
the tensorboard looks meaningful
"""
#   ---------------------------------
def conv2d(input, name, kshape, strides=[1, 1, 1, 1]):
    with tf.name_scope(name):
        W = tf.get_variable(name='w_'+name,
                            shape=kshape,
                            initializer=tf.contrib.layers.xavier_initializer(uniform=False))
        b = tf.get_variable(name='b_' + name,
                            shape=[kshape[3]],
                            initializer=tf.contrib.layers.xavier_initializer(uniform=False))
        out = tf.nn.conv2d(input,W,strides=strides, padding='SAME')
        out = tf.nn.bias_add(out, b)
        out = tf.nn.relu(out)
        return out
# ---------------------------------
# tf.contrib.layers.conv2d_transpose, do not get confused with 
# tf.layers.conv2d_transpose
def deconv2d(input, name, kshape, n_outputs, strides=[1, 1]):
    with tf.name_scope(name):
        out = tf.contrib.layers.conv2d_transpose(input,
                                                 num_outputs= n_outputs,
                                                 kernel_size=kshape,
                                                 stride=strides,
                                                 padding='SAME',
                                                 weights_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False),
                                                 biases_initializer=tf.contrib.layers.xavier_initializer(uniform=False),
                                                 activation_fn=tf.nn.relu)
        return out
#   ---------------------------------
# ksize: A list or tuple of 4 ints. The size of the window for each dimension of the input tensor.
# strides: A list or tuple of 4 ints. The stride of the sliding window for each dimension of the input tensor.
# reference https://www.quora.com/What-is-the-size-of-the-output-of-a-maxpool-layer-in-a-CNN
# for size of output of maxpool layer
def maxpool2d(x,name,kshape=[1, 2, 2, 1], strides=[1, 2, 2, 1]):
    with tf.name_scope(name):
        out = tf.nn.max_pool(x,
                             ksize=kshape, #size of window
                             strides=strides,
                             padding='SAME')
        return out
#   ---------------------------------
def upsample(input, name, factor=[2,2]):
    size = [int(input.shape[1] * factor[0]), int(input.shape[2] * factor[1])]
    with tf.name_scope(name):
        out = tf.image.resize_bilinear(input, size=size, align_corners=None, name=None)
        return out
#   ---------------------------------
def fullyConnected(input, name, output_size):
    with tf.name_scope(name):
        input_size = input.shape[1:]
        input_size = int(np.prod(input_size)) # get total num of cells in one input image
        W = tf.get_variable(name='w_'+name,
                            shape=[input_size, output_size],
                            initializer=tf.contrib.layers.xavier_initializer(uniform=False))
        b = tf.get_variable(name='b_'+name,
                            shape=[output_size],
                            initializer=tf.contrib.layers.xavier_initializer(uniform=False))
        input = tf.reshape(input, [-1, input_size])
        out = tf.nn.relu(tf.add(tf.matmul(input, W), b))
        return out
#   ---------------------------------
def dropout(input, name, keep_rate):
    with tf.name_scope(name):
        out = tf.nn.dropout(input, keep_rate)
        return out
#   ---------------------------------
# Let us now design the autoencoder
def ConvAutoEncoder(x, name):
    with tf.name_scope(name):
        """
        We want to get dimensionality reduction of 11664 to 44656
        Layers:
            input --> 84, 84 (7056)
            conv1 --> kernel size: (5,5), n_filters:25 ???make it small so that it runs fast
            pool1 --> 42, 42, 25
            dropout1 --> keeprate 0.8
            reshape --> 42*42*25
            FC1 --> 42*42*25, 42*42*5
            dropout2 --> keeprate 0.8
            FC2 --> 42*42*5, 8820 --> output is the encoder vars
            FC3 --> 8820, 42*42*5
            dropout3 --> keeprate 0.8
            FC4 --> 42*42*5,42*42*25
            dropout4 --> keeprate 0.8
            reshape --> 42, 42, 25
            deconv1 --> kernel size:(5,5,25), n_filters: 25
            upsample1 --> 84, 84, 25
            FullyConnected (outputlayer) -->  84* 84* 25, 84 * 84 *  1
            reshape --> 84 * 84
        """
        input = tf.reshape(x, shape=[-1, 84, 84, 3])

        # coding part
        c1 = conv2d(input, name='c1', kshape=[5, 5, 3, 25])  # kshape = [k_h, k_w, in_channels, out_chnnels]
        p1 = maxpool2d(c1, name='p1')
        do1 = dropout(p1, name='do1', keep_rate=0.75)
        do1 = tf.reshape(do1, shape=[-1, 42*42*25])  # reshape to 1 dimensional (-1 is batch size)
        fc1 = fullyConnected(do1, name='fc1', output_size=42*42*5)
        do2 = dropout(fc1, name='do2', keep_rate=0.75)
        fc2 = fullyConnected(do2, name='fc2', output_size=42*42)
        # Decoding part
        fc3 = fullyConnected(fc2, name='fc3', output_size=42 * 42 * 5)
        do3 = dropout(fc3, name='do3', keep_rate=0.75)
        fc4 = fullyConnected(do3, name='fc4', output_size=42 * 42 * 25)
        do4 = dropout(fc4, name='do3', keep_rate=0.75)
        do4 = tf.reshape(do4, shape=[-1, 42, 42, 25])
        dc1 = deconv2d(do4, name='dc1', kshape=[5,5],n_outputs=25)
        up1 = upsample(dc1, name='up1', factor=[2, 2])
        output = fullyConnected(input, name='output', output_size=84*84*3)
        # print(output1.shape)
        # print(x.shape)
        with tf.name_scope('cost'):
            # N.B. reduce_mean is a batch operation! finds the mean across the batch
            cost = tf.reduce_mean(tf.square(tf.subtract(output, x)))
        return output, cost
#   ---------------------------------
def train_network(x):
    # Use this output to visualize the output of the decoder.

    output, cost = ConvAutoEncoder(x, 'ConvAutoEnc')
    with tf.name_scope('opt'):
        optimizer = tf.train.AdamOptimizer().minimize(cost)

    # Create a summary to monitor cost tensor
    tf.summary.scalar("cost", cost)

    # Merge all summaries into a single op
    merged_summary_op = tf.summary.merge_all()

    n_epochs = 5
    with tf.Session() as sess:

        sess.run(tf.global_variables_initializer())  # memory allocation exceeded 10% issue

        # create log writer object
        writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
        if __debug__:
            print("init session")
        for epoch in range(n_epochs):
            avg_cost = 0
            n_batches = int(num_examples / batch_size)
            print("epoch " + str(epoch))
            # Loop over all batches
            for i in range(n_batches):
                print("batch " + str(i))
                # batch_x, batch_y = mnist.train.next_batch(batch_size)
                # .next_batch -> https://stackoverflow.com/questions/41454511/tensorflow-how-is-dataset-train-next-batch-defined/41454722

                batch_files = data[i*batch_size:(i+1)*batch_size]  # get the current batch of files
                # TODO: add get_image() functionality from model.py to transform the batch as well.
                batch = [
                get_image(batch_file,
                        input_height=84,
                        input_width=84,
                        resize_height=84,
                        resize_width=84,
                        crop=True,
                        grayscale=False) for batch_file in batch_files] # get_image will get image from file dir after applying resize operation. 
                batch_images = np.array(batch).astype(np.float32)[:, :, :, None]

                print("BATCH_IMG SHAPE")
                print(batch_images.shape)

                # Run optimization op (backprop) and cost op (to get loss value)
                # _, c, summary = sess.run([optimizer, cost, merged_summary_op], feed_dict={x: batch_x, y: batch_y})
                _, c, summary = sess.run([optimizer, cost, merged_summary_op], feed_dict={x: batch_images})

                # Compute average loss
                avg_cost += c / n_batches
                # write log
                writer.add_summary(summary, epoch * n_batches + i)

            # Display logs per epoch step
            print('Epoch', epoch+1, ' / ', n_epochs, 'cost:', avg_cost)
        print('Optimization Finished')
        print('Cost:', cost.eval({x: mnist.test.images}))


train_network(x)

Any ideas on what might have caused this issue? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.