Giter Club home page Giter Club logo

ufldl's Introduction

Unsupervised Feature Learning and Deep Learning

Solutions to the Exercises of UFLDL Tutorial Old version and New version (2016).

Some of the files, espacially images and mats, cannot be uploaded due to size contraint. Please download them from the tutorial website.

Exercise 1: Sparse Autoencoder

The following files are the core of this exercise:
  • sampleIMAGES.m: Load IMAGES.mat and randomly choose paterns to train.
  • sparseAutoencoderCost.m: Front and back propagation. Note that two implementation methods are provided.
  • computeNumericalGradient.m: Do the gradient test. This part should be skipped for future examples because it cost huge amount of time.
  • test.m: The overall procedure.
Notes:
  • We use a vectorized version already in the first implementation of sparseAutoencoderCost.m. An unvectored and bit inelegant implementation is commented after it.

Exercise 2: Vectorized Sparse Autoencoder

The following files are the core of this exercise:
  • sparseAutoencoderCost.m: Front and back propagation. Note that two implementation methods are provided.
  • test.m: The overall procedure. Notice that this time we use another set of images (and labels), with parameters altered.
  • some .m files to read the images (and labels), see info\Using the MNIST Dataset.docx.

Exercise 3A: PCA 2D

The following files are the core of this exercise:
  • pca_2d.m: Including Finding the PCA basis, Checking xRot, Dimension reduce and replot, PCA Whitening and ZCA Whitening.

Exercise 3B: PCA and Whitening

The following files are the core of this exercise:
  • pca_gen.m: Including Load and zero mean the data, Implement PCA (and check covariance), Find number of components to retain, PCA with dimension reduction, PCA with whitening and regularization (and check covariance), ZCA whitening.

Exercise 4: Softmax Regression

The following files are the core of this exercise:
  • softmaxCost.m: Compute the softmax cost function J(θ) and its gradient.
  • softmaxPredict.m: Compute the predicted lable (classification) using calculated theta and data under test.
  • test.m: The overall procedure, including Initialise constants and parameters, Load data, Implement softmaxCost (using softmaxCost.m), Gradient checking (using computeNumericalGradient.m in Exercise 1), Learning parameters (using softmaxTrain.m which minimizes softmaxCost.m by L-BFGS), Testing with test datas.

Exercise 5: Self-Taught Learning

The following files are the core of this exercise:
  • feedForwardAutoencoder.m: convert the raw image data to hidden unit activations a(2).
  • stlExercise.m: The overall procedure, including Setting parameters, Load data from the MNIST database (and divided into labled and unlabled data sets), Train the sparse autoencoder with unlabled data set (like Exercise 2), Extract Features from the Supervised Dataset (using feedForwardAutoencoder.m, based on the w(1) form the autoencoder), Train the softmax classifier (based on the input from the extracted features), Testing with test datas.
Notes:
  • The whole procedure can be explained as:
    1. Use sparse autoencoder to train unlabled data and get w(1) and w(2);
    2. Use self-taught learning to obtain a(2) using w(1);
    3. Use Softmax Regression to train labled data (a(2), y) and optimize theta (the new w(2) in final network).
  • The overall procedure is explained in topic 6.1. Notice that with fine-tuning (introduced in topic 6), we can also optimize w(1) with optimization methods when training labled data.

Exercise 6: Stacked Autoencoder for Digit Classification

This Exercise is extremely important, you are highly recomanded to read stackedAECost.m, stackedAEPredict.m and stackedAEExercise.m thoroughly.
The following files are the core of this exercise:
  • stackedAECost.m: This function do the fine-tuning, including

    1. Feed Forward Autoencoder for the hidden levels (level 2 ~ depth+1);
    2. Compute J and ▽ J for the softmax level (level depth+2);
    3. Back Propagation from the last hidden level to the input level (depth ~ 1, we minus one here from depth+1 ~ 2, because f(w(i-1), b(i-1);x(i-1)) = a(i), the parameters come from the previous level).
  • stackedAEExercise.m: The overall procedure, including

    1. Set Parameters, we set depth = 2;
    2. Load data from the MNIST database;
    3. Train the first sparse autoencoder (input level 1, hidden level 2, output level ignored);
    4. Train the second sparse autoencoder (input level 2, hidden level 3 = depth+1, output level ignored);
    5. Train the softmax classifier (input level 3 = depth+1, output level 4 = depth+2);
    6. Finetune softmax model (using stackedAECost.m);
    7. Test (using stackedAEPredict.m).
  • stackedAEPredict.m: Use trained network to test data.

Notes:
  • The levels in stackedAECost.m are:
    1. input level: level 1;
    2. hidden levels: level 2 ~ depth+1, more specifically, it should be level 2 and 3, level 3 and 4 ... level depth and depth +1, where level i is the input level of the stacked autoencoder and level i+1 is the second level to self-teach;
    3. softmax level: level depth+2.

Exercise 7: Linear Decoder on Color Features

The following files are the core of this exercise:
  • sparseAutoencoderLinearCost.m: modified from sparseAutoencoderCost.m in Exercise 1, so that f(·) and delta of the last level is set to identity ("linear") to generate color representations rather than 0~1 gray color.
  • linearDecoderExercise.m: The overall procedure, including Setting parameters, Gradient checking of the linear decoder, Load patches, ZCA whitening, Learning features (using autoencoder with linear decoder), Visualization.

Exercise 8: Convolution and Pooling

This Exercise is extremely important, you are highly recomanded to read cnnExercise.m, cnnConvolve.m and cnnPool.m thoroughly.
The following files are the core of this exercise:
  • cnnConvolve.m: This function do the convolution. The return value convolvedFeatures is of dim 4:

    1. numFeatures: equals number of hidden units in the network, we use this as number of features (i.e. convolution kernels/masks/convolution matrices);
    2. numImages: equals number of images to convolve;
    3. imageDim - patchDim + 1: equals the dimension of convoluted image;
    4. imageDim - patchDim + 1: equals the dimension of convoluted image.

    Moreover, the 3rd and 4th dimension is composed of convolvedImage, which is computed by:

    1. feature: represents the convolution matrix, it is computed by:

    2. obtain optTheta and ZCAWhite: these are the theta and ZCA matrix obtained from the color features from Exercise 7. More specifically, optTheta contains w and b of the neurons, and ZCAWhite represents the processing matrix of ZCA whitening;

    3. we use w * ZCAWhite as each feature (convolution matrix), where w is the corresponding weights from the input neurons to the specific hidden neuron, extraced from optTheta.

    4. im: represents the patterns of specific image at specific color channel.

    5. convolvedImage += conv2(im, feature, 'valid'): the convolution process.

    • As described in topic 8.3, the calculation using w, b and ZCAWhite can be abstracted as:
      Taking the preprocessing steps into account, the feature activations that you should compute is σ(W(T(x-u)) + b), where T is the whitening matrix and u is the mean patch. Expanding this, you obtain σ(WTx - WTu + b), which suggests that you should convolve the images with WT rather than W as earlier, and you should add (b - WTu), rather than just b to convolvedFeatures, before finally applying the sigmoid function.
  • cnnPool.m: This function do the pooling. The return value pooledFeatures is of dim 4:

    1. numFeatures: number of features (i.e. convolution kernels/masks/convolution matrices);
    2. numImages: equals number of images to convolve;
    3. resultDim: equals floor(convolvedDim / poolDim), which is the result dimension;
    4. resultDim: equals floor(convolvedDim / poolDim), which is the result dimension.

    We simply take the mean of each poolDim*poolDim.

  • cnnExercise.m: The overall procedure, including

    1. Initialization of parameters;

    2. Train a sparse autoencoder (with a linear decoder) to learn: we simply use the result of Exercise 7. Here three objects are used:

    3. optTheta: theta (w and b) of the autoencoder

    4. ZCAWhite: the ZCA whitening matrix

    5. meanPatch: mean of the patches

    6. Test convolution: use cnnConvolve.m to test the convolution.

    7. Test pooling: use cnnPool.m to test the pooling.

    8. Convolve and pool with the dataset: the core part, including

    9. Load train and test sets;

    10. Divide features into groups. (This part can be omitted, it is just for testing and logging. After all we have to convolute through all features/convolution matrices);

    11. convolute and pool train and test datasets.

    12. Use pooled features for classification: Here we choose to use softmax classifier;

    13. Test classifier. You should expect to get an accuracy of around 80% on the test images.

    • As described in topic 8.1, step 2 and step 5 (convolution part) can be abstracted as:
      Given some large r × c images xlarge, we first train a sparse autoencoder on small a × b patches xsmall sampled from these images, learning k features f = σ(W(1) × xsmall + b(1)) (where σ is the sigmoid function), given by the weights W(1) and biases b(1) from the visible units to the hidden units. For every a × b patch xs in the large image, we compute fs = σ(W(1) × xs + b(1)), giving us fconvolved, a k × (r - a + 1) × (c - b + 1) array of convolved features.

    • The change of dimension can be abstacted as:

      • autoencode: vector of images to train the autoencoder (whatever×1) ==> convolution matirx (8×8);
      • convolute: pathces (convolution matirx) (8×8) + image to convolute (64×64) ==> convolutedFeature (57×57);
      • pool(size=19): convolutedFeature (57×57) ==> pooledFeature (3×3).

Exercise 9: Sparse Coding

The following files are the core of this exercise:
  • sparseCodingFeatureCost.m: This function calculates J(s) and ▽J(s) when A is set.
  • sparseCodingWeightCost.m: This function calculates J(A) and ▽J(A) when s is set. Actually in our process, the optimal solution can be directly derived and therefore this function is useless.
  • sparseCodingExercise.m: The overall procedure, including
    1. Initialization of parameters. Here each parameter means a lot, and change of anyone will make a huge difference, see Notes;
    2. Load patches sampled from the original image data;
    3. Checking (the checking process is ignored to save time);
    4. Iterative optimization:
    5. Select a random mini-batch;
    6. Initialize s:
      1. Set s = A^Tx (where x is the matrix of patches in the mini-batch);
      2. For each feature in s (i.e. each column of s), divide the feature by the norm of the corresponding basis vector in A.
    7. Optimize for feature matrix s;
    8. Optimize for weight matrix A. Actually here we can directly derive the result as discribed before.
    9. Visualize result at the end of this iteration.
Notes in sparseCodingExercise.m:
  • In Step 0, we choose patches to be 16 × 16 instead of 8 × 8, and number of features to learn to be 18 × 18 instead of 11 × 11 to obtain better visual results. Also, lamda, epsilon and gamma can also be adjusted to obtain better results.
  • When iterating, we use 'cg' (conjugate gradient) instead of 'lbfgs', because lbfgs will make steeper steps and lead to worse results. One alternative is to use lbfgs while decreasing iterations (e.g. options.maxIter=15), or increasing dimension of the grouping region for topographic sparse coding (e.g. poolDim=5) so that the sparse code will be used in larger areas and therefore avoid over-accuracy of s.
Other Notes:
  • Since we minimize s and A alternatively, the cost may not be decreasing all the time, but the overall trend should be.
  • There are two changes in sampleIMAGES.m:
    1. Three parameters (images, patchDim, numPatches) are added to the function, so that we can customize patch choice;
    2. The rescaling from [-1, 1] to [0.1, 0.9] is deleted, because we have to ensure an average of 0. (see comment for explanation).

Exercise 10: ICA

The following files are the core of this exercise:
  • orthonormalICACost.m: compute J and ▽J. See here for the method of calculating gradient. Notice that here we use L2 norm instead of L1 to compute J. An intoduction to L0, L1, L2 norm can be found at here (in Chinese).
  • ICAExercise.m: The overall procedure, including Initialization of parameters, Sample patches, ZCA whiten patches, Gradient checking, and Optimization for orthonormal ICA.
Notes:
  • This exercise will take around 1-2 days for a laptop to run.

ufldl's People

Contributors

changgyhub avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.