Giter Club home page Giter Club logo

zackchase / mxnet-the-straight-dope Goto Github PK

View Code? Open in Web Editor NEW
2.6K 136.0 725.0 215.45 MB

An interactive book on deep learning. Much easy, so MXNet. Wow. [Straight Dope is growing up] ---> Much of this content has been incorporated into the new Dive into Deep Learning Book available at https://d2l.ai/.

Home Page: https://d2l.ai/

License: Apache License 2.0

Makefile 0.16% Python 0.19% Jupyter Notebook 99.64% JavaScript 0.01% CSS 0.01% Shell 0.01%

mxnet-the-straight-dope's People

Contributors

aaronmarkham avatar ahazrat avatar alwye avatar bhavinthaker avatar dingran avatar fyears avatar hcho3 avatar indhub avatar ishitori avatar jeankossaifi avatar jerryzcn avatar julianslzr avatar kazizzad avatar kellensunderland avatar lcallot avatar luckypigeon avatar mli avatar nbertagnolli avatar piiswrong avatar smolix avatar srochel avatar szha avatar tanvikumar avatar turiphro avatar vishaalkapoor avatar wlbksy avatar xmyqsh avatar zackchase avatar zhreshold avatar ziyuehuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mxnet-the-straight-dope's Issues

Citations

In the preliminary material it may or may not be so essential to cite much (could relegate to end-of-chapter references). But as the book approaches more recent material, we'll want to make sure that we have a good protocol for citing papers and for doing so inline.

Is there a good way to handle this? Is there any hope of doing LaTeX style citations within the notebook/markdown world?

@mli @smolix @piiswrong

why Batch Normalization is under Conv part?

BN is kind of useful for mlp or cnn or even other networks. (Although, it's especially useful in very deep networks.)

Maybe it's better to put BN inside Part 3 DNN, analogous to dropout?

PS: If no one is writing this introduction, I am interested in contribute to this one. :-)

Consistent notation for linear algebra objects

I noticed some inconsistent notations in the linear algebra chapter.
Can we decide one way or another and stick with it?

Here's a suggestion of mine:

  • Vector: \mathbf{x} (bold, no italic)
  • Matrix / Tensor / Set: A (italic, no bold)
  • Row vector: \mathbf{a}_i^T
  • Column vector: \mathbf{b}_i
  • Matrix entry: a_{ij}
  • Dot product: \mathbf{x}^T \mathbf{y} (without \cdot)
  • Matrix-vector product: \mathbf{y} = A\mathbf{x}
  • Real number set: \mathbb{R}
  • Norm: \| \mathbf{x} \|

Once we decide on a convention, I'll submit a pull request.

better installation guide

Loving the gluon.mxnet.io docs so far.

Noticed the link is wrong on Dependencies intro: “More detailed instructions are here” (for mxnet install)
Link extension should be .html, not .rst

Also noticed the following:
In the setup instructions (gluon.mxnet.io/docs/C01-install.html it is not clear which OS options can be used. For instance, it’s not clear if I can do this on Windows. If so, point out the sudo commands are not in Windows, but right-click the CMD Prompt and choose run as admin
How does one know if CUDA is installed and a GPU available (on Windows, on Linux/Mac)
Following these instructions in Windows – Jupyter doesn’t start. It doesn’t get installed in the Dir mentioned in the instructions (or placed in the Path) – it seems to get installed in a /user/appdata/roaming/python/scripts dir instead

P03-C03 Detailed Feedback

I understand the tutorial is in an early state, but it is looking great! My detailed feedback is below. Also currently the model is not working because there is a missing argument (last_batch) for DataLoader.

High-level comments:

  1. I think the flow would be better if the bias-variance tradeoff discussion preceded the discussion of overfitting and regularization.
  2. Do we need the softmax function anymore?

Areas that require clarification:

  1. We should clarify the following sentence: “Given a network with n nodes we are sampling uniformly at random from the 2^n networks in which a subset of the nodes are turned off.”
  2. Why are we using the Gluon DataLoader function in the ‘From Scratch’ tutorial? (I see you recently added this to P03-C01 also)
  3. We should explain why the transform function is necessary? (I see you recently added this to P03-C01 also)
  4. It would be helpful to discuss why scaling is necessary in the dropout function

Typos / Minor Changes:

  1. There is a missing ‘n’ in “Wester Union”
  2. You need to add an ‘a’ in “either positive or negative weight”
  3. You do not need the one_hot_label line in the evaluate_accuracy function
  4. It would be better to create a variable for the dropout rate in the training loop
  5. We need to align the moving loss section with previous tutorials

Access to raw model parameters with gluon-based models

Firstly, let me thank you for writing those awesome tutorials for mxnet!

After training a model I have set up using the gluon-interface, I would like to access the model's parameters preferrably as numpy arrays, e.g. I would like to call

w = net.conv1.W

similarly to what is possible with pyTorch, and receive as output either an mxnet-ndarray or a numpy-ndarray. Is there a way to conveniently access the whole model in terms of layer type and parameters conveniently? Following your current tutorial and the outdated official API, it seems rather hidden.

Is there an updated and accessible API for the nightly builds of mxnet, considering gluon?

Cheers,
Sebastian

[Need fix] Use block.save_params() instead of block.collect_params().save()

In gluon, net.collect_params().save() won't automatically strip the model prefix in parameters. This would prevent the checkpoint files from being loaded into models with different prefixes. The net.save_params() strips the prefix before saving. Going forward, please use block.save_params() and block.load_params().

The following notebooks need to be fixed:
P05-C04-rnns-gluon.ipynb
P06-C03-object-detection.ipynb
P14-C05-hybridize.ipynb

Add link to document website to the repo introduction.

How about adding the link to the introduction, google is showing multiple domains related.

BTW, P02-C06 will timeout when make html because it's using 1000 epochs. Better have results stored in advance, otherwise nbsphinx_timeout = 600 is required in conf.py

Detailed feedback for Ch 01 - part 2

chapter01_crashcourse/linear-algebra.ipynb

  • dysfluent: If you're already confident basic linear algebra
  • dysfluent: Matrices, which we'll denote with capital letters ($A$, $B$, $C$), are order represented in code as arrays with 2 axes.
  • funky layout of math. for a_i,j looks like the a is superscript.
  • Python code treats print as a function which is the default in Python 3 but not Python 2.7. Did I miss something that said the code was Python 3? Code will still work in Python 2.7 but may have parentheses around output to represent a tuple
  • groundtruth >> ground truth
  • hitherto ->> on this page
  • Capitalize "Deep Learning"?

confusing with pretrained model finetune tutorial

In the following part of code.

deep_dog_net = models.squeezenet1_1(prefix='deep_dog_', classes=2)
deep_dog_net.collect_params().initialize()
deep_dog_net._children[0] = net.features
print(deep_dog_net)

What is the _children here? It's a little confusing.
what if i use?

deep_dog_net.features = net.features

@zhreshold

Feedback for Ch 01 - part 3

chapter01_crashcourse/probability.ipynb

  • dysfluent "easy for humans to recognize cats and dogs 320 pixel resolution"
  • the text surrounding the images at different resolutions appears to conflate probability and confidence
  • Error: probability estimate cannot be 1.86: the lowest estimated probability for any of the numbers is about $.15$ and the highest estimated probability is $1.86$.

some problems in P01-C03-linear-algebra.ipynb

the first one is this line:
nd.dot(A.T, B)
I think the shape of A and B in the example is good for direct nd.dot() operation. how do you think?

second one is, near the end of document, the numeric version of norm is calculated as
nd.sqrt(nd.sum(nd.abs(u)**2))

why not using mx.nd.sqrt(mx.nd.sum(x**2)), as the mx.nd.abs() does not make too much sense here

Detailed feedback for Ch 01

Notes on chapter01_crashcourse/introduction.ipynb

Preface:
• Tone is flippant
• Vocabulary level too high for ESL readers e.g. “cognizant”, “buffoonery”
• Eliminate aspirational / modest statements
• Entire preface could be reduced to “mxnet-the-straight-dope is an educational resource for deep learning that leverages the strengths of Jupyter notebooks to present prose, graphics, equations, and (importantly) code together in one place. The result will be a resource that could be simultaneously a book, course material, a prop for live tutorials, and a resource for plagiarising (with our blessing) useful code.”

Learning by doing – who is “I”? Rest of intro uses authorial “we”

Introduction
• Inappropriate vocabulary level: fabricated, pedagogical
• we ourselves are nonetheless capable of performing the cognitive feat ourselves.
• Saying that you turn knobs is usually a reference to hyperparameter tuning, not parameter setting
• Dysfluent: Generally, our model is just a machine transforms its input into some output.
• Typo: English langauge
• Acronym ML is used without being defined
• “sucks less” – rephrase
• “model is dope” rephrase
• dysfluent: They're mostly because they are problems where coding we wouldn't be able program their behavior directly in code, but we can program with data
• Oftentimes >> Often
• Dysfluent: To get going with at machine learning
• Rephrase: Generally, the more data we have, the easier our job as modelers.
• Structured data: I would not call a Jupyter notebook structured data. It’s unstructured but marked up
• Typos: ingesting high resolution image deep neural networks
• deep neural networks >> deep artificial neural networks
• Models section: bulleted section beginning “loss functions” appears with no connection to running text.
• Loss functions: AMZN stock prediction is one example of a loss function
• Training section: “the latter” – the latter what? There are not two antecedents
• Trained error: italicized f is used without introduction.
• Incomplete sentence: “Encouraging but by no means a guarantee.”
• Rephrase: “This can be off by quite a bit (statisticians call this overfitting).” The point to make is that the error on test data can be greater that the error on the training data.
• “one aims to do” – tone difference from colloquial “you” throughout
• Supervised learning: too many terms used without introduction: x, y, targets, inputs
• Incomplete sentence “Predict cancer vs not cancer, given a CT image.”
• “Perhaps the simplest supervised learning task wrap your head around in regression”. I think predicting labels is much simpler.
• Term vector should have been introduced much earlier
• Typo: whacky. What purpose is served by introducing notation?
• “Lots of practical problems are well described regression problems.” “Lots of practical problems can be formulated as regression problems”
• dysfluent: Imagine, for example assume
• Eliminate discussion of L1 loss – way too much detail for the place where we’re describing the kinds of learning algorithms
• Fix: In classification, we want to look at a feature vector and then say which among a set of categories categories (formally called classes) an example blongs to.
• Paragraph starting “more formally”. Mangled text. Unnecessary math symbols and terminology
• Death cap example: eliminate math
• Extensive spelling errors
• dysfluent: But not matter accurate
• “This problem emerges in the biomedical literature where correctly taggin articles is important because it allows researchers to do exhaustive reviews of the literature.” It doesn’t emerge there. Applies there perhaps?
• “A possible solution to this problem is to score every element in the set of possible sets with a relevance score and then retrieve the top-rated elements.” >> “A possible solution to this problem is to score every element in the set of possible sets with a relevance score and then display retrieve the top-rated elements.”
• Recommender systems: “Generally, such systems strive to…” Eliminate math symbols or at least fix the funky rendering – it looks like a superscript u for user
• “So far we've looked at problems where we have some fixed number of inputs and produce a fixed number of outputs. Take some features of a home (square footage, number of bedrooms, number of bathrooms, walking time to downtown), and predict its value. Take an image (of fixed dimension) and produce a vector of probabilities (for a fixed number of classes). Take a user ID and an product ID and predict a star rating. And once we feed our fixed-length input into the model to generate an output, the model immediately forgets what it just saw.”
o A common idiom in the preceding text is “Take X for example” so I initially garden-pathed on these examples. One example is sufficient, preceded by “for example”.
o The preceding text did not stipulate that the input vector is fixed length. Nor did it stipulate that the labels are a fixed set.
• Automatic speech recognition: “In other words, this is a seq2seq problem where the output is much shorter than the input.” That is a very peculiar way to describe it you’re comparing length (in ms) to length (in chars) which is not mathematically valid. Ditto for the TTS discussion
• Machine Translation: “Unlike in the previous cases where the order of the inputs was preserved, in machine translation, order inversion can be vital. “Which previous examples?
o Speech recognition doesn’t preserve order, even in English e.g. “$10” is pronounced “ten dollars”
o “obnoxious tendency” this is offensive and English-centric. Remove
o Reordering is one problem with MT. A bigger problem is the many-to-many mappings of words across languages e.g. several words in one language may map to one word in another.
• Unusupervised learning: rephrase: extremely anal boss.
• Rephrase: pretty lame.
• Why do the examples of unsupervised learning only get bullet points and not sub-sections? They’re just as important and with work in autoencoders etc a huge research area
• Environment: So far we didn't discuss at all yet,
• Monikers >> terms
• “there is a large area of situations where” “There are many situations where”
• “Needless to say, “ then don’t say it. Or use a different discourse connective
• “However there are many cases…” but then the text doesn’t explicitly connect to the images that follow.
• Conclusion: does not summarize the section. Total non-sequitur. Says the chain rule is easy but no mention of the chain rule on that page or on the page linked to

how to get the parameter value after I initialize it

import mxnet as mx
import mxnet.gluon as g

ctx = mx.gpu()
net = g.nn.Sequential()
with net.name_scope():
    net.add(g.nn.Dense(10, in_units=100))

net.collect_params().initialize(mx.init.Xavier(), ctx=ctx)

If I define network like this, I run this script, I will encouter a problem.

I also don't know how to get the weight value.

Prefix notebook filenames with a sequence number

Working through the jupyter notebooks, I am having to keep referring to index.rst to know what order to read them in. It would be helpful if they use a number to indicate reading order.

E.g.

   chapter03_deep-neural-networks/mlp-scratch
   chapter03_deep-neural-networks/mlp-gluon
   chapter03_deep-neural-networks/mlp-dropout-scratch
   chapter03_deep-neural-networks/mlp-dropout-gluon
   chapter03_deep-neural-networks/plumbing
   chapter03_deep-neural-networks/custom-layer
   chapter03_deep-neural-networks/serialization

would become:

   chapter03_deep-neural-networks/01.mlp-scratch
   chapter03_deep-neural-networks/02.mlp-gluon
   chapter03_deep-neural-networks/03.mlp-dropout-scratch
   chapter03_deep-neural-networks/04.mlp-dropout-gluon
   chapter03_deep-neural-networks/05.plumbing
   chapter03_deep-neural-networks/06.custom-layer
   chapter03_deep-neural-networks/07.serialization

network visualization

How to visualize network created by gluon, since mx.viz.plot_network can just visulize symbol

who's awesome?

480b6216b16fb1a21acc318c3d805f53

good stuff. I loved the style of writing the documentation. Very doge!

P03-C04 Detailed Feedback

Overall the tutorial was easy to follow and well written. Here is my detailed feedback:

High-level comments:

  1. To better align the ‘Scratch’ and ‘Gluon’ example, let’s create a dropout variable to be used in the training loop and allow a dropout % parameter to be passed to the net (versus hard coding the dropout % in net)

Areas that require clarification:

  1. We should clarify (and perhaps cover in-depth in P03-C03) the following: “…but scale down their values, essentially averaging the various dropped out nets.”
  2. We should expand on the following and further explain in the Gluon context: “DropOut is a special kind of layer because it behaves differently when training and predicting.”
  3. I found the following sentence difficult to parse: “For example, when we generate adversarial examples (a topic we'll investigate later) we may want to record, but for the model to behave as in predict mode.”

Typos / minor changes:

  1. Change ‘implementation by hand’ to ‘implementation from scratch’
  2. In the model, the phrase “Adding first hidden layer” is repeated
  3. Remove parentheses around (with autograd.record(train_mode=False):)

formulas in Sums and means section of P01-C03-linear-algebra.ipynb are ill formatted

Currently:
A related quantity to the sum is the mean, also commonly called the average. We calculate the mean by dividing the sum by the total number of elements. With mathematical notation, we could write the average over a vector ${\boldsymbol{u}$ as \frac{1}{d} \sum_{i=1}^{d} ui$ and the average over a matrix $A$ as $\frac{1}{n \cdot m} \sum{i=1}^{m} \sum{j=1}^{n} a{i,j}$. In code, we could just call nd.mean() tensors of arbitrary shape:

Documentation improvement for Automatic Differentiation

In the documentation section for Head gradient and the chain rule, I think it might be better to explain the context behind head gradient in a bit more detailed way.
Like if we refer to the class-notes for CS231N, it explains back-prop with a notion of incoming gradient (gradient on its output) and local gradient in the Intuitive understanding of backpropagation section. If I am correct, the incoming gradient is what is referred as head gradient and I believe if we add that explanation in the documentation, it might be more intuitive to the readers.

Please let me know if my understanding is correct, I will update the documentation and raise a pull request.

Reorganization

Hi all contributors --

A warning. In the next 24 hours @mli and I are restructuring the book to contain all chapters in folders. This will make the repo more navigable and will allow us to strip the P**-C**- from section names. That will be awesome because we'll be able to move notebooks around within a section without breaking all the html links.

However it might screw up your commits by creating a crap ton of conflicts so please bear with us for the next day so we can get this right. Thanks!!!

ch5, rnns-gluon.ipynb; various feedback

This was much harder to follow than the other three in this chapter. I think it would have been helpful to first have had the gluon port of what the other three do, so we can compare speed, readability, etc. And in particular, see how to write the sample() function, which is the big thing missing in rnns-gluon.ipynb.
(Or, rephrased, a big appeal of the other three notebooks in this chapter was that we could watch the learning, and could easily substitute in our own starter sentences.)

There was no mention of temperature here, though this was a very interesting way to control the output.

It is not clear what tie_weights is going. It defaults to True, so why, and in what situations would we set it to false?

How is encoder working? Does it learn a word2vec encoding for the entire training data, and then convert the entire training data, and then that is what is divided into batches? If so, I guess a sample() function would have to take each 100-dimensional word vector output and find the closest word?

That would make sense to me, but https://github.com/apache/incubator-mxnet/blob/master/example/rnn/lstm_bucketing.py appears to be creating an embedding for each batch, in isolation, before training on it. (I could be wrong on that, as that seems a silly thing to do) BTW, that example has the same problem: failing to show how to use the model to generate text. (See also https://stackoverflow.com/q/42671658/841830 )

cannot download ssd_pretrained.params in P06-C03-object-detection.ipynb

It seems the download link is failure

epochs = 150  # set larger to get better performance
log_interval = 20
from_scratch = False  # set to True to train from scratch
if from_scratch:
    start_epoch = 0
else:
    start_epoch = 148
    pretrained = 'ssd_pretrained.params'
    sha1 = 'fbb7d872d76355fff1790d864c2238decdb452bc'
    url = 'https://apache-mxnet.s3-accelerate.amazonaws.com/gluon/datasets/pikachu/ssd_pretrained.params'
    if not osp.exists(pretrained) or not verified(pretrained, sha1):
        print('Downloading', pretrained, url)
        download(url, fname=pretrained, overwrite=True)
    net.load_params(pretrained, ctx)
('Downloading', 'ssd_pretrained.params', 'https://apache-mxnet.s3-accelerate.amazonaws.com/gluon/datasets/pikachu/ssd_pretrained.params')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-23-68fbd88e5a12> in <module>()
     11     if not osp.exists(pretrained) or not verified(pretrained, sha1):
     12         print('Downloading', pretrained, url)
---> 13         download(url, fname=pretrained, overwrite=True)
     14     net.load_params(pretrained, ctx)

/home/lxy/anaconda2/lib/python2.7/site-packages/mxnet/test_utils.pyc in download(url, fname, dirname, overwrite)
    984 
    985     r = requests.get(url, stream=True)
--> 986     assert r.status_code == 200, "failed to open %s" % url
    987     with open(fname, 'wb') as f:
    988         for chunk in r.iter_content(chunk_size=1024):

AssertionError: failed to open https://apache-mxnet.s3-accelerate.amazonaws.com/gluon/datasets/pikachu/ssd_pretrained.params

fails to run on the list of cpus

Unable to use all cpu cores.
Notebook P04-C02-cnn-gluon.ipynb.
Changed the second line to ctx = [mx.cpu(i) for i in range(4)]
Mxnet's version is 0.10.1 (stable).

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-4359ea201a97> in <module>()
      5     moving_loss = 0.
      6     for i, batch in enumerate(train_data):
----> 7         data = batch.data[0].as_in_context(ctx)
      8         label = batch.label[0].as_in_context(ctx)
      9         with autograd.record():

/Users/dizcza/pkgs/mxnet/python/mxnet/ndarray.py in as_in_context(self, context)
   1022         if self.context == context:
   1023             return self
-> 1024         return self.copyto(context)
   1025 
   1026     def attach_grad(self, grad_req='write'):

/Users/dizcza/pkgs/mxnet/python/mxnet/ndarray.py in copyto(self, other)
    973             return _internal._copyto(self, out=hret)
    974         else:
--> 975             raise TypeError('copyto does not support type ' + str(type(other)))
    976 
    977     def copy(self):

TypeError: copyto does not support type <class 'list'>

a batch_size question in Linear regression with gluon

epochs = 2
smoothing_constant = .01

for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx)
        label = label.as_in_context(ctx)
        with autograd.record():
            output = net(data)
            loss = square_loss(output, label)
        loss.backward()
        trainer.step(batch_size)  # how to specify the batch_size when training with multiple gpus? also 4 if i use 4 gpus.
        ....

Thanks.

Numerical instability on CNN from scratch

after getting to 98+% accuracy, the CNN from scratch tends to blow up, might be a problem with the bespoke softmax.

Things to look into:
nansum, log softmax, actually debugging the fracking thing.

suggestions

  1. ndarray

    • slicing, one can write multiple dimensions, such as x[2:3, 3:4] = 2
    • context. cpu(0) and cpu(1) are the same, both of them represent all cpu cores. people may think cpu(0) means core 0. while gpu(0) means the 0-th gpu. as discussed before, we should add a try_gpu_ctx()
  2. autograd

    • beggining of head gradients. you are using x, y and z in text, while x,y,f in codes
    • end: should print(a.grad)
  3. lr scrach

    • loss function, can we use (yhat-y)**2?
    • last cell
      • can we move unncessary staffs out of record().
      • if we set moving_loss=0 within the first for loop, we don't needs to check if i==0.
      • we don't need to call asnumpy(), i remember there is nd.mean(), and then call asscalar(). it should be better on performance if running on gpu.

another question is, will you also add a softmax-gluon?

Consider merging scratch/gluon notebooks

(Just for discussion.)
I'm mainly interested in reading at a high-level as possible. However I'm finding I have to go through everything: I tried just reading the gluon notebooks, but they assume you've read through the scratch versions. Things like introducing datasets and how to load them.

If this was a print book, one alternative would be have the -scratch and -gluon versions in the same section, with the -scratch versions done as sidebars. I don't know if such a thing is possible in Jupyter notebooks.

Another alternative would be to split two notebooks into three; a first one introducing the data sets, how to load and prepare, and any theory. Then a short one on how to implement from scratch, then another short one on how to implement in gluon.

Slides from KDD'17 tutorial?

Hi, I attended the excellent tutorial you did with Alex yesterday at KDD. You mentioned that the slides will be posted, could you point me to where they will be when so? Thanks.

P03-C01 / P03-C02 - Moving Loss / Hyperparameters

I have two overall comments:

  1. It would helpful to explain the moving loss calculation and why it is important. The moving loss calculation is not entirely intuitive and may seem arbitrary to beginners. I think it is important to frame it from the perspective that it helps SGD. In addition, here we've focused on a momentum based approach, but I think it would be good to introduce learning rate decay as well (we should also explain the learning rate).

  2. It would be helpful to understand how to choose the appropriates value for the hyperparameters introduced (e.g., learning rate and smoothing constant). It is difficult choosing the appropriate value for your hyperparameters, and here they are chosen for you without any explanation to why they make sense. This is especially confusing when you use different learning rates when implementing the neural network using Gluon versus defining from scratch (0.1 versus 0.001).

Feedback for ch02

Shouldn't there be an introduction for the chapter?

chapter02_supervised-learning/linear-regression-scratch.ipynb

  • Instructions for installing matplotlib but notebooks in ch01 used it without describing how to install. Just put those preqeqs somewhere at the beginning?
  • "Train a model means making" >> "Training..."
  • Need explanation for this choice: "In this case, we'll use the squared distance between our prediction and the true value."
  • This needs elaboration: "It turns out that linear regression actually has a closed-form solution. "

chapter02_supervised-learning/linear-regression-gluon.ipynb

chapter02_supervised-learning/perceptron.ipynb

  • Perceptron == E. coli?? I do not see the analogy
  • This page is much more math-intensive than what has preceded. Rework to provide text-oriented explanation with subsequent deep dive on math?

chapter02_supervised-learning/softmax-regression-scratch.ipynb

  • What is the basis for this claim? Maybe just say it's widely used. "The relevant loss function here is called cross-entropy and it may be the most common loss function you'll find in all of deep learning. That's because at the moment, classification problems tend to be far more abundant than regression problems."
  • Inappropriate tone (defined as mild oath by Merriam Websters): Jeepers.
  • Authorial voice of book is "we" in Ch01: "I reviewed"

chapter02_supervised-learning/softmax-regression-gluon.ipynb

  • Peculiar turn of phrase: "We won't suck up too much wind"

chapter02_supervised-learning/regularization-scratch.ipynb

  • no prior mention in book of decision trees: "decision trees versus neural networks"
  • typo: formulizes

chapter02_supervised-learning/loss.ipynb

  • Raw math formula: $$\mathop{\mathrm{minimize}}_f \sum_i |y_i - f|$$

chapter02_supervised-learning/environment.ipynb

  • Needs elaboration: "he click-through rate for NOKIA phone ads"
  • authorial voice: " I had"
  • typo: bild
  • raw math formula: $$\mathop{\mathrm{minimize}}w \frac{1}{m} \sum{i=1}^m l(x_i, y_i, f(x_i)) + \frac{\lambda}{2} |w|_2^2$$
  • No navigation to the first section of chapter 3

P01-C04 render & math problem

  • Jupyter throws error when loading the page. It seems to be fine in the last commit.
  • Section Conditional Probability, 0.01 * 0.03 should be 0.0003

TOC link missing in preface

The table of contents (TOC) link is missing in the first notebook. I've forked the repo and will submit a pull request. I decided to set the link to the TOC for this repo. This seems like the first solution. That being said, one could envision the link going to the TOC for each individuals forked repo but I won't do that at the moment.

Copy and paste error in P02_C02

Looks like the gluon version of linear regression still has the training steps from the hand-rolled example of linear regression.

Generate predictions (yhat) and the loss (loss) by executing a forward pass through the network.
Calculate gradients by making a backwards pass through the network (loss.backward()).
Update the model parameters by invoking our SGD optimizer.

Question about Performance: gluon vs from scratch

Running the tutorials in different device context, I noticed that on my workstation (20 core Intel [email protected]), when using the cpu, that the code written from scratch tends to only use one cpu core at a time while the rest idle. When using gluon, the same context declaration will use all of the CPU cores fully.

However, when using the same model (I adapted the number of hidden neurons in tutorials P03-C01-scratch and P03-C02-gluon to 256 and 128, used ReLU activations for both scripts) and using the same evaluation function (the one for the "from scratch" tutorial) and measuring the time required to run training on an epoch of data (after resetting the iterator), I found that the gluon-based code takes slightly more than double the time per epoch, even though all CPU cores are in use.

I have obtained similar results for the tutorials P04-* , dealing with CNN architectures:
The gluon.nn.* - based model requires 4.8 times the time per epoch, while the model inheriting gluon.Block "only" recquires roughly double the time per epoch, in comparison to the from scratch built model.

Is this a hardware related issue (e.g. an i7 or a AMD processor would benefit from gluon and experience no penalties), are the models too shallow for efficient parallelization or is mxnet optimized poorly regarding the use of multiple CPU cores? Why is the use of gluon, despite using all CPU cores instead of just one, so much slower?

Execution of all notebooks using a GPU-context results in a slight run time advantage when using gluon. inheriting a model from gluon.Block instead of building the same model with gluon.nn.* is about 20% faster in training for the P04-* tutorials.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.