Notes on chapter01_crashcourse/introduction.ipynb
Preface:
• Tone is flippant
• Vocabulary level too high for ESL readers e.g. “cognizant”, “buffoonery”
• Eliminate aspirational / modest statements
• Entire preface could be reduced to “mxnet-the-straight-dope is an educational resource for deep learning that leverages the strengths of Jupyter notebooks to present prose, graphics, equations, and (importantly) code together in one place. The result will be a resource that could be simultaneously a book, course material, a prop for live tutorials, and a resource for plagiarising (with our blessing) useful code.”
Learning by doing – who is “I”? Rest of intro uses authorial “we”
Introduction
• Inappropriate vocabulary level: fabricated, pedagogical
• we ourselves are nonetheless capable of performing the cognitive feat ourselves.
• Saying that you turn knobs is usually a reference to hyperparameter tuning, not parameter setting
• Dysfluent: Generally, our model is just a machine transforms its input into some output.
• Typo: English langauge
• Acronym ML is used without being defined
• “sucks less” – rephrase
• “model is dope” rephrase
• dysfluent: They're mostly because they are problems where coding we wouldn't be able program their behavior directly in code, but we can program with data
• Oftentimes >> Often
• Dysfluent: To get going with at machine learning
• Rephrase: Generally, the more data we have, the easier our job as modelers.
• Structured data: I would not call a Jupyter notebook structured data. It’s unstructured but marked up
• Typos: ingesting high resolution image deep neural networks
• deep neural networks >> deep artificial neural networks
• Models section: bulleted section beginning “loss functions” appears with no connection to running text.
• Loss functions: AMZN stock prediction is one example of a loss function
• Training section: “the latter” – the latter what? There are not two antecedents
• Trained error: italicized f is used without introduction.
• Incomplete sentence: “Encouraging but by no means a guarantee.”
• Rephrase: “This can be off by quite a bit (statisticians call this overfitting).” The point to make is that the error on test data can be greater that the error on the training data.
• “one aims to do” – tone difference from colloquial “you” throughout
• Supervised learning: too many terms used without introduction: x, y, targets, inputs
• Incomplete sentence “Predict cancer vs not cancer, given a CT image.”
• “Perhaps the simplest supervised learning task wrap your head around in regression”. I think predicting labels is much simpler.
• Term vector should have been introduced much earlier
• Typo: whacky. What purpose is served by introducing notation?
• “Lots of practical problems are well described regression problems.” “Lots of practical problems can be formulated as regression problems”
• dysfluent: Imagine, for example assume
• Eliminate discussion of L1 loss – way too much detail for the place where we’re describing the kinds of learning algorithms
• Fix: In classification, we want to look at a feature vector and then say which among a set of categories categories (formally called classes) an example blongs to.
• Paragraph starting “more formally”. Mangled text. Unnecessary math symbols and terminology
• Death cap example: eliminate math
• Extensive spelling errors
• dysfluent: But not matter accurate
• “This problem emerges in the biomedical literature where correctly taggin articles is important because it allows researchers to do exhaustive reviews of the literature.” It doesn’t emerge there. Applies there perhaps?
• “A possible solution to this problem is to score every element in the set of possible sets with a relevance score and then retrieve the top-rated elements.” >> “A possible solution to this problem is to score every element in the set of possible sets with a relevance score and then display retrieve the top-rated elements.”
• Recommender systems: “Generally, such systems strive to…” Eliminate math symbols or at least fix the funky rendering – it looks like a superscript u for user
• “So far we've looked at problems where we have some fixed number of inputs and produce a fixed number of outputs. Take some features of a home (square footage, number of bedrooms, number of bathrooms, walking time to downtown), and predict its value. Take an image (of fixed dimension) and produce a vector of probabilities (for a fixed number of classes). Take a user ID and an product ID and predict a star rating. And once we feed our fixed-length input into the model to generate an output, the model immediately forgets what it just saw.”
o A common idiom in the preceding text is “Take X for example” so I initially garden-pathed on these examples. One example is sufficient, preceded by “for example”.
o The preceding text did not stipulate that the input vector is fixed length. Nor did it stipulate that the labels are a fixed set.
• Automatic speech recognition: “In other words, this is a seq2seq problem where the output is much shorter than the input.” That is a very peculiar way to describe it you’re comparing length (in ms) to length (in chars) which is not mathematically valid. Ditto for the TTS discussion
• Machine Translation: “Unlike in the previous cases where the order of the inputs was preserved, in machine translation, order inversion can be vital. “Which previous examples?
o Speech recognition doesn’t preserve order, even in English e.g. “$10” is pronounced “ten dollars”
o “obnoxious tendency” this is offensive and English-centric. Remove
o Reordering is one problem with MT. A bigger problem is the many-to-many mappings of words across languages e.g. several words in one language may map to one word in another.
• Unusupervised learning: rephrase: extremely anal boss.
• Rephrase: pretty lame.
• Why do the examples of unsupervised learning only get bullet points and not sub-sections? They’re just as important and with work in autoencoders etc a huge research area
• Environment: So far we didn't discuss at all yet,
• Monikers >> terms
• “there is a large area of situations where” “There are many situations where”
• “Needless to say, “ then don’t say it. Or use a different discourse connective
• “However there are many cases…” but then the text doesn’t explicitly connect to the images that follow.
• Conclusion: does not summarize the section. Total non-sequitur. Says the chain rule is easy but no mention of the chain rule on that page or on the page linked to
•