Giter Club home page Giter Club logo

lectures's Introduction

Data science for economists

Lectures | Details | FAQ | License

Lectures

Note: While I have provided PDF versions of the lectures, they are best viewed in the original HTML format.

  1. Introduction [.html | .pdf | .Rmd]
  2. Version control with Git(Hub) [.html | .pdf | .Rmd]
  3. Learning to love the shell [.html | .pdf | .Rmd]
  4. R language basics [.html | .pdf | .Rmd]
  5. Data wrangling & tidying
  6. Webscraping: (1) Server-side & CSS [.html | .pdf | .Rmd]
  7. Webscraping: (2) Client-side & APIs [.html | .pdf | .Rmd]
  8. Regression analysis in R [.html | .pdf | .Rmd]
  9. Spatial analysis in R [.html | .pdf | .Rmd]
  10. Functions in R: (1) Introductory concepts [.html | [.pdf | [.Rmd]
  11. Functions in R: (2) Advanced concepts [.html | .pdf | .Rmd]
  12. Parallel programming [.html | .pdf | .Rmd]
  13. Docker [.html | .pdf | .Rmd]
  14. Google Compute Engine
  15. HPC (UO Talapas cluster) [Guest lecture]
  16. Databases [.html | .pdf | .Rmd]
  17. Spark [.html | .pdf | .Rmd]
  18. Machine learning
  19. Workflow & project management

Details

This is a graduate course taught by Grant McDermott at the University of Oregon. Here is the course description, taken from the syllabus:

This seminar is targeted at economics PhD students and will introduce you to the modern data science toolkit. While some material will likely overlap with your other quantitative and empirical methods courses, this is not just another econometrics course. Rather, my goal is bring you up to speed on the practical tools and techniques that I feel will most benefit your dissertation work and future research career. This includes seemingly mundane skills, generally excluded from the core graduate curriculum, which are nevertheless essential to any scientific project. We will cover topics like version control (Git) and project management; data acquisition, cleaning and visualization; efficient programming; and tools for big data analysis (e.g. relational databases, cloud computation and machine learning). In short, we will cover things that I wish someone had taught me when I was starting out in graduate school.

Please do read the rest of the syllabus before you go through the lectures. This will detail software requirements and installation, and give you a better sense of the full aims and scope of the course. I also have an "FAQ" section at the end that covers frequently asked questions (or, at least, potentially asked questions). Speaking of which, here follow answers to some questions that are more specifically related to this repo.

FAQ

How do I download this material and keep up to date with any changes?

Please note that this is a work in progress, with new material being added every week.

If you just want to read the lecture slides or HTML notebooks in your browser, then you should simply scroll up to the Lectures quicklinks section at the top of this page. Completed lectures will be hyperlinked as soon as they have been added. Remember to check back in regularly to get any updates. Or, you can watch or star the repo to get notified automatically.

If you actually want to run the analysis and code on your own system (highly recommended), then you will need to download the material to your local machine. The best way to do this is to clone the repo via Git and then pull regularly to get updates. Please take a look at these slides if you are unfamiliar with Git or are unsure how to do any of that. Once that's done, you will find each lecture contained in a numbered folder (e.g. 01-intro). The lectures themselves are written in R Markdown and then exported to HMTL format. Click on the HTML files if you just want to view the slides or notebooks.

I've spotted a mistake or would like to contribute

Please open a new issue. Better yet, please fork the repo and submit an upstream pull request. I'm very grateful for any contributions, but may be slow to respond while this course is still be developed. Similarly, I am unlikely to help with software troubleshooting or conceptual difficulties for non-enrolled students. Others may feel free to jump in, though.

Can I use/adapt your material for a similar course that I'm teaching?

Sure. That's partly why I have made everything publicly available. I only ask two favours. 1) Please let me know (email/Twitter) if you do use material from this course, or have found it useful in other ways. 2) An acknowledgment somewhere in your own syllabus or notes would be much appreciated.

Are you willing to teach a (condensed) version of this course at my institution?

Possibly. Please contact me if you would like to discuss further.

What are you using to produce these lecture slides/notebooks?

All of the lecture material is written in R Markdown. For the slide decks (lectures 1--5) I'm using xaringan. For the notebooks (lecture 6 and onwards), I using my lecturenotes template.

Do you plan to turn these lecture notes into a book?

Yes! Together with my friend and colleague, Ed Rubin, we're slowly porting our combined lecture material to a book under the tentative title: "Data science for economists and other animals".

License

The material in this repository is made available under the MIT license.

lectures's People

Contributors

beeb22 avatar canovasjm avatar drlafave avatar eddelbuettel avatar grantmcdermott avatar julienolivier3 avatar katiagallegost avatar kevinxperese avatar luispfonseca avatar mizuhirosuzuki avatar nk027 avatar pat-s avatar shambhavipriyam avatar shitao5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lectures's Issues

Typos on title of slides 42-43 of lecture 2 (Git)

Hi,

Many thanks for making this amazing course public, I'm learning a lot! I just wanted to point out that you titled slides 42-43 of your second lecture ".gitgnore" (you missed the second "i"). I thought maybe it could be misleading in some situations.

Best,

Pierre

A Small Typo Lecture Note 2 #40

Thank you very much for sharing your course material!

I just saw a typo in the first line of page 40 when I was reading lecture note 2. There are two "and" instead of one. This is not misleading anyway. But you probably can change it very easily and quickly.

Broken link on slide 42 of lecture 5

The twitter link to the 'long-running joke' about Stata's reshape command no longer works (Scott Imberman seems to have changed his handle and deleted past tweets?). The link's on slide 42 of lecture 5.

Thank you for sharing these materials: it's hugely appreciated!

Some thoughts on Lecture 3

I was looking through your shell slides and they are awesome!!!

As I was reading I had a couple comments, which are below. Feel free to ignore (some of these are stylistic). Thanks again for posting these slides!

  • I find it important to know that in a man page /term will search for term.
  • You might want to suggest single quotation marks when suggesting quoted names or else people will get confused when special symbols end up breaking things (eg in your sed commands you use single quotes)
  • On slide 36 /n should be \n
  • Not sure you want to bring it up, but head and tail also have a character mode that is often useful for files with long lines (head -c 1000)
  • In a bit of a pain that always comes up when I teach this stuff: Mac installs the BSD utilities instead of GNU utilities. So if people start exploring grep and sed, they’re probably going to hit issues pretty quick https://unix.stackexchange.com/questions/13711/differences-between-sed-on-mac-osx-and-other-standard-sed

Installed WSL and now my laptop fan is perpetually whirring

Sorry to bother you, and thanks for putting the course online!
I followed along with L3, and installed WSL in PowerShell using wsl --install
I rebooted my computer and the fan has been going mental as it typically does when running large simulations.
I've uninstalled ubuntu using the walkthrough here, but the laptop is still whirring. Just wondering if you had any advice? It may settle down overnight, but grateful if there's a way to fully undo the action that wsl --install did.

Thank you! And apologies again.

Slightly misleading description of line endings in Git

The "culprit" is the fact that Git adds an invisible character at the end of every line. This is how Git tracks changes. (More info [here](https://help.github.com/articles/dealing-with-line-endings/).)

Line endings ("invisible characters") aren't added by Git; they're always there, whether you're in a Git repo or not. Most people don't notice them under normal circumstances because their text editors handle it seamlessly. It only comes up with Git because, by default, Git considers "stuff\r\n" and "stuff\n" to be different lines.

(These slides are great, by the way!)

Parallelization, furrr, and future

I could be wrong, but I think you need update the seed setting for furrr and future (future.apply) functions. furrr functions want an additional argument .options = future_options(seed = T), and future (future.apply) functions want a similar argument future.seed = T.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.