Comments (16)
I would pick the first one to start with.
https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1492032646/
There is also this book that is available online and in paper:
https://www.deeplearningbook.org/
https://mitpress.mit.edu/books/deep-learning
from practical-statistics-for-data-scientists.
Thank you for the contribution. I'm reluctant to add it to the repository in the current form. However, we may come up with a way to do it. I have a second, private repository in addition to the public repository. The private repository contains only the note books and contains some additional code to create the figure files that were used for the book. Whenever I make changes to the code, I modify the private repository and then run a script that takes the note book, strips out the code that is book specific, creates notebooks and code files and runs each of them. On success, the files are copied to the public repository. This has the advantage, that I only need to update one file and create all of the others automatically. This is the reason why I would like to keep the R and Python directory und my full control.
Coming back to your suggestion. What we could do is have a contrib
directory where we add code contributed by the community. This could be something like your contribution or variations of the code using different packages (e.g. ggplot and not base-R plotting, or building models in scikit-learn using pipelines). I would not take responsibility for maintaining the code in this directory.
What do you think of this?
from practical-statistics-for-data-scientists.
from practical-statistics-for-data-scientists.
There is no need to have write access. Instead of creating a new repository, you fork this one and make changes in the forked repository. You can then create a pull request from your forked repository into mine. Here is a screenshot that should explain how the fork can create a pull request to the original repository.
from practical-statistics-for-data-scientists.
from practical-statistics-for-data-scientists.
The command
print(np.mean(perm_diffs > mean_b - mean_a))
is correct. It probably warrants some explanation. perm_diffs
is a vector of possible differences of means for A and B. mean_b - mean_a
is a number, the actual difference between the means of A and B.
perm_diffs > mean_b - mean_a
is a boolean vector of the same length of perm_diffs
where we have True
in the corresponding element of perm_diffs
is greater than the actual difference of the means and False
otherwise, e.g.
[True, True, False, ...., False, True]
Python and R (and a lot of other languages) interpret True
as 1
and False
as 0
. Calculating the means of this vector gives me the percentage of True
values. In the book this is 0.121
. This is what we want to know.
print(np.mean(perm_diffs) > mean_b - mean_a)
on the other hand will print either True
or False
.
Thanks for spotting the typo in the filename. It is now corrected.
from practical-statistics-for-data-scientists.
from practical-statistics-for-data-scientists.
This issue was reported before #23 but never got the versions. The problem is that mean_a
and mean_b
are float
and not numpy.float64
. The means come from pandas
, so it must be an inconsistency with that version. Can you send your pandas version?
from practical-statistics-for-data-scientists.
from practical-statistics-for-data-scientists.
My versions are: Python 3.9.4, numpy 1.20.2, and pandas 1.2.4
I looked at the various pandas release notes since 1.1.3 but couldn't pinpoint when it was fixed. There are several fixes related to regressions in type casting and it's likely that this was working before 1.1.3 and fixed again after.
I suggest you update pandas to a newer version.
from practical-statistics-for-data-scientists.
from practical-statistics-for-data-scientists.
The scipy version that I use is scipy==1.7.0
.
I just downgraded my pandas and numpy version to yours and the code still works. It could be an OS related issue. I can run the code on MacOS and Linux, but don't have windows to try it.
from practical-statistics-for-data-scientists.
from practical-statistics-for-data-scientists.
Did you try:
print(np.mean(np.array(perm_diffs) > mean_b - mean_a))
from practical-statistics-for-data-scientists.
from practical-statistics-for-data-scientists.
from practical-statistics-for-data-scientists.
Related Issues (20)
- Errors and Questions in Ch5, 6, 7 HOT 3
- Again in Ch 5, 6, 7 HOT 3
- Incorrect variable reference Chi2 (Chapter 3 page 127) HOT 1
- Ch 3. Line 77 in Python Code HOT 2
- Ch. 2 - R Code Data and Sampling Distributions Lines 35, 36 HOT 1
- Python Jupyter Notebook program output is different from what is shown there HOT 2
- Python code for Chapter 3 - Web Stickness - TypeError in the original code HOT 5
- Different histogram under the same number of bins HOT 2
- perm_fun use of set() HOT 2
- Possible Considerations on moving R into conda environment for consistency HOT 3
- Enable github CI for pull requests
- Add R build to CI HOT 3
- 水戸さん
- Resampling in chi square test HOT 1
- Adjust code to changes in Python packages
- Figure 7.1 (Python) - Broken HOT 3
- Anaconda - ResolvePackageNotFound HOT 2
- Statistics
- chi-square, resampling approach HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from practical-statistics-for-data-scientists.