gedeck / mistat-code-solutions Goto Github PK
View Code? Open in Web Editor NEWCode repository for "Modern Statistics: A Computer Based Approach with Python" and "Industrial Statistics: A Computer Based Approach with Python"
Code repository for "Modern Statistics: A Computer Based Approach with Python" and "Industrial Statistics: A Computer Based Approach with Python"
Describe the problem
The Granger causality test requires clarification why p-values are greater than the significance level. Statsmodels documentation states:
The Null hypothesis for grangercausalitytests is that the time series in the second column, x2, does NOT Granger cause the time series in the first column, x1. Grange causality means that past values of x2 have a statistically significant effect on the current value of x1, taking past values of x1 into account as regressors. We reject the null hypothesis that x2 does not Granger cause x1 if the pvalues are below a desired size of the test.
Suggested change
A clear and concise description of what you expected to happen.
Describe the problem
Chapter 2, Page 55, section 2.2.1.1. The bolded words after equation 2.13: "probability distribution function," shouldn't that be called "probability mass function"?
Change to code to create figures.
np.random.seed(1)
x = stats.norm(loc=10, scale=1).rvs(50)
fig, ax = plt.subplots(figsize=[5, 5])
pg.qqplot(x, ax=ax)
ax.get_lines()[0].set_color('grey')
ax.get_lines()[0].set_markerfacecolor('none')
ax.get_lines()[1].set_color('black')
ax.get_lines()[2].set_color('grey')
ax.get_lines()[3].set_color('grey')
plt.show()
Version 2.0 caused breaking changes to the code
Describe the problem
The pylibkriging
package causes issues during installation. Find an alternative.
Describe the problem
File is missing for Industrial statistics
Consider adding --user
to the pip install commands
Chapter 4, page 251, states that The square roots of these variances estimates are the "std err".... The Se value is shown in the regression summary output as Scale.
Page 250 states that Se^2 = 5.8869
Page 248 in the results summary states that Scale: 5.8832
Suspect there is rounding error, but none-the-less, the sqrt of 5.8832 = 2.426. Thus, Se
=/= Scale
Problem:
The Scale
output cannot be both variance and standard error.
https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLSResults.scale.html
Note that the square root of scale is often called the standard error of the regression.
Solution:
The square roots of these variances estimates are the "std err".... The Se^2 value is shown in the regression summary output as Scale.
Describe the problem
Chapter 8 has deprecation warnings and failures.
Suggested change
Check and fix
Describe the problem
The pandas method iteritems
is deprecated which causes the code in Example 7.3 to fail.
Suggested change
df = pd.DataFrame([
{satisfaction: counts for satisfaction, counts
in response.value_counts().iteritems()},
{satisfaction: counts for satisfaction, counts
in response[q1_5].value_counts().iteritems()},
])
with
df = pd.DataFrame([
{satisfaction: counts for satisfaction, counts
in response.value_counts().items()},
{satisfaction: counts for satisfaction, counts
in response[q1_5].value_counts().items()},
])
Describe the problem
Ask Springer to change this line on their website:
The mistat Python package can be accessed at https://gedeck.github.io/mistat-code-solutions/ModernStatistics/
Link should be clickable and should reference as source for code and solutions.
Suggested change
???
Describe the problem
Chapter 3 Page 162 Equation 3.30
The lower interval's denominator missing "/2" in the subscript of the Chi-square symbol. Should appear as: Chi-Square 1-a/2[n-1].
Describe the problem
The P-value explained at the top of page 152 but there is no reference to it in the index
Suggested change
Add "p. 152" to the P-value entry in the index on page 437.
Screenshots
If applicable, add screenshots to help explain your problem.
Software (if the problem is with the code):
Additional context
Add any other context about the problem here.
Describe the problem
DtreeViz has a changed API
Suggested change
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Software (if the problem is with the code):
Additional context
Add any other context about the problem here.
Describe the problem
The package used in the book seems to have low support. Check alternative GP models.
Suggested change
GPyTorch: https://docs.gpytorch.ai/en/latest/examples/01_Exact_GPs/Simple_GP_Regression.html
Scikit-learn: https://jekel.me/2016/Guassian-Process-Prediction-aka-Kriging/
PyKrige: https://geostat-framework.readthedocs.io/projects/pykrige/en/stable/contents.html
Describe the problem
Code has changed. New zip files are required. Also update the Install_packages notebook.
Describe the problem
Error below when using mistat.stepwise_regression(outcome=y, all_vars=X, data=df3)
Software Error
File ~\mistat\regression\stepwiseRegression.py:19, in find_best_model_partialF(outcome, variable_sets, data, old_model, opt_max)
17 with warnings.catch_warnings():
18 warnings.simplefilter("ignore")
---> 19 comparison = sms.anova.anova_lm(old_model, new_model)
20 if optF * partialF < optF * comparison.F[1]:
21 best_vars = variables
AttributeError: module 'statsmodels.stats' has no attribute 'anova'
Suggested change
import statsmodels.api as sm
---> 19 comparison = sm.stats.anova_lm(old_model, new_model)
Software (if the problem is with the code):
OS: Windows
Release: 10
Python implementation: CPython
Python version : 3.11.0
IPython version : 8.12.0
statsmodels: 0.14.0
Describe the problem
Example 9.17 Using the censored data from Exercise 9.16, we estimate the ....
should be:
Example 9.17 Using the censored data from Example 9.16, we estimate the ....
Describe the problem
Equation 3.30 is incorrect
Suggested change
The term
Make it clearer which book each directory refers to.
Describe the problem
Improve description of interaction plot figure.
Add information at the top of the files on what to do when Errors or Warnings occur.
Make binder link similar to Modern Statistics
Add links like:
Example:
https://github.com/UVADS/DS1001/blob/master/ddsbook/analytics-lab-III.ipynb
Will require adding pip install
statements at the start
For model_3 object in the code block, smf.ols()
function formula
argument has the wrong formula object referenced. Should read formula
instead of poly_formula
or vice versa.
Looking at the notebook at the following link: https://github.com/gedeck/mistat-code-solutions/blob/main/ModernStatistics/notebooks/Chap006.ipynb, it appears poly_formula
is the selected object name.
This is just a object labeling inconsistency issue in the textbook.
Both books
Describe the problem
A clear and concise description of what the problem is.
Suggested change
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Software (if the problem is with the code):
Additional context
Add any other context about the problem here.
Some code breaks. Review all notebooks.
Proportional Sample Allocation Ch. 5 pg 317 Equation Clarification
The equation on page 317 in the top section under V
where
Should it read?:
I get the sense that it should because of how the equation for
Add a page to guide readers on what they can do next (see predictive analytics FAQ for a start)
Describe the problem
Deprecation warnings are not identified during tests. See if we can add code to identify warnings to address early on.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.