Giter Club home page Giter Club logo

effects.jl's Introduction

Effects.jl

Effects Prediction for Linear and Generalized Linear models

Stable Dev Build Status codecov DOI

Regression is a foundational technique of statistical analysis, and many common statistical tests are based on regression models (e.g., ANOVA, t-test, correlation tests, etc.). Despite the expressive power of regression models, users often prefer the simpler procedures because regression models themselves can be difficult to interpret. Most notably, the interpretation of individual regression coefficients (including their magnitude, sign, and even significance) changes depending on the presence or even centering/contrast coding of other terms or interactions. For instance, a common source of confusion in regression analysis is the meaning of the intercept coefficient. On its own, this coefficient corresponds to the grand mean of the independent variable, but in the presence of a contrast-coded categorical variable, it can correspond to the mean of the baseline level of that variable, the grand mean, or something else altogether, depending on the contrast coding scheme that is used. Effects.jl provides a general-purpose tool for interpreting fitted regression models by projecting the effects of one or more terms in the model back into "data space", along with the associated uncertainty, fixing other the value of other terms at typical or user-specified values. This makes it straightforward to interrogate the estimated effects of any predictor at any combination of other predictors' values. Because these effects are computed in data space, they can be plotted in parallel format to raw or aggregated data, enabling intuitive model interpretation and sanity checks.

effects.jl's People

Contributors

ararslan avatar dependabot[bot] avatar github-actions[bot] avatar glennmoy avatar kimlaberinto avatar kleinschmidt avatar palday avatar spinkney avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

effects.jl's Issues

Rely on template formula instead of contrasts

Relying on the contrasts can be potentially dangerous since the ordering of the levels and the levels themselves (and center for a CenteredTerm from StandardizedPredictors.jl) are determined from the data if not specified in the contrasts, so it's possible that you'd get different contrast coding if someone specified, e.g., contrasts=Dict(:x => EffectsCoding(), :y => Center()) and passed in a reference grid.

A better alternative is to match the Terms that are present in the 'effects formula' with a 'model formula' that was actually used to generate the modelcols for the model fit. Something like finding a matching terms based on StatsModels.symequal/termsyms.

Allow user-specified `vcov` computation

Right now, we use StatsBase.vcov but we could add a kwarg vcov=StatsBase.vcov that allows the user to specify an alternative computation (e.g. based on robust standard errors or the like).

(Inspired by a question at JuliaCon)

error from empairs

I have error from code

    cat_mm  = fit(LinearMixedModel, @formula(CAT_SCORE~0+Vizit+HTPuse+AGE+SEX+PACK_YEARS+NATION+DIS_LUNG+MARITAL_STATUS+Vizit*HTPuse+(1|CODE_CAT)), ds_work_cat, REML = true, 
contrasts = Dict(:Vizit => StatsModels.DummyCoding(base = "V02"), 
:HTPuse => StatsModels.DummyCoding(base = "CC"), 
:NATION => StatsModels.DummyCoding(base = "Азиатская"), 
:SEX => StatsModels.DummyCoding(base = "мужской"), 
:DIS_LUNG => StatsModels.DummyCoding(base = "нет"), 
:MARITAL_STATUS => StatsModels.DummyCoding(base = "Нет"))
)
empairs(cat_mm)

Error msg:

ERROR: MethodError: Cannot `convert` an object of type 
  String7 to an object of type
  Union{Number, String}
Closest candidates are:
  convert(::Type{S}, ::CategoricalValue) where S<:Union{AbstractChar, AbstractString, Number} at C:\Users\a\.julia\packages\CategoricalArrays\0yLZN\src\value.jl:92
  convert(::Type{T}, ::T) where T at Base.jl:61
Stacktrace:
  [1] setindex!(h::Dict{String, Union{Number, String}}, v0::String7, key::String)
    @ Base .\dict.jl:382
  [2] (::Effects.var"#24#26"{String, String, DataFrame, Vector{String}})(::Vector{DataFrameRow{DataFrame, DataFrames.Index}})
    @ Effects C:\Users\a\.julia\packages\Effects\eXakY\src\emmeans.jl:149
  [3] MappingRF (repeats 2 times)
    @ .\reduce.jl:95 [inlined]
  [4] _foldl_impl(op::Base.MappingRF{Combinatorics.var"#10#13"{Combinatorics.var"#reorder#11"{DataFrames.DataFrameRows{DataFrame}}}, Base.MappingRF{Effects.var"#24#26"{String, String, DataFrame, Vector{String}}, Base.BottomRF{typeof(vcat)}}}, init::Base._InitialValue, itr::Combinatorics.Combinations)
    @ Base .\reduce.jl:58
  [5] foldl_impl
    @ .\reduce.jl:48 [inlined]
  [6] mapfoldl_impl
    @ .\reduce.jl:44 [inlined]
  [7] #mapfoldl#259
    @ .\reduce.jl:170 [inlined]
  [8] mapfoldl
    @ .\reduce.jl:170 [inlined]
  [9] #mapreduce#263
    @ .\reduce.jl:302 [inlined]
 [10] mapreduce
    @ .\reduce.jl:302 [inlined]
 [11] empairs(df::DataFrame; eff_col::String, err_col::Symbol, padjust::Function)
    @ Effects C:\Users\a\.julia\packages\Effects\eXakY\src\emmeans.jl:142
 [12] empairs(model::LinearMixedModel{Float64}; eff_col::Nothing, err_col::Symbol, invlink::Function, levels::Dict{Any, Any}, dof::Nothing, padjust::Function)
    @ Effects C:\Users\a\.julia\packages\Effects\eXakY\src\emmeans.jl:128
 [13] empairs(model::LinearMixedModel{Float64})
    @ Effects C:\Users\a\.julia\packages\Effects\eXakY\src\emmeans.jl:124
 [14] top-level scope
    @ REPL[33]:1

my data types:

image

Support for interactions without main effects?

Hello,

I noticed yesterday that I receive an error when I try to use interaction terms without including them as main effects.

Specifically, it happens when I want to include a variable only as an interaction. Initially I figured that this had something to do with standardization, but it is true in general.

using StableRNGs, DataFrames, Effects, StandardizedPredictors, GLM, StatsModels
rng = StableRNG(1);

data = DataFrame(age=[13:20; 13:20], 
                        sex=repeat(["male", "female"], inner=8),
                        weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(rng, 16))

m = lm(@formula(weight ~ 1 + sex & age), data, contrasts=Dict(:age => Center()))

design = Dict(:sex => unique(data.sex))
eff = effects(design, m)

I receive the following:

ERROR: ArgumentError: Can't determine columns corresponding to 'age(centered: 16.5)' in matrix term 1 + sex & age(centered: 16.5)

Thank you again for this package.

API for reference grid/design

what's the reason for having both the formula and the design? Does everything listed in the design have to have a corresponding entry in the formula RHS?

It seems like you might have two different methods: one for a design dictionary, and one with a formula (where the design stuff is calculted automagically from the original data. I think it's a safe assumption that a formula is available from the original model, so you can get things like levels for categorical variables from those terms.

If we wanted to get REALLY fancy, we could have a special term type/syntax for this where in the formula you could specify an expression that would allow you to manually control the design for some terms and let the defaults happen for the others...but that's a low priority I think.

Originally posted by @kleinschmidt in #1 (comment)

Compatibility with Turing models

Thanks forthe package!
Any plans to make Effects.jl work with (at least some simple) Turing models?
What would be the things needed for it to work?

y~age+sex, dict has only one of them

using DataFrames, Effects, GLM, StatsModels, Random

growthdata = DataFrame(; age=[13:20; 13:20],
                       sex=repeat(["male", "female"], inner=8),
                       weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn( 16))

mod_uncentered = lm(@formula(weight ~ 1 + sex + age), growthdata)


effects(Dict(:sex=>["male","female"]), mod_uncentered)

results in

ERROR: type NamedTuple has no field age
Stacktrace:
  [1] getindex
    @ ./namedtuple.jl:118 [inlined]
  [2] modelcols
    @ ~/.julia/packages/StatsModels/m1jYD/src/terms.jl:494 [inlined]
  [3] #34
[...]

But e.g. effects(Dict(:sex=>["male","female"],:age=>[0,0]), mod_uncentered) works. If I add the interaction, weight ~ 1 + sex*age it works as well with my initial effects-dict.

Conceptually, I dont think I need to specify both sex&age in the dict.


I might have narrowed it down to a "<" needing to be a "<=", but then again, I again have no Idea what I'm doing ;-)

return any(x -> _termsyms(tt) < _termsyms(x), nonfunc_terms)

(I can o.c. do the pull request, would be fun if that is the only change :-D)

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Tracking Issue: Initial Release

We're spinning off our implementation of effects computations and releasing it to the community. Hooray! Here's what needs to happen:

Task list

  • Make this tracking issue :)
  • No private info in code
  • Pre-make public commit:
    • Add MIT license
    • Add tests + CI badge to README
  • Make sure repo follows best-practice settings
  • Make docs build
  • Add docs badges to README
  • Add TagBot GitHub Action
  • Add CompatHelper GitHub Action
  • Make repo public
  • Have @jrevels enable repo access for the general julia registrator
  • Open PR in Julia registry
    • Wait for it to auto-merge (or have @ericphanson do a drive-by 😄)

Define behavior for missing levels in design/reference grid

I'm a bit worried about computing the schema from the reference grid (rather than the original data/concrete terms in the original formula). If the reference grid omits levels for any of the categorical variables, I think you'll get a mismatch between the columns you generate and columns that are present in the original modelmatrix, unless you also specify the correct contrasts in the contrasts argument.

Like I said above, I think it's a safe assumption that you can get the formula that was used to fit the model, so you can just match the terms present in that with the (un-typed) terms in the provided formula (or even with the keys from the design dict)

Originally posted by @kleinschmidt in #1 (comment)

More tests

Probably good to add at least two more sets of tests - one with categorical variables, and one with some variables missing from the design/formula

Originally posted by @kleinschmidt in #1 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.