ashryaagr / fairness.jl Goto Github PK
View Code? Open in Web Editor NEWJulia Toolkit with fairness metrics and bias mitigation algorithms
Home Page: https://ashryaagr.github.io/Fairness.jl/dev/
License: MIT License
Julia Toolkit with fairness metrics and bias mitigation algorithms
Home Page: https://ashryaagr.github.io/Fairness.jl/dev/
License: MIT License
It would be nice if one could use the measures provided here in MLJ performance evaluation and elsewhere. This means implementing the API documented here, which does not appear to be the case.
Note that an MLJ measure does not have to return a numerical value. We regard, for example, confmat
as a measure:
using MLJ
X, y = @load_crabs
model = @load DecisionTreeClassifier
y = coerce([0, 0, 1, 1, 1], OrderedFactor)
e = evaluate(model, X, y, measure=confmat, operation=predict_mode)
julia> e.per_fold[1][1]
┌───────────────────────────┐
│ Ground Truth │
┌─────────────┼─────────────┬─────────────┤
│ Predicted │ B │ O │
├─────────────┼─────────────┼─────────────┤
│ B │ 31 │ 0 │
├─────────────┼─────────────┼─────────────┤
│ O │ 3 │ 0 │
└─────────────┴─────────────┴─────────────┘
So here's one tentative suggestion for implementing the MLJ API.
In MLJ one can already have a measure m
with signature m(yhat, y, X)
where X
represents the full table of input features, which we can suppose is a Tables.jl table. In your case, you only care about one particular column of X
- let's call it the group column - whose classes you want to filter on (eg, a column like ["male", "female", "male", "male", "binary"]
). One could:
Introduce a new parameter for each MLJFair metric, called group_name
, or whatever, which specifies the name of the group column. So one would instantiate the measure like this: m = MLJFair.TruePositive(group_name=:gender)
.
Overload calling of the metric appropriately, so that m(yhat, y, X)
returns a dictionary of numerical values keyed on the "group" class, eg, Dict("male" => 2, "female" =>3, "binary" => 1)
. Or I suppose you could return a struct of some kind, but I think a dict would be the most user-friendly.
To complete the API you may have to overload some measure traits, for example:
MLJBase.name(::Type{<:MLJFair.TruePositive}) = "TruePositive"
MLJBase.target_scitype( ... ) = OrderedFactor{2}
MLJBase.supports_weights(...) = false # for now
MLJBase.prediction_type(..) = :deterministic
MJLBase.orientation(::Type) = :other # other options are :score, :loss
MLJBase.reports_each_observation(::Type) = false
MLJBase.aggregation(::Type) = Sum()
MLJBase.is_feature_dependent(::Type) = true <---- Important
If you did this, then things like evaluate(model, X, y, measure=MLJFair.TruePositive(group=gender), resampling=CV())
would work.
How does this sound?
https://github.com/ashryaagr/Fairness.jl/blob/master/src/datasets/datasets.jl has many data loading macros which are simply functions. Best practice for Julia code is to use macros only in situations where code rewriting is necessary.
Would be good to think about a general purpose data loader so that this package can work nicely with other packages such as https://github.com/JuliaStats/RDatasets.jl
Hi I don't know if this is a known issue, but using @load_adult appears to be changing the working directory.
julia> using Fairness
julia> pwd()
"/home/orefici/Repos/CItA_experiments/CItA_XAI"
julia> X, y = @load_adult;
julia> pwd()
"/home/orefici/.julia/packages/Fairness/1mEPa/data"
If the issue is real I'd be happy to try and help with a PR if you can point me to where I should look to fix it.
Hi @ashryaagr and maintainers,
I was trying to follow along with the NextJournal Tutorial for Fairness.jl - great tutorial by the way! - but it seems that there is some sort of error now with MLJFlux. Here are the steps to reproduce what I am seeing:
pkg> activate --temp
Here is the full stack trace when I try to load a NeuralNetworkClassifier:
julia> @load NeuralNetworkClassifier
[ Info: For silent loading, specify `verbosity=0`.
import MLJFlux[ Info: Precompiling MLJFlux [094fc8d1-fd35-5302-93ea-dabda2abf845]
ERROR: LoadError: UndefVarError: @aggressive_constprop not defined
Stacktrace:
[1] top-level scope
@ ~/.julia/packages/ArrayInterface/gMtB5/src/ArrayInterface.jl:12
[2] include
@ ./Base.jl:419 [inlined]
[3] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{
String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Ve
ctor{Pair{Base.PkgId, UInt64}}, source::String)
@ Base ./loading.jl:1554
[4] top-level scope
@ stdin:1
in expression starting at /home/cedarprince/.julia/packages/ArrayInterface/gMtB5/sr
c/ArrayInterface.jl:1
in expression starting at stdin:1
ERROR: LoadError: Failed to precompile ArrayInterface [4fba245c-0d91-5ea0-9b3e-6abc
04ee57a9] to /home/cedarprince/.julia/compiled/v1.8/ArrayInterface/jl_nxMy3K.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_std
out::IO, keep_loaded_modules::Bool)
@ Base ./loading.jl:1705
[3] compilecache
@ ./loading.jl:1649 [inlined]
[4] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1337
[5] _require_prelocked(uuidkey::Base.PkgId)
@ Base ./loading.jl:1200
[6] macro expansion
@ ./loading.jl:1180 [inlined]
[7] macro expansion
@ ./lock.jl:223 [inlined]
[8] require(into::Module, mod::Symbol)
@ Base ./loading.jl:1144
[9] include(mod::Module, _path::String)
@ Base ./Base.jl:419
[10] include(x::String)
@ Flux ~/.julia/packages/Flux/KkC79/src/Flux.jl:1
[11] top-level scope
@ ~/.julia/packages/Flux/KkC79/src/Flux.jl:28
[12] include
@ ./Base.jl:419 [inlined]
[13] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector
{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::V
ector{Pair{Base.PkgId, UInt64}}, source::String)
@ Base ./loading.jl:1554
[14] top-level scope
@ stdin:1
in expression starting at /home/cedarprince/.julia/packages/Flux/KkC79/src/optimise
/Optimise.jl:1
in expression starting at /home/cedarprince/.julia/packages/Flux/KkC79/src/Flux.jl:
1
in expression starting at stdin:1
ERROR: LoadError: Failed to precompile Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
to /home/cedarprince/.julia/compiled/v1.8/Flux/jl_qD0gAY.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_std
out::IO, keep_loaded_modules::Bool)
@ Base ./loading.jl:1705
[3] compilecache
@ ./loading.jl:1649 [inlined]
[4] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1337
[5] _require_prelocked(uuidkey::Base.PkgId)
@ Base ./loading.jl:1200
[6] macro expansion
@ ./loading.jl:1180 [inlined]
[7] macro expansion
@ ./lock.jl:223 [inlined]
[8] require(into::Module, mod::Symbol)
@ Base ./loading.jl:1144
[9] include
@ ./Base.jl:419 [inlined]
[10] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector
{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::V
ector{Pair{Base.PkgId, UInt64}}, source::Nothing)
@ Base ./loading.jl:1554
[11] top-level scope
@ stdin:1
in expression starting at /home/cedarprince/.julia/packages/MLJFlux/6XVNm/src/MLJFl
ux.jl:1
in expression starting at stdin:1
ERROR: Failed to precompile MLJFlux [094fc8d1-fd35-5302-93ea-dabda2abf845] to /home
/cedarprince/.julia/compiled/v1.8/MLJFlux/jl_Gyt3nF.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_std
out::IO, keep_loaded_modules::Bool)
@ Base ./loading.jl:1705
[3] compilecache
@ ./loading.jl:1649 [inlined]
[4] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1337
[5] _require_prelocked(uuidkey::Base.PkgId)
@ Base ./loading.jl:1200
[6] macro expansion
@ ./loading.jl:1180 [inlined]
[7] macro expansion
@ ./lock.jl:223 [inlined]
[8] require(into::Module, mod::Symbol)
@ Base ./loading.jl:1144
[9] eval
@ ./boot.jl:368 [inlined]
[10] eval(x::Expr)
@ Base.MainInclude ./client.jl:478
[11] _import(modl::Module, api_pkg::Symbol, pkg::Symbol, doprint::Bool)
@ MLJModels ~/.julia/packages/MLJModels/GKDnU/src/loading.jl:34
[12] top-level scope
@ ~/.julia/packages/MLJModels/GKDnU/src/loading.jl:206
I noticed that my package for MLJ was quite old so I also ran an update via pkg> up
after this error. After updating, trying to load the NeuralNetworkClassifier results in this error:
julia> @load NeuralNetworkClassifier
[ Info: For silent loading, specify `verbosity=0`.
import MLJFlux[ Info: Precompiling MLJFlux [094fc8d1-fd35-5302-93ea-dabda2abf845]
ERROR: LoadError: UndefVarError: doc_header not defined
Stacktrace:
[1] getproperty(x::Module, f::Symbol)
@ Base ./Base.jl:31
[2] top-level scope
@ ~/.julia/packages/MLJFlux/6XVNm/src/types.jl:134
[3] include(mod::Module, _path::String)
@ Base ./Base.jl:419
[4] include(x::String)
@ MLJFlux ~/.julia/packages/MLJFlux/6XVNm/src/MLJFlux.jl:1
[5] top-level scope
@ ~/.julia/packages/MLJFlux/6XVNm/src/MLJFlux.jl:25
[6] include
@ ./Base.jl:419 [inlined]
[7] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{
String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Ve
ctor{Pair{Base.PkgId, UInt64}}, source::Nothing)
@ Base ./loading.jl:1554
[8] top-level scope
@ stdin:1
in expression starting at /home/cedarprince/.julia/packages/MLJFlux/6XVNm/src/types
.jl:134
in expression starting at /home/cedarprince/.julia/packages/MLJFlux/6XVNm/src/MLJFl
ux.jl:1
in expression starting at stdin:1
ERROR: Failed to precompile MLJFlux [094fc8d1-fd35-5302-93ea-dabda2abf845] to /home
/cedarprince/.julia/compiled/v1.8/MLJFlux/jl_MwDCnk.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_std
out::IO, keep_loaded_modules::Bool)
@ Base ./loading.jl:1705
[3] compilecache
@ ./loading.jl:1649 [inlined]
[4] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1337
[5] _require_prelocked(uuidkey::Base.PkgId)
@ Base ./loading.jl:1200
[6] macro expansion
@ ./loading.jl:1180 [inlined]
[7] macro expansion
@ ./lock.jl:223 [inlined]
[8] require(into::Module, mod::Symbol)
@ Base ./loading.jl:1144
[9] eval
@ ./boot.jl:368 [inlined]
[10] eval(x::Expr)
@ Base.MainInclude ./client.jl:478
[11] _import(modl::Module, api_pkg::Symbol, pkg::Symbol, doprint::Bool)
@ MLJModels ~/.julia/packages/MLJModels/GKDnU/src/loading.jl:34
[12] top-level scope
@ ~/.julia/packages/MLJModels/GKDnU/src/loading.jl:206
Additionally, here is my current set-up and version info:
Julia Version: 1.8
[7c232609] Fairness v0.3.2
⌃ [add582a8] MLJ v0.16.7
[094fc8d1] MLJFlux v0.2.9
Info Packages marked with ⌃ have new versions available
What am I doing incorrectly here? Really looking forward to working with this package - thanks! 😄
I'm guessing that the example below failures because you assume the target is a categorical vector whose "raw" type is Bool
. However, it could be any type (in this case it is String
). The example does not error if I insert the following code immediately after the unpack
line:
y = map(y) do η
η == "1" ? true : false
end
The example:
using MLJFair, MLJ
import DataFrames
model = @pipeline ContinuousEncoder @load(EvoTreeClassifier)
# load Indian Liver Patient Dataset:
data = OpenML.load(1480) |> DataFrames.DataFrame ;
y, X = unpack(data, ==(:Class), name->true; rng=123);
y = coerce(y, Multiclass);
coerce!(X, :V2 => Multiclass, Count => Continuous);
schema(X)
# Notes:
# - The target `y` is 1 for liver patients, 2 otherwise
# - The attribute `V2` of `X` is gender
wrappedModel = ReweighingSamplingWrapper(model, grp=:V2)
julia> evaluate(wrappedModel,
X, y,
measures=MetricWrappers(
[true_positive, true_positive_rate]; grp=:V2))
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Bool
Closest candidates are:
convert(::Type{T}, ::T) where T<:Number at number.jl:6
convert(::Type{T}, ::Number) where T<:Number at number.jl:7
convert(::Type{T}, ::Ptr) where T<:Integer at pointer.jl:23
...
Stacktrace:
[1] convert(::Type{Bool}, ::CategoricalArrays.CategoricalValue{String,UInt32}) at /Users/anthony/.julia/packages/CategoricalArrays/nd8kj/src/value.jl:68
[2] setindex!(::Array{Bool,1}, ::CategoricalArrays.CategoricalValue{String,UInt32}, ::Int64) at ./array.jl:782
[3] copyto! at ./abstractarray.jl:807 [inlined]
[4] copyto! at ./abstractarray.jl:799 [inlined]
[5] AbstractArray at ./array.jl:499 [inlined]
[6] convert at ./abstractarray.jl:16 [inlined]
[7] fair_tensor(::CategoricalArrays.CategoricalArray{String,1,UInt32,String,CategoricalArrays.CategoricalValue{String,UInt32},Union{}}, ::CategoricalArrays.CategoricalArray{String,1,UInt32,String,CategoricalArrays.CategoricalValue{String,UInt32},Union{}}, ::CategoricalArrays.CategoricalArray{String,1,UInt32,String,CategoricalArrays.CategoricalValue{String,UInt32},Union{}}) at /Users/anthony/Dropbox/Julia7/MLJ/MLJFair/src/fair_tensor.jl:54
[8] (::MetricWrapper)(::MLJBase.UnivariateFiniteArray{Multiclass{2},String,UInt32,Float32,1}, ::DataFrames.DataFrame, ::CategoricalArrays.CategoricalArray{String,1,UInt32,String,CategoricalArrays.CategoricalValue{String,UInt32},Union{}}) at /Users/anthony/Dropbox/Julia7/MLJ/MLJFair/src/measures/metricWrapper.jl:50
[9] value at /Users/anthony/.julia/packages/MLJBase/r3heT/src/measures/measures.jl:74 [inlined]
[10] value at /Users/anthony/.julia/packages/MLJBase/r3heT/src/measures/measures.jl:64 [inlined]
[11] (::MLJBase.var"#264#269"{DataFrames.DataFrame,CategoricalArrays.CategoricalArray{String,1,UInt32,String,CategoricalArrays.CategoricalValue{String,UInt32},Union{}},MLJBase.UnivariateFiniteArray{Multiclass{2},String,UInt32,Float32,1}})(::MetricWrapper) at ./none:0
[12] collect(::Base.Generator{Array{Any,1},MLJBase.var"#264#269"{DataFrames.DataFrame,CategoricalArrays.CategoricalArray{String,1,UInt32,String,CategoricalArrays.CategoricalValue{String,UInt32},Union{}},MLJBase.UnivariateFiniteArray{Multiclass{2},String,UInt32,Float32,1}}}) at ./generator.jl:47
[13] (::MLJBase.var"#get_measurements#268"{Array{Tuple{Array{Int64,1},Array{Int64,1}},1},Nothing,Int64,Array{Any,1},typeof(predict),Bool,DataFrames.DataFrame,CategoricalArrays.CategoricalArray{String,1,UInt32,String,CategoricalArrays.CategoricalValue{String,UInt32},Union{}}})(::Machine{ReweighingSamplingWrapper}, ::Int64) at /Users/anthony/.julia/packages/MLJBase/r3heT/src/resampling.jl:779
[14] #248 at /Users/anthony/.julia/packages/MLJBase/r3heT/src/resampling.jl:64 [inlined]
[15] _mapreduce(::MLJBase.var"#248#249"{MLJBase.var"#get_measurements#268"{Array{Tuple{Array{Int64,1},Array{Int64,1}},1},Nothing,Int64,Array{Any,1},typeof(predict),Bool,DataFrames.DataFrame,CategoricalArrays.CategoricalArray{String,1,UInt32,String,CategoricalArrays.CategoricalValue{String,UInt32},Union{}}},Machine{ReweighingSamplingWrapper},Int64,ProgressMeter.Progress}, ::typeof(vcat), ::IndexLinear, ::UnitRange{Int64}) at ./reduce.jl:309
[16] _mapreduce_dim at ./reducedim.jl:312 [inlined]
[17] #mapreduce#584 at ./reducedim.jl:307 [inlined]
[18] mapreduce at ./reducedim.jl:307 [inlined]
[19] _evaluate!(::MLJBase.var"#get_measurements#268"{Array{Tuple{Array{Int64,1},Array{Int64,1}},1},Nothing,Int64,Array{Any,1},typeof(predict),Bool,DataFrames.DataFrame,CategoricalArrays.CategoricalArray{String,1,UInt32,String,CategoricalArrays.CategoricalValue{String,UInt32},Union{}}}, ::Machine{ReweighingSamplingWrapper}, ::CPU1{Nothing}, ::Int64, ::Int64) at /Users/anthony/.julia/packages/MLJBase/r3heT/src/resampling.jl:643
[20] evaluate!(::Machine{ReweighingSamplingWrapper}, ::Array{Tuple{Array{Int64,1},Array{Int64,1}},1}, ::Nothing, ::Nothing, ::Int64, ::Int64, ::Array{Any,1}, ::typeof(predict), ::CPU1{Nothing}, ::Bool) at /Users/anthony/.julia/packages/MLJBase/r3heT/src/resampling.jl:796
[21] evaluate!(::Machine{ReweighingSamplingWrapper}, ::CV, ::Nothing, ::Nothing, ::Int64, ::Int64, ::Array{Any,1}, ::Function, ::CPU1{Nothing}, ::Bool) at /Users/anthony/.julia/packages/MLJBase/r3heT/src/resampling.jl:859
[22] #evaluate!#242(::CV, ::Array{Any,1}, ::Array{Any,1}, ::Nothing, ::Function, ::CPU1{Nothing}, ::Nothing, ::Int64, ::Bool, ::Bool, ::Int64, ::typeof(evaluate!), ::Machine{ReweighingSamplingWrapper}) at /Users/anthony/.julia/packages/MLJBase/r3heT/src/resampling.jl:608
[23] (::MLJBase.var"#kw##evaluate!")(::NamedTuple{(:measures,),Tuple{Array{Any,1}}}, ::typeof(evaluate!), ::Machine{ReweighingSamplingWrapper}) at ./none:0
[24] #evaluate#247(::Base.Iterators.Pairs{Symbol,Array{Any,1},Tuple{Symbol},NamedTuple{(:measures,),Tuple{Array{Any,1}}}}, ::typeof(evaluate), ::ReweighingSamplingWrapper, ::DataFrames.DataFrame, ::Vararg{Any,N} where N) at /Users/anthony/.julia/packages/MLJBase/r3heT/src/resampling.jl:626
[25] (::MLJModelInterface.var"#kw##evaluate")(::NamedTuple{(:measures,),Tuple{Array{Any,1}}}, ::typeof(evaluate), ::ReweighingSamplingWrapper, ::DataFrames.DataFrame, ::Vararg{Any,N} where N) at ./none:0
[26] top-level scope at REPL[89]:1
That's quite cool how you use the learning network stuff to create the wrappers!
I've made some minor corrections on a fork: ablaom@d7f1d5c - mostly with regard to MLJBase 0.14 update
Suggestions:
Implement some checks (via clean!
method) on the classifier, such as classifier isa Deterministic
although you may want to handle Probabilisitic
too, and just change predict
to predict_mode
, or have a probability- based re-weighting. Also, to ensure it's a classifier, you should check target_scitype(classifier) <: AbstractArray{<:Finite}
.
I believe "reweighing" should be "reweighting" (despite auto-correction to the contrary). "to weight" (a data point) is a verb in this context, distinct from the related verb "to weigh" (an apple).
The functionality for FairTensor seems to be primarily to store labels together with the fairness-confusion tensor. This functionality already exists in https://github.com/JuliaArrays/AxisArrays.jl and I think it would make sense to define FairTensor as an AxisArray.
Also, let's discuss why FairTensor needs to be mutable - there are potential performance penalty issues with the current design.
This repo is behind on compat updates, and CompatHelper has also gone into hibernation, which means there are probably others. This is preventing newer versions of MLJ to be used with Fairness.jl
MLJBase is a dep of this pkg. But since MLJBase 1.0, the measures API lives in StatisticalMeasures.jl. To help this package make the transition the following may be helpful:
Registrator has been installed on this respository. This issue shall be used to trigger it. It will then trigger TagBot which will then create a tag for the registered version. This issue is strictly meant for owner and collaborators to create new releases.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
Hello,
I find it is really an interesting project, and I would like to make some contribution. What can I start with?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.