lalvim / partialleastsquaresregressor.jl Goto Github PK
View Code? Open in Web Editor NEWImplementation of a Partial Least Squares Regressor
License: MIT License
Implementation of a Partial Least Squares Regressor
License: MIT License
Hi, I have a few questions regarding the outputs of f = fitted_params(mach)
and r = report(mach)
on a trained mach
fitted_params(mach)
output? After navigating this repo I could find out that the first element is f[1].W
, the second element is f[1].b
, and the third element is f[1].P
; but this is not very clear and definitely not straightforward. It would be nice to have a description of how to access these objects and what they are in the docs of this package -there is a lot of inconsistency in terminology out there, and it is not easy to know what they actually are.report(mach)
expected to return nothing?W
, b
, P
)? Alternatively, it would be nice to have some metric of feature importance after fitting a model. (https://learnche.org/pid/latent-variable-modelling/projection-to-latent-structures/coefficient-plots-in-pls)Is the 'model.P' attribute from PartialLeastSquaresRegressor.jl equivalent to the x_rotations_ attribute from SKLearn?
The SKLearn documentation describes this attribute as: "The projection matrix used to transform X"
Follow the approach given: Martens H., NÊs T. Multivariate Calibration. Wiley: New York, 1989.
as shown here
This turns inference into a single matmul, because PLS does truly follow Y = XB when center scaled. Should work for PLS1 and PLS2.
julia> PLSRegressor.fit(rand(3,3)', rand(3,3),nfactors=1)
PLSRegressor.PLS2Model{Float64}([-0.24304924768468017; -0.7652241603951436; 0.5961199942523806], [0.5000828333756707; 0.8403325816466939; 0.20918487513671646], [-1.014428813230312; -0.5140513192948163; 1.5284801325251283], [-0.5833408395500677; -0.6270781681635975; 0.6347112774551629], 1, [0.38126711006014835 0.7374788535159716 0.5745868740183244], [0.4833154788820418 0.3610395095534553 0.5343679246337361], [0.416986758761901 0.25893432832014807 0.23456254998816195], [0.4338424499764614 0.2871044566173785 0.1631054938683769], 3, 3, true)
julia> PLSRegressor.fit(rand(3,3), rand(3,3)',nfactors=1)
ERROR: MethodError: no method matching check_constant_cols(::Adjoint{Float64,Array{Float64,2}})
Closest candidates are:
check_constant_cols(::Array{T,2}) where T<:AbstractFloat at /home/tyler/.julia/packages/PLSRegressor/w4SF2/src/utils.jl:31
check_constant_cols(::Array{T,1}) where T<:AbstractFloat at /home/tyler/.julia/packages/PLSRegressor/w4SF2/src/utils.jl:32
Stacktrace:
[1] fit(::Array{Float64,2}, ::Adjoint{Float64,Array{Float64,2}}; nfactors::Int64, copydata::Bool, centralize::Bool, kernel::String, width::Float64) at /home/tyler/.julia/packages/PLSRegressor/w4SF2/src/method.jl:27
[2] top-level scope at REPL[445]:1
The example 1 in the readme has issues:
julia> regressor = PLSRegressor(n_factors=2)
ERROR: UndefVarError: PLSRegressor not defined
This is easily fixed by adding
using PartialLeastSquaresRegressor: PLSRegressor
but maybe you really should export
that type from the package?
Then a bit later this happens:
julia> pls_model = @pipeline Standardizer regressor target=Standardizer
ERROR: LoadError: The `@pipeline` macro is deprecated. For pipelines without target transformations use pipe syntax, as in `ContinuousEncoder() |> Standardizer() |> my_classifier`. For details and advanced optioins, query the `Pipeline` docstring. To wrap a supervised model in a target transformation, use `TransformedTargetModel`, as in `TransformedTargetModel(my_regressor, target=Standardizer())`
in expression starting at REPL[16]:1
We would love to use the package, but we need a working example.
In the long term, I recommend using Literate.jl, to show working examples because then they are tested as part of CI, whereas examples in a README are not. But in the short run could you please fix the readme? Here is one Literate example:
https://jefffessler.github.io/ScoreMatching.jl/dev/generated/examples/01-overview/
julia> Pkg.add("PLSRegressor")
Resolving package versions...
ERROR: Unsatisfiable requirements detected for package PLSRegressor [fba1ee03]:
PLSRegressor [fba1ee03] log:
├─possible versions are: 1.0.1 or uninstalled
├─restricted to versions * by an explicit requirement, leaving only versions 1.0.1
└─restricted by julia compatibility requirements to versions: uninstalled — no versions left
Stacktrace:
[1] #propagate_constraints!#61(::Bool, ::Function, ::Pkg.GraphType.Graph, ::Set{Int64}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/GraphType.jl:1007
[2] propagate_constraints! at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/GraphType.jl:948 [inlined]
[3] #simplify_graph!#121(::Bool, ::Function, ::Pkg.GraphType.Graph, ::Set{Int64}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/GraphType.jl:1462
[4] simplify_graph! at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/GraphType.jl:1462 [inlined] (repeats 2 times)
[5] resolve_versions!(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}, ::Nothing) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Operations.jl:371
[6] resolve_versions! at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Operations.jl:315 [inlined]
[7] #add_or_develop#63(::Array{Base.UUID,1}, ::Symbol, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Operations.jl:1172
[8] #add_or_develop at ./none:0 [inlined]
[9] #add_or_develop#17(::Symbol, ::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:59
[10] #add_or_develop at ./none:0 [inlined]
[11] #add_or_develop#16 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:36 [inlined]
[12] #add_or_develop at ./none:0 [inlined]
[13] #add_or_develop#13 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:34 [inlined]
[14] #add_or_develop at ./none:0 [inlined]
[15] #add_or_develop#12(::Base.Iterators.Pairs{Symbol,Symbol,Tuple{Symbol},NamedTuple{(:mode,),Tuple{Symbol}}}, ::Function, ::String) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:33
[16] #add_or_develop at ./none:0 [inlined]
[17] #add#22 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:64 [inlined]
[18] add(::String) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:64
[19] top-level scope at none:0
How can loadings and scores be extracted after performing fit using fit!(pls_machine, rows = train)
?
A Fast PLS version
https://www.rdocumentation.org/packages/plsdepot/versions/0.1.17/topics/simpls
S. de Jong. SIMPLS: An Alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 18, 1993 (251-263).
Add diagnostics like Leverage, Explained variance in X&Y, and Q& Hotelling statistics as in here:
https://github.com/caseykneale/ChemometricsTools.jl/blob/master/src/ModelAnalysis.jl#L21
As described in https://discourse.julialang.org/t/ann-plans-for-removing-packages-that-do-not-yet-support-1-0-from-the-general-registry/ we are planning on removing packages that do not support 1.0 from the General registry. This package has been detected to not support 1.0 and is thus slated to be removed. The removal of packages from the registry will happen approximately a month after this issue is open.
To transition to the new Pkg system using Project.toml
, see https://github.com/JuliaRegistries/Registrator.jl#transitioning-from-require-to-projecttoml.
To then tag a new version of the package, see https://github.com/JuliaRegistries/Registrator.jl#via-the-github-app.
If you believe this package has erroneously been detected as not supporting 1.0 or have any other questions, don't hesitate to discuss it here or in the thread linked at the top of this post.
A N x n_factors
matrix, P
, is allocated on line 59 in kpls.jl and then populated on line 103 but is otherwise unused. Not sure if julia optimizes it away, but if not it can be a pretty sizeable allocation that should be removed.
Hi @lalvim
I'm excited about your updated package!
Have you considered making an announcement here: https://discourse.julialang.org/c/community/packages/47
It can get your package some attention.
I tried to fit uninformative data (random, uniform, and centered) with PLS2 and the regressor was unable to learn the baseline (note that I am using the MLJ interface from #10).
regressor = PLS(n_factors=1)
X = rand(1000, 5) .- 0.5
y = rand(1000, 2) .- 0.5
plsmachine = MLJ.machine(regressor, MLJ.table(X), MLJ.table(y))
MLJ.fit!(plsmachine)
pred = MLJ.predict(plsmachine)
yhat = MLJ.matrix(pred)
# Error of the model
println(sum((y .- yhat).^2)) # 249.26
# Baseline prediction yhat = 0
println(sum(y.^2)) # 166.33
I would expect the error to be not worse for the PLS2 model here since by learning every internal parameters to be zero, it would always return [0, 0]
as output and match the baseline prediction.
Scikit learn version on the other hand works as expected. It doesn't quite learn all parameters to be zero, but the final error matches the baseline's one.
Hi and thank you for this package!
have you considered porting it to MLJ.jl?
For some data sets training is failing. Given the MethodError
thrown, this looks like a bug to me:
julia> using MLJBase, PartialLeastSquaresRegressor
julia> X, y = @load_boston;
julia> machine(PartialLeastSquaresRegressor.PLSRegressor(), X, y) |> fit!
[ Info: Training machine(PLSRegressor(n_factors = 1), …).
┌ Error: Problem fitting the machine machine(PLSRegressor(n_factors = 1), …).
└ @ MLJBase ~/.julia/packages/MLJBase/wnJff/src/machines.jl:617
[ Info: Running type checks...
[ Info: Type checks okay.
ERROR: MethodError: no method matching check_constant_cols(::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
Closest candidates are:
check_constant_cols(::Matrix{T}) where T<:AbstractFloat at /Users/anthony/.julia/packages/PartialLeastSquaresRegressor/OrIoJ/src/utils.jl:26
check_constant_cols(::Vector{T}) where T<:AbstractFloat at /Users/anthony/.julia/packages/PartialLeastSquaresRegressor/OrIoJ/src/utils.jl:27
Stacktrace:
[1] fit(m::PartialLeastSquaresRegressor.PLSRegressor, verbosity::Int64, X::NamedTuple{(:Crim, :Zn, :Indus, :NOx, :Rm, :Age, :Dis, :Rad, :Tax, :PTRatio, :Black, :LStat), NTuple{12, SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Y::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
@ PartialLeastSquaresRegressor ~/.julia/packages/PartialLeastSquaresRegressor/OrIoJ/src/mlj_interface.jl:65
[2] fit_only!(mach::Machine{PartialLeastSquaresRegressor.PLSRegressor, true}; rows::Nothing, verbosity::Int64, force::Bool)
@ MLJBase ~/.julia/packages/MLJBase/wnJff/src/machines.jl:615
[3] fit_only!
@ ~/.julia/packages/MLJBase/wnJff/src/machines.jl:568 [inlined]
[4] #fit!#52
@ ~/.julia/packages/MLJBase/wnJff/src/machines.jl:683 [inlined]
[5] fit!
@ ~/.julia/packages/MLJBase/wnJff/src/machines.jl:681 [inlined]
[6] |>(x::Machine{PartialLeastSquaresRegressor.PLSRegressor, true}, f::typeof(fit!))
@ Base ./operators.jl:858
[7] top-level scope
@ REPL[162]:1
[8] top-level scope
@ ~/.julia/packages/CUDA/fAEDi/src/initialization.jl:52
It would be nice a good documentation.
The tag name "v1.0" is not of the appropriate SemVer form (vX.Y.Z).
cc: @lalvim
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.