beacon-biosignals / lighthouse.jl Goto Github PK

View Code? Open in Web Editor NEW

18.0 18.0 2.0 26.32 MB

Performance evaluation tools for multiclass, multirater classification models

License: MIT License

Julia 100.00%

lighthouse.jl's People

Contributors

Stargazers

Watchers

Forkers

stjordanis bparbhu

lighthouse.jl's Issues

Dev'ing Lighthouse pulls OldLighthouse

I don't think this is the right thing -- but for some reason Pkg grabs OldLighthouse.

compute cohen's kappa from confusion matrix

currently, we compute many metrics directly from the confusion matrix, but not Cohen's kappa:

Lighthouse.jl/src/metrics.jl

Lines 105 to 127 in 2feb223

 """ 

  cohens_kappa(class_count, hard_label_pairs) 

 Return `(κ, p₀)` where `κ` is Cohen's kappa and `p₀` percent agreement given 

 `class_count` and `hard_label_pairs` (these arguments take the same form as 

 their equivalents in [`confusion_matrix`](@ref)). 

 """ 

 function cohens_kappa(class_count, hard_label_pairs) 

 all(issubset(pair, 1:class_count) for pair in hard_label_pairs) || 

 throw(ArgumentError("Unexpected class in `hard_label_pairs`.")) 

 p₀ = accuracy(confusion_matrix(class_count, hard_label_pairs)) 

 pₑ = _probability_of_chance_agreement(class_count, hard_label_pairs) 

 return _cohens_kappa(p₀, pₑ), p₀ 

 end 

 _cohens_kappa(p₀, pₑ) = (p₀ - pₑ) / (1 - ifelse(pₑ == 1, zero(pₑ), pₑ)) 

 function _probability_of_chance_agreement(class_count, hard_label_pairs) 

 labels_1 = (pair[1] for pair in hard_label_pairs) 

 labels_2 = (pair[2] for pair in hard_label_pairs) 

 x = sum(k -> count(==(k), labels_1) * count(==(k), labels_2), 1:class_count) 

 return x / length(hard_label_pairs)^2 

 end

It could clean up some of the data flow to do so. Here is an implementation.

function cohens_kappa_from_confusion_matrix(conf)
    p₀ = accuracy(conf)
    pₑ = probability_of_chance_agreement_from_confusion_matrix(conf)
    return (p₀ - pₑ) / (1 - ifelse(pₑ == 1, zero(pₑ), pₑ))
end

function probability_of_chance_agreement_from_confusion_matrix(conf)
    counts_1 = dropdims(sum(conf; dims=1); dims=1)
    counts_2 = dropdims(sum(conf; dims=2); dims=2)
    n = sum(counts_1)
    @check n == sum(counts_2)
    return dot(counts_1, counts_2) / n^2
end

Checkpointing?

I think it would make sense for Lighthouse to support per-epoch checkpointing and automatic resumption from checkpoints. I imagine the interface could work like:

MyClassifier <: AbstractClassifier may optionally provide methods for:

load_checkpoint(::Type{MyClassifier}, uri) -> MyClassifier
save_checkpoint(uri, ::MyClassifier) -> Nothing
And can optionally provide a specialized implementation for has_checkpoint(uri, ::AbstractClassifier) = isfile(uri) and checkpoint_extension(::Type{<:AbstractClassifier}) = ".checkpoint".

Then learn! optionally takes a checkpoint_dir URI as a keyword argument.

If no URI is passed, no checkpointing is done.
If one is passed, the classifier must support the load and save methods, and
at the start of each epoch, uri = joinpath(checkpoint_dir, string(epoch), "classifier_checkpoint$(ext)") (with ext = checkpoint_extension(MyClassifier)) is checked for the existence of a checkpoint (by has_checkpoint).
If so, we immediately proceed to the next epoch (without calling callbacks-- presumably those already ran the first time).

Not sure what to do about logs though-- should we also load logs from a checkpoint? I'd like any model-managed state to be part of the AbstractClassifier and have its checkpointing handle any of that state, but I'm not sure about any Lighthouse-managed state, and especially what gaurantees should callbacks have about the availability of that state if we resumed from a checkpoint.

`plot_confusion_matrix` cannot change `annotation_text_size`

I was trying to increase the text size of my confusion matrix plot. But when I tried to pass in the keyword arg annotation_text_size=34 it gave me an error. The code works fine without the keyword args (Lighthouse.plot_confusion_matrix(rand(2, 2), ["1", "2"], :Row)) and produces a confusion matrix plot. But when I try to change the default text size it gave me this error:

Lighthouse.plot_confusion_matrix(rand(2, 2), ["1", "2"], :Row, annotation_text_size=34)

MethodError: no method matching plot_confusion_matrix(::Matrix{Float64}, ::Vector{String}, ::Symbol; annotation_text_size=34)

Closest candidates are:

plot_confusion_matrix(::AbstractMatrix, ::Any, ::Any) at ~/.julia/packages/Lighthouse/K3PBy/src/learn.jl:333 got unsupported keyword argument "annotation_text_size"

here are my environment/system info, if it's of any help:

Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i3-4030U CPU @ 1.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, haswell)

Environment:
  JULIA_REVISE_WORKER_ONLY = 1
      Status `~/usr/project/dir/Project.toml`
  [ac2c24cd] Lighthouse v0.11.3

Support custom soft label hardening in `TradeoffMetricsRow` calculations

Hacky prototype in #68; now that #69 has landed, we can do it better/for real.

RFC: Remove training harness wrappers (`learn!`, `evaluate!`) from Lighthouse.jl?

I took a recent time-boxed* attempt at deprecating EvaluationRow (#71), and it was Not A Fun Time™. I'm not sure what the correct thing to do here is, and I think that it is very possible that we should remove some (all?) of the training harness aspects of Lighthouse.jl altogether---or at least, we should figure out the minimal set of them that users actually depend on.

We've spent a certain amount of time maintaining the existing harness behavior, but afaik our internal usage of the harness itself is basically non-existent, as teams have combined the Lighthouse components in their own ways to support their specific use-cases. (Specifics to be left to a Beacon slack thread!) Externally, I would be surprised if anyone was depending on them (although if someone is, please say the word!!). Additionally, https://github.com/beacon-biosignals/LighthouseFlux.jl/blob/main/src/LighthouseFlux.jl does not use learn! or evaluate!.

*technically it was also [hurtling-through-]space-boxed as well, huzzah for transcontinental flight....

The particular hole I fell down was trying to cleanly swap out the use of evaluation_metrics_row (and its corresponding log_evaluation_row!) inside evaluate!.

Lighthouse.jl/src/learn.jl

Line 233 in 0540cdd

metrics = evaluation_metrics_row(predicted_hard_labels, predicted_soft_labels,

Why? Well, the existing usage conflates tradeoff metrics and hardened metrics, and we shouldn't be computing (and logging) both there without the caller specifying things like how the per-class binarization should happen (to create the hardened metrics), and whether they want to additionally compute multirater label metrics (i.e., if votes exist), and whether they additionally want to compute additional metrics that (until now) have only been computed for binary classification problems.

At the point that we expose separate interfaces for the various combinations of inputs that a user might want (mulitclass v binary, single rater v multirater, binarization choice, pre-computed predicted_hard_labels for multiclass input), we will have created a lot more code to maintain for a lot less benefit. At the end of the day, asking the caller to implement their own evaluate!(model::AbstractClassifier, ...) would be easier.

...which seems like a viable option. Except what inputs would a template evaluate! function take? What combination of the existing

predicted_hard_labels::AbstractVector,
              predicted_soft_labels::AbstractMatrix,
              elected_hard_labels::AbstractVector,
              classes, logger;
              logger_prefix, logger_suffix,
              votes::Union{Nothing,AbstractMatrix}=nothing,
              thresholds=0.0:0.01:1.0, optimal_threshold_class::Union{Nothing,Integer}=nothing

? Because the choice depends on the type of metrics being computed in the first place.

If various projects were depending on this evaluate! function, it seems like choosing some minimal required arguments (

predicted_soft_labels::AbstractMatrix,
              elected_hard_labels::AbstractVector,
              classes, logger;
              logger_prefix, logger_suffix

?) and passing the others as kwargs from the learn! function might do it. Buuuut I'm not sure how many folks use this function in the first place!

Which brings me to: what is the correct thing to do here? And does anyone use the learn! and evaluate! functions, or could/should we remove them except as examples (in the docs) for how to implement a custom loop for one's own model?

To quote @ericphanson, when talking about this issue,

I think a harness like that it just too rigid. We just need a lot of useful primitives like metrics, logging, plots, etc, and then we can combine them in different ways as makes sense for the project

I agree---and so I think losing the evaluate! and learn! calls as they currently exist would allow us to focus on those primitives and sink less time into maintaining code that isn't used. And if more flexible (or smaller scoped) harnesses make their way back into Lighthouse.jl, I think that would be cool also, but we should let that happen from the ground up once we create training loops that actively work for us and could be shared.

Log time spent in post-epoch callback?

I think this would be nice to have. Could just put it in a log_resource_info! I think.

Path to `EvaluationRow` deprecation

Once #69 lands (via #70), the next piece is to refactor out the individual plotting components in relation to their given *MetricsRow---and then we can get rid of EvaluationRow, which combines multiple unrelated metrics types in a confusing (and easy to misinterpret) way.

(Ref brief discussion in #70 (comment))

Tasks:

Replace evaluation_metrics_plot with per-schema plots that take in a table of the relevant metrics row
Remove EvaluationRow and remove evaluation_metrics_row

Package is unsupported on Apple Silicon (M1/M2 chips)

A dependency of this package, FFMPEG, cannot be built due to the following exception:

ERROR: Error building `FFMPEG`:
┌ Warning: Platform `arm64-apple-darwin22.4.0` is not an officially supported platform
└ @ BinaryProvider ~/.julia/packages/BinaryProvider/U2dKK/src/PlatformNames.jl:450
ERROR: LoadError: KeyError: key "unknown" not found

Since this dependency cannot be built, the Lighthouse package itself won't build. BinaryProvider.jl is more or less a deprecated package that hasn't been updated since 2020, but it appears that FFMPEG still relies on it. I will file a related issue there as well.

convert Plots.jl -> AbstractPlotting.jl

(transferred from https://github.com/beacon-biosignals/OldLighthouse.jl/issues/134)

The consensus is that AbstractPlotting is easier to work with and gives you more control, and that the conversion shouldn't be too much work -- @kolia

Compute AUC without needing specific thresholds

Currently, Lighthouse asks for a set of input thresholds (e.g. 0.0:0.01:1.0), then computes binary stats for each threshold, then forms corresponding fpr/tpr curves, then computes the area underneath those curves w/ the trapezoid rule.

I came across An Improved Model Selection Heuristic for AUC today which among other things, pointed out that you can compute the AUC without reference to specific thresholds by sorting the soft-labels of all instances, and then doing a particular kind of count:

I think this is a nice algorithm we should consider using, so that the accuracy of the AUC estimate is not influenced by the granularity of the thresholds chosen.

Remove LearnLogger?

After #60 we've defined an interface that folks can plug into for custom logging.

We could drop the Tensorboard dep by removing the LearnLogger. We could make a no-op logger that doesn't actually log anything, or maybe a Dict logger that logs things to a logged dict as the current one does, which could be good for testing. Or we could keep the current LearnLogger.

Add bootstrapped confidence intervals to any/all metrics

(transferred from beacon-biosignals/OldLighthouse.jl#126)

Regarding the bootstrapping stuff ... I actually have a Julia package for jackknife resampling and estimation https://github.com/ararslan/Jackknife.jl -- @ararslan

In `EvaluationRow`, use `Matrix{Int64}` instead of `Array{Int64}`

As introduced in #45, and blocked by apache/arrow-julia#125. Fix will be to line

Lighthouse.jl/src/row.jl

Line 18 in 9e9de96

confusion_matrix::Union{Missing,Array{Int64}} = vec_to_mat(confusion_matrix),

Update `log_evaluation_row!` to not special case Spearman correlation coefficient

okay, i'm not "allowed" to comment below this point, but I think we should specialize the below implementation of
log_evaluation_row! on LearnLogger so that special-casing spearman correlation doesn't have to happen by default---that is a very tb-specific logged item. E.g.,

function log_evaluation_row!(logger, field::AbstractString, metrics)
    metrics_plot = evaluation_metrics_plot(metrics)
    metrics_dict = _evaluation_row_dict(metrics)
    log_plot!(logger, field, metrics_plot, metrics_dict)
    return metrics_plot
end

or, better:

function log_evaluation_row!(logger, field::AbstractString, metrics)
    metrics_plot = evaluation_metrics_plot(metrics)
    log_plot!(logger, field, metrics_plot) # if we make the plot_data field optional
    # optionally, could also then log each field of `metrics_plot` with `log_values!`
    return metrics_plot
end

and

function log_evaluation_row!(logger::LearnLogger, field::AbstractString, metrics)
    metrics_plot = evaluation_metrics_plot(metrics)
    metrics_dict = _evaluation_row_dict(metrics)
    log_plot!(logger, field, metrics_plot, metrics_dict)
    if haskey(metrics_dict, "spearman_correlation")
        sp_field = replace(field, "metrics" => "spearman_correlation")
        log_value!(logger, sp_field, metrics_dict["spearman_correlation"].ρ)
    end
    return metrics_plot
end

Originally posted by @hannahilea in #60 (comment)

Add "abstract" logging interface

We should abstract our logging interface so there is no default logger, and folks can provide methods for a well-documented interface, likely based on this code:

Lighthouse.jl/src/learn.jl

Lines 16 to 55 in 344672c

 struct LearnLogger 

 path::String 

 tensorboard_logger::TensorBoardLogger.TBLogger 

 logged::Dict{String,Vector{Any}} 

 end 

 function LearnLogger(path, run_name; kwargs...) 

 tensorboard_logger = TBLogger(joinpath(path, run_name); kwargs...) 

 return LearnLogger(path, tensorboard_logger, Dict{String,Any}()) 

 end 

 function log_event!(logger::LearnLogger, value) 

 logged = string(now(), " | ", value) 

 TensorBoardLogger.log_text(logger.tensorboard_logger, "events", logged) 

 return logged 

 end 

 function log_plot!(logger::LearnLogger, field::AbstractString, plot, plot_data) 

 values = get!(() -> Any[], logger.logged, field) 

 push!(values, plot_data) 

 TensorBoardLogger.log_image(logger.tensorboard_logger, field, plot; step=length(values)) 

 return plot 

 end 

 function log_value!(logger::LearnLogger, field::AbstractString, value) 

 values = get!(() -> Any[], logger.logged, field) 

 push!(values, value) 

 TensorBoardLogger.log_value(logger.tensorboard_logger, field, value; 

 step=length(values)) 

 return value 

 end 

 function log_resource_info!(logger, section::AbstractString, info::ResourceInfo; 

 suffix::AbstractString="") 

 log_value!(logger, section * "/time_in_seconds" * suffix, info.time_in_seconds) 

 log_value!(logger, section * "/gc_time_in_seconds" * suffix, info.gc_time_in_seconds) 

 log_value!(logger, section * "/allocations" * suffix, info.allocations) 

 log_value!(logger, section * "/memory_in_mb" * suffix, info.memory_in_mb) 

 return info 

 end

GR inline evaluates into closed module

This is breaking for Lighthouse on 1.6.

place a screencap of Lighthouse's default plots in the README

just realized this would be a nice illustration of what the package currently provides :)

Add bootstrapped confidence intervals to various metrics

Should probably be optional (as it will increase computation time!); should add least include CI to inter-rater agreement, possibly also ROC curves etc.

Automatically determine class thresholding for plotting calibration curve

(transferred from https://github.com/beacon-biosignals/OldLighthouse.jl/issues/82)

Add plot of class distribution to default metrics plots

"Is there class imbalance" and "how many observations are in each class?" are common questions people ask when viewing the output Lighthouse metrics plots. Let's add a class distribution bar plot (or similar) to the default plots to answer these questions!

Update doc examples to use meaningful test data

Right now, the examples in the docs are generated from test data that shows very bad/very poorly performing (basically random!) model performance. Update test data to approximate values from a "decently" performing model, so that metrics plots are meaningful/interpretable and demonstrate accurate implementation (i.e., give someone looking at the ROC curve demo a reason to trust that it is a correctly implemented ROC curve!).

Follow-up: can use the results to generate #17.

`binary_statistics` reports sensitivies as 1 instead of NaN when no actual positives

Lighthouse.jl/src/metrics.jl

Lines 76 to 87 in e829617

 true_positive_rate = (true_positives == 0 && actual_positives == 0) ? 

 (one(true_positives) / one(actual_positives)) : 

 (true_positives / actual_positives) 

 true_negative_rate = (true_negatives == 0 && actual_negatives == 0) ? 

 (one(true_negatives) / one(actual_negatives)) : 

 (true_negatives / actual_negatives) 

 false_positive_rate = (false_positives == 0 && actual_negatives == 0) ? 

 (zero(false_positives) / one(actual_negatives)) : 

 (false_positives / actual_negatives) 

 false_negative_rate = (false_negatives == 0 && actual_positives == 0) ? 

 (zero(false_negatives) / one(actual_positives)) : 

 (false_negatives / actual_positives)

IMO 1.0 is misleading here and NaN would be better. I think this is a breaking change.

Feature request: `binary_comparison_metrics_plot`

The idea is to have something like evaluation_metrics_plot except for when you have more than 1 model, and for the simpler case of binary classifiers instead of multi-class. Ideally, this plot would include:

ROC curves (note in the binary case, we only really care about 1 ROC curve per model, since the other classes' curve is mirrored from it)
Detection calibration curves
interrater reliability (possibly with something more compact than a bar chart)
intermodel reliability-- like interrate reliability except comparing the models' predictions
and maybe, though less importantly IMO, prediction reliability calibration curves and PR curves

Ideally the plots could scale from 2 to say 20 models, though if 20 gets cramped we could split it up and only do smaller subsets.

Essentially, it would look like evaluation_metrics_plot except for most of the plots, it would be 1 curve per model instead of 1 curve per class.

Add Krippendorff’s alpha?

It is a measure of inter-rater reliability and there is a Julia package to compute it: https://discourse.julialang.org/t/ann-krippendorff-jl/54261

Add `log_confusion_matrix` to logging interface

This way we can call it from log_evaluation_row!. Perhaps as the fallback we can call log_value! on each entry.

Adjust color thresholds for confusion matrix plotting

(transferred from beacon-biosignals/OldLighthouse.jl#115)

the numbers should be white when the color in a cell is below a certain threshold, so that we can still read them

Lighthouse should consume tables

This is the counterpart to #45 which is about producing rows. It would be nice if the inputs were tables.

Create Legolas schema with 1 column per segment with columns for what we need in order to evaluate (just votes, predicted_soft_labels, elected_hard_labels?)
Write an evaluation_metrics method that consumes tables with this schema, along with keyword arguments (classes, thresholds, probably?)

Exception thrown in area_under_curve

This is a bit of a weird issue - just popped up during training:

Seems like it could be:

Types are too loose, letting something in which should throw earlier.
This is a bug and we should define one(::Type{Union{}}) = 1 for example.

`evaluation_metrics_plot` fails when IRA fields are in the input `EvaluationRow` for multiclass result

Currently, if votes is not provided to Lighthouse.evaluation_metrics_row, it outputs a row with per_class_IRA_kappas and multiclass_IRA_kappas fields defined as missing. If this is a multiclass result, then passing in such a row to evaluation_metrics_plot fails because it doesn't expect these fields to exist when not dealing with multi-rater metrics.

For example, for a 5 class EvaluationRow, it throws:

ERROR: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 12 and 8")

`true_negatives` in `binary_statistics()` is incorrect

The computation of true negatives based on the confusion matrix looks like it's summing only the diagonal of the submatrix instead of the full submatrix of all classes except the class of interest. Since this is supposed to be a binary metric, it should be summing all classes that are not the class of interest that are also not predicted to be the class of interest.

Lighthouse.jl/src/metrics.jl

Line 73 in e829617

true_negatives = sum(diag(confusion)) - true_positives

Update confusion matrix to give both counts and normalized values in one figure

We currently have the option to generate row-, and column-normalized confusion matrices, or non-normalized confusion matrices.

Having an option to generate normalized CM that also shows the counts are very helpful to better understand class-imbalances.

Example below:

The great `evaluation_metrics_row` refactor!

Right now, evaluation_metrics_row is kludgy and tries to do way too much stuff, which makes it (a) hard to know exactly what your outputs relate to and (b) hard to customize any internal subfunctions without rewriting the entire loop (or threading some new param everywhere). It also combines a bunch of different types of output metrics into a single hard-to-parse schema (EvaluationRow). We'd like to refactor this!

Plan (as devised w/ @ericphanson): Split current evaluation_metrics_row function into three separate (types of) functions:

a function that covers everything not dependent on a specific threshold OR on predicted_hard_labels (e.g., roc curves, pr curves, some calibration curves maybe)
function(s) that choose a threshold, given the output of step 1 - this would also allow choice of specific desired threshold, e.g., thresh for given specificity OR thresh based on calibration curves (default now) OR thresh based on ROC curve minimization (other default currently!)
functions that calculates derived stats that are based on hardened predictions (e.g., confusion matrix, ea kappas, etc) (multiclass AND per-class rows); two methods:
- (convenience option) takes a hardening option + threshold + observations, compute predicted_hard_labels internally
- takes predicted_hard_labels, goes from there

What is the output if you call all three steps??

Currently, we return a single EvaluationRow, which contains threshold-dependent AND threshold-independent metrics, and where each field contains results for all classes AND (when valid) a multiclass value. (i.e., some fields are per class, some fields are multiclass, some contain both; some fields threshold dependent, some are threshold independent
After refactor, those should all be separate things:
- Should have separate schemas for threshold-dependent (from step 3) and threshold-independent statistics (from step 1)
- Each class should have its own output row(s) - #row
- Multiclass stats should be its own row - confusion matrix & multiclass expert-agreement kappa (if valid)

Other important points:

This doesn't necessarily account for specialized hardening (a la #68), but would make it clearer to implement. What needs a hardening function? Step 1 and the convenience option of step 3.
As much as possible we should reorganize the loop structure so that instead of doing each computation over all classes before moving on to the next metric, we instead do all computations for a single class and then iterate over all classes at the outer loop
make sure docs are clear about what values do and don't depend on predictions (i.e., ira does not!)
Can make a breaking release with all of these changes; depending on how much time it takes, can add a deprecation path to construct the current EvaluationRow from new row types (might be useful for testing that change adds no functional change, otherwise file it for later)
How does this interact with plotting? Plotting is its own special hell that we'll likely want to refactor also, but for now,

Remove PR-Gain curves from reported metrics

(transferred from beacon-biosignals/OldLighthouse.jl#130)

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Docs should mention that a Makie backend needs to be loaded

Okay yup, using WGLMakie as a backend did the trick. But, was this mentioned in the docs that we have to activate one of the Maxkie backend?

Originally posted by @abhinav3398 in #82 (comment)

upgrade to Legolas 0.5

We define two schemas

"lighthouse.evaluation@1",
"lighthouse.observation@1"
"lighthouse.class@1"
"lighthouse.label-metrics@1" > "lighthouse.class@1"
"lighthouse.hardened-metrics@1" >"lighthouse.class@1"
"lighthouse.tradeoff-metrics@1" >"lighthouse.class@1"

We use Legolas.validate in code and tests.

TODOs

Backend plotting settings do not persist, lead to downstream crashes iff GUI cannot launch

(transferred from https://github.com/beacon-biosignals/OldLighthouse.jl/issues/110)

Note:

Likely a bug in dependencies, not Lighthouse.jl itself
Only a problem if GUI cannot launch, e.g., on AWS instances

	"""
	cohens_kappa(class_count, hard_label_pairs)

	Return `(κ, p₀)` where `κ` is Cohen's kappa and `p₀` percent agreement given
	`class_count` and `hard_label_pairs` (these arguments take the same form as
	their equivalents in [`confusion_matrix`](@ref)).
	"""
	function cohens_kappa(class_count, hard_label_pairs)
	all(issubset(pair, 1:class_count) for pair in hard_label_pairs) \|\|
	throw(ArgumentError("Unexpected class in `hard_label_pairs`."))
	p₀ = accuracy(confusion_matrix(class_count, hard_label_pairs))
	pₑ = _probability_of_chance_agreement(class_count, hard_label_pairs)
	return _cohens_kappa(p₀, pₑ), p₀
	end

	_cohens_kappa(p₀, pₑ) = (p₀ - pₑ) / (1 - ifelse(pₑ == 1, zero(pₑ), pₑ))

	function _probability_of_chance_agreement(class_count, hard_label_pairs)
	labels_1 = (pair[1] for pair in hard_label_pairs)
	labels_2 = (pair[2] for pair in hard_label_pairs)
	x = sum(k -> count(==(k), labels_1) * count(==(k), labels_2), 1:class_count)
	return x / length(hard_label_pairs)^2
	end

	struct LearnLogger
	path::String
	tensorboard_logger::TensorBoardLogger.TBLogger
	logged::Dict{String,Vector{Any}}
	end

	function LearnLogger(path, run_name; kwargs...)
	tensorboard_logger = TBLogger(joinpath(path, run_name); kwargs...)
	return LearnLogger(path, tensorboard_logger, Dict{String,Any}())
	end

	function log_event!(logger::LearnLogger, value)
	logged = string(now(), " \| ", value)
	TensorBoardLogger.log_text(logger.tensorboard_logger, "events", logged)
	return logged
	end

	function log_plot!(logger::LearnLogger, field::AbstractString, plot, plot_data)
	values = get!(() -> Any[], logger.logged, field)
	push!(values, plot_data)
	TensorBoardLogger.log_image(logger.tensorboard_logger, field, plot; step=length(values))
	return plot
	end

	function log_value!(logger::LearnLogger, field::AbstractString, value)
	values = get!(() -> Any[], logger.logged, field)
	push!(values, value)
	TensorBoardLogger.log_value(logger.tensorboard_logger, field, value;
	step=length(values))
	return value
	end

	function log_resource_info!(logger, section::AbstractString, info::ResourceInfo;
	suffix::AbstractString="")
	log_value!(logger, section * "/time_in_seconds" * suffix, info.time_in_seconds)
	log_value!(logger, section * "/gc_time_in_seconds" * suffix, info.gc_time_in_seconds)
	log_value!(logger, section * "/allocations" * suffix, info.allocations)
	log_value!(logger, section * "/memory_in_mb" * suffix, info.memory_in_mb)
	return info
	end

	true_positive_rate = (true_positives == 0 && actual_positives == 0) ?
	(one(true_positives) / one(actual_positives)) :
	(true_positives / actual_positives)
	true_negative_rate = (true_negatives == 0 && actual_negatives == 0) ?
	(one(true_negatives) / one(actual_negatives)) :
	(true_negatives / actual_negatives)
	false_positive_rate = (false_positives == 0 && actual_negatives == 0) ?
	(zero(false_positives) / one(actual_negatives)) :
	(false_positives / actual_negatives)
	false_negative_rate = (false_negatives == 0 && actual_positives == 0) ?
	(zero(false_negatives) / one(actual_positives)) :
	(false_negatives / actual_positives)

beacon-biosignals / lighthouse.jl Goto Github PK

lighthouse.jl's People

Contributors

Stargazers

Watchers

Forkers

lighthouse.jl's Issues

TODOs

Recommend Projects

Recommend Topics

Recommend Org