A set of functions to help the analytical work of a data scientist. Full documentation can be found in Package Homepage. The main motivation of the package is to facilitate the taks by using a common input and output structure, SparkDF and PandasDF.
The package can be installed either using PyPi:
pip install ds-toolbox
Or directly form github:
pip install git+https://github.com/viniciusmsousa/ds-toolbox.git#main
- statistics:
contigency_chi2_test
: Wrapper for Scipy function;mannwhitney_pairwise
: Wrapper for scikit-posthocs function;ks_test
: Compute the KS-Test, for Pandas and Spark DF;ab_test_pairwise
: A Simple ab test based on mean, std and var, PandasDF and Spark DF.
- ml:
- evaluator:
binary_classifier_metrics
: Computes classification metrics (confusion_matrix, accuracy, f1, precision, recall, aucroc, aucpr) based on a dataframe (SparkDF or PandasDF) with ground truth and prediction.
- evaluator:
- econometrics:
- causal_regression:
CausalRegression
: A class built on top of what is presented in the chapters 19-21 from the book Causal Inference for The Brave and True.
- causal_regression: