Giter Club home page Giter Club logo

oreum_core's Introduction

Oreum Core Tools oreum_core

Python License GitHub Release PyPI CI publish

code style: black code style: flake8 code style: isort code style: interrogate code security: bandit


1. Description and Scope

This is an ever-growing package of core tools for use on client projects by Oreum Industries.

  • Provides an essential workflow for data curation, EDA, basic ML using the core scientific Python stack incl. numpy, scipy, matplotlib, seaborn, pandas, scikit-learn, umap-learn
  • Optionally provides an advanced Bayesian modelling workflow in R&D and Production using a leading probabilistic programming stack incl. pymc, pytensor, arviz (do pip install oreum_core[pymc])
  • Optionally enables a generalist black-box ML workflow in R&D using a leading Gradient Boosted Trees stack incl. catboost, xgboost, optuna, shap (do pip install oreum_core[tree])
  • Also includes seevral utilities for text cleaning, sql scripting, file handling

This package is:

  • A work in progress (v0.y.z) and liable to breaking changes and inconvenience to the user
  • Solely designed for ease of use and rapid development by employees of Oreum Industries, and selected clients with guidance

This package is not:

  • Intended for public usage and will not be supported for public usage
  • Intended for contributions by anyone not an employee of Oreum Industries, and unsolicitied contributions will not be accepted.

Notes

  • Project began on 2021-01-01
  • The README.md is MacOS and POSIX oriented
  • See LICENCE.md for licensing and copyright details
  • See CONTRIBUTORS.md for list of contributors
  • This uses a logger named 'oreum_core', feel free to incorporate or ignore
  • Hosting:
    • Source code repo on GitHub
    • Source code release on GitHub
    • Package release on PyPi

2. Instructions to Create Dev Environment

For local development on MacOS

2.0 Pre-requisite installs via homebrew

  1. Install Homebrew, see instuctions at https://brew.sh
  2. Install direnv, git, git-lfs, graphviz, zsh
$> brew update && upgrade
$> brew install direnv git git-lfs graphviz zsh

2.1 Git clone the repo

Assumes direnv, git, git-lfs and zsh installed as above

$> git clone https://github.com/oreum-industries/oreum_core
$> cd oreum_core

Then allow direnv on MacOS to autorun file .envrc upon directory open

2.2 Create virtual environment and install dev packages

Notes:

  • We use conda virtual envs controlled by mamba (quicker than conda)
  • We install packages using miniforge (sourced from the conda-forge repo) wherever possible and only use pip for packages that are handled better by pip and/or more up-to-date on pypi
  • Packages might not be the very latest because we want stability for pymc which is usually in a state of development flux
  • See cheat sheet of conda commands
  • The Makefile creates a dev env and will also download and preinstall miniforge if not yet installed on your system

2.2.1 Create the dev environment

From the dir above oreum_core/ project dir:

$> make -C oreum_core/ dev

This will also create some files to help confirm / diagnose successful installation:

  • dev/install_log/blas_info.txt for the BLAS MKL installation for numpy
  • dev/install_log/pipdeptree[_rev].txt lists installed package deps (and reversed)
  • LICENSES_THIRD_PARTY.md details the license for each package used

2.2.2 (Optional best practice) Test successful installation of dev environment

From the dir above oreum_core/ project dir:

$> make -C oreum_core/ test-dev-env

This will also add files dev/install_log/[numpy|scipy].txt which detail successful installation (or not) for numpy, scipy

2.2.3 (Useful during env install experimentation): To remove the dev environment

From the dir above oreum_core/ project dir:

$> make -C oreum_core/ uninstall-env

2.3 Code Linting & Repo Control

2.3.1 Pre-commit

We use pre-commit to run a suite of automated tests for code linting & quality control and repo control prior to commit on local development machines.

  • Precommit is already installed by the make dev command (which itself calls pip install -e .[dev])
  • The pre-commit script will then run on your system upon git commit
  • See this project's .pre-commit-config.yaml for details

2.3.2 Github Actions

We use Github Actions aka Github Workflows to run:

  1. A suite of automated tests for commits received at the origin (i.e. GitHub)
  2. Publishing to PyPi upon creating a GH Release
  • See Makefile for the CLI commands that are issued
  • See .github/workflows/* for workflow details

Copyright 2024 Oreum OÜ t/a Oreum Industries. All rights reserved. See LICENSE.md.

Oreum OÜ t/a Oreum Industries, Sepapaja 6, Tallinn, 15551, Estonia, reg.16122291, oreum.io


Oreum OÜ © 2024

oreum_core's People

Contributors

dependabot[bot] avatar jonsedar avatar

Stargazers

 avatar

Watchers

 avatar

oreum_core's Issues

modify classic_mod.tplx to adjust images properly

nbconvert 5.6.1

Regular latex template has images too large:

\adjustimage{max size={0.9\linewidth}{0.9\paperheight}}{((( filename )))}

I want to override using classic_mod.tplx, something like

((*- block figure -*))
    ((( super() )))
    %\renewcommand{\adjustimage}{max size={0.5\linewidth}{0.5\paperheight}}{((( filename )))}
((*- endblock figure -*))

but nothing I try works. Instead have manually hacked

/Users/jon/opt/anaconda3/envs/freberg_trk/lib/python3.8/site-packages/nbconvert/templates/latex/document_contents.tplx

with line:

\adjustimage{max size={0.9\linewidth}{0.3\paperheight}}{((( filename )))}

which is poor form!

add zero-inflated dist invcdf

useful for e.g. freq defined as claims / $exposure which is zero-inflated

consider zero-inflated poison / neg-binomial, but these are discrete and freq defined over $exposure is continuous, so prefer zero inflated something else... perhaps Gamma:

e.g.

see also:

Create 2-sided sample comparison between PPC of pred(s) vs training

Create calc_2_sample_delta_prop

Calculate 2-side sample delta difference between arrays row-wise so that we can make a statement about the difference between a test array a reference array how different arr is from arr_ref

    Basic algo
    ---------- 

    for each row i in arr:
        for each row j in arr_ref:
            do: d = arr[i] - arr_ref[j]
            do: q = quantiles[0.03, 0.97](d)
            do: b_i = q > 0
            if: sum(b_i) == 2:
                we state "94% of arr[i] > arr_ref[j], substantially larger"
            elif: sum(b_i) == 1:
                we state "not different"                
            else (sum(b_i) == 0):                
                we state "94% of arr[i] < arr_ref[j], substantially smaller"                    
        do: prop = unique_count(b) / len(b)

Then we can state "prop arr[i] larger, same, smaller than arr_ref"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.