Giter Club home page Giter Club logo

klib's Introduction

Hi there, I'm a Quantitative Developer based in Germany! ๐Ÿ‘‹

Software Engineer ๐Ÿ‘จโ€๐Ÿ’ป | Economist ๐ŸŽ“ | Researcher ๐Ÿ“š

  • ๐Ÿ Currently working on a package that facilitates data exploration and data cleaning ๐Ÿ‘‰ klib
  • ๐Ÿ’ณ Interested in projects and collaborations related to banking and financial markets ๐Ÿฆ
  • ๐Ÿ“š Take a look at the coding exercises on Exercism and reach out to me for feedback and mentoring!
  • ๐Ÿ“ซ How to reach me: LinkedIn

klib's People

Contributors

akanz1 avatar deepsourcebot avatar dependabot[bot] avatar hasan-alper avatar jrrmcalcio avatar m-marqx avatar px39n avatar snyk-bot avatar withshubh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

klib's Issues

Check dependencies

  • Check if jinja2 is stil required.
  • Check dependencies
  • Check dev dependencies
  • upgrade dev dependencies

[BUG] - missinval_plot method return ValueError

Describe the bug
While trying to lot missing values I obtain the following error
ValueError: rotation must be 'vertical', 'horizontal' or a number, not 90

here the error backtrace

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [24], line 1
----> 1 klib.missingval_plot(data_interim_df)

File ~/venv/lib/python3.10/site-packages/klib/describe.py:689, in missingval_plot(data, cmap, figsize, sort, spine_color)
    687 for rect, label in zip(ax1.patches, mv_cols):
    688     height = rect.get_height()
--> 689     ax1.text(
    690         rect.get_x() + rect.get_width() / 2,
    691         height + max(np.log(1 + height / 6), 0.075),
    692         label,
    693         ha="center",
    694         va="bottom",
    695         rotation="90",
    696         alpha=0.5,
    697         fontsize="11",
    698     )
    700 ax1.set_frame_on(True)
    701 for _, spine in ax1.spines.items():

File /shared-libs/python3.10/py/lib/python3.10/site-packages/matplotlib/axes/_axes.py:678, in Axes.text(self, x, y, s, fontdict, **kwargs)
    617 """
    618 Add text to the Axes.
    619 
   (...)
    668     >>> text(x, y, s, bbox=dict(facecolor='red', alpha=0.5))
    669 """
    670 effective_kwargs = {
    671     'verticalalignment': 'baseline',
    672     'horizontalalignment': 'left',
   (...)
    676     **kwargs,
    677 }
--> 678 t = mtext.Text(x, y, text=s, **effective_kwargs)
    679 t.set_clip_path(self.patch)
    680 self._add_text(t)

File /shared-libs/python3.10/py/lib/python3.10/site-packages/matplotlib/_api/deprecation.py:454, in make_keyword_only.<locals>.wrapper(*args, **kwargs)
    448 if len(args) > name_idx:
    449     warn_deprecated(
    450         since, message="Passing the %(name)s %(obj_type)s "
    451         "positionally is deprecated since Matplotlib %(since)s; the "
    452         "parameter will become keyword-only %(removal)s.",
    453         name=name, obj_type=f"parameter of {func.__name__}()")
--> 454 return func(*args, **kwargs)

File /shared-libs/python3.10/py/lib/python3.10/site-packages/matplotlib/text.py:178, in Text.__init__(self, x, y, text, color, verticalalignment, horizontalalignment, multialignment, fontproperties, rotation, linespacing, rotation_mode, usetex, wrap, transform_rotates_text, parse_math, **kwargs)
    176 self.set_horizontalalignment(horizontalalignment)
    177 self._multialignment = multialignment
--> 178 self.set_rotation(rotation)
    179 self._transform_rotates_text = transform_rotates_text
    180 self._bbox_patch = None  # a FancyBboxPatch instance

File /shared-libs/python3.10/py/lib/python3.10/site-packages/matplotlib/text.py:1197, in Text.set_rotation(self, s)
   1195     self._rotation = 90.
   1196 else:
-> 1197     raise ValueError("rotation must be 'vertical', 'horizontal' or "
   1198                      f"a number, not {s}")
   1199 self.stale = True

ValueError: rotation must be 'vertical', 'horizontal' or a number, not 90

[BUG] - data cleaning sometimes returns float32 instead of float64

Describe the bug
Hi @akanz1, first of all thanks for this amazing package. I do not know whether this is properly a bug.
The cleaning function sometimes converts data to float32 instead of float64, and the dist_plot function returns a ValueError: data type <class 'numpy.object_'> not inexact . If I manually convert the data with .astype(float)everything works fine.

Here is the data that produces the error, you can create a data frame col with that data and try out

[0.0,
2.0331595,
2.0331595,
0.0,
0.0,
2.0331595,
2.0331595,
0.0,
0.0,
2.0331595,
2.0331595,
2.0331595,
1.0165797,
2.0331595,
2.0331595,
0.0,
2.0331595,
1.0165797,
0.0,
2.0331595,
2.0331595,
0.0,
0.0,
2.0331595,
2.0331595,
0.0,
0.0,
2.0331595,
2.0331595,
2.0331595,
1.0165797,
2.0331595,
2.0331595,
0.0,
2.0331595,
1.0165797,
0.0,
2.0331595,
2.0331595,
0.0,
0.0,
2.0331595,
2.0331595,
0.0,
0.0,
2.0331595,
2.0331595,
2.0331595,
1.0165797,
2.0331595,
2.0331595,
0.0,
2.0331595,
1.0165797,
0.0,
2.0331595,
2.0331595,
0.0,
0.0,
2.0331595,
2.0331595,
0.0,
0.0,
2.0331595,
2.0331595,
2.0331595,
1.0165797,
2.0331595,
2.0331595,
0.0,
2.0331595,
1.0165797,
0.0,
2.0331595,
2.0331595,
0.0,
0.0,
2.0331595,
2.0331595,
0.0,
0.0,
2.0331595,
2.0331595,
2.0331595,
1.0165797,
2.0331595,
2.0331595,
0.0,
2.0331595,
1.0165797]

[BUG] - Plots not showing up in Jupyter Notebooks on Mac M1

Describe the bug

Plots not showing up in jupyter notebooks on mac m1

To Reproduce
Steps to reproduce the behavior:

  1. Install the library on a fresh conda enviroment (on macbook air M1, big sur)
  2. run the jupyter notebook
  3. import data from seaborn
  4. Plot the charts (the corr_mat does show up).

Expected behavior
The plots should show up as per the library homepage.

Screenshots
image

Desktop (please complete the following information):

  • OS: Mac Big Sur
  • Browser: Chrome/VS Code + Jupyter Notebooks

[BUG] - Cannot set non-string value {value} into a StringArray with klib.cat_plot

Once running klib.cat_plot(df_cleaned) I get this error:
Cannot set non-string value '2' into a StringArray.

Screen Shot 2020-10-05 at 11 01 32 AM

Python version: Python 3.6.9
Pandas version: 1.1.2

My data look like this:
0 vmid string
1 subscriptionid category
2 deploymentid category
3 vmcreated int32
4 vmdeleted int32
5 maxcpu float32
6 avgcpu float32
7 p95maxcpu float32
8 vmcategory category
9 vmcorecountbucket category
10 vmmemorybucket category
11 lifetime_h float32

[BUG] - Broken missing values plot for small percentage of missing values

Describe the bug
The missing value plot gets broken for a small missing values percentage.

To Reproduce

df = pd.DataFrame.from_dict({'col': np.ones(1000)})
df.loc[:2] = np.NaN
klib.missingval_plot(df)

Expected behavior
Appropriate positioning of the text, y axes labels

Screenshots
image

I have a fix, but I'm not allowed to push my branch. How can I do it?

P.S. awesome library, thanks for the work!

[BUG] - numpy overflow encountered in reduce

Thanks for sharing this package, I'm loving it!

I did run into a bug today. When I try to run dist_plot on my dataset, I get the following message:

\numpy\core_methods.py:160: RuntimeWarning:
overflow encountered in reduce

I isolated it down to one particular series in my dataframe. It's not one I really care about, but maybe someone else will run into it for a series they DO care about. Here's a describe() after running it through klib's data_cleaning function:

df.created_at.describe()
count 5.213400e+04
mean 1.610795e+12
std 4.225043e+08
min 1.609891e+12
25% 1.610552e+12
50% 1.610838e+12
75% 1.611198e+12
max 1.611274e+12
Name: created_at, dtype: float64

Meanwhile, info() reports something different:

df.info()
...
2 created_at 52134 non-null float32
...

Notice one reports float32 while the other says float64... Seems fishy.

I'm using miniconda on Windows 10.
conda v4.9.2
numpy v1.19.5
klib v0.1.0

If you need me to provide my dataset, I can do so.

CI updates

  • add python 3.12-dev to CI
  • replcae flake8/pylint/reorder_imports with ruff
  • dependency updates

[BUG] - ... The command -- klib.dist_plot(df) does not plot the distribution for all the numeric features of a Dataframa

Describe the bug
The issue is that the the command -- klib.dist_plot(df) does not plot the distribution for all the numeric features of a Dataframa it just plots the ditribution for the first numeric feature only.
To Reproduce
Steps to reproduce the behavior:

  1. Go to python notebook and import klib
  2. Create a dataframe from any dataset in my case i used "df = pd.read_csv('https://raw.githubusercontent.com/agconti/kaggle-titanic/master/data/train.csv')"
  3. then use - klib.dist_plot(df) to plot the distribution
  4. See error that it plots for the first numeric only in case use also set the showall to True

Screenshots
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.