Giter Club home page Giter Club logo

datasist's Introduction

Hi there ๐Ÿ‘‹

My Name is Rising Odegua. I am Full Stack Software Engineer. I combine my knowledge of software and data science to build data driven products that can solve problems.

Strong Languages: JavaScript, Python, and Typescript

  • ๐Ÿ”ญ Iโ€™m currently building and maintaining open source data tools like on Danfojs, Dnotebook, Datasist etc.
  • ๐Ÿ‘ฏ Iโ€™m looking to collaborate on open source tools for data science and machine learning.
  • ๐Ÿ’ฌ Ask me about OSS, Software Engineering, Machine Learning and Data Science.
  • ๐Ÿ“ซ Connect with me: Linkedin.

Latest Updates:

My latest writings are:

  • See more of my technical articles on Medium

  • Learn more about on my website

Rising's github stats

datasist's People

Contributors

aminuisrael avatar codebrain001 avatar dependabot[bot] avatar e-stat avatar emekaborisama avatar emmarex avatar ezekielolugbami avatar kennyrich avatar marquisvictor avatar mihael147work avatar nelsonchris1 avatar opeyemibami avatar rexsimiloluwah avatar risenw avatar thayeylolu avatar tosi-n avatar vishwaak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datasist's Issues

Vizualization

  • Plotting confusion Matrix

  • Plotting Scatter plots with up to 5 legends

  • Plotting ROC Curve

Add a function to timeseries.py, which displays crypto coin market data as a Candlestick Chart

Utilizing Plotly to make a "Coin-USD" market data Candlestick plot

Required Packages

  • NumPy

  • Pandas

  • Matplotlib

  • yfinance

  • Plotly

  • kaleido

  • Random (Python built-in)


โœ… - signifies packages currently in requirements.txt

  Default function - get_crypto_visuals
  โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“โ€“

def get_crypto_visuals(coin, period='5d', interval='15m', MA=False,days=[7,25,99], boll=False, boll_sma=25, save_fig=False, img_format='png'):



โ€œโ€โ€

  ----------------------SKIPPING PARAMETER DESCRIPTION TO END OF DOCSTRING--------------------
  



  Examples of valid use

    ----------------

  >>>  get_crypto_visuals("ETH")

  >>>  get_crypto_visuals("ETH", MA=True)

  >>>  get_crypto_visuals("ETH", period="5d", interval="15m",MA=True, days=[5, 20], save_fig=True)

  >>>  get_crypto_visuals("ETH", period="5d", interval="15m", boll=True, boll_sma=26, save_fig=True, img_format='jpeg')

โ€โ€โ€

-------Main Code goes here-------

End Result Snippet

ETH

ValueError: Cannot take a larger sample than population when 'replace=False'

The to_date function in the feature_engineering module raises the above error when the row size of a dataset is less than 20. I noticed that the to_date function calls a function get_date_cols in the structdata module. This calls a private function _match_date (also in the structdata module), which returns a list of columns that matches the DateTime expression. The default sample size in the _match_date function is 20. Although, Users hardly use a data set less than 20. I was thinking it would be nice if the sample_size equal to the row size of the dataset when the row size of the dataset is less than 20 and it should be set to default if otherwise.

image

Add a log transformation feature

A feature that helps to normalize data using the log_transfromation method and also visualizes the dataset as it shows the skewness also.

Unable to extract categorical features from dataframe using datasist

i once used datasist to extract both numerical and categorical features successfully. Now, i tried to re run my codes, i only see numerical features being extracted and my categorical features is blank. Even, it extracted all the 69 features in the dataframe as numerical features while categorical features is blank. Surprisingly, it ran perfectly okay before the rerun showing both numerical and categorical features respectively, and due to this i was able to do feature engineering for my test dataset in the first place. Now that i wanted to conduct the same feature engineering for my train dataset, it only extracted numerical features leaving categorical features blank. Knowingfully well after using "dtype" that there are many object features(i.e categorical features) in both train and test datasets. In fact, there are only 25 numerical features out of 69, others are categorical.
What can be the problem? I tried running my codes both on Kaggle notebook and google colab, the same problem keeps persisting.

Exploratory Data Analysis in one line of code

In regard to the vision behind Dataist, I dislike when i always have to perform EDA functions in Pandas one after the other like; df.describe, df.shape, df.dtype and so on. I will like to have it all in one line of code.

Add Tests

Tests should be added for all functionality to prevent breaking in future. An CI to can be used to run PRs against to ensure its not breaking any existing functionality

droping missing values

----> 4 df=ds.drop_missing(data=data, percent=80)

AttributeError: module 'datasist' has no attribute 'drop_missing'

So is this not in the function

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.