Giter Club home page Giter Club logo

cryptocurrencies_analysis_unsupervised-ml_scikit-learn_pca_k-means's Introduction

Cryptocurrency analysis to creat a classification system for a new investment using unsupervised machine learning with scikit-learn, Pandas, hvPlot, PCA, K-means algoritms.

-Contents-

Overview of the Cryptocurrencies Analysis

The purpose of the project is to prepare an analysis for the clients who are preparing to get into the cryptocurrency market. For the purpose, and as there is no known output it needs to process data, to cluster it, to reduce the dimensions, and the principal components by using unsupervised learning. Unsupervised learning doesn't have a clear outcome or target variable and it is used to find patterns.

The analysis consists of four technical analysis deliverables as the following:

  • Preprocessing the Data for PCA;
  • Reducing Data Dimensions Using PCA;
  • Clustering Cryptocurrencies Using K-means;
  • Visualizing Cryptocurrencies Results.

Resources

The analysis was created using next software: Jupyter-notebook 6.3.0, Python 3.8.8, Pandas 1.2.4, Visual Studio Code 1.58.0, Python machine learning library scikit-learn 0.24.1 , Python visualization library hvplot 0.7.3, graphing library plotly 5.1.0.

The dataset of the analysis can be found here Crypto_data.

The Cryptocurrencies Analysis Summary

The result of the Cryptocurrencies Analysis can be found in the Crypto_clustering file.

Firstly the data should be properly prepared to can select features that help us find patterns.

During data processing, the focus was on making sure the data is set up for the following: • Null values are handled. • Only numerical data is used. • Values are scaled. In other words, data has been manipulated to ensure that the variance between the numbers won't skew results.

After data processing the Principal Component Analysis algorithm (PCA - a statistical technique to speed up machine learning algorithms when the number of input features (or dimensions) is too high) was applied and, the dimensions of the X DataFrame were reduced to three principal components.

image

Additionally, the data points should be grouped together in other word clustering data. To identify and solve clustering issues it was used K-means. The K-means algorithm groups the data into K clusters, where belonging to a cluster is based on some similarity or distance measure to a centroid (the arithmetic mean position of all the points on a cluster).

An easy method for determining the best number for K is the elbow curve. So the elbow curve was created and the angle at point 4 looks like an elbow on the graph below:

image

Finally, the clusters (results) were visualizated by using hvPlot python library.

The 3D scatter plot can be rotated using the mouse to click and drag and panned using the scroll wheel. There are now four distinct groups that correspond to the four clusters that is expectable.

image

In the results, 3D plot looks like more informative than 2D but last is easier to analyze with only two features. See bolow grath:

image

So, on the last plot it appears two clusters are overlapping and not quite forming the distincts groups. They look not okay because of some data points mixed in the middle a lot.

cryptocurrencies_analysis_unsupervised-ml_scikit-learn_pca_k-means's People

Contributors

olesyapro888 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.