Giter Club home page Giter Club logo

machine-learning-from-scratch's Introduction

machine-learning-from-scratch's People

Contributors

misraturp avatar ploeber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machine-learning-from-scratch's Issues

Problem in result of (y_pred -y) in Linear Regression implementation

First of all self.w is shaped (n_features,) so the y_pred will be (n_samples,)
Since, y_pred shape is (n_samples,) and y shape is (n_samples,1) and when they subtract each other(y_pred-y) the dimension of the result is (n_samples,n_samples), whose dot product to X.T which is shaped (n_features,n_samples) gives result (n_features,n_samples) which was expected to be (n_features,).
image

Code Outdated in DecisionTree.py

Some places need to use iloc to slice instead of default accessors.

def _best_split(self, X, y, feat_idxs):
        best_gain = -1
        split_idx, split_threshold = None, None

        for feat_idx in feat_idxs:
            # iloc instead
            X_column = X.iloc[:, feat_idx]
            thresholds = np.unique(X_column)

            for thr in thresholds:
                # calculate the information gain
                gain = self._information_gain(y, X_column, thr)

                if gain > best_gain:
                    best_gain = gain
                    split_idx = feat_idx
                    split_threshold = thr
        return split_idx, split_threshold

DecisionTree code - suggestion

Hi,
In the DecisionTree implementation, line 33 there is a call to the _most_common_label() method.
Line 33 is reached only if the y numpy array has one unique value, so instead of calling most common
you can simply take the value from any index of the array.

Maybe something like:

# check the stopping criteria
if (depth>=self.max_depth or n_labels==1 or n_samples<self.min_samples_split):
    return Node(value=y[0])

instead of :

# check the stopping criteria
if (depth>=self.max_depth or n_labels==1 or n_samples<self.min_samples_split):
    leaf_value = self._most_common_label(y)
    return Node(value=leaf_value)

PCA

Line 20 in PCA should change, eig returns eigenvalues , eigenvectors. It is stated in reverse form in the code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.