Giter Club home page Giter Club logo

qais's People

Contributors

pomodoren avatar

Stargazers

 avatar

Watchers

 avatar  avatar

qais's Issues

On how to respond to the script request

Develop a script to ingest data through the api's main endpoint 
(if hosted locally "localhost:5000/api/v1/data")

NOTE: The data from the API endpoint is paginated. Thus, you need to give to the API endpoint also a
GET parameter "page" (starting from 0 until it will throw an error).
IMPORTANT The ingestion script should ingest data in batches and feed it to the model in batches.
Do not just pre-load all the data in advance. This is the "streaming" part of the challenge.

So, steps to be taken:

  • extend api endpoint to insert into database
  • create script to use this api endpoint for pushing data

The question is: which data? the data we already have in 'main.db'?

On validation of the results

After checking the first results that came from #2, I believe that we might have some accuracy issues.
I need to check with the other results from the static run.

On the learning process

Recognition of Difference

Even though it seems like the learning process is well defined, when put in the context of Online Learning, things change.

  • Ingestion-Batch-Size != Learning-Batch-Size (at least they should be different variables)
  • How to separate test and train?
    1. Use strategy used in DeepNets Example. Question here is: Do I keep a basis unit Testing Set, and I train for each new input, and test for these?
    2. Use strategy we have at our Prediction Models Analysis. Question here still remains: Do I lose the values I tested for? (because I do not want to lose them)
    3. Go crazy: Test for all new batch, and then train with them for the next. This way you dont lose data points, but the accuracy of the models will be shifted by one, as the accuracy of model 1 will be found only at step 2 where the testing data will come. Still, this allows us to play well.

Many questions arise from this discussion, but what I care:

  1. I do not want to lose data points in my training - Skip 2
  2. I do not want to keep a basic testing unit, because this way I will lose that data from the beginning, and I do not see the point of keeping a basic testing unit - as the data might shift, and we will decrease accuracy when we just need to shift parameters.

Implications of choosing 3:

  1. Generalization will be good.

On how to store the models in the database

The first question seems to be:

Firstly, develop a simple classification algorithm which attempts
to predict the variable "promoted" through the other variables. 
The focus of this model is pure prediction capability.

Here is described how to do it

Bonus: Please save the model, the current page,
the coefficients and any relevant statistical measure 
to the SQLite database (on a different table than "data") while you are updating it.

Step by step

  • pick which model to play with
  • understand fields that model needs ( create table )
  • store these fields into SQLite ( problem might be pickling )
  • load back
  • connect to #1

On how to share information with the user

User-oriented

You can be creative about how an user should interact with your streaming ML pipeline.

How can the user visualize the convergence/updating of the model? How are you going to present the results?

User? Who is the user?
I guess that it is me, the person who will add the data to the stream.

Step by step

  • Generate results after reading from database
  • Graphical results in a dashboard page using google charts in index.html
  • API endpoint bringing results

On endogenous variables and their impact on the model

Working with weird words

Secondly, develop one or more regression models in 
which the ENDOGENOUS variable is "network_ability". 

The goal of this second kind of model is to understand 
the relationship between the other variables and "network_ability". 
Hint: Try to reason qualitatively about the dependencies
of the variables before starting to crunch numbers, try to think like a scientist.

I guess here its described the inter-dependency between
promotion and network ability because having more network will eventually create more options for promotion,
or the inter-dependency between promotion and competence.

Step by step

  • Understand endogenous variable better
  • Understand what is meant by: 'develop models where the endogenous variable is x' read this
  • Build model
  • Show results using google charts

On creating a package solution with docker-compose

Why complicate it?

At some point the model might take long to load, or might not be a good idea to let the user wait.
Introduce a mechanism to do the training in the background (while informing the user that some learning is happening).
This will keep the user-xp a happy trip, and will keep the code clean.

Step by step

  • Decide on which worker type you will use
  • Decide if you will store the tasks in db
  • Proceed with docker-compose

Additional

  • Run multiple models - and bring the best result? [ completely unnecessary - still fun to think and do ]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.