pomodoren / qais Goto Github PK

QuickAlgorithm Internal Solutions - interview

Dockerfile 0.16% Python 12.05% CSS 0.47% HTML 2.56% Jupyter Notebook 84.22% Shell 0.14% Mako 0.40%

qais's Issues

On how to respond to the script request

Develop a script to ingest data through the api's main endpoint 
(if hosted locally "localhost:5000/api/v1/data")

NOTE: The data from the API endpoint is paginated. Thus, you need to give to the API endpoint also a
GET parameter "page" (starting from 0 until it will throw an error).

IMPORTANT The ingestion script should ingest data in batches and feed it to the model in batches.
Do not just pre-load all the data in advance. This is the "streaming" part of the challenge.

So, steps to be taken:

extend api endpoint to insert into database
create script to use this api endpoint for pushing data

The question is: which data? the data we already have in 'main.db'?

On validation of the results

After checking the first results that came from #2, I believe that we might have some accuracy issues.
I need to check with the other results from the static run.

Recognition of Difference

Even though it seems like the learning process is well defined, when put in the context of Online Learning, things change.

Ingestion-Batch-Size != Learning-Batch-Size (at least they should be different variables)
How to separate test and train?
1. Use strategy used in DeepNets Example. Question here is: Do I keep a basis unit Testing Set, and I train for each new input, and test for these?
2. Use strategy we have at our Prediction Models Analysis. Question here still remains: Do I lose the values I tested for? (because I do not want to lose them)
3. Go crazy: Test for all new batch, and then train with them for the next. This way you dont lose data points, but the accuracy of the models will be shifted by one, as the accuracy of model 1 will be found only at step 2 where the testing data will come. Still, this allows us to play well.

Many questions arise from this discussion, but what I care:

I do not want to lose data points in my training - Skip 2
I do not want to keep a basic testing unit, because this way I will lose that data from the beginning, and I do not see the point of keeping a basic testing unit - as the data might shift, and we will decrease accuracy when we just need to shift parameters.

Implications of choosing 3:

Generalization will be good.

On how to store the models in the database

The first question seems to be:

Firstly, develop a simple classification algorithm which attempts
to predict the variable "promoted" through the other variables. 
The focus of this model is pure prediction capability.

Here is described how to do it

Bonus: Please save the model, the current page,
the coefficients and any relevant statistical measure 
to the SQLite database (on a different table than "data") while you are updating it.

Step by step

pick which model to play with
understand fields that model needs ( create table )
store these fields into SQLite ( problem might be pickling )
load back
connect to #1

On how to share information with the user

User-oriented

You can be creative about how an user should interact with your streaming ML pipeline.

How can the user visualize the convergence/updating of the model? How are you going to present the results?

User? Who is the user?
I guess that it is me, the person who will add the data to the stream.

Step by step

Generate results after reading from database
Graphical results in a dashboard page using google charts in index.html
API endpoint bringing results

On endogenous variables and their impact on the model

Working with weird words

Secondly, develop one or more regression models in 
which the ENDOGENOUS variable is "network_ability". 

The goal of this second kind of model is to understand 
the relationship between the other variables and "network_ability". 
Hint: Try to reason qualitatively about the dependencies
of the variables before starting to crunch numbers, try to think like a scientist.

I guess here its described the inter-dependency between
promotion and network ability because having more network will eventually create more options for promotion,
or the inter-dependency between promotion and competence.

Step by step

Understand endogenous variable better
Understand what is meant by: 'develop models where the endogenous variable is x' read this
Build model
Show results using google charts

On creating a package solution with docker-compose

Why complicate it?

At some point the model might take long to load, or might not be a good idea to let the user wait.
Introduce a mechanism to do the training in the background (while informing the user that some learning is happening).
This will keep the user-xp a happy trip, and will keep the code clean.

Step by step

Decide on which worker type you will use
Decide if you will store the tasks in db
Proceed with docker-compose

Additional

Run multiple models - and bring the best result? [ completely unnecessary - still fun to think and do ]

pomodoren / qais Goto Github PK

qais's People

Contributors

Stargazers

Watchers

qais's Issues

Recognition of Difference

The first question seems to be:

Here is described how to do it

Step by step

User-oriented

Step by step

Working with weird words

Step by step

Why complicate it?

Step by step

Additional

Recommend Projects

Recommend Topics

Recommend Org