I had some more thoughts on one.list. (Moving the conversation to git as suggested.) T

one.list about ibllib HOT 20 CLOSED

int-brain-lab commented on September 27, 2024

one.list

from ibllib.

Comments (20)

rossant commented on September 27, 2024

what would be the output of one.list(eids=['someeid'], search_term='dataset_type')?

from ibllib.

kdharris101 commented on September 27, 2024

It would be all the dataset types available for the eid 'someeid' - provided that was a valid eid. If it wasn't, you would get None, or an error, or something.

Again, we would have the issue of passing a string vs a list of strings, and I would suggest we interpret a single string the same way as a list of one string.

from ibllib.

rossant commented on September 27, 2024

so one.list(eids=['someeid'], search_term='dataset_type') and one.list(eids=['someeid']) would be equivalent?

I'm just wondering whether it would be confusing for users to have the same function list() return different kinds of items depending on the arguments. Would one.list_dataset_types(...), one.list_users(...), one.list_eids(...), one.list_datasets(...) be sensible? Doing one.<TAB> in IPython would immediately give you the list of relevant functions.

from ibllib.

kdharris101 commented on September 27, 2024

Yes, isn't that just how it works with python default arguments?

I see your point about having multiple functions - but the problem then is that introducing a new search term requires adding a new function.

How about one.list('search_terms') to return a list of search terms used by this implementation?

from ibllib.

nbonacchi commented on September 27, 2024

What about both?
I see good arguments for both ways of doing things.
We could implement one.list by having a bunch of user interface wrapper functions called as Cyrille was suggesting that in reality could just call one.list() with the appropriate arguments....

To Kennet's point, yes that would be nice, my initial post in the openneurodata repo here was exactly that, called metasearch...

So what if it looked like this:
one.list() returns all the listable dimensions - all args can be used as now
one.list_eids() returns eIDs - all args except eID can be used
one.list_users() returns a list of users - all args except users
one.list_dataset_types returns dataset_types - ...
etc...
of course one.list would always be there and available for users to use
so the output of one.list would be the intersection of all the args that were inserted
and all other one.list_* would basically specify what the user wants to return.

from ibllib.

rossant commented on September 27, 2024

Depending on how we implement it, we could also have list() that transparently calls the various list_stuff() functions, depending on the arguments. I can already see a big nested list of if sequences in the main list() implementation, which can be error-prone.

from ibllib.

kdharris101 commented on September 27, 2024

The most common thing people are going to want to do is find what files are associated with an eid. The reasoning behind this suggestion was to have a function that does that with the default parameters - but also does other things (required less often) with other parameters. The less functions we have the better!

One other thing we should have is a way to get a docstring on what a particular dataset type means. (We have this in the spreadsheet now)

from ibllib.

nbonacchi commented on September 27, 2024

:) I see what you mean... I was thinking of using sets and intersections of sets after cache querying the whole database, but that can be problematic also...

from ibllib.

oliche commented on September 27, 2024

My initial thinking for the list function was to provide the user with the possible values for session fields that can be filtered on: dataset_types, users and subjects. This is functionality A.

myone.list('dataset-types') returns the API query on the dataset-types Django table.

Kenneth has a very valid point: as of yesterday there was no way to export the session info (functionality B) without also downloading the full datasets . To put it mildly, that was somewhat stupid.
d = myone.load(eid, dataset_types=dataset_types, dclass_output=True)
I fixed it yesterday with the following proposed implementation:
d = myone.info(eid)

D is a data structure (auto-completes in editor and matches Matlab structure syntax) with the following fields available: d.dataset_id, d.dataset_type, among others.

I called the functionality A method list, and functionality B method info.
I suggest to call them list and session_info.

from ibllib.

kdharris101 commented on September 27, 2024

This has the functionality we need, but maybe not in the most user-friendly way.

It would be great to just type one.list(eid) into my jupyter notebook, and get a simple list of dataset types available for this experiment - because that is what I will want to know 90% of the time. The other stuff I will want rarely, and could type a more complex command to get it.

from ibllib.

oliche commented on September 27, 2024

Ok it makes sense. Short easy to remember command for widely used functionality.
Will do:

myone.list(eid) returns a list of dataset-types
myone.session_info(eid) returns a data structure with more fields about the datasets
I'll refactor myone.list('dataset_types') etc... as
- myone.ls_dataset_types
- myone.ls_users
- myone.ls_subjects

Matlab implementation will use same namings.
Everybody on board ?

from ibllib.

oliche commented on September 27, 2024

Pushed on dev and master:

syntax refactoring
updated tests
updated tutorial

from ibllib.

kdharris101 commented on September 27, 2024

Re-opening this since I can still see a problem with having the function names encode the search terms.

We don't know exactly how the search functionality is going to work. But it will probably be something like the Django syntax, i.e. one.search(field, value) or one.search(relation, value) for example one.search('user', 'kenneth') or one.search('date_>', '1/1/2017').

The search fields could be anything in the database - not just the things we are coding here. If we need a separate function to list every one of these, we will need to code a new ls_ function for every field in the database. If we just pass the search term as an argument to one.list, this won't happen. So for example if the experiments table had a field percent_correct then

one.list(None, 'percent_correct')

Would give a value of all possible percent correct scores in all experiments.

Django also allows for relational queries spanning multiple tables. All this comes for free. Why would we want to reimplement it with lots of new functions?

from ibllib.

oliche commented on September 27, 2024

Ok, I've re-opened the thread.

For the ls_* those functions are wrappers to the generic ls function. It was just a matter of convenience to get auto-completion since I would expect those to be used often.
Note that those functions do not perform an aggregation, it is a simple REST query to the endpoint (a simple table dump).

Aggregating unique values for fields of existing experiments:

If the field is a foreign key, this can be done through REST going through the table endpoint. List users for existing experiments will require a custom filter in users/views.py and the endpoint queried would be users?custom_query.
If the field is not a foreign key, like percent_correct, this requires to create a view and map the view to an URL that returns a JsonResponse.

Cyrille, I may be wrong, but it seems to me that the REST API is not meant to forward directly aggregation or complex queries, unless they correspond to existing endpoints (tables). Plus neither case allows for dynamic definitions.

So yes it is easy and efficient to perform queries on the Django side (AWS instance), but I'm not sure it is so easy through REST. I'm looking into it now, ideas welcomed !

from ibllib.

kdharris101 commented on September 27, 2024

Is there any way to make Django queries directly from a client computer? I.e. So you do not have to have a specific REST API for each query? It’s not much of a security risk if it uses a read-only account.

from ibllib.

oliche commented on September 27, 2024

Ok so I've opened the postgres ports, and can connect from a client computer via the Django command line. This implies adding Alyx as a dependency on the client, but it doesn't require the full installation nor it requires to run an Alyx instance on the client. Just using the models to connect to the database and perform queries.

However it seems impossible to run queries with a read-only postgres user.

from ibllib.

rossant commented on September 27, 2024

The whole point of alyx was to provide two interfaces, a complex, low-level one with SQL or django, and a simpler, high-level one with REST for the most common operations.

The REST API should provide endpoints/filters for 90% of the use-cases.

Are you suggesting that these 90% are not sufficient for ONE?

If there is a highly complex query that would be very common, we should just implement a new endpoint or a new filter. We shouldn't try to reimplement django or SQL on top of REST.

from ibllib.

kdharris101 commented on September 27, 2024

This is a really important question, and it brings up a bigger issue that we have put off discussing but should get to soon, about exactly what sort of searches we want to enable via ONE. I would guess that 90% of the searches people want to do will be to find experiments with a certain combination of dataset types, and possibly also to restrict to a certain user, subject, or date range. But in the remaining 10% of cases they might want to search on experiments with a certain number of neurons of a particular class; a certain fraction of trials correct; or all sorts of other things. I get what you are saying about using REST wherever possible. And we certainly don’t want to reimplement Django’s query functions ourselves However for this question of searching, it is possible that Django queries might already provide exactly what we need. Specifically the [filter](https://docs.djangoproject.com/en/2.1/ref/models/querysets/#django.db.models.query.QuerySet.filter) function looks like it does what we want, both for the 90% and 10% cases. It lets you search on multiple columns in a DB, and you don’t need to hard code the names and datatypes of these columns. It allows comparisons like fixing only the year of a date. I think it even allows lookups via linked tables. So if it is possible to write the one.search client function so that it can execute a generic Django filter function that could be the best solution. It could either run on the client side via an SQL connection, or on the server side via a REST query that specifies the fields to go in the filter function. (The latter might be better come to think about it.)

from ibllib.

rossant commented on September 27, 2024

It would be better not to rely on the alyx codebase client-side. If all you ever want to do is call filter(...) then it would be nearly trivial to implement something where all the arguments to filter() are passed as a serialized string to a REST API, and passed again to django's filter() server-side. This is a bit of a hack, but it would work...

@oliche you could implement this as follows:

one.filter(**kwargs) => kwargs dictionary as JSON => base64 encoding => REST /custom/ endpoint or something => base64 decoding => JSON decoding => django.filter(**kwargs)

from ibllib.

oliche commented on September 27, 2024

Yes.

For the filter, I would implement this with Q objects as this is equivalent and more flexible (we can't make OR with the straight filter).

To communicate from client to server, for existing filters or simple ones we can implement with the current endpoints. However if we need to return aggregations that do not correspond to an endpoint I'll create a new app one with a view that returns a Json object.

from ibllib.

one.list about ibllib HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent