For one.list it seems the correct syntax is: <code class="notranslat

For search, why not just use the django syntax ? <a href="https://docs.djangoproject.c

one.list keywords and plurals about ibllib HOT 18 CLOSED

int-brain-lab commented on September 27, 2024

one.list keywords and plurals

from ibllib.

Comments (18)

kdharris101 commented on September 27, 2024

Actually how about this (to get around the problem of lists of lists):

one.list(eids, search_terms) returns a dict. There is one key for each entry in the list search_terms, whose value contains a list of the same length as eids.

If you pass search_terms=None, it will default to all the search terms there are. If you pass eids=None, it will default to all the eids there are. The default argument for search_terms is 'dataset_types', but that can also be passed as part of a longer list of search terms, and is also included in the None input option.

from ibllib.

oliche commented on September 27, 2024

First Post

The problem is that the keyword refers to 3 different things depending on the use:
in one.list(eids, keyword = 'subject'), it refers to the field subject of the sessions referenced by eids
in one.list(None, keyword = 'subjects') it refers to the table subjects (endpoint)
in one.search(subject='toto') subject refers to a filter object in the Django API, that overlaps by default with the fields, but may be customized (as in date_range).

To address this, the one class opens with the following

ENDPOINT_LIST = ('dataset-types', 'users', 'subjects')
SESSION_FIELDS = ('subject', 'users', 'lab', 'type', 'start_time', 'end_time')
SEARCH_TERMS = ('dataset_types', 'users', 'subjects', 'date_range')

To help the users, I've tried to make meaningful error messages that list possible values when fields don't exist, but I still wanted to have keywords matching the names of the objects they referred to.

Suggestion

Thinking about user-friendliness, we may not be leveraging auto-completion as much as we could. We could be using classes/dataclasses/named tuples as such:
one.list.dataset_types(), one.list.any_endpoint()
one.list.dateset_types(eid), one.list.subjects(eid)
This makes the functions more usable, less prone to nagging spelling issues, and the ONE standard can stay flexible on the methods/fields to implement with a few must-haves.

This will also make the code consistent with the Matlab structures and packages.

from ibllib.

kdharris101 commented on September 27, 2024

We could be using classes/dataclasses/named tuples as such one.list.any_endpoint()

Very nice idea! And I guess this could even be done dynamically? I.e. if in the future the list of search terms is found from the database, the one object could be given these functions dynamically - at least in Python?

from ibllib.

kdharris101 commented on September 27, 2024

About plurals, I understand what you are saying, but let's try to think of it from the user's perspective -- i.e. a user who doesn't understand about database tables, endpoints, and filter objects.

Trying to think from this perspective, I had some more thoughts about one.search and one.list. The end result is very similar to what we already have, but just systematizes things a bit. I don't know how easy or hard it would be to implement this, I am just trying to guess what would seem natural for a user who doesn't know about databases. If it sounds good, we can then ask how hard it would be to implement.

You can consider every experiment as having a set of attributes, each of which is either a single object, or a list of objects. For example each experiment has a list of dataset types, a list of users associated with the experiment, but a single subject and a single date. (In reality sometimes these lists come from multi-table queries, but the user doesn't need to know that.)

To do a search, you compare attributes to values via an operator. For example you can ask is the subject equal to 'Hercules'; you can ask is the date after Jan 1 2017; you can ask if the list of dataset types contains everything in ('spikes.times', 'spikes.clusters', 'clusters.probes'); you can ask if the subject's list of alleles contains an one of the form "*Pvalb*".

So a natural query would be of the form one.search(attribute1=(operator1, value1), attribute2=(operator2, value2), ...). For example:

one.search(subject=('=', 'Hercules'), data=('contains', ['spikes.times', 'spikes.clusters', 'clusters.probes']), date=('>', 'Jan 1 2017'))

To simplify things, we could note that the user will nearly always want to use the same operator for a given search term. So if the argument is not a tuple whose first element is an operator, the default operator is prepended. The default for subject and date would be '=', for data would be 'contains'. Thus, you could make the same search as:

one.search(subject='Hercules', data=['spikes.times', 'spikes.clusters', 'clusters.probes'], date=('>', 'Jan 1 2017'))

To make things more flexible, there could be an operator for between i.e. date=('between', 'Jan 1 2017', 'Feb 27 2017'), and strings could by default be matched with wildcards unless otherwise specified.

If we did this, then one.list would be simple: it would just list the possible values for each attribute. If the attribute takes a single value, it would collect the values for all eids into a list; if the attribute is a list, it would concatenate these lists for all specified eids and return their unique values.

Does this make sense?

from ibllib.

oliche commented on September 27, 2024

List

Yes I was wondering if I had to do the set of unique values or not. One user case would be to get all information for each session in the same order as the query for further selection, another use-case is to get the unique set.

I implemented the 'details' to get all information about each queried session and will probably keep an option like that, in the same order as the session. Other than that I'm agnostic.

Search

Does this make sense?

Yes ! The good thing is that we could re-use most of Django operators this way, and overload or create the one we need. I need to have a look at Datajoint query syntax.
What may be missing above is the relation between queries (AND (default) / OR) but this is probably for further down the road.

Keywords flexibility

There is one thing I am sure of: if we stay with named arguments (no autocomplete), the translation of input arguments will be implemented using dictionaries, where the key name is the user possible input keywords and the values the actual implementation:

_ENDPOINTS = {
     'data': 'dataset-types',
     'dataset': 'dataset-types',
     'datasets': 'dataset-types',
     'dataset-types': 'dataset-types',
     'users': 'users',
     'user': 'users',
     'subject': 'subjects',
     'subjects': 'subjects'
}

In short we design the functions so that typos from the users are allowed.

from ibllib.

rossant commented on September 27, 2024

For search, why not just use the django syntax ? https://docs.djangoproject.com/en/2.1/ref/models/querysets/#field-lookups

from ibllib.

kdharris101 commented on September 27, 2024

That would be attractive, since potentially we could just pass the query through. However I couldn’t think how to do our most common query: finding sessions that have data of all of a list of required dataset types.

from ibllib.

oliche commented on September 27, 2024

What I'm gathering from all the discussions above is that the implementation and the user syntax are bound to be different.

The keywords do not match the database field names (plurals), an argument refers to different objects kinds (tables,fields or search filter keywords)...

From the user perspective, I think we shouldn't assume that the user will have to know that x field is in another table (double underscore) nor the exact name of searchable fields: this pretty much rules out the Django syntax. Plus some operators we'll have to implement with Q functions.
For our implementation, yes !

from ibllib.

rossant commented on September 27, 2024

We don't have to follow the django syntax to the letter. But why inventing a new syntax that does the same thing, when we can just use the django syntax?

We don't have to care about which fields belong to which table, it's just a matter of using the appropriate keywords and operators with __. This syntax doesn't have to be strictly the one we'd use in the django console, it can just be loosely inspired by it. As a user I think I'd prefer to type something like date__gt='20170101' rather than date=('>', 'Jan 1 2017').

For the dataset types, we can come up with anything we'd like, e.g. dataset_types=['spikes.times', 'spikes.clusters', 'clusters.probes'].

from ibllib.

kdharris101 commented on September 27, 2024

Sounds good. To be clear, you are suggesting: 1. Rather than passing relationships as tuples, using the Django `field__relation` when a relationship needs to be specified. 2. Not passing Django queries straight to Django, but just copying some of their syntax 3. Not using this syntax to link between tables – but instead having linked items (such as dataset_types) be translated to a list, and defining new relationship operators for these lists, also encoded in field__relation form?

from ibllib.

rossant commented on September 27, 2024

@kdharris101 yes, absolutely!

from ibllib.

kdharris101 commented on September 27, 2024

OK sounds good! Olivier what do you think?

from ibllib.

oliche commented on September 27, 2024

Search

For the search syntax, I think this is a good list of functionalities we want to implement:
https://tutorials.datajoint.io/beginner/building-first-pipeline/python/more-queries.html
I'm undecided between attempting to re-use datajoint syntax for standardization or have a go with Django style syntax. I see merits in both so I'll be easy to convince.

Current List

In the short term, I've implemented dynamic typo-proof keywords through dictionaries (in Python, but on the Matlab branch). Even I was confused so this was a must-have.

ibllib/python/oneibl/one.py

Lines 38 to 46 in fb21305

 _LIST_KEYWORDS = dict(_SESSION_FIELDS, **{ 

 'all': 'all', 

 'data': 'dataset-type', 

 'dataset': 'dataset-type', 

 'datasets': 'dataset-type', 

 'dataset-types': 'dataset-type', 

 'dataset_types': 'dataset-type', 

 'dataset-type': 'dataset-type', 

 'dataset_type': 'dataset-type'})

Proposed roadmap for next week:

catch-up with Matlab and release 0.3.0
write small prototypes that pull behaviour data and show at least one psychometric curve. This is to lure in a few users and get their feedback
implement prototype functionalities shown in Datajoint via the REST API and Django. Low level private methods.
assess which syntax Django or Datajoint will suit us the best to wrap the methods

from ibllib.

kdharris101 commented on September 27, 2024

If we can use Datajoint syntax that would be great. It will be familiar to more users than Django. Let’s think about some specific use cases. Again, the most important one is to find all experiments that have all in a list of DatasetTypes. I’m not sure how to do that in DataJoint. I guess we could ask them! A more difficult one is to find all recordings that have at least 8 narrow-spiking neurons recorded in area CA1. I can’t think of any way to do this easily, including in the system I just suggested.

from ibllib.

kdharris101 commented on September 27, 2024

OK here's a though of how to use the same framework to do searches for experiments with enough cells of a certain type.

Each experiment has an attribute called cells or something like that, which is a data frame - i.e. a table containing an entry for each cell recorded in that experiment, with multiple fields for each cell like brain region, mean firing rate, isolation quality, putative cell type, etc.

We then define an operator that tests if the experiment has at least N cells matching a certain combination of criteria. Something like:

one.search(cellCounts__gt(10,brainLocation='CA1', spikeWidth='narrow', firingRate__gt=5))

The idea here is that cellCounts__gt(N, conditions...) would specifying a condition that there must be at least N cells obeying the specified conditions. In this case there would need to be at least 10 cells with location in CA1, narrow spike width, firing rate at least 5Hz.

Not sure if this syntax is legal in python though... but something along these lines...

from ibllib.

rossant commented on September 27, 2024

It would be good if we could use the datajoint syntax. But then we'll have to answer the obvious question: why not just use datajoint? Why do we need to reimplement something that already exists? I'm sure users will ask us this question: "shall I use ONE or datajoint?". Could we have a simple REST API and a corresponding Python syntax for the simplest/most common queries, and defer to datajoint for more advanced search capabilities?

from ibllib.

kdharris101 commented on September 27, 2024

Yes, that’s a good point. There’s no need to spend too much time implementing complicated search features in ONE – unless we find users actually want them because they are not happy with DJ for some reason.

from ibllib.

rossant commented on September 27, 2024

closing the issue here and continuing the discussion on the ONE v2 proposal

from ibllib.

one.list keywords and plurals about ibllib HOT 18 CLOSED

Comments (18)

First Post

Suggestion

List

Search

Keywords flexibility

Search

Current List

Proposed roadmap for next week:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	_LIST_KEYWORDS = dict(_SESSION_FIELDS, **{
	'all': 'all',
	'data': 'dataset-type',
	'dataset': 'dataset-type',
	'datasets': 'dataset-type',
	'dataset-types': 'dataset-type',
	'dataset_types': 'dataset-type',
	'dataset-type': 'dataset-type',
	'dataset_type': 'dataset-type'})