Comments (18)
Actually how about this (to get around the problem of lists of lists):
one.list(eids, search_terms)
returns a dict. There is one key for each entry in the list search_terms
, whose value contains a list of the same length as eids
.
If you pass search_terms=None
, it will default to all the search terms there are. If you pass eids=None
, it will default to all the eids there are. The default argument for search_terms
is 'dataset_types'
, but that can also be passed as part of a longer list of search terms, and is also included in the None input option.
from ibllib.
First Post
The problem is that the keyword refers to 3 different things depending on the use:
in one.list(eids, keyword = 'subject')
, it refers to the field subject of the sessions referenced by eids
in one.list(None, keyword = 'subjects')
it refers to the table subjects (endpoint)
in one.search(subject='toto')
subject refers to a filter object in the Django API, that overlaps by default with the fields, but may be customized (as in date_range
).
To address this, the one class opens with the following
ENDPOINT_LIST = ('dataset-types', 'users', 'subjects')
SESSION_FIELDS = ('subject', 'users', 'lab', 'type', 'start_time', 'end_time')
SEARCH_TERMS = ('dataset_types', 'users', 'subjects', 'date_range')
To help the users, I've tried to make meaningful error messages that list possible values when fields don't exist, but I still wanted to have keywords matching the names of the objects they referred to.
Suggestion
Thinking about user-friendliness, we may not be leveraging auto-completion as much as we could. We could be using classes/dataclasses/named tuples as such:
one.list.dataset_types(), one.list.any_endpoint()
one.list.dateset_types(eid), one.list.subjects(eid)
This makes the functions more usable, less prone to nagging spelling issues, and the ONE standard can stay flexible on the methods/fields to implement with a few must-haves.
This will also make the code consistent with the Matlab structures and packages.
from ibllib.
We could be using classes/dataclasses/named tuples as such
one.list.any_endpoint()
Very nice idea! And I guess this could even be done dynamically? I.e. if in the future the list of search terms is found from the database, the one object could be given these functions dynamically - at least in Python?
from ibllib.
About plurals, I understand what you are saying, but let's try to think of it from the user's perspective -- i.e. a user who doesn't understand about database tables, endpoints, and filter objects.
Trying to think from this perspective, I had some more thoughts about one.search
and one.list
. The end result is very similar to what we already have, but just systematizes things a bit. I don't know how easy or hard it would be to implement this, I am just trying to guess what would seem natural for a user who doesn't know about databases. If it sounds good, we can then ask how hard it would be to implement.
You can consider every experiment as having a set of attributes, each of which is either a single object, or a list of objects. For example each experiment has a list of dataset types, a list of users associated with the experiment, but a single subject and a single date. (In reality sometimes these lists come from multi-table queries, but the user doesn't need to know that.)
To do a search, you compare attributes to values via an operator. For example you can ask is the subject equal to 'Hercules'; you can ask is the date after Jan 1 2017; you can ask if the list of dataset types contains everything in ('spikes.times', 'spikes.clusters', 'clusters.probes'); you can ask if the subject's list of alleles contains an one of the form "*Pvalb*".
So a natural query would be of the form one.search(attribute1=(operator1, value1), attribute2=(operator2, value2), ...)
. For example:
one.search(subject=('=', 'Hercules'), data=('contains', ['spikes.times', 'spikes.clusters', 'clusters.probes']), date=('>', 'Jan 1 2017'))
To simplify things, we could note that the user will nearly always want to use the same operator for a given search term. So if the argument is not a tuple whose first element is an operator, the default operator is prepended. The default for subject and date would be '=', for data would be 'contains'. Thus, you could make the same search as:
one.search(subject='Hercules', data=['spikes.times', 'spikes.clusters', 'clusters.probes'], date=('>', 'Jan 1 2017'))
To make things more flexible, there could be an operator for between i.e. date=('between', 'Jan 1 2017', 'Feb 27 2017')
, and strings could by default be matched with wildcards unless otherwise specified.
If we did this, then one.list
would be simple: it would just list the possible values for each attribute. If the attribute takes a single value, it would collect the values for all eids into a list; if the attribute is a list, it would concatenate these lists for all specified eids and return their unique values.
Does this make sense?
from ibllib.
List
Yes I was wondering if I had to do the set of unique values or not. One user case would be to get all information for each session in the same order as the query for further selection, another use-case is to get the unique set.
I implemented the 'details'
to get all information about each queried session and will probably keep an option like that, in the same order as the session. Other than that I'm agnostic.
Search
Does this make sense?
Yes ! The good thing is that we could re-use most of Django operators this way, and overload or create the one we need. I need to have a look at Datajoint query syntax.
What may be missing above is the relation between queries (AND (default) / OR) but this is probably for further down the road.
Keywords flexibility
There is one thing I am sure of: if we stay with named arguments (no autocomplete), the translation of input arguments will be implemented using dictionaries, where the key name is the user possible input keywords and the values the actual implementation:
_ENDPOINTS = {
'data': 'dataset-types',
'dataset': 'dataset-types',
'datasets': 'dataset-types',
'dataset-types': 'dataset-types',
'users': 'users',
'user': 'users',
'subject': 'subjects',
'subjects': 'subjects'
}
In short we design the functions so that typos from the users are allowed.
from ibllib.
For search, why not just use the django syntax ? https://docs.djangoproject.com/en/2.1/ref/models/querysets/#field-lookups
from ibllib.
from ibllib.
What I'm gathering from all the discussions above is that the implementation and the user syntax are bound to be different.
The keywords do not match the database field names (plurals), an argument refers to different objects kinds (tables,fields or search filter keywords)...
From the user perspective, I think we shouldn't assume that the user will have to know that x field is in another table (double underscore) nor the exact name of searchable fields: this pretty much rules out the Django syntax. Plus some operators we'll have to implement with Q functions.
For our implementation, yes !
from ibllib.
We don't have to follow the django syntax to the letter. But why inventing a new syntax that does the same thing, when we can just use the django syntax?
We don't have to care about which fields belong to which table, it's just a matter of using the appropriate keywords and operators with __
. This syntax doesn't have to be strictly the one we'd use in the django console, it can just be loosely inspired by it. As a user I think I'd prefer to type something like date__gt='20170101'
rather than date=('>', 'Jan 1 2017')
.
For the dataset types, we can come up with anything we'd like, e.g. dataset_types=['spikes.times', 'spikes.clusters', 'clusters.probes']
.
from ibllib.
from ibllib.
@kdharris101 yes, absolutely!
from ibllib.
OK sounds good! Olivier what do you think?
from ibllib.
Search
For the search syntax, I think this is a good list of functionalities we want to implement:
https://tutorials.datajoint.io/beginner/building-first-pipeline/python/more-queries.html
I'm undecided between attempting to re-use datajoint syntax for standardization or have a go with Django style syntax. I see merits in both so I'll be easy to convince.
Current List
In the short term, I've implemented dynamic typo-proof keywords through dictionaries (in Python, but on the Matlab branch). Even I was confused so this was a must-have.
Lines 38 to 46 in fb21305
Proposed roadmap for next week:
- catch-up with Matlab and release 0.3.0
- write small prototypes that pull behaviour data and show at least one psychometric curve. This is to lure in a few users and get their feedback
- implement prototype functionalities shown in Datajoint via the REST API and Django. Low level private methods.
- assess which syntax Django or Datajoint will suit us the best to wrap the methods
from ibllib.
from ibllib.
OK here's a though of how to use the same framework to do searches for experiments with enough cells of a certain type.
Each experiment has an attribute called cells
or something like that, which is a data frame - i.e. a table containing an entry for each cell recorded in that experiment, with multiple fields for each cell like brain region, mean firing rate, isolation quality, putative cell type, etc.
We then define an operator that tests if the experiment has at least N cells matching a certain combination of criteria. Something like:
one.search(cellCounts__gt(10,brainLocation='CA1', spikeWidth='narrow', firingRate__gt=5))
The idea here is that cellCounts__gt(N, conditions...) would specifying a condition that there must be at least N cells obeying the specified conditions. In this case there would need to be at least 10 cells with location in CA1, narrow spike width, firing rate at least 5Hz.
Not sure if this syntax is legal in python though... but something along these lines...
from ibllib.
It would be good if we could use the datajoint syntax. But then we'll have to answer the obvious question: why not just use datajoint? Why do we need to reimplement something that already exists? I'm sure users will ask us this question: "shall I use ONE or datajoint?". Could we have a simple REST API and a corresponding Python syntax for the simplest/most common queries, and defer to datajoint for more advanced search capabilities?
from ibllib.
from ibllib.
closing the issue here and continuing the discussion on the ONE v2 proposal
from ibllib.
Related Issues (20)
- Duplicate file records for floferlab
- Add support for multiple raw_imaging_data collections
- Add support for other camera sync labels
- Mesoscope: register FOVs in Alyx
- Make DAQ trials extractors independent of protocol
- Mesoscope suite2P task HOT 1
- Mesoscope Lossless Compression HOT 1
- Fix mesoscope sessions
- Mesoscope: fix DAQ noise
- index error for plot_swanson_vector after creating flatmaps
- Adding a field on Alyx to query for subject training status HOT 3
- AP rms dataset has not been registered / copied to Flatiron for many recent sessions HOT 1
- Fix frame 2ttl v2 for iblrig v7 HOT 1
- Mesoscope preprocess output QC
- Atlas imports don't recover gracefully after partial downloads
- Register multiple server statuses in alyx database for monitoring HOT 1
- Ignore imaging folders not in experiment description HOT 1
- Release mesoscope branch
- Add `channel_labels.npy` from KS2 output to `_kilosort_raw.output.tar` HOT 1
- Add psychofit as dependency lib HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ibllib.