Giter Club home page Giter Club logo

Comments (18)

kdharris101 avatar kdharris101 commented on September 27, 2024

Actually how about this (to get around the problem of lists of lists):

one.list(eids, search_terms) returns a dict. There is one key for each entry in the list search_terms, whose value contains a list of the same length as eids.

If you pass search_terms=None, it will default to all the search terms there are. If you pass eids=None, it will default to all the eids there are. The default argument for search_terms is 'dataset_types', but that can also be passed as part of a longer list of search terms, and is also included in the None input option.

from ibllib.

oliche avatar oliche commented on September 27, 2024

First Post

The problem is that the keyword refers to 3 different things depending on the use:
in one.list(eids, keyword = 'subject'), it refers to the field subject of the sessions referenced by eids
in one.list(None, keyword = 'subjects') it refers to the table subjects (endpoint)
in one.search(subject='toto') subject refers to a filter object in the Django API, that overlaps by default with the fields, but may be customized (as in date_range).

To address this, the one class opens with the following

ENDPOINT_LIST = ('dataset-types', 'users', 'subjects')
SESSION_FIELDS = ('subject', 'users', 'lab', 'type', 'start_time', 'end_time')
SEARCH_TERMS = ('dataset_types', 'users', 'subjects', 'date_range')

To help the users, I've tried to make meaningful error messages that list possible values when fields don't exist, but I still wanted to have keywords matching the names of the objects they referred to.

Suggestion

Thinking about user-friendliness, we may not be leveraging auto-completion as much as we could. We could be using classes/dataclasses/named tuples as such:
one.list.dataset_types(), one.list.any_endpoint()
one.list.dateset_types(eid), one.list.subjects(eid)
This makes the functions more usable, less prone to nagging spelling issues, and the ONE standard can stay flexible on the methods/fields to implement with a few must-haves.

This will also make the code consistent with the Matlab structures and packages.

from ibllib.

kdharris101 avatar kdharris101 commented on September 27, 2024

We could be using classes/dataclasses/named tuples as such one.list.any_endpoint()

Very nice idea! And I guess this could even be done dynamically? I.e. if in the future the list of search terms is found from the database, the one object could be given these functions dynamically - at least in Python?

from ibllib.

kdharris101 avatar kdharris101 commented on September 27, 2024

About plurals, I understand what you are saying, but let's try to think of it from the user's perspective -- i.e. a user who doesn't understand about database tables, endpoints, and filter objects.

Trying to think from this perspective, I had some more thoughts about one.search and one.list. The end result is very similar to what we already have, but just systematizes things a bit. I don't know how easy or hard it would be to implement this, I am just trying to guess what would seem natural for a user who doesn't know about databases. If it sounds good, we can then ask how hard it would be to implement.

You can consider every experiment as having a set of attributes, each of which is either a single object, or a list of objects. For example each experiment has a list of dataset types, a list of users associated with the experiment, but a single subject and a single date. (In reality sometimes these lists come from multi-table queries, but the user doesn't need to know that.)

To do a search, you compare attributes to values via an operator. For example you can ask is the subject equal to 'Hercules'; you can ask is the date after Jan 1 2017; you can ask if the list of dataset types contains everything in ('spikes.times', 'spikes.clusters', 'clusters.probes'); you can ask if the subject's list of alleles contains an one of the form "*Pvalb*".

So a natural query would be of the form one.search(attribute1=(operator1, value1), attribute2=(operator2, value2), ...). For example:

one.search(subject=('=', 'Hercules'), data=('contains', ['spikes.times', 'spikes.clusters', 'clusters.probes']), date=('>', 'Jan 1 2017'))

To simplify things, we could note that the user will nearly always want to use the same operator for a given search term. So if the argument is not a tuple whose first element is an operator, the default operator is prepended. The default for subject and date would be '=', for data would be 'contains'. Thus, you could make the same search as:

one.search(subject='Hercules', data=['spikes.times', 'spikes.clusters', 'clusters.probes'], date=('>', 'Jan 1 2017'))

To make things more flexible, there could be an operator for between i.e. date=('between', 'Jan 1 2017', 'Feb 27 2017'), and strings could by default be matched with wildcards unless otherwise specified.

If we did this, then one.list would be simple: it would just list the possible values for each attribute. If the attribute takes a single value, it would collect the values for all eids into a list; if the attribute is a list, it would concatenate these lists for all specified eids and return their unique values.

Does this make sense?

from ibllib.

oliche avatar oliche commented on September 27, 2024

List

Yes I was wondering if I had to do the set of unique values or not. One user case would be to get all information for each session in the same order as the query for further selection, another use-case is to get the unique set.

I implemented the 'details' to get all information about each queried session and will probably keep an option like that, in the same order as the session. Other than that I'm agnostic.

Search

Does this make sense?

Yes ! The good thing is that we could re-use most of Django operators this way, and overload or create the one we need. I need to have a look at Datajoint query syntax.
What may be missing above is the relation between queries (AND (default) / OR) but this is probably for further down the road.

Keywords flexibility

There is one thing I am sure of: if we stay with named arguments (no autocomplete), the translation of input arguments will be implemented using dictionaries, where the key name is the user possible input keywords and the values the actual implementation:

_ENDPOINTS = {
     'data': 'dataset-types',
     'dataset': 'dataset-types',
     'datasets': 'dataset-types',
     'dataset-types': 'dataset-types',
     'users': 'users',
     'user': 'users',
     'subject': 'subjects',
     'subjects': 'subjects'
}

In short we design the functions so that typos from the users are allowed.

from ibllib.

rossant avatar rossant commented on September 27, 2024

For search, why not just use the django syntax ? https://docs.djangoproject.com/en/2.1/ref/models/querysets/#field-lookups

from ibllib.

kdharris101 avatar kdharris101 commented on September 27, 2024

from ibllib.

oliche avatar oliche commented on September 27, 2024

What I'm gathering from all the discussions above is that the implementation and the user syntax are bound to be different.

The keywords do not match the database field names (plurals), an argument refers to different objects kinds (tables,fields or search filter keywords)...

From the user perspective, I think we shouldn't assume that the user will have to know that x field is in another table (double underscore) nor the exact name of searchable fields: this pretty much rules out the Django syntax. Plus some operators we'll have to implement with Q functions.
For our implementation, yes !

from ibllib.

rossant avatar rossant commented on September 27, 2024

We don't have to follow the django syntax to the letter. But why inventing a new syntax that does the same thing, when we can just use the django syntax?

We don't have to care about which fields belong to which table, it's just a matter of using the appropriate keywords and operators with __. This syntax doesn't have to be strictly the one we'd use in the django console, it can just be loosely inspired by it. As a user I think I'd prefer to type something like date__gt='20170101' rather than date=('>', 'Jan 1 2017').

For the dataset types, we can come up with anything we'd like, e.g. dataset_types=['spikes.times', 'spikes.clusters', 'clusters.probes'].

from ibllib.

kdharris101 avatar kdharris101 commented on September 27, 2024

from ibllib.

rossant avatar rossant commented on September 27, 2024

@kdharris101 yes, absolutely!

from ibllib.

kdharris101 avatar kdharris101 commented on September 27, 2024

OK sounds good! Olivier what do you think?

from ibllib.

oliche avatar oliche commented on September 27, 2024

Search

For the search syntax, I think this is a good list of functionalities we want to implement:
https://tutorials.datajoint.io/beginner/building-first-pipeline/python/more-queries.html
I'm undecided between attempting to re-use datajoint syntax for standardization or have a go with Django style syntax. I see merits in both so I'll be easy to convince.

Current List

In the short term, I've implemented dynamic typo-proof keywords through dictionaries (in Python, but on the Matlab branch). Even I was confused so this was a must-have.

_LIST_KEYWORDS = dict(_SESSION_FIELDS, **{
'all': 'all',
'data': 'dataset-type',
'dataset': 'dataset-type',
'datasets': 'dataset-type',
'dataset-types': 'dataset-type',
'dataset_types': 'dataset-type',
'dataset-type': 'dataset-type',
'dataset_type': 'dataset-type'})

Proposed roadmap for next week:

  1. catch-up with Matlab and release 0.3.0
  2. write small prototypes that pull behaviour data and show at least one psychometric curve. This is to lure in a few users and get their feedback
  3. implement prototype functionalities shown in Datajoint via the REST API and Django. Low level private methods.
  4. assess which syntax Django or Datajoint will suit us the best to wrap the methods

from ibllib.

kdharris101 avatar kdharris101 commented on September 27, 2024

from ibllib.

kdharris101 avatar kdharris101 commented on September 27, 2024

OK here's a though of how to use the same framework to do searches for experiments with enough cells of a certain type.

Each experiment has an attribute called cells or something like that, which is a data frame - i.e. a table containing an entry for each cell recorded in that experiment, with multiple fields for each cell like brain region, mean firing rate, isolation quality, putative cell type, etc.

We then define an operator that tests if the experiment has at least N cells matching a certain combination of criteria. Something like:

one.search(cellCounts__gt(10,brainLocation='CA1', spikeWidth='narrow', firingRate__gt=5))

The idea here is that cellCounts__gt(N, conditions...) would specifying a condition that there must be at least N cells obeying the specified conditions. In this case there would need to be at least 10 cells with location in CA1, narrow spike width, firing rate at least 5Hz.

Not sure if this syntax is legal in python though... but something along these lines...

from ibllib.

rossant avatar rossant commented on September 27, 2024

It would be good if we could use the datajoint syntax. But then we'll have to answer the obvious question: why not just use datajoint? Why do we need to reimplement something that already exists? I'm sure users will ask us this question: "shall I use ONE or datajoint?". Could we have a simple REST API and a corresponding Python syntax for the simplest/most common queries, and defer to datajoint for more advanced search capabilities?

from ibllib.

kdharris101 avatar kdharris101 commented on September 27, 2024

from ibllib.

rossant avatar rossant commented on September 27, 2024

closing the issue here and continuing the discussion on the ONE v2 proposal

from ibllib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.