panosc-eu / search-api Goto Github PK

View Code? Open in Web Editor NEW

4.0 13.0 4.0 5.37 MB

PaN search api for WP3 and WP4

License: BSD 2-Clause "Simplified" License

JavaScript 98.30% Dockerfile 1.70%

fair-data loopback loopback4 restful-api

search-api's People

Stargazers

Watchers

Forkers

garethcmurphy rkrahl minottic jirzi0

search-api's Issues

Person role

In the Panosc Data Policy, we can see that we have:
PI (Principal Investigator)
experimental team

OpenAire is using DataCite metadata standard for harvesting metadata. In DataCite, we have
Creator: The main researchers involved working on the data, or the authors of the publication in priority order.
This means the PI should be the first one then followed by the experimental team.

Contributor: The institution or person responsible for collecting, creating, or otherwise contributing to the developement of the dataset. Here there is a controlled vocabulary list:
https://guidelines.openaire.eu/en/latest/data/field_contributor.html

More details can be found here: https://schema.datacite.org/meta/kernel-4.3/doc/DataCite-MetadataKernel_v4.3.pdf
Annex 1 p.32

Within Expands, the data policy and the data management processes are part of WP2. We are having regular meetings with WP2 so I can ask their view on using Datacite metadata standard as a reference.

Supporting JSON-format queries

Loopback supports queries using a JSON format and a square bracket format. Both versions are supported by this implementation. For example, I can query like this:

curl -g -X GET "http://localhost:3000/datasets?filter[where][pressure.value][gt]=50&filter[limit]=10&filter[skip]=0" -H "accept: application/json"

And like this:

curl -g -X GET 'http://localhost:3000/datasets?filter={"where":"pressure.value":"gt":50}},"limit":10,"skip":0}' -H "accept: application/json"

The Datagateway API uses the JSON version. Will both versions be supported by the PaNOSC search API specification?

(Including @louise-davies and @agbeltran in the discussion)

Return entries although their related objects are missing

Is your feature request related to a problem? Please describe.
I want to get eg. documents with related models, however, if the related entry is not present, then the whole "row" goes missing. If I call like so:

{
    "include": [{
        "relation": "members"
    }]
}

then all the results than do not have a corresponding member don't get returned at all.

Describe the solution you'd like
Return all results, even without the related models.
Instead of the missing object, return something else: null, string (n/A), ...(?).

Describe alternatives you've considered
Throw an error.

Additional context
Not something crazy important but it feels awkward.

The search example from the README yields internal server error

Describe the bug
When trying the search example query, the server replies with a 500 "Internal Server Error"

To Reproduce
Steps to reproduce the behavior, follow the instructions from the README:

git clone [email protected]:panosc-eu/search-api.git
cd search-api
npm install
npm start
Wait for the message Server is running at http://134.30.210.106:3000 to appear.
In a second terminal: curl -g -X GET "http://134.30.210.106:3000/datasets?filter[where][pressure.value][gt]=50&filter[limit]=10&filter[skip]=0" -H "accept: application/json"

Expected behavior
Expected to see some search result in the second terminal.

Screenshots
The output from the curl command in the second terminal is:
{"error":{"statusCode":500,"message":"Internal Server Error"}}

The first terminal running the npm start command shows a trace log:

Unhandled error in GET /datasets?filter[where][pressure.value][gt]=50&filter[limit]=10&filter[skip]=0: 500 Error: Unit not recognized
    at new QtyError (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/js-quantities/build/quantities.js:180:17)
    at parseUnits (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/js-quantities/build/quantities.js:794:13)
    at Qty.parse (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/js-quantities/build/quantities.js:748:24)
    at new Qty (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/js-quantities/build/quantities.js:872:13)
    at convertUnits (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/src/repositories/dataset.repository.ts:37:15)
    at processQuery (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/src/repositories/dataset.repository.ts:67:11)
    at DatasetRepository.modelClass.observe (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/src/repositories/dataset.repository.ts:26:31)
    at notifySingleObserver (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/loopback-datasource-juggler/lib/observer.js:162:24)
    at /net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/async/dist/async.js:3110:16
    at replenish (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/async/dist/async.js:1011:17)
    at /net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/async/dist/async.js:1016:9
    at eachLimit$1 (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/async/dist/async.js:3196:24)
    at Object.<anonymous> (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/async/dist/async.js:1046:16)
    at doNotify (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/loopback-datasource-juggler/lib/observer.js:159:11)
    at doNotify (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/loopback-datasource-juggler/lib/observer.js:157:49)
    at Function.ObserverMixin._notifyBaseObservers (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/loopback-datasource-juggler/lib/observer.js:180:5)

Desktop (please complete the following information):

OS: openSUSE 15.1
npm --version: 6.9.0
node --version: v10.16.3

Cannot use 'in' operator to search for 'limit'

Hi,

Describe the bug

I am trying to fix the problem when retrieving the list of members linked to a document.

In the UI the following request is triggered when expanding the member list:
https://federated.panosc.ess.eu/api/documents/59848051?filter={%22include%22:[{%22relation%22:%22members%22,%22scope%22:{%22include%22:[{%22relation%22:%22person%22},{%22relation%22:%22affiliation%22}]}}],%22limit%22:50}

returning the error below:

{"error":{"name":"TypeError","status":500,"message":"Cannot use 'in' operator to search for 'limit' in 59848051","stack":"TypeError: Cannot use 'in' operator to search for 'limit' in 59848051\n    at /home/node/app/server/connectors/distributedConnector.js:147:70\n    at Array.map (<anonymous>)\n    at Function.remoteMethodProxy [as findById] (/home/node/app/server/connectors/distributedConnector.js:146:10)\n    at SharedMethod.invoke (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/shared-method.js:263:25)\n    at HttpContext.invoke (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/http-context.js:389:12)\n    at phaseInvoke (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/remote-objects.js:654:9)\n    at runHandler (/home/node/app/node_modules/loopback-phase/lib/phase.js:135:5)\n    at iterate (/home/node/app/node_modules/loopback-phase/node_modules/async/lib/async.js:146:13)\n    at Object.async.eachSeries (/home/node/app/node_modules/loopback-phase/node_modules/async/lib/async.js:162:9)\n    at runHandlers (/home/node/app/node_modules/loopback-phase/lib/phase.js:144:13)\n    at iterate (/home/node/app/node_modules/loopback-phase/node_modules/async/lib/async.js:146:13)\n    at /home/node/app/node_modules/loopback-phase/node_modules/async/lib/async.js:157:25\n    at /home/node/app/node_modules/loopback-phase/node_modules/async/lib/async.js:154:25\n    at execStack (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/remote-objects.js:493:7)\n    at RemoteObjects.execHooks (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/remote-objects.js:497:10)\n    at phaseBeforeInvoke (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/remote-objects.js:650:10)"}}

There are two things that I have noticed:

I have no error trace on my side. It makes me think that the request gets intercepted before calling the ESRF panosc search api deployed via icatplus.esrf.fr
The very same query seems to work on our local panosc deployement:
https://icatplus.esrf.fr/api/Documents/59848051?filter={%22include%22:[{%22relation%22:%22members%22,%22scope%22:{%22include%22:[{%22relation%22:%22person%22},{%22relation%22:%22affiliation%22}]}}],%22limit%22:50}

I was wondering what is wrong. I can see that for others (ESS, for instance) it works so I presume that is somehow related to my deployment but I can not understand how.

To Reproduce
Steps to reproduce the behavior:

Go to here
Compare with this

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Desktop (please complete the following information):

Browser [e.g. chrome, safari]

ilike/nilike on ESS's federated endpoint

Is your feature request related to a problem? Please describe.
I want to use the ilike operator as default for text field search. Currently not working as expected.
Describe the solution you'd like
I'd love it to work as their like/nlike couterparts but with the case-insensitiveness
Describe alternatives you've considered
If a specific query language will be required to do that, it can be added as middleware to the frontend adapter. Please don't.
Additional context
Add any other context or screenshots about the feature request here.

add documentation

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Questions on the API calls

Note: this issue is mostly a bunch of questions rather then a feature request or bug report.

I have some questions on the calls listed in the API explorer:

The search API seem to implement just all HTTP methods for all objects. Is this supposed to be like that in the final API? (I'd suggest the search API should only provide read access to most of our data catalogue content.)
What is GET /datasets/query supposed to do? In particular compared to GET /datasets.
What is PATCH /datasets supposed to do?

Add units

Add unit handling to search-api

Count of items returned for a specific query

Is your feature request related to a problem? Please describe.
Implementing pagination using skip & limit, only total resource count is available though (to my knowlage at least).
Describe the solution you'd like
It would feel more graceful on the frontend side if a total of items returned by specific query was also provided.
Describe alternatives you've considered
It's all right, I just check for an empty array to come and stop there...

Document the search API

I opened #19 and #20 I suggested a documentation of the API calls and the data model respectively. It is pretty obvious that this needs to be discussed together with the query syntax for the search calls. So we would need a similar draft documentation for that query syntax as a starting point for that discussion.

Add field translation to rest api backend connection

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Need to define what is included in the results of the API calls

The current API call documentation is not yet specific enough about the return values. In particular, it is not defined which related objects are to be included with a returned object. For instance, the GET /datasets/{id} call returning a dataset might need to include its parameters and files, otherwise there would be no way to retrieve those parameters or files at all.

I can see two possible options to achieve this: either

define for each call in the API a fixed set of related objects to include with the returned object, or
add a includes query parameter to allow the client to define what it needs to be included. E.g. a GET /datasets/{id}?includes=parameters&includes=files call would include the parameters and the files, but not the techniques with the dataset.

The latter option would be more versatile and easier to adapt to future extensions but also require a little more effort in the implementation. The first option would require very careful planning to avoid omitting things that turn out to be essential later on.

Furthermore, there are different options how the inclusion of related objects can be done: either

directly include the full JSON objects as in this example dataset:

{
    "id": 1385,
    "title": "e243856",
    "creationDate": "2017-05-04T03:04:57",
    "isPublic": false,
    "techniques": [
        {
            "name": "XMCD"
        },
        {
            "name": "NEXAFS"
        }
    ]
}

or include only the ids, as in:
```
{
    "id": 1385,
    "title": "e243856",
    "creationDate": "2017-05-04T03:04:57",
    "isPublic": false,
    "samples": [27, 45, 88]
}
```
The second sample related to this dataset could then retrieved in a subsequent GET /samples/45 call. This option would only work for Dataset, Document, Instrument, and Sample, because for other object types, we do not have a call to retrieve them later on.

Including both person and affiliation in member scope throws 500

Describe the bug
I didn't succeed in including two related models in one query (full document listing), I can include each of them separately but if I put both in one call then the server throws 500

To Reproduce
Steps to reproduce the behavior:
This query:

{
    "include": [{
        "relation": "datasets"
    }, {
        "relation": "members",
        "scope": {
            "include": [{
                "relation": "affiliation"
            }, {
                "relation": "person"
            }]
        }
    }]
}

%7B%22include%22%3A%5B%7B%22relation%22%3A%22datasets%22%7D%2C%7B%22relation%22%3A%22members%22%2C%22scope%22%3A%7B%22include%22%3A%5B%7B%22relation%22%3A%22affiliation%22%7D%2C%7B%22relation%22%3A%22person%22%7D%5D%7D%7D%5D%7D

Expected behavior
Return all documents with members including both person and affiliation...

Additional context
Sorry if the problem lies in the query...

Unit support in queries

The documentation of the query syntax currently states that units should be “a standard unit as currently defined in js-quantities”. But not all implementations of the API will use Loopback. The API specification should be technology agnostic. We need to check whether notation imposed by js-quantities is compatible with corresponding packages available for other technologies like Python and/or Java.

Furthermore we need to define what to do with parameters that are not expressed in these “standard units”. We also have such nice things as a.u. (Atomic units) in our dataset parameters here.

Multiple scopes result in 500 (search by parameter & include related data at the same time)

Describe the bug

{
   "include":[
      {
         "relation":"parameters",
         "scope":{
            "where":{
               "and":[
                  {
                     "name":"wavelength"
                  },
                  {
                     "value":{
                        "between":[
                           379,
                           1744
                        ]
                     }
                  },
                  {
                     "unit":"nm"
                  }
               ]
            }
         }
      },
      {
         "relation":"datasets"
      },
      {
         "relation":"members",
         "scope":{
            "include":[
               {
                  "relation":"affiliation"
               },
               {
                  "relation":"person"
               }
            ]
         }
      }
   ]
}

%7B%22include%22%3A%5B%7B%22relation%22%3A%22parameters%22%2C%22scope%22%3A%7B%22where%22%3A%7B%22and%22%3A%5B%7B%22name%22%3A%22wavelength%22%7D%2C%7B%22value%22%3A%7B%22between%22%3A%5B379%2C1744%5D%7D%7D%2C%7B%22unit%22%3A%22nm%22%7D%5D%7D%7D%7D%2C%7B%22relation%22%3A%22datasets%22%7D%2C%7B%22relation%22%3A%22members%22%2C%22scope%22%3A%7B%22include%22%3A%5B%7B%22relation%22%3A%22affiliation%22%7D%2C%7B%22relation%22%3A%22person%22%7D%5D%7D%7D%5D%7D

To Reproduce
using this db.json

Expected behavior
Return array with one document - 10.5072/panosc-document1 including the parameter, members with affiliation and person and its datasets.

Alternative

Provide a document endpoint with related models included by default.
Make multiple api calls from frontend in order to display data :-/

Additional context
Works without additional scope.

{
   "include":[
      {
         "relation":"parameters",
         "scope":{
            "where":{
               "and":[
                  {
                     "name":"wavelength"
                  },
                  {
                     "value":{
                        "between":[
                           696,
                           2000
                        ]
                     }
                  },
                  {
                     "unit":"nm"
                  }
               ]
            }
         }
      },
      {
         "relation":"datasets"
      },
      {
         "relation":"members"
      }
   ]
}

(And again: sorry if this is just me having bugs in the db file or query but it looks ok from my pov)

Divide list of experimental technique by facility type

Not sure if this is the right place.
Panosc WP3 has a confluence page with a list of experimental techniques.
It will be very useful if we could have this list divided by facility type:
Synchrotron
Neutron/Muon sources
FEL
Laser (ELI)

Non-functional technique filter

Describe the bug
Selecting technique "quasi-elastic scattering" returns no results
Searching for "QENS" using the text field returns 50+ results.

Same pattern is observed for other techniques

Hence, the technique field is not working as intended.

To Reproduce
See above

Expected behavior
See all data sets generated by the technique selected in the technique field

Additional context
In its current form, the results from using the technique field is misleading and may lead users to believe that there are no data of interest for them.

If this is due to lack of proper meta-data in the data sets and the associated facilities are not willing to fix that, I suggest that users receive a warning or that the functionality is removed.

Will authentication be implemented? How?

Will the Search API only expose open data or will some data require authentication? How will this be implemented? Will there be a token carried in the HTTP Headers?

(I suppose I have not understood what are Work Package 4's requirements for this API).

(Including @louise-davies and @agbeltran in the discussion)

Can a Document reference another Document?

In the DataModel, we have a Document entity - at the top level - which references datasets. I'm assuming that this entity is a catch-all for the different, top-level entities needed to map between facility data models and the PaNOSC data model.

The entity has a type field so we can customise it's meaning. But will this entity also be able to reference other Document entities? I'm think of the case where we model different hierarchies of entities. For example, we have 2 facilities using ICAT with different hierarchies:

ISIS: Study -> Facility Cycle -> Investigation -> Dataset -> Datafile
Diamond: Investigation -> Visit -> Dataset -> Datafile
where '->' is a one-to-many relationship.

(Including @louise-davies and @agbeltran in the discussion)

AND operator seems broken for parameters

Describe the bug
Searching for multiple items using the 'and' operator appears to be broken for parameters.

To Reproduce
Test dataset with parameter A and parameter B is not returned when parameter A and parameter B are queried using the AND operator, the dataset is returned when quering either parameter A or parameter B as well as when using the OR operator.

Expected behavior
Test dataset from above should be returned when using the AND operator.

~~EDIT: bug is not isolated to parameters, doesn't appear to be working at all.~~
EDIT: indeed works on things other than parameter. Tried searching for type AND title (using ilike) and got the wrong impression, it was just the ilike operator not working as I expected. #56

Need a query syntax capable of searching objects based on properties of related objects

In response of #21 a document describing a query syntax based on Loopback Where filter has been added. However, it seems that this will not be enough to suit our needs. Most queries in practice will need to select the searched objects not on properties of the object itself, but based on properties of related objects.

A few examples:

Use case: “I would like to search experiments of a certain technique”. This needs something like: SELECT ds FROM Dataset ds JOIN ds.techniques t WHERE t.name = '…'
Use case: “I would like to find my own data (experimental data from my proposals)”: SELECT ds FROM Dataset ds JOIN ds.document doc JOIN doc.members m JOIN m.person p WHERE doc.type = 'proposal' AND m.role = 'principle investigator' AND p.pid = '…'
Use case: “I would like to find proposals having a wavelength document parameter in a certain range”: SELECT doc FROM Document doc JOIN doc.parameters p WHERE p.name = 'wavelength' AND p.units = 'nm' AND p.value > '…' AND p.value < '…'
As opposed to use case: “I would like to find proposals having performed a measurement using a wavelength in a certain range”: SELECT doc FROM Document doc JOIN doc.datasets ds JOIN ds.parameters p WHERE p.name = 'wavelength' AND p.units = 'nm' AND p.value > '…' AND p.value < '…'
Use case: “I would like to find proposals having investigated a particular sample using a certain technique”: SELECT doc FROM Document doc JOIN doc.datasets ds JOIN ds.samples s JOIN ds.techniques t WHERE s.pid = '…' AND t.name = '…'

Note: I added pseudo SQL queries to the examples in order to illustrate the structure of the required search query based on our data model. I do not intend to suggest that the query syntax should be SQL like.

It is at least not clear how to spell queries like that using the query syntax as described in the documentation. In the discussion last week, sombody suggested to use GraphQL, which might be an option worth to be considered.

Digging the openapi.json as served by the API in /components/schemas suggests that the relevant objects with their properties are:

      "Dataset": {
          "pid",
          "title",
          "isPublic",
          "size",
          "creationDate"
        },

all properties required,

      "Document": {
          "pid",
          "type",
          "title",
          "internalID",
          "summary",
          "doi",
          "startDate",
          "endDate",
          "releaseDate",
          "license"
        },

with pid, type, and title being required,

      "Instrument": {
          "id",
          "name",
          "facility"
        },

all properties required, and

      "Sample": {
          "name",
          "pid",
          "description"
        },

with only name being required. Is that correct?

However, I still have some questions:

What are the relations between the objects? How and where are these
relations encoded?
What is the id in paths like /datasets/{id}? I assume it's some
internal id of the local data catalogue. But how may the client
retrieve it?
All objects have additionalProperties set to true in openapi.json.
Does that mean that the objects will have arbitrary additional
properties? Does something in the API implicitly assume the
presence of any particular additional property?
From the examples, it seems that parameters of the measurement are
supposed to be added as additional properties (e.g. temperature and
pressure). Is this correct?

panosc-eu / search-api Goto Github PK

search-api's People

Stargazers

Watchers

Forkers

search-api's Issues

Recommend Projects

Recommend Topics

Recommend Org