panosc-eu / search-api Goto Github PK
View Code? Open in Web Editor NEWPaN search api for WP3 and WP4
License: BSD 2-Clause "Simplified" License
PaN search api for WP3 and WP4
License: BSD 2-Clause "Simplified" License
In the Panosc Data Policy, we can see that we have:
PI (Principal Investigator)
experimental team
OpenAire is using DataCite metadata standard for harvesting metadata. In DataCite, we have
Creator: The main researchers involved working on the data, or the authors of the publication in priority order.
This means the PI should be the first one then followed by the experimental team.
Contributor: The institution or person responsible for collecting, creating, or otherwise contributing to the developement of the dataset. Here there is a controlled vocabulary list:
https://guidelines.openaire.eu/en/latest/data/field_contributor.html
More details can be found here: https://schema.datacite.org/meta/kernel-4.3/doc/DataCite-MetadataKernel_v4.3.pdf
Annex 1 p.32
Within Expands, the data policy and the data management processes are part of WP2. We are having regular meetings with WP2 so I can ask their view on using Datacite metadata standard as a reference.
Loopback supports queries using a JSON format and a square bracket format. Both versions are supported by this implementation. For example, I can query like this:
curl -g -X GET "http://localhost:3000/datasets?filter[where][pressure.value][gt]=50&filter[limit]=10&filter[skip]=0" -H "accept: application/json"
And like this:
curl -g -X GET 'http://localhost:3000/datasets?filter={"where":"pressure.value":"gt":50}},"limit":10,"skip":0}' -H "accept: application/json"
The Datagateway API uses the JSON version. Will both versions be supported by the PaNOSC search API specification?
(Including @louise-davies and @agbeltran in the discussion)
Is your feature request related to a problem? Please describe.
I want to get eg. documents with related models, however, if the related entry is not present, then the whole "row" goes missing. If I call like so:
{
"include": [{
"relation": "members"
}]
}
then all the results than do not have a corresponding member don't get returned at all.
Describe the solution you'd like
Return all results, even without the related models.
Instead of the missing object, return something else: null, string (n/A), ...(?).
Describe alternatives you've considered
Throw an error.
Additional context
Not something crazy important but it feels awkward.
Describe the bug
When trying the search example query, the server replies with a 500 "Internal Server Error"
To Reproduce
Steps to reproduce the behavior, follow the instructions from the README:
git clone [email protected]:panosc-eu/search-api.git
cd search-api
npm install
npm start
Server is running at http://134.30.210.106:3000
to appear.curl -g -X GET "http://134.30.210.106:3000/datasets?filter[where][pressure.value][gt]=50&filter[limit]=10&filter[skip]=0" -H "accept: application/json"
Expected behavior
Expected to see some search result in the second terminal.
Screenshots
The output from the curl command in the second terminal is:
{"error":{"statusCode":500,"message":"Internal Server Error"}}
The first terminal running the npm start
command shows a trace log:
Unhandled error in GET /datasets?filter[where][pressure.value][gt]=50&filter[limit]=10&filter[skip]=0: 500 Error: Unit not recognized
at new QtyError (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/js-quantities/build/quantities.js:180:17)
at parseUnits (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/js-quantities/build/quantities.js:794:13)
at Qty.parse (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/js-quantities/build/quantities.js:748:24)
at new Qty (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/js-quantities/build/quantities.js:872:13)
at convertUnits (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/src/repositories/dataset.repository.ts:37:15)
at processQuery (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/src/repositories/dataset.repository.ts:67:11)
at DatasetRepository.modelClass.observe (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/src/repositories/dataset.repository.ts:26:31)
at notifySingleObserver (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/loopback-datasource-juggler/lib/observer.js:162:24)
at /net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/async/dist/async.js:3110:16
at replenish (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/async/dist/async.js:1011:17)
at /net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/async/dist/async.js:1016:9
at eachLimit$1 (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/async/dist/async.js:3196:24)
at Object.<anonymous> (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/async/dist/async.js:1046:16)
at doNotify (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/loopback-datasource-juggler/lib/observer.js:159:11)
at doNotify (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/loopback-datasource-juggler/lib/observer.js:157:49)
at Function.ObserverMixin._notifyBaseObservers (/net/home/jsi/ExPaNDS/PaNOSC-WP3/search-api/node_modules/loopback-datasource-juggler/lib/observer.js:180:5)
Desktop (please complete the following information):
npm --version
: 6.9.0node --version
: v10.16.3Hi,
Describe the bug
I am trying to fix the problem when retrieving the list of members linked to a document.
In the UI the following request is triggered when expanding the member list:
https://federated.panosc.ess.eu/api/documents/59848051?filter={%22include%22:[{%22relation%22:%22members%22,%22scope%22:{%22include%22:[{%22relation%22:%22person%22},{%22relation%22:%22affiliation%22}]}}],%22limit%22:50}
returning the error below:
{"error":{"name":"TypeError","status":500,"message":"Cannot use 'in' operator to search for 'limit' in 59848051","stack":"TypeError: Cannot use 'in' operator to search for 'limit' in 59848051\n at /home/node/app/server/connectors/distributedConnector.js:147:70\n at Array.map (<anonymous>)\n at Function.remoteMethodProxy [as findById] (/home/node/app/server/connectors/distributedConnector.js:146:10)\n at SharedMethod.invoke (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/shared-method.js:263:25)\n at HttpContext.invoke (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/http-context.js:389:12)\n at phaseInvoke (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/remote-objects.js:654:9)\n at runHandler (/home/node/app/node_modules/loopback-phase/lib/phase.js:135:5)\n at iterate (/home/node/app/node_modules/loopback-phase/node_modules/async/lib/async.js:146:13)\n at Object.async.eachSeries (/home/node/app/node_modules/loopback-phase/node_modules/async/lib/async.js:162:9)\n at runHandlers (/home/node/app/node_modules/loopback-phase/lib/phase.js:144:13)\n at iterate (/home/node/app/node_modules/loopback-phase/node_modules/async/lib/async.js:146:13)\n at /home/node/app/node_modules/loopback-phase/node_modules/async/lib/async.js:157:25\n at /home/node/app/node_modules/loopback-phase/node_modules/async/lib/async.js:154:25\n at execStack (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/remote-objects.js:493:7)\n at RemoteObjects.execHooks (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/remote-objects.js:497:10)\n at phaseBeforeInvoke (/home/node/app/node_modules/loopback/node_modules/strong-remoting/lib/remote-objects.js:650:10)"}}
There are two things that I have noticed:
I was wondering what is wrong. I can see that for others (ESS, for instance) it works so I presume that is somehow related to my deployment but I can not understand how.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Desktop (please complete the following information):
Is your feature request related to a problem? Please describe.
I want to use the ilike operator as default for text field search. Currently not working as expected.
Describe the solution you'd like
I'd love it to work as their like/nlike couterparts but with the case-insensitiveness
Describe alternatives you've considered
If a specific query language will be required to do that, it can be added as middleware to the frontend adapter. Please don't.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Note: this issue is mostly a bunch of questions rather then a feature request or bug report.
I have some questions on the calls listed in the API explorer:
GET /datasets/query
supposed to do? In particular compared to GET /datasets
.PATCH /datasets
supposed to do?Add unit handling to search-api
Is your feature request related to a problem? Please describe.
Implementing pagination using skip & limit, only total resource count is available though (to my knowlage at least).
Describe the solution you'd like
It would feel more graceful on the frontend side if a total of items returned by specific query was also provided.
Describe alternatives you've considered
It's all right, I just check for an empty array to come and stop there...
I opened #19 and #20 I suggested a documentation of the API calls and the data model respectively. It is pretty obvious that this needs to be discussed together with the query syntax for the search calls. So we would need a similar draft documentation for that query syntax as a starting point for that discussion.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The current API call documentation is not yet specific enough about the return values. In particular, it is not defined which related objects are to be included with a returned object. For instance, the GET /datasets/{id}
call returning a dataset might need to include its parameters and files, otherwise there would be no way to retrieve those parameters or files at all.
I can see two possible options to achieve this: either
define for each call in the API a fixed set of related objects to include with the returned object, or
add a includes
query parameter to allow the client to define what it needs to be included. E.g. a GET /datasets/{id}?includes=parameters&includes=files
call would include the parameters and the files, but not the techniques with the dataset.
The latter option would be more versatile and easier to adapt to future extensions but also require a little more effort in the implementation. The first option would require very careful planning to avoid omitting things that turn out to be essential later on.
Furthermore, there are different options how the inclusion of related objects can be done: either
directly include the full JSON objects as in this example dataset:
{
"id": 1385,
"title": "e243856",
"creationDate": "2017-05-04T03:04:57",
"isPublic": false,
"techniques": [
{
"name": "XMCD"
},
{
"name": "NEXAFS"
}
]
}
or include only the ids, as in:
{
"id": 1385,
"title": "e243856",
"creationDate": "2017-05-04T03:04:57",
"isPublic": false,
"samples": [27, 45, 88]
}
The second sample related to this dataset could then retrieved in a subsequent GET /samples/45
call. This option would only work for Dataset, Document, Instrument, and Sample, because for other object types, we do not have a call to retrieve them later on.
Describe the bug
I didn't succeed in including two related models in one query (full document listing), I can include each of them separately but if I put both in one call then the server throws 500
To Reproduce
Steps to reproduce the behavior:
This query:
{
"include": [{
"relation": "datasets"
}, {
"relation": "members",
"scope": {
"include": [{
"relation": "affiliation"
}, {
"relation": "person"
}]
}
}]
}
%7B%22include%22%3A%5B%7B%22relation%22%3A%22datasets%22%7D%2C%7B%22relation%22%3A%22members%22%2C%22scope%22%3A%7B%22include%22%3A%5B%7B%22relation%22%3A%22affiliation%22%7D%2C%7B%22relation%22%3A%22person%22%7D%5D%7D%7D%5D%7D
Expected behavior
Return all documents with members including both person and affiliation...
Additional context
Sorry if the problem lies in the query...
The documentation of the query syntax currently states that units should be “a standard unit as currently defined in js-quantities”. But not all implementations of the API will use Loopback. The API specification should be technology agnostic. We need to check whether notation imposed by js-quantities is compatible with corresponding packages available for other technologies like Python and/or Java.
Furthermore we need to define what to do with parameters that are not expressed in these “standard units”. We also have such nice things as a.u.
(Atomic units) in our dataset parameters here.
Describe the bug
{
"include":[
{
"relation":"parameters",
"scope":{
"where":{
"and":[
{
"name":"wavelength"
},
{
"value":{
"between":[
379,
1744
]
}
},
{
"unit":"nm"
}
]
}
}
},
{
"relation":"datasets"
},
{
"relation":"members",
"scope":{
"include":[
{
"relation":"affiliation"
},
{
"relation":"person"
}
]
}
}
]
}
%7B%22include%22%3A%5B%7B%22relation%22%3A%22parameters%22%2C%22scope%22%3A%7B%22where%22%3A%7B%22and%22%3A%5B%7B%22name%22%3A%22wavelength%22%7D%2C%7B%22value%22%3A%7B%22between%22%3A%5B379%2C1744%5D%7D%7D%2C%7B%22unit%22%3A%22nm%22%7D%5D%7D%7D%7D%2C%7B%22relation%22%3A%22datasets%22%7D%2C%7B%22relation%22%3A%22members%22%2C%22scope%22%3A%7B%22include%22%3A%5B%7B%22relation%22%3A%22affiliation%22%7D%2C%7B%22relation%22%3A%22person%22%7D%5D%7D%7D%5D%7D
To Reproduce
using this db.json
Expected behavior
Return array with one document - 10.5072/panosc-document1 including the parameter, members with affiliation and person and its datasets.
Alternative
Additional context
Works without additional scope.
{
"include":[
{
"relation":"parameters",
"scope":{
"where":{
"and":[
{
"name":"wavelength"
},
{
"value":{
"between":[
696,
2000
]
}
},
{
"unit":"nm"
}
]
}
}
},
{
"relation":"datasets"
},
{
"relation":"members"
}
]
}
(And again: sorry if this is just me having bugs in the db file or query but it looks ok from my pov)
Not sure if this is the right place.
Panosc WP3 has a confluence page with a list of experimental techniques.
It will be very useful if we could have this list divided by facility type:
Synchrotron
Neutron/Muon sources
FEL
Laser (ELI)
Describe the bug
Selecting technique "quasi-elastic scattering" returns no results
Searching for "QENS" using the text field returns 50+ results.
Same pattern is observed for other techniques
Hence, the technique field is not working as intended.
To Reproduce
See above
Expected behavior
See all data sets generated by the technique selected in the technique field
Additional context
In its current form, the results from using the technique field is misleading and may lead users to believe that there are no data of interest for them.
If this is due to lack of proper meta-data in the data sets and the associated facilities are not willing to fix that, I suggest that users receive a warning or that the functionality is removed.
Will the Search API only expose open data or will some data require authentication? How will this be implemented? Will there be a token carried in the HTTP Headers?
(I suppose I have not understood what are Work Package 4's requirements for this API).
(Including @louise-davies and @agbeltran in the discussion)
In the DataModel, we have a Document entity - at the top level - which references datasets. I'm assuming that this entity is a catch-all for the different, top-level entities needed to map between facility data models and the PaNOSC data model.
The entity has a type field so we can customise it's meaning. But will this entity also be able to reference other Document entities? I'm think of the case where we model different hierarchies of entities. For example, we have 2 facilities using ICAT with different hierarchies:
ISIS: Study -> Facility Cycle -> Investigation -> Dataset -> Datafile
Diamond: Investigation -> Visit -> Dataset -> Datafile
where '->' is a one-to-many relationship.
(Including @louise-davies and @agbeltran in the discussion)
Describe the bug
Searching for multiple items using the 'and' operator appears to be broken for parameters.
To Reproduce
Test dataset with parameter A and parameter B is not returned when parameter A and parameter B are queried using the AND operator, the dataset is returned when quering either parameter A or parameter B as well as when using the OR operator.
Expected behavior
Test dataset from above should be returned when using the AND operator.
EDIT: bug is not isolated to parameters, doesn't appear to be working at all.
EDIT: indeed works on things other than parameter. Tried searching for type AND title (using ilike) and got the wrong impression, it was just the ilike operator not working as I expected. #56
In response of #21 a document describing a query syntax based on Loopback Where filter has been added. However, it seems that this will not be enough to suit our needs. Most queries in practice will need to select the searched objects not on properties of the object itself, but based on properties of related objects.
A few examples:
Use case: “I would like to search experiments of a certain technique”. This needs something like: SELECT ds FROM Dataset ds JOIN ds.techniques t WHERE t.name = '…'
Use case: “I would like to find my own data (experimental data from my proposals)”: SELECT ds FROM Dataset ds JOIN ds.document doc JOIN doc.members m JOIN m.person p WHERE doc.type = 'proposal' AND m.role = 'principle investigator' AND p.pid = '…'
Use case: “I would like to find proposals having a wavelength document parameter in a certain range”: SELECT doc FROM Document doc JOIN doc.parameters p WHERE p.name = 'wavelength' AND p.units = 'nm' AND p.value > '…' AND p.value < '…'
As opposed to use case: “I would like to find proposals having performed a measurement using a wavelength in a certain range”: SELECT doc FROM Document doc JOIN doc.datasets ds JOIN ds.parameters p WHERE p.name = 'wavelength' AND p.units = 'nm' AND p.value > '…' AND p.value < '…'
Use case: “I would like to find proposals having investigated a particular sample using a certain technique”: SELECT doc FROM Document doc JOIN doc.datasets ds JOIN ds.samples s JOIN ds.techniques t WHERE s.pid = '…' AND t.name = '…'
Note: I added pseudo SQL queries to the examples in order to illustrate the structure of the required search query based on our data model. I do not intend to suggest that the query syntax should be SQL like.
It is at least not clear how to spell queries like that using the query syntax as described in the documentation. In the discussion last week, sombody suggested to use GraphQL, which might be an option worth to be considered.
The API can deal with standard units (which has been narrowed down) and conversions.
We need to decide on any special provisions for mappings or conversions that need to be implemented (for example photon energy to wavelength).
Will there be an endpoint for the Files entity or is this deliberately excluded? I imagine one could make a case for or against but I am not aware if this had been decided.
(Including @louise-davies and @agbeltran in the discussion)
Note: this issue is mostly a bunch of questions rather then a feature request or bug report.
In the absence of documentation, it is not clear what the underlying schema or data model of the API is. That makes it difficult to assess the fitness and usability of the API.
Digging the openapi.json
as served by the API in /components/schemas
suggests that the relevant objects with their properties are:
"Dataset": {
"pid",
"title",
"isPublic",
"size",
"creationDate"
},
all properties required,
"Document": {
"pid",
"type",
"title",
"internalID",
"summary",
"doi",
"startDate",
"endDate",
"releaseDate",
"license"
},
with pid
, type
, and title
being required,
"Instrument": {
"id",
"name",
"facility"
},
all properties required, and
"Sample": {
"name",
"pid",
"description"
},
with only name
being required. Is that correct?
However, I still have some questions:
id
in paths like /datasets/{id}
? I assume it's someadditionalProperties
set to true in openapi.json.Add metadata formats
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.