There are 2 implementation schemas. a) <a href="https://raw.github.c

Please note that schema A is a subset of schema B. <p d

implementation schema(s) about openml HOT 9 CLOSED

openml commented on June 14, 2024

implementation schema(s)

from openml.

Comments (9)

janvanrijn commented on June 14, 2024

Your understanding about the two versions of the schema is correct. Please note that schema A is a subset of schema B. To me that makes perfect sense. The fields not present in schema a are the ID, the uploader_ID, the upload_date, the implements field(ignore for now), the source url and the binary url. These are all values made up by the server, which the client does not know at uploading time. Therefore, these fields should not be allowed in the XSD that the server uses for validating an uploaded implementation.

Also the minOccurs issue is not wrong in my opinion. When not providing a version, the server will determine one. This will be an in cremation of the highest version number of that implementation so far, 1.0 if none yet exists.

Please let me know if you do not agree on either of these points.

from openml.

berndbischl commented on June 14, 2024

Please note that schema A is a subset of schema B.

I understand. Actually that was kinda the criticism, because we MANUALLY have to make sure that that is indeed the case and no inconsistencies occur in the shared fields where somebody changed some field in one file but not the other.

Could you please go through the shared stuff of both XSDs and check whether they really are consistent across both files and tell me?

Also the minOccurs issue is not wrong in my opinion. When not providing a version,
the server will determine one. This will be an in cremation of the highest version
number of that implementation so far, 1.0 if none yet exists.

At least then the comment in the XSD is slightly misleading. But I disagree anyway, this sounds really error prone. Force the user to provide the version, IMHO. If you really want to automate stuff like this, do it the back-end tools that talk to the server.

from openml.

janvanrijn commented on June 14, 2024

Point 1: I agree that this is not very efficient. I think we can indeed merge both schema's into one, since the implementation_upload.xsd is the subset of the implementation(download).xsd. If the user provides an illegal field while uploading, e.g., he tries to spoof the uploader field, no errors are thrown, but the field value can be ignored.

We have a similar construction for the dataset(_upload) schema's, do you suggest that I merge these to?

Point 2: I also agree on this. I can also alter this. Joaquin, do you agree on this?

from openml.

joaquinvanschoren commented on June 14, 2024

Yes I think it's fine to ask the user to explicitly define a version
number. This will force her to either start versioning or check whether
some form of versioning exists. We cannot guess this.

Still, we should make sure that we always know which version is newer, e.g.
by using the upload date.

Cheers,
Joaquin

On 8 September 2013 23:53, janvanrijn [email protected] wrote:

Point 1: I agree that this is not very efficient. I think we can indeed
merge both schema's into one, since the implementation_upload.xsd is the
subset of the implementation(download).xsd. If the user provides an illegal
field while uploading, e.g., he tries to spoof the uploader field, no
errors are thrown, but the field value can be ignored.

We have a similar construction for the dataset(_upload) schema's, do you
suggest that I merge these to?

Point 2: I also agree on this. I can also alter this. Joaquin, do you
agree on this?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11#issuecomment-24030387
.

Dr. Ir. Joaquin Vanschoren

Leiden Institute of Advanced Computer Science (LIACS)
Universiteit Leiden
Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
office: 1.14
phone: +31 715 27 89 19
fax: +32 16 32 79 96
mobile: (+32) (0)497 90 30 69

from openml.

berndbischl commented on June 14, 2024

Point 1: I agree that this is not very efficient. I think we can indeed merge both schema's into one, since the
implementation_upload.xsd is the subset of the implementation(download).xsd. If the user provides an illegal
field while uploading, e.g., he tries to spoof the uploader field, no errors are thrown, but the field value can be
ignored.

Yes, I think we should do that.
So to be precise:

We have only one schema for upload and download time
"name" and "version" stay
"uploader" stays like it is. But it is documented: a) that the user does not have to provide this. b) If it is provided, it is ignored.
"id": Either remove this, because "name" + "version" is the id. But I would prefer (and I think Joaquin already mentioned that as well) to autogenerate an integer "id" on new upload. So we treat it like "uploader", it is ignored if provided on upload. You can (and probably) should still check whether name + version already exist and then throw an error when the user uploads.

We have a similar construction for the dataset(_upload) schema's, do you suggest that I merge these to?

It is the same problem so we should solve it the same way. Also I just noticed that what I proposed above for the implementation "id" seems to be exactly what we do for data sets so this would also make it more consistent?

from openml.

joaquinvanschoren commented on June 14, 2024

I agree with all of the above. Maybe document 'id' and 'uploader' as 'This
field is automatically filled out by the server. If a value is provided, it
will be ignored.', or something like that.

Cheers,
Joaquin

On 9 September 2013 01:16, berndbischl [email protected] wrote:

Point 1: I agree that this is not very efficient. I think we can indeed
merge both schema's into one, since the
implementation_upload.xsd is the subset of the
implementation(download).xsd. If the user provides an illegal
field while uploading, e.g., he tries to spoof the uploader field, no
errors are thrown, but the field value can be
ignored.

Yes, I think we should do that.
So to be precise:

We have only one schema for upload and download time

"name" and "version" stay

"uploader" stays like it is. But it is documented: a) that the user
does not have to provide this. b) If it is provided, it is ignored.

"id": Either remove this, because "name" + "version" is the id. But
I would prefer (and I think Joaquin already mentioned that as well) to
autogenerate an integer "id" on new upload. So we treat it like "uploader",
it is ignored if provided on upload. You can (and probably) should still
check whether name + version already exist and then throw an error when the
user uploads.

We have a similar construction for the dataset(_upload) schema's, do you
suggest that I merge these to?

It is the same problem so we should solve it the same way. Also I just
noticed that what I proposed above for the implementation "id" seems to be
exactly what we do for data sets so this would also make it more consistent?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11#issuecomment-24031971
.

Dr. Ir. Joaquin Vanschoren

Leiden Institute of Advanced Computer Science (LIACS)
Universiteit Leiden
Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
mobile: (+32) (0)497 90 30 69

from openml.

janvanrijn commented on June 14, 2024

Sorry for the late commit, but I finally merged the two files (also some additional checks needed to be done on the server, to ensure that users cannot spoof the uploader and id field.)

Currently implementing an id column in the implementation table.

from openml.

janvanrijn commented on June 14, 2024

Changes in code are done. I cannot test it on the server yet, as there are some temporarily problems.

I will keep you informed.

from openml.

janvanrijn commented on June 14, 2024

Tested it on server. I will close this issue.

from openml.

implementation schema(s) about openml HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent