Giter Club home page Giter Club logo

Comments (9)

janvanrijn avatar janvanrijn commented on June 14, 2024

Your understanding about the two versions of the schema is correct. Please note that schema A is a subset of schema B. To me that makes perfect sense. The fields not present in schema a are the ID, the uploader_ID, the upload_date, the implements field(ignore for now), the source url and the binary url. These are all values made up by the server, which the client does not know at uploading time. Therefore, these fields should not be allowed in the XSD that the server uses for validating an uploaded implementation.

Also the minOccurs issue is not wrong in my opinion. When not providing a version, the server will determine one. This will be an in cremation of the highest version number of that implementation so far, 1.0 if none yet exists.

Please let me know if you do not agree on either of these points.

from openml.

berndbischl avatar berndbischl commented on June 14, 2024

Please note that schema A is a subset of schema B.

I understand. Actually that was kinda the criticism, because we MANUALLY have to make sure that that is indeed the case and no inconsistencies occur in the shared fields where somebody changed some field in one file but not the other.

Could you please go through the shared stuff of both XSDs and check whether they really are consistent across both files and tell me?

Also the minOccurs issue is not wrong in my opinion. When not providing a version,
the server will determine one. This will be an in cremation of the highest version
number of that implementation so far, 1.0 if none yet exists.

At least then the comment in the XSD is slightly misleading. But I disagree anyway, this sounds really error prone. Force the user to provide the version, IMHO. If you really want to automate stuff like this, do it the back-end tools that talk to the server.

from openml.

janvanrijn avatar janvanrijn commented on June 14, 2024

Point 1: I agree that this is not very efficient. I think we can indeed merge both schema's into one, since the implementation_upload.xsd is the subset of the implementation(download).xsd. If the user provides an illegal field while uploading, e.g., he tries to spoof the uploader field, no errors are thrown, but the field value can be ignored.

We have a similar construction for the dataset(_upload) schema's, do you suggest that I merge these to?

Point 2: I also agree on this. I can also alter this. Joaquin, do you agree on this?

from openml.

joaquinvanschoren avatar joaquinvanschoren commented on June 14, 2024

Yes I think it's fine to ask the user to explicitly define a version
number. This will force her to either start versioning or check whether
some form of versioning exists. We cannot guess this.

Still, we should make sure that we always know which version is newer, e.g.
by using the upload date.

Cheers,
Joaquin

On 8 September 2013 23:53, janvanrijn [email protected] wrote:

Point 1: I agree that this is not very efficient. I think we can indeed
merge both schema's into one, since the implementation_upload.xsd is the
subset of the implementation(download).xsd. If the user provides an illegal
field while uploading, e.g., he tries to spoof the uploader field, no
errors are thrown, but the field value can be ignored.

We have a similar construction for the dataset(_upload) schema's, do you
suggest that I merge these to?

Point 2: I also agree on this. I can also alter this. Joaquin, do you
agree on this?


Reply to this email directly or view it on GitHubhttps://github.com//issues/11#issuecomment-24030387
.

Dr. Ir. Joaquin Vanschoren

Leiden Institute of Advanced Computer Science (LIACS)
Universiteit Leiden
Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
office: 1.14
phone: +31 715 27 89 19
fax: +32 16 32 79 96
mobile: (+32) (0)497 90 30 69

from openml.

berndbischl avatar berndbischl commented on June 14, 2024

Point 1: I agree that this is not very efficient. I think we can indeed merge both schema's into one, since the
implementation_upload.xsd is the subset of the implementation(download).xsd. If the user provides an illegal
field while uploading, e.g., he tries to spoof the uploader field, no errors are thrown, but the field value can be
ignored.

Yes, I think we should do that.
So to be precise:

  • We have only one schema for upload and download time
  • "name" and "version" stay
  • "uploader" stays like it is. But it is documented: a) that the user does not have to provide this. b) If it is provided, it is ignored.
  • "id": Either remove this, because "name" + "version" is the id. But I would prefer (and I think Joaquin already mentioned that as well) to autogenerate an integer "id" on new upload. So we treat it like "uploader", it is ignored if provided on upload. You can (and probably) should still check whether name + version already exist and then throw an error when the user uploads.

We have a similar construction for the dataset(_upload) schema's, do you suggest that I merge these to?

It is the same problem so we should solve it the same way. Also I just noticed that what I proposed above for the implementation "id" seems to be exactly what we do for data sets so this would also make it more consistent?

from openml.

joaquinvanschoren avatar joaquinvanschoren commented on June 14, 2024

I agree with all of the above. Maybe document 'id' and 'uploader' as 'This
field is automatically filled out by the server. If a value is provided, it
will be ignored.', or something like that.

Cheers,
Joaquin

On 9 September 2013 01:16, berndbischl [email protected] wrote:

Point 1: I agree that this is not very efficient. I think we can indeed
merge both schema's into one, since the
implementation_upload.xsd is the subset of the
implementation(download).xsd. If the user provides an illegal
field while uploading, e.g., he tries to spoof the uploader field, no
errors are thrown, but the field value can be
ignored.

Yes, I think we should do that.
So to be precise:

  • We have only one schema for upload and download time
  • "name" and "version" stay
  • "uploader" stays like it is. But it is documented: a) that the user
    does not have to provide this. b) If it is provided, it is ignored.
  • "id": Either remove this, because "name" + "version" is the id. But
    I would prefer (and I think Joaquin already mentioned that as well) to
    autogenerate an integer "id" on new upload. So we treat it like "uploader",
    it is ignored if provided on upload. You can (and probably) should still
    check whether name + version already exist and then throw an error when the
    user uploads.

We have a similar construction for the dataset(_upload) schema's, do you
suggest that I merge these to?

It is the same problem so we should solve it the same way. Also I just
noticed that what I proposed above for the implementation "id" seems to be
exactly what we do for data sets so this would also make it more consistent?


Reply to this email directly or view it on GitHubhttps://github.com//issues/11#issuecomment-24031971
.

Dr. Ir. Joaquin Vanschoren

Leiden Institute of Advanced Computer Science (LIACS)
Universiteit Leiden
Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
mobile: (+32) (0)497 90 30 69

from openml.

janvanrijn avatar janvanrijn commented on June 14, 2024

Sorry for the late commit, but I finally merged the two files (also some additional checks needed to be done on the server, to ensure that users cannot spoof the uploader and id field.)

Currently implementing an id column in the implementation table.

from openml.

janvanrijn avatar janvanrijn commented on June 14, 2024

Changes in code are done. I cannot test it on the server yet, as there are some temporarily problems.

I will keep you informed.

from openml.

janvanrijn avatar janvanrijn commented on June 14, 2024

Tested it on server. I will close this issue.

from openml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.