They way I understand it: Impl ID = name + version (Both user chosen) <p dir="

Considering the current problems we encountered here <a class="issue

Well, the problem with <a class="issue-link js-issue-link" data-error-text="Failed to

For names I suggested above: [a-z] [A-Z] [0-9] [_ , -, .] <p dir

Potential problem / question regarding impl. ids about openml HOT 24 CLOSED

openml commented on June 14, 2024

Potential problem / question regarding impl. ids

from openml.

Comments (24)

joaquinvanschoren commented on June 14, 2024

Versions can be different depending on the used system: 1.2.3, r1234, 234

I do agree that we should define this more clearly though. Suggestions?

from openml.

berndbischl commented on June 14, 2024

Maybe we should simply test it with a very weird name first and then see, what bad things happen?
But my stomach feeling is that stuff like in my example above should not be allowed for IDs.
Maybe: (both for "name" and "version")
[a-z] [A-Z] [0-9] [_ , -, .] ???

from openml.

janvanrijn commented on June 14, 2024

I will implement a check for this in the next API update.

from openml.

berndbischl commented on June 14, 2024

Considering the current problems we encountered here

#20

I think defining a conservative naming scheme and checking for it on the server side is reasonable....

from openml.

joaquinvanschoren commented on June 14, 2024

Well, the problem with #20 related to the description field, not the name field, but yes, a conservative naming scheme sounds like a good idea. Any suggestions?

from openml.

berndbischl commented on June 14, 2024

For names I suggested above:

[a-z] [A-Z] [0-9] [_ , -, .]

But I think we should actually make a list first, where problems like this might occur. Basically all "free" user provided text. I suppose their are then two (?) categories: a) stuff that becomes like an Id / file name, etc.
b) text-like descriptions.

For a) I would try to be as conservative as possible, like the suggestion above. For b) I don't really know what exactly causes problems for you. Problem also is that b) will often come out of files / data / is generated, and I don't know how freely we can "throw stuff away" from it. Certainly not many user want to edit these text blobs manually, if they are not validated by the server.

from openml.

janvanrijn commented on June 14, 2024

I think this is important for all the XSD's that validate uploaded content. I added some datatypes, oml:system_string and oml:simple_string.

oml:system_string allows users to insert [a-z] [A-Z] [0-9] [_ , -, .] and is applied to all fields where we want a high restriction level, e.g., because these can occur in URLs. Examples are implementation:name, implementation:version, etc.
oml:simple_string allows [a-z] [A-Z] [0-9] [_ , -, .], comma's and white spaces. This is used for textual input, where we want to restrict the input, but when we do not need it to be URL friendly. implementation:creator and implementation:contributor are examples of such fields. We can extend the list of allowed characters even further.
All other fields (which are likely to accept machine generated content) are still xs:string.

However, we can compile a list of characters which are allowed on OpenML. The workbenches are then responsible for checking on these characters, and removing them (or replacing them) before uploading the XML. Of course, this happens without the user noticing. Any suggestions on this are welcome.

from openml.

berndbischl commented on June 14, 2024

a) I think it is a good idea of compiling such a list like you are doing above. I hope for simplicity's sake we can keep the type definitions simple.

b) Why do we need to restrict a field like "creator"? Just a question out of curiosity. Also, many people in Non-English countries have sometimes weird characters in their names.

c) For "free text fields" like "description" and such:
Have you already noticed that some input created problems on the server, even though it was a valid xs:string?

from openml.

janvanrijn commented on June 14, 2024

What I have done for now: I implemented the system_string datatype, which restricts limits a string to alphanumeric characters and underscore, dash and point. For fields like name, version and very obvious other fields (md5_hashes) the system_string is used, all other fields not.

a) I agree on that, but I think that for the time being we are good.

b and c) From what I understand is that you are hesitant to restrict the input of these fields, because it can give users (or the automated system) a hard time getting the format correct. I myself have not encountered any problems so far, but I can see that the more we restrict on this, the more likely it is that these problems occur.

from openml.

berndbischl commented on June 14, 2024

I asked about creator names because of stuff like German umlaute, French accents and so on. I am perfectly fine with that not being possible now - no umlaute in my name :).

And about problems with weird characters in text descriptions I simply asked out of curiosity to understand our current system better. I thought you already had to clean up some descriptions of UCI data sets because they caused problems? Or am I wrong? If not, what exactly was the problem?

from openml.

joaquinvanschoren commented on June 14, 2024

We previously had problems because of HTML tags in the textual descriptions.

If we can process everything in utf-8 that would be preferable, but I agree
it's not that urgent.

Cheers!
Joaquin

On Friday, 13 September 2013, berndbischl wrote:

I asked about creator names because of stuff like German umlaute, French
accents and so on. I am perfectly fine with that not being possible now -
no umlaute in my name :).

And about problems with weird characters in text descriptions I simply
asked out of curiosity to understand our current system better. I thought
you already had to clean up some descriptions of UCI data sets because they
caused problems? Or am I wrong? If not, what exactly was the problem?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-24369169
.

Dr. Ir. Joaquin Vanschoren

Leiden Institute of Advanced Computer Science (LIACS)
Universiteit Leiden
Niels Bohrweg 1, 2333 CA Leiden, The Netherlands
office: 1.14
phone: +31 715 27 89 19
fax: +32 16 32 79 96
mobile: (+32) (0)497 90 30 69

from openml.

mfeurer commented on June 14, 2024

Is there a reason why brackets in flow names are forbidden? My use case is that I want to store a pipeline in scikit-learn, and want to add the components to the name of the flow to distinguish between similar pipelines, for example:

sklearn.grid_search.RandomizedSearchCV(sklearn.pipeline.Pipeline(sklearn.preprocessing.data.StandardScaler,sklearn.pipeline.FeatureUnion(sklearn.preprocessing.data.PolynomialFeatures,sklearn.decomposition.pca.PCA),sklearn.ensemble.weight_boosting.AdaBoostClassifier(sklearn.tree.tree.DecisionTreeClassifier)))
sklearn.grid_search.RandomizedSearchCV(sklearn.pipeline.Pipeline(sklearn.preprocessing.data.StandardScaler,sklearn.pipeline.FeatureUnion(sklearn.preprocessing.data.PolynomialFeatures,sklearn.decomposition.pca.PCA),sklearn.ensemble.weight_boosting.RandomForestClassifier))

I want to use brackets here instead of the underscores as done for WEKA, because the flows contains nested components.

from openml.

janvanrijn commented on June 14, 2024

I updated XSD, ( and ) are now possible

from openml.

mfeurer commented on June 14, 2024

Thanks for your fast reply. I just tried this on the test server, it still works with plain strings like dummy, but not with complicated strings as shown above.

from openml.

joaquinvanschoren commented on June 14, 2024

Jan, did you perhaps only push the change to the production server?

On Fri, Apr 29, 2016 at 9:47 AM Matthias Feurer [email protected]
wrote:

Thanks for your fast reply. I just tried this on the test server, it still
works with plain strings like dummy, but not with complicated strings as
shown above.

—
You are receiving this because you modified the open/close state.

Reply to this email directly or view it on GitHub
#14 (comment)

from openml.

mfeurer commented on June 14, 2024

@joaquinvanschoren @janvanrijn are there any news on this?

from openml.

janvanrijn commented on June 14, 2024

Yes, I pushed this to the production server.

Now also on test server.

from openml.

mfeurer commented on June 14, 2024

It works now, thanks a lot

from openml.

mfeurer commented on June 14, 2024

Sorry to open this again, but could you please also allow commas to be part of the name?

from openml.

janvanrijn commented on June 14, 2024

Not so sure actually. As flow names are defined to be URL safe, according to HTTP specification comma's can not occur in a URL.

Need to check many things internally before being able to change this

from openml.

mfeurer commented on June 14, 2024

I didn't know that. I'll think about a different solution for now. Is it possible to have this in the future or do you think this would be too much work?

from openml.

janvanrijn commented on June 14, 2024

The answer wasn't a "no"!

It was more a "definitely not now"

from openml.

mfeurer commented on June 14, 2024

Sure, but I need a solution as soon as possible, it didn't sound like 'not now' is before the 10th of May ;)

from openml.

mfeurer commented on June 14, 2024

Brackets are actually working now.

from openml.

Potential problem / question regarding impl. ids about openml HOT 24 CLOSED

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent