citrineinformatics / citrine-python Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Currently, each data object and template ("data concept") in taurus
is extended as a Resource in citrine-python
. That's no biggie for reads, since they expose the same read interface, but it can cause confusion when creating data. The citrine-python
version of data concepts must be used, but the class definitions for the rest of the data model still live in taurus. This creates code blocks like:
from citrine.resources.process_spec import ProcessSpec
from citrine.resources.project import Project
from taurus.entity.bounds.integer_bounds import IntegerBounds
from taurus.entity.value.nominal_integer import NominalInteger
It would be nice if the register
method would accept bare taurus
data concepts in addition to their resource versions. In that case, the register
method would also serve the purpose of converting from taurus
to citrine-python
: it would return the Resource sub-class. This way, users would only ever have to create data in the taurus
model but could still interact with an independent REST API.
In a fresh python 3.5 conda environment, I get:
In [1]: import citrine
Traceback (most recent call last):
File "/home/maxhutch/anaconda3/envs/py3.5/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-2b383dd42cdf>", line 1, in <module>
import citrine
File "/home/maxhutch/anaconda3/envs/py3.5/lib/python3.5/site-packages/citrine/__init__.py", line 2, in <module>
from citrine.citrine import Citrine # noqa: F401
File "/home/maxhutch/anaconda3/envs/py3.5/lib/python3.5/site-packages/citrine/citrine.py", line 7
DEFAULT_HOST: str = 'citrine.io'
^
SyntaxError: invalid syntax
Reproduction steps:
$ conda create -n py3.5 python=3.5
$ conda activate py3.5
$ pip install citrine
$ pip install ipython
$ ipython
In [1]: import citrine
If an ingredient column is empty, it doesnt show up in the gemtable. There should be an option to have unused ingredients show up in the gemtable.
example data source:
ID | ing 1 | ing 2 | ing 3 (this ingredient doesnt show up in gemtable) |
---|---|---|---|
type | Amount | Amount | Amount |
formulation 1 | 70 | 30 | |
formulation 2 | 50 | 50 | |
formulation 3 | 25 | 75 |
update
is defined for all collections (https://github.com/CitrineInformatics/citrine-python/blob/master/src/citrine/_rest/collection.py#L105), but it does not work for data concepts collections (https://github.com/CitrineInformatics/citrine-python/blob/master/src/citrine/resources/data_concepts.py#L215) because data objects do not have a .uid
field. Instead, to update a data object, one can re-call register
or use async_update
.
update
should be overridden for data concepts collections. The following are possible behaviors:
register
async_update
, wait for it to finish, then get and return the updated data objectregister
or async_update
, depending on their intentCurrently, if you try to assign str
to an int
field or a None
to a non-optional field, you get an error message like:
ValueError: None is not one of valid types: <class 'str'>!
which forces you to navigate the stacktrace in order to figure out where its coming from. It would be really helpful to include the field name and object type in that value error, e.g.:
ValueError: None is not one of valid types: <class 'str'> for MaterialRun.name!
Current predictor report documentation: https://github.com/CitrineInformatics/citrine-python/blob/master/docs/source/workflows/predictor_reports.rst
The example report JSON only contains feature importances, which is out of date. It should contain a set of descriptors and a sequence of models. Here's an example pulled from development:
{
"models": [
{
"name": "GeneralLosslessModel_1559617749",
"type": "GeneralLosslessModel",
"inputs": [
"x",
"y",
"z"
],
"outputs": [
"x"
],
"display_name": "ExpressionRelation_-9162134",
"model_settings": [
{
"name": "Expression",
"value": "(x) <- (x * y * z)",
"children": []
}
],
"feature_importances": []
}
],
"descriptors": [
{
"units": "",
"category": "Real",
"lower_bound": -1.7976931348623157e+308,
"upper_bound": 1.7976931348623157e+308,
"descriptor_key": "x"
},
{
"units": "",
"category": "Real",
"lower_bound": -1.7976931348623157e+308,
"upper_bound": 1.7976931348623157e+308,
"descriptor_key": "y"
},
{
"units": "",
"category": "Real",
"lower_bound": -1.7976931348623157e+308,
"upper_bound": 1.7976931348623157e+308,
"descriptor_key": "z"
},
{
"units": "",
"category": "Real",
"lower_bound": 0,
"upper_bound": 100,
"descriptor_key": "x"
}
]
}
The current exposition of the first technical page describing citrine-python describes access control in a place that distracts from the priority of describing how the pieces fit together. This can be minimized by linking to the access control section.
In the documentation which shows example code here:
https://github.com/CitrineInformatics/citrine-python/blob/master/docs/source/getting_started/code_examples.rst#create-a-linked-process-material-and-measurement
Condition
, Parameter
, and Property
are imported this way:
from citrine.attributes.condition import Condition
from citrine.attributes.parameter import Parameter
from citrine.attributes.property import Property
but citrine.attributes
has been depreciated and no longer exists. So running the example code results in
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-68-a74b6227c8ac> in <module>
----> 1 from citrine.attributes.condition import Condition
2 from citrine.attributes.parameter import Parameter
3 from citrine.attributes.property import Property
ModuleNotFoundError: No module named 'citrine.attributes'
if there is a project with say 12 items:
projects.list(page=2, per_page=20)
should be an empty list or none; Instead it is returning the last page that has items in it (in this case, page 1)
This issue does not affect datasets or data objects
File currently has minimal content. At a minimum it should include the required format for writing docstrings, a discussion of the linter requirements, and a note about the test coverage requirement.
Specifically the adaptive design capabilities
Right now, bad API requests only include the route in the stacktrace. If an error is encounted, a user has to go back and do somethign like:
try:
dataset.process_runs.register(process_run)
except BadRequest as e:
print(e.api_error)
to see the error. We should include the api_error in the exception message so that it shows up in a stacktrace.
cc: @lkubie @asantas93
All PolymorphicSerializable
implement a get_type(data)
class method that returns the underlying type given serialized data. In general, data
has some type
field with the name of the class, which is matched against a known list of types. Here are two different example implementations:
@classmethod
def get_type(cls, data) -> Type['Processor']:
return {
'Grid': GridProcessor,
'Enumerated': EnumeratedProcessor
}[data['config']['type']]
@classmethod
def get_type(cls, data) -> Type['Predictor']:
type_dict = {
"Simple": SimpleMLPredictor
}
typ = type_dict.get(data['config']['type'])
if typ is not None:
return typ
else:
raise ValueError(
'{} is not a valid predictor type. '
'Must be in {}.'.format(data['config']['type'], type_dict.keys())
)
Both implementations have a "type dictionary" and return the value in the type dictionary corresponding to a key that is pulled from data
. But the first implementation does not gracefully throw an exception if the type is not in the type dictionary, and neither catches the possibility that data
could be malformed (e.g., if data['config']['type']
does not exist).
Moving some of this logic and error checking to the abstract PolymorhpicSerializable
class would lead to code deduplication and standardize exceptions across all implementations of PolymorphicSerializable
.
When using jupyter, the representation of the object shown in the output cell is its __repr__
. By default, that only shows object members, which excludes soft links. This leads to the non-obvious behavior that repr(process_run)
doesn't include ingredients and repr(material_run)
doesn't include measurements.
We should override __repr__
in objects that include soft-links to show linked objects.
h/t @sesevgen
The file src/citrine/informatics/predictors.py is a monolith that contains every predictor. Create a "predictors/" directory and put individual files inside of it. Import the classes into "predictors/init.py" so that code that imports predictors does not break.
Right now, the requirements for testing are included in test_requirements.txt
but not in setup.py
. This isn't a bug, per se, but it could make it difficult for a developer that is building on top of citrine-python to include test-scoped dependencies (because pypi won't walk requirements files).
These test dependencies should be added with reasonably permissive version bounds, again so that a developer who is using citrine-python can flexibly test their own code.
is there any way to add some kind of 'alpha' warning when using the new Ara API?
from @andyczerwonka in #189
IngredientSpec and IngredientRun have a "unique_label" field, and ProcessTemplate has a "allowed_unique_label" field. Per the most recent docs, those should be changed to "name" and "allowed_names".
This change must be done after CitrineInformatics/gemd-python#3
Many of our PolymorphicSerializable
class hierarchies have shared members. It would be nice to bring these members (e.g. key
descriptors) into the parent class to communicate that they are part of the interface and to deduplicate their definition. However, the magik in those properties causes an error to be thrown when this is attempted, e.g.
E assert <citrine.informatics.descriptors.InorganicDescriptor object at 0x7fe6b29af630> == <citrine.informatics.descriptors.InorganicDescriptor object at 0x7fe6945453c8>
Interesting, I get a completely different error when I try to do this in #163 :
tests/ara/test_variables.py:9: in <module>
RootInfo("root name", ["Root", "Name"], "name"),
src/citrine/ara/variables.py:71: in __init__
self.short_name = short_name
src/citrine/_serialization/properties.py:114: in __set__
getattr(base_class, self.serialization_path).fset(obj, value_to_set)
E AttributeError: 'NoneType' object has no attribute 'fset'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.