Comments (5)
The original idea for _type.contents
envisaged arbitrary datavalue complexity, Part of the driver for this was the perceived need to put all loop key datanames together into a single 'master key' dataname. This approach has been dropped, so almost all datanames have a much simpler structure. Datanames with heterogeneous values are unlikely to be common, as each component would need to be treated in a different way and so would more natrually be defined separately. So my initial response is that we no longer need these complex structures, indeed the only place that _type.container
of Multiple
occurs is in the definition of _type.contents
, and the most complex _type.contents
that we have are those that @vaitkus has identified.
At this point I think it would be worth describing a simple grammar that only included points (1) and (2). I don't think it is worth developing a grammar that can describe arbitrarily complex data values until such data values actually appear, which I suspect is unlikely. At this point we can simplly reserve the relevant characters (&|*.).
It is correct to say that a _type.container
value of Multiple
implies that combinations of the items in the _type.contents
list becomes possible, and that this violates the principle that interpretation of one dataname should not depend on the value of another. One way around this is to define a separate attribute, e.g. _type.compound_contents
which contains this information. In this case _type.container
of Multiple
is replaced by List
or Single
. So we might have:
_type.container List
_type.contents Compound #New enumerated value
_type.compound_contents "List(Real,Integer,(Text,Integer),Text)" #The contents of each list element
How does that sound? The definition for _type.compound_contents
would include the grammar that you have developed.
A lot of these _type
attributes were developed to allow for transforming dREL methods into typed languages. So the transforming function would be able to emit the type of the function as well as the types of the parameters. Ideally we would make sure that we can preserve this ability.
from cif_core.
I fully agree that things should not be made too complex in advance. The proposed _type.compound_contents
seems quite flexible; I could easily modify the grammar to only include (1) and (2) for now.
By the way, the example that you provided (List(Real,Integer,(Text,Integer),Text)
) does not fit the grammar since any nested list must be prefixed with the List
keyword (the correct version in this case should be List(Real,Integer,(Text,Integer),Text)
). Is this acceptable or should I modify the grammar in some way? Also, should I include the Matrix
keyword in the grammar as well?
I am very happy that you are considering including the grammar in the description, however, I would really suggest defining a separate data item for that purpose. The data item could have the _type.purpose
value set to Encode
thus explicitly stating that it can be automatically processed (i. e. for automatically generating a parser). Having a separate data item would also open the door to eventually providing machine-parsable descriptions to other data items as well.
from cif_core.
Yes, I have made a mistake in the example, you are correct that List
was missing. Array
and Matrix
are also container types, so can be included, although it may be difficult to detect any difference between an Array
and a Matrix
purely by inspection of the structure or element type.
I am reluctant to define a separate data attribute for machine-readable grammars, mostly because very few dataitems in general should have structure. As I have explained above, any non-uniform structure within a datavalue means that datavalue could be decomposed into distinct parts. So I think we need to wait and see whether or not there is any need for descriptions of internal structure before defining a new data attribute.
from cif_core.
I have updated the grammar (type-contents-simplified.txt) in regards to your recent comments. I also included additional restrictions on the Matrix and Array containers based on their description in the _type.container
description -- these containers are not allowed to nest other containers and can only contain numerical data types. Please let me know if I should change that.
I agree that probably very few items will have an intricate internal structure, but even the simple ones would benefit from the formal grammar. For example, the _enumeration.range
description (The inclusive range of values "from:to" allowed for the defined item) leaves some room for interpretation:
- Can a range be defined without the lower (i.e. ':10') or the upper bound (i.e. '0:')?
- Can symbol ranges (i.e. 'a:z') be defined as well as number ranges?
- Can non-continuous ranges be defined (i.e. '1:11,12:42'), etc.?
Ambiguities like these can be clarified with several more sentences to the description and a few more examples, or/and a formal grammar (which would actually allow one to validate the provided examples). Of course, one can always add the explanatory grammar to the description, but so can the value of any other data item. Having a separate data item designated just for this purpose seems like a more ontologically elegant solution and it might even encourage developers of other DDLm dictionaries to include grammar definitions into their own dictionaries.
In any way, I would be glad for the grammar definitions to be included in any form.
from cif_core.
As the _type.container
and _type.contents
have been greatly simplified in recent revisions (i.e. the Multiple
container type was removed), the majority of this discussion is probably no longer relevant. However, I think that the idea of introducing an attribute intended to store a formal grammar that describes all possible attribute values is still worthwhile pursuing, so I might open a separate issue on this topic in the future.
from cif_core.
Related Issues (20)
- Why are most `templ_enum` default value listings keyed on `_atom_type.symbol`? HOT 1
- Add ability to formally record enumeration source? HOT 7
- `AUDIT_AUTHOR` missing some information
- units for dataitems with mixed units? HOT 12
- Unicode bug in layout checker? HOT 2
- Do we need a `_dictionary.DOI` attribute? HOT 11
- drel: interpretation of the geom_*() functions HOT 2
- is dREL formatting important? HOT 2
- Units for B (the atomic displacement parameter)? HOT 13
- wishlist of all the features we would like to eventually fix/address in DDLm 5/6/7 HOT 1
- Add example CIF files that illustrate the usage of the `ATOM_ANALYTICAL` category HOT 2
- cif_core.dic: categories `ATOM_ANALYTICAL_MASS_LOSS` and `ATOM_ANALYTICAL_SOURCE` should not have `ATOM_ANALYTICAL` as their parent HOT 6
- What is the `_type.source` of `SU`-related data items? HOT 10
- Are the concepts of primitive and non-primitive data still relevant in DDLm? HOT 2
- `CELL`-related saveframes in `templ_attr` need updating
- Create a category to hold structure description items HOT 14
- CIF2.0 line termination? HOT 1
- Proposal: split out multi-block-related data items to separate dictionary HOT 8
- Evaluation method of `_atom_site.type_symbol` HOT 3
- CIF2 grammar and `triple-quoted-string` and `quoted-string` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cif_core.