clld / clldclient Goto Github PK

DEPRECATED. Do not use anymore!

License: Apache License 2.0

Python 100.00%

clldclient's Introduction

clld

The clld toolkit - a web framework for the publication of Cross-Linguistic Linked Data.

Documentation for the code base and its use is available at https://clld.readthedocs.io/en/latest/. The source for this documentation is in the docs directory.

How to cite

To cite the clld software, please cite the presentation introducing it

Forkel, R., & Bank, S. (2014, October 7). The clld toolkit. Language Comparison with Linguistic Databases: RefLex and Typological Databases, Nijmegen. Zenodo. https://doi.org/10.5281/zenodo.10846846

Usage

Once the initial steps (installation, bootstrapping a new project) have been done helped by the online documentation, the biggest resource to guide further development of a clld app is the wealth of existing apps. (Note: GitHub's "Used by" links - created from the dependency graph data - are really helpful here!) The following pointers are meant as a starting point to solve specific problems by perusing the code of other apps.

Integrating language metadata from Glottolog: There's a plugin for that and here's the list of apps on GitHub using it: https://github.com/clld/clld-glottologfamily-plugin/network/dependents
Displaying (data on) phylogenetic laguage trees: There's a plugin for that and here's the list of apps on GitHub using it: https://github.com/clld/clld-phylogeny-plugin/network/dependents
Displaying cognacy relations between words: There's a plugin for that and here's the list of apps on GitHub using it: https://github.com/clld/clld-cognacy-plugin/network/dependents
Displaying phoneme inventories as IPA charts: There's a plugin for that and here's the list of apps on GitHub using it: https://github.com/clld/clld-ipachart-plugin/network/dependents
Integrating audio recordings of lexical data: There's a plugin for that and here's the list of apps on GitHub using it: https://github.com/clld/clld-audio-plugin/network/dependents
Rendering CLDF Markdown in the context of the app: There's a plugin for that.
Aggregating data from multiple CLDF datasets: The app serving the Intercontinental Dictionary Series does this. Very simple per-dataset metadata of the form
```
{
  "id": "ids-cosgrovevoro",
  "repo": "https://github.com/intercontinental-dictionary-series/cosgrovevoro",
  "doi": "10.5281/zenodo.4280576",
  "order": 2
}
```
is read and used to populate the database, see https://github.com/clld/ids/blob/master/ids/scripts/initializedb.py#L38-L67
Aggregating data from different CLDF modules: While most clld apps are concerned with just one type of data (e.g. typological questionnaires as in WALS, or wordlists as in IDS), some have a different focus (e.g. TuLaR (Tupían Language Resources)). The TuLaR app aggregates data which is curated in several datasets, bundled under a Zenodo community, see https://github.com/tupian-language-resources/tular/blob/main/tular/scripts/initializedb.py
Using Charis SIL fonts: using SIL's Charis fonts on a clld page is simple. Here's an example https://ids.clld.org/valuesets/1-100-316
- Include the relevant style sheet (which will pull in the font resources): https://github.com/clld/ids/blob/b2884e06a53a0a3c7a0dc27955c314869d0a31aa/ids/templates/ids.mako#L10-L12
- Then assign the appropriate css class: https://github.com/clld/ids/blob/b2884e06a53a0a3c7a0dc27955c314869d0a31aa/ids/templates/unit/detail_html.mako#L6

See

clldclient's People

Contributors

Stargazers

Watchers

Forkers

pombredanne lingulist anaphory

clldclient's Issues

Glottolog client raises "AttributeError" when language is not in glottolog

Instead of a more meaningful error message, the error raised by glottolog.languoid when a language id cannot be found in glottolog is the following AttributeError traceback.

  File "p:\my documents\database\lexibank\lexibank\scripts\util.py", line 79, in import_dataset
    languoid = glottolog.languoid(lang_id)
  File "p:\my documents\database\clldclient\clldclient\glottolog.py", line 40, in languoid
    return self.resource(code, 'language')
  File "p:\my documents\database\clldclient\clldclient\database.py", line 224, in resource
    rtype = get_resource_type(res.content, URIRef(res.canonical_url))
AttributeError: 'NoneType' object has no attribute 'content'

Provide map of resource type names to app-specific names in cli

The command line interface should have an option to discover/list resource type names (to be used when downloading tables) and the app-specific names; e.g. parameter in WALS is feature.

Dynamic tables must advertise the constraints they support

The command line interface to download tables must provide a way to list all constraints one may use with a specific table.

example gives error

I tried the bash example in the readme

clld-download-table wals.info value --parameter 1A | in2csv -f json | csvstat

It gives the following error:

Traceback (most recent call last):
File "/usr/local/bin/clld-download-table", line 11, in
sys.exit(download_table())
File "/usr/local/lib/python3.5/site-packages/clldclient/cli.py", line 40, in download_table
*_{name: getattr(args, name) for name in constraints if getattr(args, name)})
File "/usr/local/lib/python3.5/site-packages/clldclient/database.py", line 251, in table
return Table(rsc, self, strip_html=strip_html, *_constraints)
File "/usr/local/lib/python3.5/site-packages/clldclient/table.py", line 18, in init
if name in self.client.dataset.resource_types:
File "/usr/local/lib/python3.5/site-packages/clldclient/database.py", line 212, in dataset
self._dataset = Dataset(self.get(), self, 'dataset')
File "/usr/local/lib/python3.5/site-packages/clldclient/database.py", line 48, in init
assert res.mimetype == 'application/rdf+xml'
File "/usr/local/lib/python3.5/site-packages/clldclient/cache.py", line 67, in mimetype
return self.content_type.split(';')[0].strip()
File "/usr/local/lib/python3.5/site-packages/clldclient/cache.py", line 63, in content_type
return self.headers['content-type']
KeyError: 'content-type'
Expecting value: line 1 column 1 (char 0)

The headers of a response must be a case-insensitive dict

cache.Response must use a case-insensitive dictionary for the headers, maybe the implementation in requests.structures.

support for batch download of data from datatables

It should be possible to access (i.e. download) data from all DataTables within clld apps (in all reasonable formats), including sort and filter criteria.

It is not clear, though, what the interface to this functionality should look like. One possibility: Pass a URL to a page within a clld app, get a list of DataTable instances on that page, then choose one of these. Sorting and filtering would also have to be done based on the column names that appear on the HTML page. While this looks a lot like screen scraping rather than a proper API, it probably is the most appropriate/convenient way to implement this, since the DataTables themselves are already heavily geared towards the human user and not an API client.

TypeError: a bytes-like object is required, not 'str'

Trying to use the lexibank code for my languages, I run into a similar problem to #4. Just like @cysouw, I am using python3.5. I have adapted lexibank to work with python3 and clld 2 (changing print statements to functions calls, replacing jsonread() calls with json.read(open()) calls, substituting clld.utils with clldutils, those kinds of things).

Now I have added a singular very minimal data entry referencing a glottolog language in my data, with my dummy entry in the data.csv looking as follows.

"Value","Feature_ID","Language_ID"
"e","http://someconcept.db/e","http://glottolog.org/resource/languoid/id/tetu1245"

Running the initializedb.py script to add this to the database, clldclient dies with the following traceback.

ERROR
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "p:\my documents\database\clld_core\clld\scripts\util.py", line 234, in initializedb
    create(args)
  File "p:/My Documents/Database/lexibank/lexibank/scripts/initializedb.py", line 40, in main
    import_cldf(os.path.join(datadir, provider, 'cldf'), provider)
  File "p:\my documents\database\lexibank\lexibank\scripts\util.py", line 124, in import_cldf
    import_dataset(os.path.join(dirpath, fname), provider)
  File "p:\my documents\database\lexibank\lexibank\scripts\util.py", line 73, in import_dataset
    languoid = glottolog.languoid(row['Language_ID'])
  File "c:\Users\me\AppData\Local\Continuum\Anaconda3\lib\site-packages\clldclient-1.3.2-py3.5.egg\clldclient\glottolog.py", line 40, in languoid
    return self.resource(code, 'language')
  File "c:\Users\me\AppData\Local\Continuum\Anaconda3\lib\site-packages\clldclient-1.3.2-py3.5.egg\clldclient\database.py", line 226, in resource
    rtype = get_resource_type(res.content, URIRef(res.canonical_url))
  File "c:\Users\me\AppData\Local\Continuum\Anaconda3\lib\site-packages\rdflib-4.2.1-py3.5.egg\rdflib\term.py", line 206, in __new__
    if not _is_valid_uri(value):
  File "c:\Users\me\AppData\Local\Continuum\Anaconda3\lib\site-packages\rdflib-4.2.1-py3.5.egg\rdflib\term.py", line 77, in _is_valid_uri
    if c in uri: return False
TypeError: a bytes-like object is required, not 'str'

This behaviour is functionally identical no matter whether the entry in the cache exists or not.
When the cache entry does not exist, cache.Cache.add() executes the encoding explicitly, and when it does exist, the database returns the raw bytes objects.
Either I did something wrong while testing* – or it does not help to define the corresponding responses table columns as String(convert_unicode=True) either.

I would place the source of this error in database.Database.get(), because the types of attributes in database.Database.resource() are still str before that call, but that function reads raw bytes from the database without decoding.

I assume there are two ways to solve this issue:
Either add explicit decoding to Database.get(), or be cleverer than myself about database column types and set the column type logic in the responses table to some appropriate value of “Unicode in python 2 and String in python 3” and let sqlalchemy deal with the coding issues.

*) Which is quite likely, it took me a while to get the grasp of the program flow, and I may have missed saving once before investigating that branch, or I may have had a case of “cache is empty, encoding is done explicitly in add, not due to being read from the db” when trying to test it, or I missed something about how columns are defined, or something else. Going to retest.

PHOIBLE client

A PHOIBLE-specific client should be available, allowing convenient access to the feature system associated with phonemes.