Giter Club home page Giter Club logo

wals's Introduction

The World Atlas of Language Structures Online

CLDF validation

How to cite

If you use these data please cite

  • the original source

    Dryer, Matthew S. & Haspelmath, Martin (eds.) 2013. The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at https://wals.info)

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at https://wals.info

CLDF Datasets

The following CLDF datasets are available in cldf:

wals's People

Contributors

bibiko avatar mehmetumutmutlu avatar xrotwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

wals's Issues

Reclassify Eudeve

Eudeve should be moved out of the Cahita genus and placed in a separate new genus called Opata-Eudeve. (This brings WALS in line with Glottolog wrt this.)

Fix chapter 98 and 99

In chapters 98 and 99 change

For only one language has it been claimed that all noun phrases have the tripartite system, namely Warrungu (Pama-Nyungan; Australia; Tasaku Tsunoda, p.c.).

to

Languages where all noun phrases have the tripartite system are at best exceedingly rare; for some possible cases, see Dixon (1994: 40–41).

Merge Juat and Nyungar

The datapoints for the WALS language Juat should be combined with Nyungar (of which it is a dialect), thus removing Juat from WALS as a separate language.

Update datapoint Ladakhi / Tea

Change datapoint Ladakhi / Tea to "Words derived from Sinitic cha".

Ladakhi has got a grey dot, which indicates that its word for
tea would not be derived from Chinese /cha/. I don't know on which data
this is based and on what kind of error or misunderstanding. However,
the standard word is [ʧa ] - with or without low tone - corresponding to
Classical Tibetan /ja/, which is usually thought to be derived from
Chinese /cha/. (Lhasa) Tibetan, where the same word is found, has
accordingly got a red dot.

Reclassify Marind for 77A & 78A

Marind is currently classified in 77A for only having indirect evidentials, however it has both indirect and direct distinctions (Drabbe 1955, 125-128).
These evidentials are also bound morphemes on the verb and Marind should thus be classified as having such in 78A.
Drabbe 1955, 134 does not say anything about evidentials -- The information is found on pages 125-128.

Split Adamawa

The genus Adamawa in the Niger-Congo family needs to be broken up into the following six genera:

  • Day Genus
    • Day
  • Bua Genus
    • Gula Iro
    • Lua
  • Kim Genus
    • Kosop
  • Mbumic Genus
    • Kuo (Lakka)
    • Mambai
    • Mbum
    • Mundang
    • Tupuri
  • Samba-Duru Genus
    • Doyayo
    • Samba Leko
    • Yag Dii
  • Mumuye-Yandang Genus
    • Mumuye

WALS languoid ID mappings

I don't know if this is useful, but I noticed these things in the WALS "language.csv" download file:

Name of huaa1248 in the language.csv download file is "Err:520", it should be "=|Hoan" or "Amkoe"

There were some languoid entries that lacked glottcode and/or iso code. I think these are the appropriate matchings in those cases:

Name wals_code iso_code Glottocode Comment
Arapesh (Abu) aab aah abua1245
Ayomán ayo ayom1234
Berber (Figuig) bfg grr tasn1238
Johari joh rgk rang1266
Juat jua nys nyun1247
Kenyah (Uma' Lung) keu ulu umal1238
Kriol (Fitzroy Corssing) kfc rop krio1252 Ideally it'd be better to find the exact dialect code
Mongol (Khamnigan) mkh kham1281
Mixe (Ayutla) mxx mxp tlah1239
Nahuatl (Huauchinango) nhu ncj nort2957
Nahuatl (Milpa Alta) nmp nhm more1259
Kualan kua sdm sema1269
Russian-Chinese Pidgin (Birobidjan) rcp kjac1234
Romani (Sepecides) rse balk1252
Tommo So tms tomm1242
Tasmanian (Oyster Bay to Pitwater) toy brun1235 a bit unsure
Tupi tup pog poti1237

These ones I'm unsure about.

Name wals_code Comment
Lughat al-Isharat al-Lubnaniya lgh Lebanese sign, doesn't have glottocode of its own. Maybe should be mapped to Jordanian Sign Language?
Tasmanian tsm Very unclear which Tasmanian language this should be mapped to

Merge Sikuani and Guahibo

Two of the current WALS languages are actually the same language (and not just dialects of the same language), namely Sikuani and Guahibo. These need to be merged into a single WALS language and let's follow Glottolog in calling the language Guahibo and moving the Sikuani data (and source) to Guahibo.

Toussian

The entry for Toussian [tou] is specifically Southern Toussian, which is the same as Win Toussian [tow], so WALS tou and tow need to be merged as Southern Toussian with WALS code tou and iso code wib.

Split Genus Sepik

The Middle Sepik genus within the Sepik family needs to be split up into two genera:

  • Nukuma genus: Kwoma
  • Ndu genus: the other five languages

Changes to classification of Marind

A couple of related changes to the genealogical classification of the set of languages that WALS currently classifies as the Marind family.

In the current version, this family contains two genera, South Bird’s Head and Marind Proper. Both of these need changing:

  1. South Bird’s Head is removed from this family and is a separate family on its own, with that name.

  2. Currently, South Bird’s Head is a single genus. The new South Bird’s Head family should contain two genera:

    • Inanwatan (to which Inanwatan belongs)
    • South Bird’s Head Proper (to which Arandai belongs)
  3. The four remaining languages in what is currently the Marind family all fall within what is called the Marind Proper genus. The family needs to be renamed Anim and the genus needs to be renamed Marind.

Split Gbaya Kara

The WALS entry for Gbaya Kara includes data for two separate languages and will need to be split.

The datapoints based on Tucker and Bryan (1966) are for a separate language Gbaya (Southwest), ISO code [gso] and needs a new WALS page. We can use a new WALS code ‘gbs’ for this language

The remaining datapoints, based on the other three sources are correct, though it would be better to rename Gbaya Kara as Gbaya (Northwest) (keeping the same WALS code and ISO code).

Split family Kwomtari-Baibai

The current WALS has a Kwomtari-Baibai family with two genera, Fas and Kwomtari. These two genera should be treated as separate families, where Fas is renamed Baibai-Fas (as the name of both family and genus) and Kwomtari is the name of both family and genus.

Fix reference Cotterell-1964

The bibliographic information in WALS for Cotterell 1964
is incorrect (as is the copy of it in Glottolog). Instead of

Cotterell, F. P. 1964. Expansion processes in Amharic syntax. (Africa Language Studies, 5.) London: University of London School of Oriental and African Studies.

it should say

Cotterell, F. P. 1964. Expansion processes in Amharic syntax. African Language Studies 5: 1-16.

Split language Bira

It turns out that the data we have for Bira is actually for two separate languages (the two with the two ISO codes bip and brf), so this needs to be split into two languages, Bila and Bera.

We should keep the WALS code bia for Bila [bip], changing the name from Bira to Bila, and add a new language for Bera, for which we could use the WALS code brq.

The data points that use Kutsch Lojenga 2003 as the source should go with Bila, those that use Meinhof 1938/39 should go with Bera.

Fix Jurchen

First, we need to move the datapoints for every feature value under
Jurchen, except for the
datapoint for 9A, to Nanai. The datapoint for 9A
will remain as the only datapoint for Jurchen. The only reference for
Jurchen will be the Pevnov reference.

Second, the reference

Hang, Yanchang and Xi, Hang and Shuyan, Ai. 1989. The Hezhen Language. Lin University Press.

should be changed to

Zhang, Xi and Tai, Shu-yen and Zhang, Yanchang, 1989. The Hezhen Language: Ho-che-yū. [Chʿang-chʿun shih]: Jilin University Press.

and this reference is the reference for the datapoints being moved from
Jurchen to Nanai.

Add reference for 108A-bsq

The following reference

Zuñiga F.; Fernández, B. In press. Grammatical relations in Basque.
In Balthasar Bickel & Alena Witzlack-Makarevich (eds.) Handbook of
Grammatical Relations. Amsterdam: John Benjamins.

needs to be added as source for the datapoint Basque / Antipassive Constructions

Eventually, we will need to change the year.

Split Chinook (Upper)

It has been brought to my attention that the WALS entry for Chinook (Upper) conflates two separate languages which need to be separated in WALS.

The datapoints where Hymes (1955) is the source need to be removed from this WALS language and placed as a separate language called Kathlamet, for which we can use the WALS code ktm (there is no ISO code for this language, though the Glottocode is kath1253).

There is also the issue of location, but I would want to use a new location for Upper Chinook as well:

  • Kathlamet: 46.14 N to 123.4 W
  • Chinook (Upper): 45.62 N to 121.16 W

Fix Nenets varieties

WALS distinguishes the following two languages:

Quite apart from the fact that ideally we should try to avoid having separate “lects” where one is a sub-variety of the other, it turns out that all the datapoints for nen (with a small exception noted below) are actually specifically datapoints for ntu. Hence all the datapoints for nen should be moved to ntu.

We should rename the page for nen as Nenets (Forest), the other variety of Nenets. (The two languages are similar to the extent Dutch and German are, even though they do not have distinct ISO codes.)

The only minor complication is that two chapters (70 and 71) cite two sources, one of which is a source for Tundra Nenets, the other a source for Forest Nenets. I suggest that these datapoints should be “split” in the sense that the values will be recorded under both varieties. The source for ntu for chap 70 will be Salminen 1998: 534, 536; the source for ntu for chapter 71 will be Salminen 1998: 534. The source for nen for chap 70 should be Sammallahti 1974: 81-82 while the source for nen for chapter 71 should be Sammallahti 1974: 83. These will be the only data points for nen.

Re-classify Karkar-Yuri

WALS has a language isolate Karkar-Yuri, also the name of both family and genus.

Recently, people have realized that Karkar-Yuri belongs to the Pauwasi family, though none of the other Pauwasi languages are in WALS.

So the family that Karkar-Yuri belongs to should thus be changed to Pauwasi and the genus to Eastern Pauwasi.

Reclassify Baining-Taulil

The family Baining-Taulil should be split into two families, based on the two genera (Baining, Taulil) and the names of the two new families should be the same as the name of the one genus in that family.

Update datapoints based on Xiong 1983

WALS has two datapoints for Eastern Mnong (http://wals.info/languoid/lect/wals_code_mge). But it turns out that this in in error. The source for these two datapoints

Xiong, Lang, Joua Xiong & Nao Leng Xiong. (1983) English-Mong-English
dictionary = Phoo txhais lug Aakiv-Moob-Aakiv. Milwaukee, WI: Xiong
Partnership.

is actually a source for Hmong Daw (http://wals.info/languoid/lect/wals_code_hmd), so the two datapoints, with source, should be moved to that language.

Split Yi data into Nuosu and Yi (Wuding-Luquan)

It turns out that the data and sources for Yi are actually for two completely different languages (both in the Burmese-Lolo genus, but in different branches). My intention has been not to add any new languages this time, but perhaps we should make an exception here since we are really splitting a language in two rather than adding a language.

One language is Nuosu. We will keep the WALS code yi for Nuosu. We need to add Yi as a name under "Other". Its location needs to be changed to 28° N, 103° E. Otherwise, the language information remains the same.

The other language is Yi (Wuding-Luquan), which is the new language, whose information is as follows:

  • Ethnologue name: Yi, Wuding-Luquan
  • ISO-code: ywq
  • Glottolog code: wudi1238
  • Other name: Nasu
  • WALS code: yiw
  • Location: 25° 30′ N, 102° 30′ E

The source for Nuosu is

Yiyu Jianzhi. [A brief description of the Yi language] by Chen Shilin, Bian Shiming and Xiuqing, Li 1985

while the source for Yi (Wuding-Luquan) is

Yiyu yufa yanjiu [A study on Yi grammar] by Gao, Huanian 1958

When I next submit revised data, I will be submitting data separately, but every item in the current version that gives Chen et al as source will be for Nuosu and every item that gives Gao as source will be for Yi (Wuding-Luquan). I don't think you need concern yourself with that, as long as Gao doesn't show up on the reference list for Nuosu.

Evidentiality in East Caucasian

The maps on evidentiality contain a few inaccuracies regarding the languages of the East Caucasian / Nakh-Daghestanian family.

Map 78A. Coding of Evidentiality

  • Lak is classified as having "No grammatical evidentiality", while it has both an indirect evidential tense form and clitics marking hearsay and inference (Friedman 2007), i.e. a "Mixed system".

  • Archi is classified as having a "Verbal affix or clitic", while it has an indirect evidential perfect and several derived unwitnessed tenses (Tatevosov 2001) (= "Part of the tense system"), as well as a reported speech clitic (-er), which can function as a quotative (marking an utterance as being a quote, often embedded under a verb of speech) or as a reportative, indicating that a statement is based on hearsay (Kibrik 1977: 231-232) (= "Verbal affix or clitic"). So in my opinion, it should be classified as having a "Mixed system" as well.

  • The inclusion of Batsbi as having a mixed system on the other hand is questionable. I cannot be sure what this classification is based on, since the dataset does not reference specific forms, but to my knowledge, Batsbi features only one type of marking, which can be interpreted as either "Part of the tense system" or "Verbal affix or clitic".

According to (Holisky & Gagua 1994), who are also cited by the author, evidentiality in Batsbi is expressed with two distinct affixes (-lo and -no) marking unwitnessed events. Each affix attaches to specific tenses to form their unwitnessed counterparts. They could be construed as verbal affixes, since they are distinct affixes that contribute a specific meaning, rather than repurposed tense forms such as perfects turned unwitnessed pasts, as is the case in other related languages. At the same time, they are part of the tense system, and at least one of them (-no) plausibly originates from the development of the perfect tense.

(There is also a clitic used to mark quoted utterances (aino). Possibly, it can be used as a hearsay marker on occasion, similar to -er in Archi, but that remains unclear.)

Map 77A. Semantic distinctions of evidentiality

  • For some reason, Batsbi is classified as distinguishing "Direct and indirect" evidentiality, while there is no mention of direct marking in the cited source (Holisky & Gagua 1994).

References

Friedman, V.A. 2007. 'The expression of speaker subjectivity in Lak (Daghestan)'. In: Zlatka Guentchéva & John Landabaru (eds.) L’Énonciation médiatisée II, 351—376.
Louvain/Paris/Dudley MA: Peeters.

Holisky, D.-A. and R. Gagua. 1994. 'Tsova-Tush (Batsbi)'. In: Riek Smeets (ред.) The indigenous languages of the Caucasus. Volume 4. Part 2, 147—212. Delmar NY: Caravan Books.

Kibrik, A.E. 1977. Opyt strukturnogo opisanija arčinskogo jazyka. Tom II Taksonomičeskaja grammatika [The structural description of Archi. Volume II Taxonomical grammar]. Moscow: Izdatel'stvo Moskovskogo Universiteta.

Tatevosov, S.G. 2001. 'From resultatives to evidentials: Multiple uses of the perfect in Nakh-Daghestanian languages'. Journal of Pragmatics 33(3). 443—464.

errors in ch Gender Distinctions in Independent Personal Pronouns: French, Italian

The database lists French and Italian as having gender in the 3rd singular only. But of course they distinguish gender in the 3rd plural as well. I checked the cited reference for French to see if it had some strange way of thinking, but it doesn't: it explains the complex evolving nature of French pronouns from a gender system to an animacy+sex system, but it doesn't mention the plural pronouns at all, let alone say they behave differently.

Datapoint for feature 138A and language wals_code_lad

Ladakhi is listed with its word for tea not derived from Chinese /cha/. However,
the standard word is [ʧa ] - (with or without low tone) corresponding to
Classical Tibetan /ja/, which is usually thought to be derived from
Chinese /cha/. (Lhasa) Tibetan, where the same word is found, is listed correctly.

Fula corrections

One of the sources for many datapoints for Fulfulde (Adamawa) is Arnott (1970). However, Arnott (1970) actually deals with Fulfulde (Nigerian) not Fulfulde (Adamawa). What needs to be done, at least in the long run, is to move these from Fulfulde (Adamawa) to Fulfulde (Nigerian). This is not straightforward, however, since many of the datapoints for Fulfulde (Adamawa) are based both on Arnott and on one or more sources that are correctly sources for Fulfulde (Adamawa). What I suggest is that you leave all the datapoints in my chapters that use Arnott as a source where they are now, and wait until I next update my WALS chapters, which will “automatically” take care of this. However, for other people’s chapters, you will have to do something (though you might wait until I update my data). For other people’s chapters, these datapoints need to be moved to Fulfulde (Nigerian):

  • 36A Associative same as additive plural The Associative Plural Arnott 1970: 400; Labatut 1973: 62
  • 49A No morphological case-marking Number of Cases Arnott 1970: 139-148, App. 3
  • 50A No case-marking Asymmetrical Case-Marking Arnott 1970: 139-148, App. 3
  • 58A Absent Obligatory Possessive Inflection Arnott 1970
  • 58B None reported Number of Possessive Nouns Arnott 1970
  • 59A Two classes Possessive Classification Arnott 1970
  • 72A Maximal system Imperative-Hortative Systems Arnott 1970: 248-252, 300-302
  • 74A Affixes on verbs Situational Possibility Arnott 1970: 300
  • 75A Affixes on verbs Epistemic Possibility Arnott 1970: 274f.
  • 76A Overlap for either possibility or necessity Overlap between Situational and Epistemic Modal Marking Arnott 1970: 302-304
  • 100A Accusative Alignment of Verbal Person Marking Arnott 1970: 183, 212
  • 102A Both the A and P arguments Verbal Person Marking Arnott 1970: 212
  • 103A No zero realization Third Person Zero of Verbal Person Marking Arnott 1970: 212
  • 104A P precedes A Order of Person Markers on the Verb Arnott 1970: 212
  • 107A Present Passive Constructions Arnott 1970: 179
  • 125A Deranked Purpose Clauses Arnott 1970: 380
  • 126A Balanced/deranked 'When' Clauses Arnott 1970: 38, 320-1, 326
  • 127A Balanced Reason Clauses Arnott 1970: 38
  • 136A M-T pronouns, paradigmatic M-T Pronouns Arnott 1970
  • 136B m in first person singular M in First Person Singular Arnott 1970
  • 137A No N-M pronouns N-M Pronouns Arnott 1970
  • 137B m in second person singular M in Second Person Singular Arnott 1970

There is one complication and that is that chapter 36 uses
Arnott 1970: 400; Labatut 1973: 62
and Arnott is a source for Fulfulde (Nigerian) while Labatut is a source for Fulfulde (Adamawa). I suggest that this datapoint be moved to Fulfulde (Nigerian) and that Labatut simply be removed as a source.

Fifth, the WALS entry for Fulfulde (Nigerian) (=Fula (Nigerian)) at http://wals.info/languoid/lect/wals_code_fni lists feature values for features 95A, 96A, and 97A, which is necessarily erroneous, since these three features are based on other features and this language is not coded for those other features. (However, once the data for Arnott is moved to Fulfulde (Nigerian), there WILL be data for these.) But this raises the question whether there might be other errors of this sort. If it's not too difficult, would it be possible for you to write a script to see whether there are any other languages with values for one or more of these three features, but no value for one or both of the features that feature is based on:

  • 95A is based on 83A and 85A
  • 96A is based on 83A and 90A
  • 97A is based on 83A and 87A

Re-classify Jabutí

WALS has a language family Jabutí, consisting of one genus of the same name, that consisting of one language of the same name.

This genus should be treated as a genus within the Macro-Ge family, rather than as a separate family.

Split genus Northern Atlantic

The genus Northern Atlantic should be split up into the following six genera:

  • Bak: Balanta, Diola-Fogny, Diola-Kasa, Manjaku, Mankanya
  • Tenda: Basari, Konyagi
  • Biafada: Biafada
  • Peul-Serer: Fula (Burkina Faso), Fula (Cameroonian), Fula (Guinean), Fula (Mauritanian), Fula (Nigeria), Fulani (Gombe), Fula (Senegal), Fulfulde (Maasina), Ful (Liptako)
  • Cangin: Ndut, Noon, Palor
  • Wolof: Wolof

Basque interrogatives

I believe Feature 93A "Position of Interrogative Phrases in Content Questions" has the wrong value ("Interrogative phrases not obligatorily initial") for Basque. Basque interrogative phrases, both of the "no-" (nor, nori, nork, non, nondik, nora, etc.) or of the "ze-" form (zer, zein, etc.) or phrases containing them systematically appear at the beginning of the sentence:

Patxik Liburua erosi du
Patxi-ERG Book-DET-ABS bought has-it
'Patxi has bought a book"

Zer erosi du Patxik?
What-ABS bought has-it Patxi-ERG
'What has Patxi bought?'

This seems to be the general behaviour, although there is a description of dialects that do not behave so: https://muse.jhu.edu/article/539650/pdf

Error in reference Shafeev-1964

There is an error in http://wals.info/refdb/record/Shafeev-1964

Shafeev, D. A. 1964. A Short Grammatical Outline of Pashto. (Indiana Research Center in Anthropology, Folklore and Linguistics Publications, 33.) In Paper, Herbert H. (ed.) Bloomington: Indiana University.

should say

Shafeev, D. A. 1964. A Short Grammatical Outline of Pashto. (Indiana Research Center in Anthropology, Folklore and Linguistics Publications, 33.) Bloomington: Indiana University.

Missing source in chapter 20

I notice that chapter 20 in the online WALS (http://wals.info/chapter/20) is missing a reference in “See … for careful discussion” in the following:

Conflicting evidence is found, for example, in Lakhota (Siouan; North and South Dakota), where the future tense marker -kta is part of the same phonological word as the verb stem with regard to morphophonological rules, but not apparently with regard to syllabification (see for careful discussion).
I consulted the author who says that the hard copy says "See Russell 1999 for careful discussion."

where the reference in the bibliography is:

Russell, Kevin. 1999. The "word in two polysynthetic languages. In T. Alan Hall and Ursula Kleinhenz, eds., Studies on the Phonological Word 203-221. Amsterdam: Benjamins.

I suspect that the source of this may be that this source is not the source for any datapoint.

Update datapoint Haida / Alignment of Case Marking of Pronouns

One of the datapoints for feature 99A needs to be changed, namely the datapoint for Haida, must be changed from Neutral to Active-Inactive. This also decreases the number of Neutral languages from 79 to 78 and increases the number of Active-Inactive languages from 3 to 4.

In addition, two example sentences need to be replaced, as follows:

daa-hl@ gyaaxa
you.SG.AGT-IMP stand
‘you stand up!’

dang-gw@ q’ud-uus?
you.SG.PAT-Q be.hungry-BIASED
‘you’re hungry, aren’t you?’

Source: Enrico 2003, p. 121, 137

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.