Giter Club home page Giter Club logo

acqdiv-database's Introduction

acqdiv-database (deprecated)

This repository contains the 2018 public-facing ACQDIV database that was used in our paper on frequent frames:

Moran, Steven, Damián E. Blasi, Robert Schikowski, Aylin C. Küntay, Barbara Pfeiler, Shanley Allen and Sabine Stoll. 2018. A universal cue for grammatical categories in the input to children: frequent frames. Cognition, 175, 131–140. DOI: https://doi.org/10.1016/j.cognition.2018.02.005.

Please find the latest public-facing ACQDIV database now archived in Zenodo: http://doi.org/10.5281/zenodo.3558641.

Below is the overview to the 2018 (deprecated) version.

Overview

This repository hosts the public-facing ACQDIV database. Currently, it includes longitudinal child language acquisition corpora in the ACQDIV database format for:

  • Indonesian (Gil & Tadmor, 2007)
  • Japanese MiiPro (Miyata & Nisisawa 2009, Nisisawa & Miyata 2009, Miyata & Nisisawa 2010, Nisisawa & Miyata 2010, Miyata 2012)
  • Japanese Miyata (Miyata 2004a,b,c, 2012)
  • Sesotho (Demuth 1992, 2015)

If you use our database or additional annotations in your research, please cite it as:

Moran, Steven, Robert Schikowski, Danica Pajović, Cazim Hysi and Sabine Stoll. 2016. The ACQDIV Database: Min(d)ing the Ambient Language. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 4423–4429. May 23-28, Portorož, Slovenia. Online: http://www.lrec-conf.org/proceedings/lrec2016/pdf/1198_Paper.pdf.

These corpora are available in various original input formats (e.g. CHAT, CHAT-XML) via the Child Language Data Exchange System (CHILDES) component of the TalkBank system and are made openly available under the Creative Commons license BY-NC-SA 3.0..

We have converted the original data formats into easily accessible tables and we have enriched the annotation data, in particular at the morpheme level. We also provide a linguistically-informed subset of grammatical classes for cross-linguistic research. For detailed information about the corpora and the data structures that we use, see the ACQDIV corpus manual.

The ground rules for using corpora from TalkBank and CHILDES are stipulated here and the use of individual corpora used in research should be cited accordingly:

https://talkbank.org/share/rules.html

The ACQDIV database also contains privately owned longitudinal child language acquisition corpora from languages that were selected from five clusters calculated via maximum diversity sampling (Stoll & Bickel, 2013) to achieve a typologically maximally diverse language sample:

  • Chintang (Stoll et al. 2015)
  • Cree (Brittain 2015)
  • Inuktitut (Allen Unpublished)
  • Russian (Stoll & Meyer 2008)
  • Turkish (Küntay et al. Unpublished)
  • Yucatec (Pfeiler Unpublished)

Access to these corpora is restricted by the project's Terms of Agreement. Contact Prof. Sabine Stoll for more information.

The research leading to these results has received funding from the European Unions Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 615988 (PI Sabine Stoll).

References

Allen, Shanley. Unpublished. Allen Inuktitut Child Language Corpus.

Brittain, Julie. Corpus of the Chisasibi Child Language Acquisition Study (CCLAS). http://childes.psy.cmu.edu/.

Demuth, Katherine. Demuth Sesotho Corpus. http://childes.psy.cmu.edu/.

Demuth, Katherine. 1992. Acquisition of Sesotho. In Dan Slobin (ed.), The Cross-Linguistic Study of Language Acquisition, vol. 3, 557-638. Hillsdale, N.J.: Lawrence Erlbaum Associates.

Gil, David & Uri Tadmor. 2007. The MPI-EVA Jakarta Child Language Database. A joint project of the Department of Linguistics, Max Planck Institute for Evolutionary Anthropology and the Center for Language and Culture Studies, Atma Jaya Catholic University. https://jakarta.shh.mpg.de/acquisition.php.

Küntay, Aylin Copty, Dilara Koçbaş, Süleyman Sabri Taşçı. Unpublished. Koç University Longitudinal Language Development Database on language acquisition of 8 children from 8 to 36 months of age.

Miyata, Susanne. 2004. Aki Corpus. Pittsburgh, PA: TalkBank. 1-59642-055-3.

Miyata, Susanne. 2004. Ryo Corpus. Pittsburgh, PA: TalkBank. 1-59642-056-1.

Miyata, Susanne. 2004. Tai Corpus. Pittsburgh, PA: TalkBank. 1-59642-057-X.

Miyata, Susanne. 2012. Japanese CHILDES: The 2012 CHILDES manual for Japanese.

Miyata, Susanne & Hiro Yuki Nisisawa. 2009. MiiPro – Asato Corpus. Pittsburgh, PA: TalkBank.

Miyata, Susanne & Hiro Yuki Nisisawa. 2010. MiiPro – Tomito Corpus. Pittsburgh, PA: TalkBank.

Miyata, Susanne. 2012. Japanese CHILDES: The 2012 CHILDES manual for Japanese.

Nisisawa, Hiro Yuki & Susanne Miyata. 2009. MiiPro – Nanami Corpus. Pittsburgh, PA: TalkBank.

Nisisawa, Hiro Yuki & Susanne Miyata. 2010. MiiPro – ArikaM Corpus. Pittsburgh, PA: TalkBank.

Pfeiler, Barbara. Unpublished. Pfeiler Yucatec Child Language Corpus.

Stoll, Sabine & Bickel, Balthasar. 2013. Capturing diversity in language acquisition research. Language Typology and Historical Contingency: In Honor of Johanna Nichols. Amsterdam: John Benjamins, pages 195–216.

Stoll, Sabine & Roland Meyer. 2008. Audio-visual longitudinal corpus on the acquisition of Russian by 5 children.

Stoll, Sabine, Elena Lieven, Goma Banjade, Toya Nath Bhatta, Martin Gaenszle, Netra P. Paudyal, Manoj Rai, Novel Kishor Rai, Ichchha P. Rai, Taras Zakharko, Robert Schikowski & Balthasar Bickel. 2015. Audiovisual corpus on the acquisition of Chintang by six children.

acqdiv-database's People

Contributors

bambooforest avatar

Stargazers

 avatar

Watchers

 avatar  avatar

acqdiv-database's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.