Giter Club home page Giter Club logo

sunoikisis2019zg-eklogai's Introduction

SunoikisisDC 2019, University of Zagreb: From Annotated Text to Vocabulary Exercises

Authors: Neven Jovanović, Petar Soldo, Department of Classical Philology, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia

A Sunoikisis Digital Classics Session, Summer 2019

DOI

Zenodo record 3244012

Synopsis

Demonstrate how to use BaseX and XQuery to produce Anki spaced repetition vocabulary exercises from a set of morphologically annotated and lemmatized short texts in Greek.

Concentrate on reoccurring words, and on words which are very frequent in Greek (according to the Dickinson College Core Vocabulary list).

Produce three types of exercises:

  1. from the form to the lemma
  2. from the form to the grammatical description
  3. from words in the text to entries in the DC Greek Core vocabulary list (Croatian version, converted to XML)

How to use

The Greek texts, annotated in Arethusa (on Perseids), are in data directory.

The Croatian translation of Greek and Latin DC Core lists, converted to XML with some additional fields, is in grclatcore

The BaseX scripts are in scripts.

Activities

  1. Create the main database sunGreek with linguistically annotated Greek texts: createDbGreek.xq
  2. Create a DCC Greek list (with Croatian translations) as a BaseX database grclatcore: createDbGrcLatCore.xq

Analyze the collection

  1. For a given lemma, get a list of forms and POS tags in the collection: forLemmaGetFormPOStag.xq
  2. Create a list of lemmata: findLemma.xq
  3. Create a list of lemmata, order by frequency: findLemmaFrequency.xq
  4. Narrow the list to lemmata whose forms occur at least twice (and exclude punctuation): findLemmaFrequencyTwoPlus.xq
  5. Explore frequencies of linguistic annotations: getFrequenciesAttributes.xq (lemma, form, postag)

From repeated lemmata to Anki exercises

  1. For lemmata where f >= 2, get a list of occurring forms: fromLemmaToForms.xq
  2. For a pair of form and lemma, produce an Anki exercise: fromLemmaToAnki.xq
  3. Narrow to a specific number of occurrences: fromLemmaToAnkiNarrowNumber.xq
  4. Narrow to specific types of words (e. g. just inflected words: nouns, verbs, adjectives, pronouns): fromLemmaToAnkiNarrowMorphology.xq

Here a list of codes / attributes used for Greek in Arethusa is quite helpful.

From POS tags to Anki exercises

  1. Create a list of morphological descriptions (parts of speech, POS tags): findPOStag.xq
  2. Get frequency of morphological configurations: findPOStagFrequency.xq
  3. Select only POS tags for inflected forms, select frequent configurations (e. g. where f >= 14): findPOStagInflectedFrequency.xq
  4. For a set of POS tags, get forms, lemma, POS: retrievePOS.xq
  5. Produce Anki exercises asking for the lemma and morphological description of a given form: retrievePOSmapToWords.xq (with Arethusa / Alpheios morphological codes expanded)

From one text to vocabulary reoccurring in other texts

  1. Get vocabulary of one text: vocabularyOneText.xq
  2. Find lemmata reoccurring in other texts: vocabularyRepeatedInOtherTexts.xq
  3. Prepare Anki exercises for such lemmata: vocabularyRepeatedInOtherTexts.xq

From vocabulary to the DCC Greek list

  1. Find all DCC lemmata occurring in our texts: findWordsInDCCore.xq
  2. Produce a set of Anki exercises for these lemmata: DCCoreToAnki.xq

Anki

About the program: the Anki User Manual

Form of exercises to be imported into Anki (no field names necessary; the "tag" field can be omitted):


question ; answer ; tag
αὐτός αὐτή αὐτό ; on, isti ; grmorf01
καί ; i ; grmorf01
δέ ; a ; grmorf01
οὗτος αὕτη τοῦτο ; ovaj ; grmorf01

The results of BaseX scripts (...ToAnki) can be saved as text files (extension is not important), edited in a text editor (recommended, but just for pedagogical reasons -- to select what we want to teach and learn), and then imported into the Anki database (File / Import).

For better control, it is recommended to first add new user to Anki (Add / Open on the welcome screen).

License

CC-BY

sunoikisis2019zg-eklogai's People

Contributors

nevenjovanovic avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.