Giter Club home page Giter Club logo

itincknell / word-hoarder Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 0.0 4.04 MB

A program for creating a searchable local language dictionary based (mainly) on dumped wiktionary data. Allows user to collect definitions which can be exported as a machine readable flashcard file. Currently supports Latin, Ancient Greek and Old English.

License: GNU General Public License v3.0

Python 100.00%
ancient-greek ancient-languages anki anki-flashcards edtech flashcards latin-language old-english wiktionary wiktextract

word-hoarder's Introduction

Word-Hoarder

A program for creating a searchable local language dictionary based (mainly) on extracted wiktionary data. Allows users to collect definitions which can be exported as a machine readable flashcard file. Currently supports Latin, Ancient Greek and Old English.

Parsing Data

convert_file_utilities.py

This module processes extracted wiktionary data files which can be found at kaikki.org

See https://github.com/tatuylonen/wiktextract

The module looks for the files in a subfolder of main directory containing the source files: "kaikki_json_files".

The module organizes the data into a standard data structure used in this program.

the word definition data structure

Definitions are made of standard python data structures.

Definitions: { "heading": unicode string of the word as spelled in its original alphabet, "handle": heading converted to asci, "entries":[list of entry objects, see below], "tags":[list of identifying tag strings], "roots": heading of a root/lemma if the definition is not itself a root }

Entries: { "partOfSpeech": string "verb", "noun" etc., "principleParts": string representing principle parts, "simpleParts": simplified version of principle parts supported for Latin, "senses": [list of word 'sense' objects, typically displayed as an ordered list], "etymology": string containing etymology information }

Senses: { "gloss": string containing a word sense you would find in a single line of a definition in a typical dictionary, "tags": [tags related to a specific word sense such as "Pre-classical" or "transitive"] }

dictionary_LSJ.py and dictionary_Middle_Liddell.py

These modules are called when the language is set to Ancient Greek. They use machine readable files of two important Greek lexicon's: the Middle Liddel and Liddel-Scott-Jones (LSJ). The data files can be found here:

These files were originally made available by the Tufts University Perseus Digital Library.

dictionary_MLJohnson.py

This module is called when the language is set to Old English. It uses a text file containing Mary Lynch Johnson's A Modern English - Old English Dictionary.

get_simple.py

Called when the language is set to Latin. Changes the top line of most definitions to a simple string containing the 'principle parts' for verbs, nouns and adjectives. Other parts of speech are unchanged.

language_splitter.py

Organize parsed definitions into a datrie and saves data to a local file.

Using the dictionary

load_dict.py

Utility functions for creating personal dictionary files or "word hoards".

parser_shell.py

This is the principle module for interacting with datrie files. Contains functions for loading trie objects, searching and saving definitions to word hoards.

word_print_edit.py and edit_entry.py

Contain functions for editing and displaying word definitions and entries respectively.

Creating a flashcard file

Allows users to export a word hoard to a text file containing separator characters and html tags. The support is currently built around the file import tool in the Anki flashcard program. https://apps.ankiweb.net/

edit_dictionary.py

Contains the functions for printing formatted flashcards to a file.

Tables

This is an entire functionally separate part of the program. It fetches pages from wiktionary.org and parses the html text to find the morphology tables for Latin, Greek and Old English words. Supports nouns, verbs and adjectives in all three languages. The algorithms are quite involved and are not compatible across languages. The word forms are organized into nested dictionaries and saved into a template file. The template files can be used to created flashcards with various configurations.

Example: Front: present tense of verb "x", Back: Table showing present tense forms.

tables.py, tables_greek_ext.py, tables_latin_ext.py, tables_oe_ext.py

These modules support the morphology table functionality.

word-hoarder's People

Contributors

itincknell avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.