Giter Club home page Giter Club logo

anamr's Introduction

anamR

An R package of helper functions to read/write/process data from the Kaytetye database (aname is Kaytetye for 'helpless'). The lexicon is stored in a backslash-coded .txt file as shown below (first 10 lines):

 1 \lx ahe |1
 2     \lx_id 1
 3     \sk ah
 4     \hm 1
 5     \audio ahe
 6     \ta Stirling 450
 7     \sl MT JG
 8     \ps n.
 9     \sn 1
10         \de fight, quarrel, dispute, squabble, trouble
...
...

The function read.KDB() processes this text file into a table:

lineno indent tag content lx_id
1 0 lx ahe |1 1
2 4 lx_id 1 1
3 4 sk ah 1
4 4 hm 1 1
5 4 audio ahe 1
6 4 ta Stirling 450 1
7 4 sl MT JG 1
8 4 ps n. 1
9 4 sn 1 1
10 8 de fight, quarrel, dispute, squabble, trouble 1
... ... ... ... ...

Example

Get part of speech (ps tag) for each headword (lx, or lx_id)

library(anamR)
library(tidyr)

ps_df <- read.KDB("path/to/KDB.txt") %>%
            filter(tag == "ps")      %>%
            spread(tag, content)     %>%
            distinct(lx_id,
                     .keep_all = TRUE)

First 10 headwords and their parts of speech:

lineno indent lx_id ps
8 4 1 n
96 4 2 n
141 4 3 n
221 4 4 n
253 4 5 n
274 4 6 n
310 4 7 n
357 4 8 kin
382 4 9 kin
446 4 10 vi
... ... ... ...

Plot distribution of parts of speech in dictionary

library(ggplot2)

ps_df %>%
  group_by(ps) %>%
  tally()      %>%
  ggplot(aes(x = ps, y = n)) +
  geom_bar(stat = "identity")

Looks like there's a whole lot of nouns!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.