slcladal / slcladal.github.io Goto Github PK

This is the website for the Language Technology and Data Analysis Laboratory (LADAL) which is part of the School of Languages and Cultures at the University of Brisbane, Australia.

Home Page: https://ladal.edu.au

License: Other

HTML 52.31% JavaScript 6.25% CSS 0.69% TeX 3.56% R 4.09% Jupyter Notebook 33.09% SCSS 0.01% Shell 0.01%

slcladal.github.io's Introduction

SLCLADAL.github.io

This is the GitHub repo of the website of the Language Technology and Data Analysis Laboratory (LADAL). The LADAL (pronouced lah’dahl) is a school-based, collaborative support infrastructure for digital and computational humanities established and maintained by the School of Languages and Cultures at the University of Queensland.

Goals

The LADAL aims to help develop computational and digital skills by providing information and practical, hands-on tutorials on data and text analytics as well as on statistical methods relevant for language research. In addition, the LADAL provides self-guided study materials relevant for computational Natural Language Processing. In order to be attractive to both beginners and people with advanced skills, the LADAL website covers topics and introduces methods relevant for people coming with different degrees of prior knowledge and experience - ranging from introductions to concepts of quantitative reasoning to step-by-step guides on advanced statistical modeling.

Since the primary concern of the LADAL is to introduce computational methods that are relevant to research involving natural language, the focus of this website is placed on linguistic data and methods relevant for text analytics. As such, the LADAL provides resources for (computational) text analytics and offers introductions to quantitative reasoning, research designs, and computational methods including data visualization and statistics. The areas covered on the LADAL website are

introductions to quantitative reasoning and basic concepts in empirical language studies.
introductions to R as programming environment for processing natural language data.
tutorials on data visualization and data analytics (statistics and machine learning).
tutorials on text analysis, text mining, distant reading, and corpus linguistics.

Contact

To get in touch, please feel free to contact us via email and stay up to date by following us on Twitter or Facebook. Webinars or recordings of other events are available on our YouTube channel.

Email: [email protected]

Twitter: @slcladal

Facebook: https://www.facebook.com/profile.php?id=100073328753218

YouTube: https://slcladal.github.io/opening.html

Audience

The LADAL resources are aimed at researchers in HASS (Humanities, Arts, and the Social Sciences) and we aspire to attract complete novices as well as expert users. And, while the focus of the LADAL website is placed on handling data that represents natural language, anyone who has an interest in quantitative methods, data visualization, statistics, or R is welcome to explore this webpage.

License & Citation

The LADAL website was created by Martin Schweinberger. It was freely released under GNU General Public License, Version 3, June 2007.

If you use (parts of) it for your own research or in your teaching materials, please cite the individual subpages as shown at the bottom of each page or reference it as:

Schweinberger, Martin. 2021. The Language Technology and Data Analysis Laboratory (LADAL). Brisbane: The University of Queensland, School of Languages and Cultures. url: https://slcladal.github.io/index.html.

slcladal.github.io's People

Contributors

Stargazers

Watchers

Forkers

anhnguyendepocen stephenclark restuadi311 grenwi akellerhals stragu complexbrains ddryl001 antonmalko seulette tpetric7 svetaepc diminera

slcladal.github.io's Issues

kwic.character() is deprecated in kwic.rmd

Looks like tokenisation will in the future be required before doing anything, according to the warning message:

'kwic.character()' is deprecated. Use 'tokens()' first.

Investigate Google Analytics issues

Migrate to Quarto?

Quarto seems like the future and avoids many of the limitations with base rmarkdown site building, but we don't need to migrate right away - it might be better to wait a bit for more of the process to be streamlined.

Most content will migrate seamlessly, the only major piece of work associated with the move is migrating the styling.

One issue that we need to investigate further is slow pandoc render times for some specific files. This isn't associated with the knitting step, but the pandoc step of turning markdown into html.

Error: `loops` is `FALSE`, but `x` contains loops.

Hi,
I am going through the tutorial https://slcladal.github.io/net.html
But there is an error when running

net = network::network(romeo,

```
                   directed = FALSE,
```

                   ignore.eval = FALSE,

                   names.eval = "weights")

The error message is:
Error: loops is FALSE, but x contains loops.
The following values are affected:
- x[1, 1:2]
- x[2, 1:2]
- x[7, 1:2]
- x[12, 1:2]

I am running on R4.1.1, would you please fix this error?

Separate knitting to .md from building html

So we can apply styling/header changes without doing the analytical work.

Challenge:

Do this while keeping knit workflows in place

Use of the term "word embedding" is surprising

Hi @MartinSchweinberger, in looking through some of the text analysis tutorials, you seem to use "word embedding" to denote the frequencies of a word's contexts. This is a bit jarring to me, where I understand an embedding as a representation in finite, real-valued vector space of a (usually) discrete entity, such as a word. The intention is that distance in the vector space should have meaningful correspondence in the entity space. For word embeddings, this may directly encode the frequencies of its contexts, or may encode a compressed version of that sparse embedding, expressing Harris's distributional hypothesis. Increasingly, however, word embeddings are not simple decompositions of the sparse frequency space, but are discriminatively trained to predict the word's contexts, and may be conditioned on other contextual information to handle polysemy.

We shouldn't assume that readers have access to a machine with that memory (apart from colab most online services are going to have much tighter memory limits, and 8GiB of ram is a common spec for a low end laptop).
This exceeds the limit of github actions on github hosted runners (7GiB), limiting our options or adding extra work if we start to automate more of the build in future.

Check links across the site

Migrate links to ladal.edu.au

Links in r examples should point at the new absolute address
Any other markdown link should be a relative link