Giter Club home page Giter Club logo

slcladal.github.io's Introduction

Beautiful UQ

SLCLADAL.github.io

This is the GitHub repo of the website of the Language Technology and Data Analysis Laboratory (LADAL). The LADAL (pronouced lah’dahl) is a school-based, collaborative support infrastructure for digital and computational humanities established and maintained by the School of Languages and Cultures at the University of Queensland.

LADAL logo UQ

Goals

The LADAL aims to help develop computational and digital skills by providing information and practical, hands-on tutorials on data and text analytics as well as on statistical methods relevant for language research. In addition, the LADAL provides self-guided study materials relevant for computational Natural Language Processing. In order to be attractive to both beginners and people with advanced skills, the LADAL website covers topics and introduces methods relevant for people coming with different degrees of prior knowledge and experience - ranging from introductions to concepts of quantitative reasoning to step-by-step guides on advanced statistical modeling.

Since the primary concern of the LADAL is to introduce computational methods that are relevant to research involving natural language, the focus of this website is placed on linguistic data and methods relevant for text analytics. As such, the LADAL provides resources for (computational) text analytics and offers introductions to quantitative reasoning, research designs, and computational methods including data visualization and statistics. The areas covered on the LADAL website are

  • introductions to quantitative reasoning and basic concepts in empirical language studies.

  • introductions to R as programming environment for processing natural language data.

  • tutorials on data visualization and data analytics (statistics and machine learning).

  • tutorials on text analysis, text mining, distant reading, and corpus linguistics.

Contact

To get in touch, please feel free to contact us via email and stay up to date by following us on Twitter or Facebook. Webinars or recordings of other events are available on our YouTube channel.

Email: [email protected]

Twitter: @slcladal

Facebook: https://www.facebook.com/profile.php?id=100073328753218

YouTube: https://slcladal.github.io/opening.html

Audience

The LADAL resources are aimed at researchers in HASS (Humanities, Arts, and the Social Sciences) and we aspire to attract complete novices as well as expert users. And, while the focus of the LADAL website is placed on handling data that represents natural language, anyone who has an interest in quantitative methods, data visualization, statistics, or R is welcome to explore this webpage.

License & Citation

The LADAL website was created by Martin Schweinberger. It was freely released under GNU General Public License, Version 3, June 2007.

CC BY-SA

If you use (parts of) it for your own research or in your teaching materials, please cite the individual subpages as shown at the bottom of each page or reference it as:

Schweinberger, Martin. 2021. The Language Technology and Data Analysis Laboratory (LADAL). Brisbane: The University of Queensland, School of Languages and Cultures. url: https://slcladal.github.io/index.html.

slcladal.github.io's People

Contributors

martinschweinberger avatar samhames avatar antonmalko avatar benfoley avatar seulette avatar stragu avatar kjdallaston avatar grenwi avatar

Stargazers

DATAUNIRIO avatar Nikolaos Papachristou avatar Leah avatar Ben avatar Ece Aybike Ala avatar  avatar  avatar  avatar  avatar Francisco Rowe avatar

Watchers

James Cloos avatar Kostas Georgiou avatar Shraddha avatar

slcladal.github.io's Issues

kwic.character() is deprecated in kwic.rmd

Looks like tokenisation will in the future be required before doing anything, according to the warning message:

'kwic.character()' is deprecated. Use 'tokens()' first.

Migrate to Quarto?

Quarto seems like the future and avoids many of the limitations with base rmarkdown site building, but we don't need to migrate right away - it might be better to wait a bit for more of the process to be streamlined.

Most content will migrate seamlessly, the only major piece of work associated with the move is migrating the styling.

One issue that we need to investigate further is slow pandoc render times for some specific files. This isn't associated with the knitting step, but the pandoc step of turning markdown into html.

Error: `loops` is `FALSE`, but `x` contains loops.

Hi,
I am going through the tutorial https://slcladal.github.io/net.html
But there is an error when running

net = network::network(romeo,

  •                    directed = FALSE,
    
  •                    ignore.eval = FALSE,
    
  •                    names.eval = "weights")
    

The error message is:
Error: loops is FALSE, but x contains loops.
The following values are affected:
- x[1, 1:2]
- x[2, 1:2]
- x[7, 1:2]
- x[12, 1:2]

I am running on R4.1.1, would you please fix this error?

Use of the term "word embedding" is surprising

Hi @MartinSchweinberger, in looking through some of the text analysis tutorials, you seem to use "word embedding" to denote the frequencies of a word's contexts. This is a bit jarring to me, where I understand an embedding as a representation in finite, real-valued vector space of a (usually) discrete entity, such as a word. The intention is that distance in the vector space should have meaningful correspondence in the entity space. For word embeddings, this may directly encode the frequencies of its contexts, or may encode a compressed version of that sparse embedding, expressing Harris's distributional hypothesis. Increasingly, however, word embeddings are not simple decompositions of the sparse frequency space, but are discriminatively trained to predict the word's contexts, and may be conditioned on other contextual information to handle polysemy.

It would be nice if coll.Rmd didn't consume 10GiB of memory

Can we mix that up the workflow a bit to use less memory?
This is important for two reasons:

  1. We shouldn't assume that readers have access to a machine with that memory (apart from colab most online services are going to have much tighter memory limits, and 8GiB of ram is a common spec for a low end laptop).
  2. This exceeds the limit of github actions on github hosted runners (7GiB), limiting our options or adding extra work if we start to automate more of the build in future.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.