Giter Club home page Giter Club logo

deepflow-analysis's Introduction

👋 Hi! I'm a researcher at the Meertens Instituut in Amsterdam. My research focuses on why some cultural phenomena are adopted and persist through time, while others change or disappear (why, for example, did Python's f-string syntax take over from the str.format syntax? 😉). Additionally, I'm interested in measuring cultural diversity and compositional complexity, and how we can account for biases in our estimations of diversity. (check out our recent publication on estimating the amount of lost books in Science).

🔭 I mostly program in Python, and I also like to teach a bit about programming. Together with Mike Kestemont and Allen Riddell, I recently published a 📖 with Princeton University Press about using Python for Humanities data analysis. An open access version of the book can be found at https://www.humanitiesdataanalysis.org

📫 You can check my website for more information about my academic work, follow me on Twitter or Mastodon.

deepflow-analysis's People

Contributors

emanjavacas avatar fbkarsdorp avatar mikekestemont avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deepflow-analysis's Issues

Open goals in the paper

I am adding the number of goals we set ourselves last meeting. Feel free to open new issues to discuss more in detail the goal. I am not sure who is in charge of the remaining issues, feel free to add.

  • break down features [E]
  • motivate, group, explain and refer to literature of all features [E]
  • explain data collection preprocessing [F]
  • related research section [F/E/M]
  • move away from turing test as background framing (since it's not clearly a turing test) [M]
  • look into cases that killed people [F/M]

Literature list

Other comments reviewer 1

Other comments:

  • p. 1: "Increasingly, people interact with a variety of artificial agents, often even without being fully aware of whether or not their conversation partners are in fact human." ==> are you sure people are often not aware of this? This seems like an overstatement.

  • p. 4: Why did you decide to allow the same amount of time for A- and B-runs? Why not allow self-paced reading in both conditions?

  • p. 5: "encompassing the main body of English Hip-Hip music produced and consumed in the United States of America. " ==> ..English-language Hip-Hop.. Also: I think the "consumed in the United States of America" part could be removed, as this is now a global genre.

  • p. 5: since PLOS is a general scientific journal, it would be good to briefly spell out what LSTMs and Transformers are (with references and/or pointers to later sections of the paper)

  • p. 5: "we translated all unique words into a" ==> this slightly confused me; do you mean all words that occur only once (the hapax legomena), or all words that occur (all types)? Initially I assumed the former, but surely it must be the latter.

  • p. 6: "LSTMs have been shown to excel at Language Modeling [31] and we therefore resort to it" ==> .. resort to them

  • p. 6: "(i.e. in the present corpus from 89337 syllables to 172 characters)" ==> 172 characters is more than one might expect, so perhaps briefly explain where this number comes from

  • p. 6: "The reasoning is twofold: (i) noisy data.." ==> here, also, it might be worthwhile to say something about the possibly noisy input

  • p. 7: "extracting single word-level distributional feature vector." ==> .. vectors

  • p. 7: "One possibility to accomplish it is to initialize" ==> .. accomplish this..

  • p. 7: "Our model, however less general since it assumes.." ==> Our model, however, is less general (..) yet still achieves...

  • p. 8: "we fine-tune the on a model-per-model basis" ==> something wrong here

  • p. 8: "by manually inspection of the model output at different temperature values" ==> ..manual inspection.. More importantly: can you say a bit more about how this manual inspection was done?

  • p. 8: "Following the template, we generate as many sentences.." ==> it might be worth pointing out that templates are also often used in NLG (although arguably in a somewhat different way). See e.g., Deemter, K. van et al. (2005). Real versus template-based natural language generation: A false opposition?. Computational Linguistics, 31(1), 15-24.

  • p. 8: "where $\mu$ was selected per model through an inspection of random samples" ==> please briefly say how this was done

  • p. 9, caption table 5: "PC words have been deliberately masked" ==> I assume this should be Non-PC words, right? And what about motherfcking and sht? Also: what is a W model?

  • p. 11: "participants performed significantly words on" ==> ..significantly worse..

  • p. 11: "As can be observed from the marginal effects plot in Fig 2a, the learning effect is present in both question types and it is most strongly pronounced at the beginning of the game, after which it diminishes." ==> "most strong pronounced" is a pleonasm ("most strong" or "most pronounced"). More importantly: could this suggest that people start to pick up cues of neurally generated text (see above)?

  • p. 12: I would suggest removing \mu = and \sigma = and just report means and SDs, as 0.045 (0.057) (i.e., M(SD)), which is much more common.

  • p. 16: "At the same time, Hip-Hop lyrics very often do not develop longer stretches of thematically coherent narrative, ..." ==> I beg to differ. Do you have any evidence for this claim? If not, it would be good to phrase this a bit more cautiously.

  • p. 16: "This effect might also be reduced when longer fragments are admitted." ==> I agree, and think this would be a very interesting question for follow-up research. Maybe make this explicit?

  • p. 17ff: the references are not fully consistent in how they cite pages and dates. The Turing reference stands out because it is all caps. Would be good to make this consistent.

The issue with the expert scoring system;

  • Perhaps most importantly, I was not convinced by the expert analysis (p. 15/16). “All players with a score higher than 10 are considered experts (n = 135).” Isn't it likely that a sizeable number of these experts just happened to guess correctly? The scoring system is not entirely clear from the paper, but given the large number of participants, a many “experts” could just have been lucky. My suggestion would be to remove this part of the analysis, or consider setting-up a "real" expert analysis.

The issue with the scope of the findings

  • In the general discussion, I feel the findings could be positioned somewhat more broadly. What do we learn from this study beyond the generation of rap lyrics? The authors very briefly touch upon this, but the discussion remains at a somewhat high level. Could these models also work for the generation of other forms of poetry? There has been some work on this, for example, on the generation of haikus. I am not sure, but don't think it is true that computer-generated haikus are difficult to distinguish from human authored haikus, at least until recently (and note that these are very short as well, like the rap fragments studied in this paper). Also: do you think there are practical benefits for computer-generated rap lyrics? And, if so, should rappers acknowledge using tools like these for their lyrics? (I am sure the authors are aware of the controversy surrounding rappers who allegedly don't write their own lyrics, like Drake or Dr. Dre.)

The issue with using semantic coherence measures

Judging a snippet of several lines of lyrics seems like a very hard task if there is no context. I can imagine the original position within the song of the snippet is very important for the rate of success for classifying it as real. A snippet from the beginning of a verse might for instance seem more coherent than a snippet from the middle. It seems to me that semantic coherence is the main driver for succesful classification. In fig 1b for instance you would not make your choice based on linguistic features but rather on coherence of the text. I would like to see more analyis on semantic coherence and which role it plays.

The issue of Hip-Hop lyrics vs rap lyrics

  • The next point is a detail, but might be worth clarifying nevertheless: why do you speak of Hip-Hop lyrics and not rap lyrics? I guess the latter term would be more appropriate, since Hip-Hop, at least traditionally, refers to a culture, which obviously includes rapping, but also, for example, graffiti and breakdancing.

The issue with evaluation task set up

Another issue is that we don't really get an idea how well this particular architecture works. Adding a baseline model that uses a very simple language model could tell us something on where to place the quality of outputs of these more sophisticated models. This might be something for the discussion.

The issue withe training corpus

  • About the training materials: can you say a little about how the OHHLA corpus was collected and whether there is any quality control of the lyrics? Why didn't you use rap.genius? It would also be good, in this context, to refer to Bradley & DuBois' (2010) Anthology of Rap, Yale University Press, since it is the first ‘serious’ anthology of rap lyrics. At the time, this anthology got quite a bit of criticism about the quality of the transliterations, which are indeed notoriously difficult for rap lyrics. How does the possible noisiness of the data influence the models?

Remaining issues

  • Table "Text generation model details." needs to be mentioned and referred to in the text.
  • Table "Examples of generated samples." needs to be mentioned and referred to in the text.

The issue with linguistic features

  • The analysis in terms of linguistic features is very interesting. However, most features are motivated from characteristic properties of rap lyrics. The exceptions are lexical diversity and word repetition. Interestingly, there have been earlier analyses of lexical diversity of rappers, showing that there are huge differences between them (see, for example, here: https://pudding.cool/projects/vocabulary/index.html). In a somewhat similar vein, repetition also occurs a lot in ‘real’ rap lyrics (Travis Scott ft Young Thug -- Yeah Yeah is just one example that springs to mind). It would be good to update the discussion of features accordingly. And what about other features that might be typical of neurally generated text, such as agreement mistakes and lack of global coherence? Would it be possible to integrate those?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.