Giter Club home page Giter Club logo

scientific-writing-strategies's Introduction

Writing Strategies for Science Communication: Data and Computational Analysis

Purpose

This repo holds code and data for the publication:

Tal August, Lauren Kim, Katharina Reinecke, and Noah Smith ''Writing Strategies for Science Communication: Data and Computational Analysis'', Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020.

Prerequisites

Begin by creating a new virtual environment and installing required packages (here using conda)

conda env create -f environment.yml

And activate it

conda activate sci_articles

Data

The urls of all scraped articles are in data/cleaned_article_urls.csv and all annotations made on these articles are in data/annotations.csv

The code for the scraper (including the parameters used to run the scrapy spider) are in the scraper/ directory

Getting the data

If you want the full text for each article, you can run the scraper with the parameters of the original paper. In the scraper/ directory, run python runSpiders.py all <FILEPATH>.jsonl for wherever you want to store the files.

Note that this might take a few days to scrape all the data.

Models

We use the RoBERTa models from the Huggingface transformers library, pretraining and finetuning details are in the paper. Many of the scripts we use are adapted from the Huggingface example scripts, please refer to these for more details, specifically the scripts in examples/text-classification/.

If you are interested in getting access to the final finetuned models for classifying the writing strategies, feel free to reach out!

Annotation Interface

For the annotation interface, see the repo: https://github.com/talaugust/scientific_article_annotation/.

scientific-writing-strategies's People

Contributors

talaugust avatar

Stargazers

胡伊然 avatar Zhenyao Cai avatar Jeff Hammerbacher avatar Dustin Wright avatar Kaito Sugimoto avatar LM avatar Xiaohu Zhu avatar Takumi Ito avatar Lei Li avatar Andrew Head avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.