bookscribs-io Goto Github PK

repos: 54.0 gists: 0.0

Name: Redwing Brands

Type: Organization

Bio: AI, Media, and Literary Technology Research

Location: Boston

Blog: allenredwing.com

Redwing Brands's Projects

bert-extractive-summarizer

Easy to use extractive text summarization with BERT

book-dataset

This dataset contains 207,572 books from the Amazon.com, Inc. marketplace.

The goal of this project is to build a recommendation engine that aims to help users find books which might be interesting for them based on their summaries. We'll do this by applying Latent Dirichlet Allocation - LDA algorithm.

bookcorpus

Crawl BookCorpus

booknlp

BookNLP, a natural language processing pipeline for books

booksum

c4_200m-synthetic-dataset-for-grammatical-error-correction

This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)

chapterbreak

clic

Source code for the CLiC web application

clipshots

ClipShots is the first large-scale dataset for shot boundary detection collected from Youtube and Weibo covering more than 20 categories, including sports, TV shows, animals, etc.

condensedmovies

Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset.

cpm-generate

Chinese Pre-Trained Language Models (CPM-LM) Version-I

cs224w-project

csi-corpus

annotated screenplays for 39 CSI:Crime Scene Investigation episodes for paper "Whodunnit? Crime Drama as a Case for Natural Language Understanding"

ctrl

Conditional Transformer Language Model for Controllable Generation

data

Interesting datasets for personal projects or submissions to #TidyTuesday

dialogpt

Large-scale pretraining for dialogue

doccano

Open source text annotation tool for machine learning practitioner.

faceval

EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization

genre-based-story-generator

A web application that generates stories based on genres. Created by fine-tuning GPT2 on genre-based stories.

github-typo-corpus

GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors

google_books_crawler

Python crawler for getting books' metadata from the Google Books API using asyncio and aiohttp

gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"

gpt-2-simple

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gutenberg-dialog

Build a dialog dataset from online books in many languages

jslda

An implementation of latent Dirichlet allocation in javascript

litcliches

Code for the paper "Cliche expressions in literary and genre novels"

mica-riskybehavior-identification

Code companion to Joint Estimation and Analysis of Risk Behavior Ratings in Movie Scripts

bookscribs-io Goto Github PK

Redwing Brands's Projects

Recommend Projects

Recommend Topics

Recommend Org