Topic: corpus-data Goto Github
Some thing interesting about corpus-data
Some thing interesting about corpus-data
corpus-data,爬取bilibili视频下的评论,最新出品!!!⚠本代码只适用于学习,做其他事情概不负责!!!
User: 1837669410
corpus-data,open source corpora created, annotated or maintained by the ACoLi group at University of Augsburg, Germany.
Organization: acoli-repo
corpus-data,:globe_with_meridians: ANT Corpus website repository.
Organization: antcorpus
Home Page: https://antcorpus.github.io/
corpus-data,文本去重
User: aplmikex
corpus-data,Filipino wordlist word-level
User: austinzuniga
corpus-data,粵文語料篩選器 Cantonese text filter
Organization: canclid
Home Page: https://pypi.org/project/canto-filter/
corpus-data,My public domain speech index
User: carlfm01
corpus-data,An annotated corpus of discussion forum threads from Massive Open Online Courses.
User: cmkumar87
corpus-data,DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)
Organization: dataset-vn
corpus-data,This repository contains the DFKI Product Corpus, a dataset of 174 documents annotated for product and company named entities, and the relation CompanyProvidesProduct.
Organization: dfki-nlp
corpus-data,Data from a corpus of written Hawaiian
User: dohliam
Home Page: https://dohliam.github.io/corpus/haw/
corpus-data,MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
User: esbatmop
corpus-data,Tunisian Sentiment Analysis Corpus.
User: fbougares
corpus-data,Kumpulan dokumen korpus dalam bahasa Indonesia berisi kasus uji deteksi plagiarisme eksternal dengan standar PAN CLEF (http://www.uni-weimar.de/medien/webis/events/pan-11).
User: felikjunvianto
corpus-data,Clean corpus generic script made with tm package
User: filipefilardi
corpus-data,A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
User: gkiril
corpus-data,UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Organization: grammarly
Home Page: https://ua-gec-dataset.grammarly.ai/
corpus-data,chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
User: guhhhhaa
corpus-data,chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
User: guhhhhaa
corpus-data,A Public Corpus for Machine Learning
User: hailiang-wang
Home Page: http://bbs.egret.com/forum-94-1.html
corpus-data,golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers
Organization: ibnmalik
Home Page: http://arabicstemmer.com/
corpus-data,CCNC: A Comprehensive Chinese Name Corpus (3.65M name samples). 大型中文姓名语料库 (内含365万姓名语例)。
User: jaaack-wang
corpus-data,A parser for annotated MuseScore 3 files.
User: johentsch
Home Page: https://ms3.readthedocs.io
corpus-data,:bookmark_tabs: Galician corpus for misogyny detection
User: luciamariaalvarezcrespo
Home Page: https://aclanthology.org/2024.propor-1.3/
corpus-data,Scraper
Organization: magizbox
corpus-data,simple bs4 based web crawl for a corpus in need of statistical machine translation
User: marspanther
corpus-data,datasets with text data for use in NLP, Text analysis, information extraction, ML research.
Organization: maxent-ai
corpus-data,Arabic hate speech data
User: motazsaad
corpus-data,Utilities for Processing the HCRC Map Task Corpus
User: nathanduran
corpus-data,Utilities for Processing the Meeting Recorder Dialogue Act Corpus
User: nathanduran
corpus-data,Utilities for Processing the Switchboard Dialogue Act Corpus
User: nathanduran
corpus-data,Repository dedicated to a collection of resources and helping material for Urdu language Processing related tasks
Organization: pakurdu-research-center
corpus-data,ChatGPT 中文语料库 对话语料 小说语料 客服语料 用于训练大模型
User: plexpt
Home Page: https://chat.aimakex.com/
corpus-data,GermaParl: Corpus of Plenary Protocols of the German Bundestag (TEI Format)
Organization: polmine
corpus-data,Thai News Dataset from Thai government website.
Organization: pythainlp
corpus-data,This is a dataset consisting of all song lyric words found on all of Taylor Swift's studio albums (up to and including TTPD), as well as a selection of other songs written by her.
User: sagesolar
corpus-data,汉语现代诗歌语料库整理,3489诗人,81.7K诗歌,15.43M字。持续扩充...
User: sheepzh
Home Page: https://www.chinese-poetry.org
corpus-data,:books:中文突发事件语料库(Chinese Emergency Corpus)-上海大学-语义智能实验室
User: shijiebei2009
corpus-data,:books:中文环境突发事件语料库(Chinese Environment Emergency Corpus)-上海大学-语义智能实验室
User: shijiebei2009
corpus-data,Use Bi-LSTM neural network to classify Chinese text sentiment, including eight categories (like, disgust, happiness, sadness, anger, surprise, fear, none)
User: spianmo
corpus-data,Public Domain Words and Texts for Conlangs
Organization: termsurf
corpus-data,Python API for loading language data from American-English CHILDES database
Organization: uiuclearninglanguagelab
corpus-data,Reading the data from OPIEC - an Open Information Extraction corpus
Organization: uma-pi1
Home Page: https://www.uni-mannheim.de/dws/research/resources/opiec/
corpus-data,Vietnamese Wikipedia Corpus
Organization: undertheseanlp
corpus-data,An Interactive Tool for Annotating Discourse Structure and Text Improvement
User: wiragotama
corpus-data,Korean ASR Corpus generated from TEDx talks
User: yc9701
corpus-data,Biomedical NLP Corpus or Datasets.
User: zonghui0228
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.