Comments (2)
- write a script that converts t&w's
.tsv
output to prodigy's json-lines format - use the
db-in
recipe to load all of the data into a database - use
prodigy train
to auto-infer the suggestion function for the spancat and test out training one - try out
train-curve
to see how the model responds to more or less data - add functions to the streamlit app to run the model on arbitrary input and display predictions with the span visualizer
from jdsw.
Jeff & Hantao's span categories include very granular info on some kinds of content, but skip over other kinds we're interested in:
TAG_MAP = {
"E": "E", # headword
"B": "T", # book title
"BC": "C", # commentary on book title
"F": "F", # fanqie
"T": "T", # poem title
"J": "T", # juan number
"C": "C", # commentary on headword
"CF": "F", # fanqie reading for char in commentary
"CC": "C", # commentary on commentary
"S": "T", # section title
"SC": "C", # commentary on section title
"SF": "F", # fanqie reading for char in section title
"SS": "T", # sub-section title
"SSC": "C", # commentary on sub-section title
"SSF": "F", # fanqie reading for char in sub-section title
}
I determined that training a model based on this data doesn't really fit our research question. We can already identify fanqie without a model, and that's mostly what this data does, so there doesn't seem to be much point in pursuing it (except perhaps later to aid in detecting which of the characters in the headword is being annotated).
from jdsw.
Related Issues (20)
- update README
- implement pipeline pattern for data transformations HOT 1
- generate CoNLL-U base versions of all texts
- add logging HOT 1
- fix missing pages in SBCK edition of the JDSW
- add visualization HOT 1
- run topic modeling algorithm on annotations HOT 1
- parse annotations using a model HOT 2
- restructure as spaCy project
- find named entities in annotations HOT 2
- use SuPaR-Kanbun as the base model
- check to see whether NER patterns occur in annotation corpus HOT 1
- rearrange POS tags in priority order HOT 1
- add a streamlit interface for testing named entity predictions HOT 1
- add a streamlit interface for testing span categorization
- Add project task to export annotations
- Separate relation and span annotation
- Detect and label restatements of the headword HOT 1
- Add an algorithm for inferring relations between spans HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jdsw.