Comments (3)
cool, readings_exceptions
or readings_archaic
or something makes sense to me.
from jdsw.
Right now this is controlled by:
Lines 70 to 71 in 50b6e6f
where readings_for()
just checks our SBGY file, so I could easily have it output a warning when rejecting things.
Do you think it's appropriate in this case to just add the readings to your GDR-SBGY-full.csv
? If not, probably not too difficult to expand the Reconstruction
class in lib/phonology.py to allow augmenting the sound table with more information after it's constructed.
from jdsw.
Thanks for pointing to the right line of code for this. Yeah, adding such a warning output would be great so I can analyze where things clash between the SBGY and the JDSW.
As for whether or not to append these readings to GDR-SBGY-full.csv
-- I think better not, as I think that data should be left as is. The example I provided above should not occur in the vast majority of medieval texts, nor in "regular" Han texts, but only in that domain of texts that is decidedly archaic (shangshu, maoshi, yijing etc.). I'd estimate that the same will be true for most, if not all, "correct" readings that are omitted in the SBGY.
I'd hence say those are clear exceptions; perhaps a separate readings_exceptions.csv
or so might be a better place to store those?
from jdsw.
Related Issues (20)
- update README
- implement pipeline pattern for data transformations HOT 1
- generate CoNLL-U base versions of all texts
- add logging HOT 1
- fix missing pages in SBCK edition of the JDSW
- add visualization HOT 1
- run topic modeling algorithm on annotations HOT 1
- parse annotations using a model HOT 2
- restructure as spaCy project
- find named entities in annotations HOT 2
- use SuPaR-Kanbun as the base model
- check to see whether NER patterns occur in annotation corpus HOT 1
- rearrange POS tags in priority order HOT 1
- train a span categorizer on jeff & hantao's data HOT 2
- add a streamlit interface for testing named entity predictions HOT 1
- add a streamlit interface for testing span categorization
- Add project task to export annotations
- Separate relation and span annotation
- Detect and label restatements of the headword HOT 1
- Add an algorithm for inferring relations between spans HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jdsw.