Comments (3)
ya, really cool idea.
Agree that tags are limited as data, and have taken us really far - (probably too far).
I also like the idea for storing captured metadata, like date metadata, within reach of compromise somehow.
Imagine if we could do something in match queries with the json like:
let doc=nlp('paul, john lennon and ringo starr')
doc.match('ringo starr').payload({roles:['drummer', 'singer'], hair:'long'})
//then later...
doc.match('and {roles:'drummer'}') //or something
Been stuck, forever, on this same dilemma - where to store information about groups of words.
The good news is that they are just javascript objects, and we can stick stuff anywhere.
View
objects are transient. Every method returns a new one, and would need to marshal any data payload around, with every interaction. Old views would have stale payloads. I don't think it's the right place for this.
Putting paylods in Term
objects would also be the wrong place - 'ringo' and 'star' would need dangled or duped data between them.
Open to it, just haven't got it clear yet.
from compromise.
Just throwing some ideas around in case they offer any inspiration... far from a solution...!
What if there was some new layer like compromise/four
with a method like .commit()
that could commit a View
and store it separately in the document.
const someObj = {} // my payload
const view = nlp('See you next September').match('next #Month').commit()
view.payload(someObj)
.commit()
could hash the Term.IDs
to generate a deterministic ID for the View
on .commit()
. This would ensure that a committed View
can be later updated with new data if needed.
doc: {
commits: {
"somehash1": {
terms: [] // list of Terms
payload: {} // the payload data
}
}
}
This would allow for Terms to hold different data in different contexts. For example a match of next #Month
versus #Month
could both attach data to the Term
September, but independently. A user could then:
const payload1 = { a: 1 }
const payload2 = { a: 2 }
const doc = nlp('See you next September')
doc.match('next #Month').commit().payload(payload1)
doc.match('#Month').commit().payload(payload2)
// ... later in the app
doc.match('next #Month').payload() // Generate checksum for this match and use it to lookup payload1 data from the commit
doc.match('#Month').payload() // Generate checksum for this match and use it to lookup payload2 data from the commit
The data could also be output by the .json()
function:
doc.match('next #Month').json()
[
{
"text": "next september",
"terms": [
{
"text": "next",
"pre": "",
"post": " ",
"tags": [
"Adjective"
],
"normal": "next",
"index": [
0,
2
],
"id": "next|00700002C",
"dirty": true,
"chunk": "Noun"
},
{
"text": "september",
"pre": "",
"post": "",
"tags": [
"Date",
"Noun",
"Month"
],
"normal": "september",
"index": [
0,
3
],
"id": "september|00800003V",
"chunk": "Noun",
"dirty": true
}
],
payload: {} ***** MY PAYLOAD *****
}
]
I think, but am not sure, that this might also support your (excellent!) suggestion of a new match syntax based on payloads:
doc.match('and {roles:'drummer'}') //or something
The matcher could simply know that when it sees {roles:'drummer'}
that it has to go and find all committed views that have that data, return their term IDs and use those to complete the match like and ringo|00012ABC starr|0A11A00B
from compromise.
check out the compromise-payload plugin
⚡
from compromise.
Related Issues (20)
- Apostrophe "s" disambiguation issue with search query style sentences HOT 7
- Query: Does Compromise.js compile RegExes from match-syntax? HOT 1
- Get .terms() but keep hyphenated strings (similar to .hyphenated() ) HOT 1
- Using .freeze() in nlp.plugin()? HOT 11
- JSON Speed HOT 2
- Tagging mixed number as #Value HOT 5
- Feature request: Logical operations in match HOT 2
- [Issue]: Various common nouns tagged as proper noun. HOT 6
- True Casing HOT 10
- [Improvements]: Add .toLowerCase() API to various functions. HOT 1
- [Issue]: Gov Rule & Possible Other's Needs Improved. HOT 5
- [Issue]: "My favorite time of the year" in .nouns() response HOT 3
- `.prepend()` removes frozen tags for acronyms HOT 2
- Improve TypeScript DX by reducing usage of "any" HOT 1
- NFD form combining characters not picked up as part of word HOT 3
- Feature: .slashes() tokenize transform HOT 6
- Geedy tag matching and punctuation HOT 2
- [Feature Request]: Flesch–Kincaid Function HOT 6
- "to" is a preposition and not a conjuction HOT 1
- Verb is mistakenly parsed as a noun. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from compromise.