Signals/Noise Issue,about ikanow/absolute-pin

astrite commented on August 25, 2024

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst flags
a document relevant to a case, it can be moved into the supporting evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman [email protected]:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]
Mobile: 301.514.1384

from absolute-pin.

sschneiderman commented on August 25, 2024

Can you provide training on Thursday on how Tag Weighting would be applied to reduce false positives on similar names (John Smith the target versus John Smith the innocent bystander)? I understand the principle but not the implementation.
Thanks.

From: Andrew [mailto:[email protected]]
Sent: Wednesday, April 24, 2013 12:24 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst flags
a document relevant to a case, it can be moved into the supporting evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman <[email protected]mailto:[email protected]>wrote:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]:[email protected]
Mobile: 301.514.1384

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16945286.

from absolute-pin.

astrite commented on August 25, 2024

That's a slightly different issue. Tag weighting is appropriate for
inflating the score of a particular kind of document (eg all those from CNN
or Databot) which will ensure that certain kinds of documents show up
before others.

"False positives" like the one you describe are better solved using
alternative query strategies and query qualifiers, and to a lesser extent
aliasing. Selecting documents that match the correct John Smith and
finding associated entities will give you additional query parameters.
These terms, if included in the query for John Smith, should push the
relevant documents up to the top.

eg John Smith AND ( Company A OR Company B OR Associate A OR Associate B)

Alternately, if you have a scenario where you have John Smith (incorrect
person) and John B. Smith (correct person), you can either discard one of
the entities so it not longer displays or run queries like:

eg (John B. Smith OR "John Smith") NOT John Smith.

A certain amount experimentation is probably required to develop an
effective query.

As an aside, John Smith (the accountant) vs. John Smith (the priest) isn't
a true false positive. In both cases, a query for John Smith should bring
back matches with "John Smith" (of whatever entity type you define) back.
A false positive would be if documents were getting labeled with John
Smith when they are not actually about that entity. This is more the
situation where an advertisement might flag a document to be about a
company, but it is not actually in the text.

On Wed, Apr 24, 2013 at 12:30 PM, sschneiderman [email protected]:

Can you provide training on Thursday on how Tag Weighting would be applied
to reduce false positives on similar names (John Smith the target versus
John Smith the innocent bystander)? I understand the principle but not the
implementation.
Thanks.

From: Andrew [mailto:[email protected]]
Sent: Wednesday, April 24, 2013 12:24 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor
you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd
put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst
flags
a document relevant to a case, it can be moved into the supporting
evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains
granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman <[email protected]
mailto:[email protected]>wrote:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both
Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott

—
Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74>
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]:[email protected]
Mobile: 301.514.1384

—
Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74#issuecomment-16945286>.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16945722
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]
Mobile: 301.514.1384

from absolute-pin.

sschneiderman commented on August 25, 2024

Understood. Lets discuss again Thursday.

From: Andrew [mailto:[email protected]]
Sent: Wednesday, April 24, 2013 12:49 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's a slightly different issue. Tag weighting is appropriate for
inflating the score of a particular kind of document (eg all those from CNN
or Databot) which will ensure that certain kinds of documents show up
before others.

"False positives" like the one you describe are better solved using
alternative query strategies and query qualifiers, and to a lesser extent
aliasing. Selecting documents that match the correct John Smith and
finding associated entities will give you additional query parameters.
These terms, if included in the query for John Smith, should push the
relevant documents up to the top.

eg John Smith AND ( Company A OR Company B OR Associate A OR Associate B)

Alternately, if you have a scenario where you have John Smith (incorrect
person) and John B. Smith (correct person), you can either discard one of
the entities so it not longer displays or run queries like:

eg (John B. Smith OR "John Smith") NOT John Smith.

A certain amount experimentation is probably required to develop an
effective query.

As an aside, John Smith (the accountant) vs. John Smith (the priest) isn't
a true false positive. In both cases, a query for John Smith should bring
back matches with "John Smith" (of whatever entity type you define) back.
A false positive would be if documents were getting labeled with John
Smith when they are not actually about that entity. This is more the
situation where an advertisement might flag a document to be about a
company, but it is not actually in the text.

On Wed, Apr 24, 2013 at 12:30 PM, sschneiderman <[email protected]mailto:[email protected]>wrote:

Can you provide training on Thursday on how Tag Weighting would be applied
to reduce false positives on similar names (John Smith the target versus
John Smith the innocent bystander)? I understand the principle but not the
implementation.
Thanks.

From: Andrew [mailto:[email protected]]
Sent: Wednesday, April 24, 2013 12:24 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor
you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd
put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst
flags
a document relevant to a case, it can be moved into the supporting
evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains
granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman <[email protected]
mailto:[email protected]%20%0b> mailto:[email protected]>wrote:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both
Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott

—
Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74>
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]:[email protected]mailto:[email protected]%3cmailto:[email protected]
Mobile: 301.514.1384

—
Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74#issuecomment-16945286>.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16945722
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]:[email protected]
Mobile: 301.514.1384

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16946849.

from absolute-pin.

Signals/Noise Issue about absolute-pin HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent