Giter Club home page Giter Club logo

Comments (4)

astrite avatar astrite commented on August 25, 2024

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst flags
a document relevant to a case, it can be moved into the supporting evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman [email protected]:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott


Reply to this email directly or view it on GitHubhttps://github.com//issues/74
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]
Mobile: 301.514.1384

from absolute-pin.

sschneiderman avatar sschneiderman commented on August 25, 2024

Can you provide training on Thursday on how Tag Weighting would be applied to reduce false positives on similar names (John Smith the target versus John Smith the innocent bystander)? I understand the principle but not the implementation.
Thanks.

From: Andrew [mailto:[email protected]]
Sent: Wednesday, April 24, 2013 12:24 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst flags
a document relevant to a case, it can be moved into the supporting evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman <[email protected]mailto:[email protected]>wrote:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott


Reply to this email directly or view it on GitHubhttps://github.com//issues/74
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]:[email protected]
Mobile: 301.514.1384


Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16945286.

from absolute-pin.

astrite avatar astrite commented on August 25, 2024

That's a slightly different issue. Tag weighting is appropriate for
inflating the score of a particular kind of document (eg all those from CNN
or Databot) which will ensure that certain kinds of documents show up
before others.

"False positives" like the one you describe are better solved using
alternative query strategies and query qualifiers, and to a lesser extent
aliasing. Selecting documents that match the correct John Smith and
finding associated entities will give you additional query parameters.
These terms, if included in the query for John Smith, should push the
relevant documents up to the top.

eg John Smith AND ( Company A OR Company B OR Associate A OR Associate B)

Alternately, if you have a scenario where you have John Smith (incorrect
person) and John B. Smith (correct person), you can either discard one of
the entities so it not longer displays or run queries like:

eg (John B. Smith OR "John Smith") NOT John Smith.

A certain amount experimentation is probably required to develop an
effective query.

As an aside, John Smith (the accountant) vs. John Smith (the priest) isn't
a true false positive. In both cases, a query for John Smith should bring
back matches with "John Smith" (of whatever entity type you define) back.
A false positive would be if documents were getting labeled with John
Smith when they are not actually about that entity. This is more the
situation where an advertisement might flag a document to be about a
company, but it is not actually in the text.

On Wed, Apr 24, 2013 at 12:30 PM, sschneiderman [email protected]:

Can you provide training on Thursday on how Tag Weighting would be applied
to reduce false positives on similar names (John Smith the target versus
John Smith the innocent bystander)? I understand the principle but not the
implementation.
Thanks.

From: Andrew [mailto:[email protected]]
Sent: Wednesday, April 24, 2013 12:24 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor
you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd
put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst
flags
a document relevant to a case, it can be moved into the supporting
evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains
granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman <[email protected]
mailto:[email protected]>wrote:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both
Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott


Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74>
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]:[email protected]
Mobile: 301.514.1384


Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74#issuecomment-16945286>.


Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16945722
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]
Mobile: 301.514.1384

from absolute-pin.

sschneiderman avatar sschneiderman commented on August 25, 2024

Understood. Lets discuss again Thursday.

From: Andrew [mailto:[email protected]]
Sent: Wednesday, April 24, 2013 12:49 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's a slightly different issue. Tag weighting is appropriate for
inflating the score of a particular kind of document (eg all those from CNN
or Databot) which will ensure that certain kinds of documents show up
before others.

"False positives" like the one you describe are better solved using
alternative query strategies and query qualifiers, and to a lesser extent
aliasing. Selecting documents that match the correct John Smith and
finding associated entities will give you additional query parameters.
These terms, if included in the query for John Smith, should push the
relevant documents up to the top.

eg John Smith AND ( Company A OR Company B OR Associate A OR Associate B)

Alternately, if you have a scenario where you have John Smith (incorrect
person) and John B. Smith (correct person), you can either discard one of
the entities so it not longer displays or run queries like:

eg (John B. Smith OR "John Smith") NOT John Smith.

A certain amount experimentation is probably required to develop an
effective query.

As an aside, John Smith (the accountant) vs. John Smith (the priest) isn't
a true false positive. In both cases, a query for John Smith should bring
back matches with "John Smith" (of whatever entity type you define) back.
A false positive would be if documents were getting labeled with John
Smith when they are not actually about that entity. This is more the
situation where an advertisement might flag a document to be about a
company, but it is not actually in the text.

On Wed, Apr 24, 2013 at 12:30 PM, sschneiderman <[email protected]mailto:[email protected]>wrote:

Can you provide training on Thursday on how Tag Weighting would be applied
to reduce false positives on similar names (John Smith the target versus
John Smith the innocent bystander)? I understand the principle but not the
implementation.
Thanks.

From: Andrew [mailto:[email protected]]
Sent: Wednesday, April 24, 2013 12:24 PM
To: IKANOW/Absolute-Pin
Cc: Scott Schneiderman
Subject: Re: [Absolute-Pin] Signals/Noise Issue (#74)

That's partially implemented currently via Tag Weighting. When a user
creates a source, they can set a number of user-defined tags. These tags
are transmitted to each document coming across that particular harvest. If
you provide a unique tag to each source, you can then define weights to
apply to query scoring on the Advanced Options pane. The format "Tag1":
number, "Tag2": number, etc... where the number is the weighting factor
you
want on the score. So for an RSS feed of CNN sources, you can tag it with
"CNN" and then if you want all CNN documents to get weighted x 2, you'd
put
"CNN": 2 in the tag weighting. When you run a query, documents then will
be assigned an overall score based on how well the document matches the
query terms and then that will be weighted further by geo / time / tag
weighting parameters that exist. Note that in the current implementation,
you can update a source's tags, but this will only impact new documents -
it's not retroactive. There's an open issue to alter this functionality to
be retroactive, but we do not have an ETA at this time as to when it might
be worked into an upcoming build.

From a functional perspective sense, the case management layer would also
partially resolve the issue you're describing because once an analyst
flags
a document relevant to a case, it can be moved into the supporting
evidence
folder. At that level then, you'll only be working with documents deemed
relevant by an analyst and the analysis / collection layer retains
granular
query-specific relevance.

On Wed, Apr 24, 2013 at 11:42 AM, sschneiderman <[email protected]
mailto:[email protected]%20%0b> mailto:[email protected]>wrote:

Andrew, We previously discussed methods for promoting or demoting source
documents based on analyst judgment. This was an interest of both
Aveshka
and CGS. Pls advise if there is any follow up on how this might work.
Thanks,
Scott


Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74>
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]:[email protected]mailto:[email protected]%3cmailto:[email protected]
Mobile: 301.514.1384


Reply to this email directly or view it on GitHub<
https://github.com/IKANOW/Absolute-Pin/issues/74#issuecomment-16945286>.


Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16945722
.

Andrew Strite
Intelligence Solutions Architect | IKANOW http://www.ikanow.com
Email: [email protected]:[email protected]
Mobile: 301.514.1384


Reply to this email directly or view it on GitHubhttps://github.com//issues/74#issuecomment-16946849.

from absolute-pin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.