Giter Club home page Giter Club logo

Comments (2)

rstrahan avatar rstrahan commented on June 4, 2024

Using a nested datatype for questions seems to be the way to go. With this approach, the terms are matched separately against each question.

Mapping for nested questions:

curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index" -d '{
  "mappings": {
    "qna": {
                   "properties":{
                        "qid":{"type":"keyword"},
                        "question":{
                            "type":"nested"
                            },
                        "a":{
                            "type":"text",
                            "analyzer":"english"
                        },
                        "r":{"properties":{
                            "attachmentLinkUrl":{"type":"keyword"},
                            "buttons":{"properties":{
                                "text":{"type":"text"},
                                "value":{"type":"keyword"}
                            }},
                            "imageUrl":{"type":"keyword"},
                            "subTitle":{"type":"text"},
                            "title":{"type":"text"}
                        }}
                    }
                }
            }
}'

Test Data:

curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index/qna/test.001" -d '{
         "question": [
            {"q":"tell me about snorkeling"}
         ],
         "a": "Snorkeling is cool!",
         "r": {
            "title": "",
            "imageUrl": ""
         },
         "qid": "test.001"
}'
curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index/qna/test.002" -d '{
         "question": [
            {"q":"tell me about snorkel prices"}
         ],
         "a": "Snorkels are not expensive",
         "r": {
            "title": "",
            "imageUrl": ""
         },
         "qid": "test.002"
}'

Add more questions to test.001 to check that we still get correct answer:

curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index/qna/test.001" -d '{
         "question": [
            {"q":"tell me about snorkeling"},
            {"q":"nothing in common"},
            {"q":"something else again"}
         ],
         "a": "Snorkeling is cool!",
         "r": {
            "title": "",
            "imageUrl": ""
         },
         "qid": "test.001"
}'

Test query:

Notes:

  • Important to use ?search_type=dfs_query_then_fetch with small numbers of objects to combine idf across shards. (QnABot handler already does this)
  • score_mode:max used to return the score of the strongest question match, and avoid diluting a strong match with other weaker matches.
  • boost used to give double weighting to matches on question, compared to matches on answer (previously implemented with multi-match 'fields' syntax.
  • try adding more questions to either test.001 or test.002 to ensure that adding more questions doesn't ever result in the wrong answer being ranked higher.
curl -H'Content-Type: application/json' -XPOST "$ESURL/qna-index/qna/_search?search_type=dfs_query_then_fetch" -d '{  
   "query":{  
      "bool":{  
         "should":[  
            {  
               "nested":{  
                  "path":"question",
                  "score_mode":"max",
                  "boost":2,
                  "query":{  
                     "match":{  
                        "question.q":"tell me about snorkeling"
                     }
                  }
               }
            },
            {
                "match":{
                    "a":"tell me about snorkeling"
                }
            },
            {
                 "match":{
                    "t":"topicvalue"
                }
            }     
         ]
      }
   }
}'

This change will modify the JSON structure for documents. The content designer 'Import' function should support previous JSON structure for backward compatibility and to allow content migration, however i think we can migrate export and import to new nested structure going forward.

We should probably also take this opportunity to rename the fields in the document JSON to replace fields "a", "q", "t", "r" with more explicit longform names that better reflect the field meaning.

from qnabot-on-aws.

JohnCalhoun avatar JohnCalhoun commented on June 4, 2024

fixed in v2.0.0

from qnabot-on-aws.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.