Especially for text classification use cases, it would be very useful to be able to de

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

thanks for the quick reply <a class="user-mention notranslate" data-hovercard-type="us

Feature Request: Define prompt:assertion pairs rather than only being able to define assertions in test objects about promptfoo HOT 7 OPEN

promptfoo commented on July 26, 2024

Feature Request: Define prompt:assertion pairs rather than only being able to define assertions in test objects

from promptfoo.

Comments (7)

typpo commented on July 26, 2024 1

Thanks for explaining - let me give this some thought and get back to you with a suggestion!

from promptfoo.

typpo commented on July 26, 2024

Hi @zeldrinn,

Would the output depend on the value of legal_case_text? If so, can you structure it like this:

# ...
tests:
  - vars:
      legal_case_text: 'first case text....'
    assert:
      - type: equals
        value: "Yes"
  - vars:
        legal_case_text: 'second case text.....'
    assert:
      - type: equals
        value: "It is likely"

from promptfoo.

zeldrinn commented on July 26, 2024

thanks for the quick reply @typpo! that's the approach i initially took, but it doesn't fully address the fact that the output also depends on the prompt, not just on the value of the vars (in this case legal_case_text). in other words, a common pattern when trying to optimize prompts is to run all the same test cases (input vars) on all of the different prompt variations (i.e. define your test cases via the cartesian product of [[var1a, var1b, ...], [var2a, var2b, ...], ...] and [prompt_variant_1, prompt_variant_2, ...]).

concretely, in the example we're discussing, "It is likely" is specifically a pattern only found in responses from prompt 2, so it wouldn't make sense to use it for tests run against prompt 1. you could alternatively create separate test config files, each of which references a different set of prompts, but that mostly defeats the purpose of being able to compare the performance of different prompt variations. ideally, the full set of tests (the cartesian product) in this example would be:

[{
    "prompt": "Is the following a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease ONLY give a Yes or No answer.",
    "legal_case_text": "first case text",
    "assertion_type": "contains",
    "assertion_value": "Yes"
},
{
    "prompt": "Is the following a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease ONLY give a Yes or No answer.",
    "legal_case_text": "second case text",
    "assertion_type": "contains",
    "assertion_value": "Yes"
},
{
    "prompt": "Is the following likely a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease give a brief explanation along with your answer.",
    "legal_case_text": "first case text",
    "assertion_type": "contains",
    "assertion_value": "It is likely"
},
{
    "prompt": "Is the following likely a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease give a brief explanation along with your answer.",
    "legal_case_text": "second case text",
    "assertion_type": "contains",
    "assertion_value": "It is likely"
}]

from promptfoo.

typpo commented on July 26, 2024

I could implement something like this, essentially what you asked for, an assert list associated with each prompt that will get merged into each individual test cases.

prompts:
  - prompt1.txt:
      assert:
        - type: contains
          value: Yes
  - prompt2.txt:
      assert:
        - type: contains
          value: It is likely
providers: ['anthropic:completion']
tests:
  - vars:
      legal_case_text: 'first case text.....'
  - vars:
      legal_case_text: 'second case text.....'
    assert:
      - type: contains
        value: Other

Based on my understanding I think this will work for you, but just to be explicit, this would test the following:

anthropic, prompt1, first case text, contains "Yes"
anthropic, prompt1, second case text, contains "Yes" and "Other"
anthropic, prompt2, first case text, contains "It is likely"
anthropic, prompt2, second case text, contains "It is likely" and "Other"

from promptfoo.

zeldrinn commented on July 26, 2024

this would almost work. only issue is that this wouldn't permit toggling the assertion from "positive" to "negative" (in the case of binary classification tasks), because the type is tied to the prompt. if we just put the value in the prompt > assertion and put the type in the tests section then i believe it would be resolved.

better yet, perhaps both type and value are optional in the prompts section, and it uses them if defined there. otherwise, for each one, it will fall back to the type and value defined in the tests section. perhaps this is what you were implying? e.g.:

prompts:
  - prompt1.txt:
      assert:
        - type: icontains
          value: Yes
  - prompt2.txt:
      assert:
        - value: It is likely
providers: ['anthropic:completion']
tests:
  - vars:
      legal_case_text: 'first case text.....'
     assert:
       - type: contains
         value: it is likely
  - vars:
      legal_case_text: 'second case text.....'
     assert:
       - type: contains
         value: it is likely
  - vars:
      legal_case_text: 'third case text.....'
    assert:
      - type: contains
        value: Other

this would effectively let users define both the type and value alongside each prompt as an override of the default that is defined in the test cases. we're probably still missing something fundamental in how binary classification test cases should be modeled, but this would probably be good enough for now!

from promptfoo.

thiagosalvatore commented on July 26, 2024

Hey guys!

I have a very similar problem. Basically I want to assert that my LLM is not hallucinating. In order to do that, I have a prompt that I send to the LLM and the answer from the LLM should only contain links that are in the prompt, which means that I need to have access to both the prompt sent to the LLM and the output generated by the LLM in my assertion. Is it possible?

from promptfoo.

zhlmmc commented on July 26, 2024

I could implement something like this, essentially what you asked for, an assert list associated with each prompt that will get merged into each individual test cases.
prompts:

  - prompt1.txt:

      assert:

        - type: contains

          value: Yes

  - prompt2.txt:

      assert:

        - type: contains

          value: It is likely

providers: ['anthropic:completion']

tests:

  - vars:

      legal_case_text: 'first case text.....'

  - vars:

      legal_case_text: 'second case text.....'

    assert:

      - type: contains

        value: Other
Based on my understanding I think this will work for you, but just to be explicit, this would test the following:

anthropic, prompt1, first case text, contains "Yes"

anthropic, prompt1, second case text, contains "Yes" and "Other"

anthropic, prompt2, first case text, contains "It is likely"

anthropic, prompt2, second case text, contains "It is likely" and "Other"

Is this feature online?

from promptfoo.

Feature Request: Define prompt:assertion pairs rather than only being able to define assertions in test objects about promptfoo HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent