Giter Club home page Giter Club logo

Comments (7)

typpo avatar typpo commented on July 26, 2024 1

Thanks for explaining - let me give this some thought and get back to you with a suggestion!

from promptfoo.

typpo avatar typpo commented on July 26, 2024

Hi @zeldrinn,

Would the output depend on the value of legal_case_text? If so, can you structure it like this:

# ...
tests:
  - vars:
      legal_case_text: 'first case text....'
    assert:
      - type: equals
        value: "Yes"
  - vars:
        legal_case_text: 'second case text.....'
    assert:
      - type: equals
        value: "It is likely"

from promptfoo.

zeldrinn avatar zeldrinn commented on July 26, 2024

thanks for the quick reply @typpo! that's the approach i initially took, but it doesn't fully address the fact that the output also depends on the prompt, not just on the value of the vars (in this case legal_case_text). in other words, a common pattern when trying to optimize prompts is to run all the same test cases (input vars) on all of the different prompt variations (i.e. define your test cases via the cartesian product of [[var1a, var1b, ...], [var2a, var2b, ...], ...] and [prompt_variant_1, prompt_variant_2, ...]).

concretely, in the example we're discussing, "It is likely" is specifically a pattern only found in responses from prompt 2, so it wouldn't make sense to use it for tests run against prompt 1. you could alternatively create separate test config files, each of which references a different set of prompts, but that mostly defeats the purpose of being able to compare the performance of different prompt variations. ideally, the full set of tests (the cartesian product) in this example would be:

[{
    "prompt": "Is the following a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease ONLY give a Yes or No answer.",
    "legal_case_text": "first case text",
    "assertion_type": "contains",
    "assertion_value": "Yes"
},
{
    "prompt": "Is the following a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease ONLY give a Yes or No answer.",
    "legal_case_text": "second case text",
    "assertion_type": "contains",
    "assertion_value": "Yes"
},
{
    "prompt": "Is the following likely a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease give a brief explanation along with your answer.",
    "legal_case_text": "first case text",
    "assertion_type": "contains",
    "assertion_value": "It is likely"
},
{
    "prompt": "Is the following likely a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease give a brief explanation along with your answer.",
    "legal_case_text": "second case text",
    "assertion_type": "contains",
    "assertion_value": "It is likely"
}]

from promptfoo.

typpo avatar typpo commented on July 26, 2024

I could implement something like this, essentially what you asked for, an assert list associated with each prompt that will get merged into each individual test cases.

prompts:
  - prompt1.txt:
      assert:
        - type: contains
          value: Yes
  - prompt2.txt:
      assert:
        - type: contains
          value: It is likely
providers: ['anthropic:completion']
tests:
  - vars:
      legal_case_text: 'first case text.....'
  - vars:
      legal_case_text: 'second case text.....'
    assert:
      - type: contains
        value: Other

Based on my understanding I think this will work for you, but just to be explicit, this would test the following:

  • anthropic, prompt1, first case text, contains "Yes"
  • anthropic, prompt1, second case text, contains "Yes" and "Other"
  • anthropic, prompt2, first case text, contains "It is likely"
  • anthropic, prompt2, second case text, contains "It is likely" and "Other"

from promptfoo.

zeldrinn avatar zeldrinn commented on July 26, 2024

this would almost work. only issue is that this wouldn't permit toggling the assertion from "positive" to "negative" (in the case of binary classification tasks), because the type is tied to the prompt. if we just put the value in the prompt > assertion and put the type in the tests section then i believe it would be resolved.

better yet, perhaps both type and value are optional in the prompts section, and it uses them if defined there. otherwise, for each one, it will fall back to the type and value defined in the tests section. perhaps this is what you were implying? e.g.:

prompts:
  - prompt1.txt:
      assert:
        - type: icontains
          value: Yes
  - prompt2.txt:
      assert:
        - value: It is likely
providers: ['anthropic:completion']
tests:
  - vars:
      legal_case_text: 'first case text.....'
     assert:
       - type: contains
         value: it is likely
  - vars:
      legal_case_text: 'second case text.....'
     assert:
       - type: contains
         value: it is likely
  - vars:
      legal_case_text: 'third case text.....'
    assert:
      - type: contains
        value: Other

this would effectively let users define both the type and value alongside each prompt as an override of the default that is defined in the test cases. we're probably still missing something fundamental in how binary classification test cases should be modeled, but this would probably be good enough for now!

from promptfoo.

thiagosalvatore avatar thiagosalvatore commented on July 26, 2024

Hey guys!

I have a very similar problem. Basically I want to assert that my LLM is not hallucinating. In order to do that, I have a prompt that I send to the LLM and the answer from the LLM should only contain links that are in the prompt, which means that I need to have access to both the prompt sent to the LLM and the output generated by the LLM in my assertion. Is it possible?

from promptfoo.

zhlmmc avatar zhlmmc commented on July 26, 2024

I could implement something like this, essentially what you asked for, an assert list associated with each prompt that will get merged into each individual test cases.

prompts:

  - prompt1.txt:

      assert:

        - type: contains

          value: Yes

  - prompt2.txt:

      assert:

        - type: contains

          value: It is likely

providers: ['anthropic:completion']

tests:

  - vars:

      legal_case_text: 'first case text.....'

  - vars:

      legal_case_text: 'second case text.....'

    assert:

      - type: contains

        value: Other

Based on my understanding I think this will work for you, but just to be explicit, this would test the following:

  • anthropic, prompt1, first case text, contains "Yes"

  • anthropic, prompt1, second case text, contains "Yes" and "Other"

  • anthropic, prompt2, first case text, contains "It is likely"

  • anthropic, prompt2, second case text, contains "It is likely" and "Other"

Is this feature online?

from promptfoo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.