Comments (7)
Thanks for explaining - let me give this some thought and get back to you with a suggestion!
from promptfoo.
Hi @zeldrinn,
Would the output depend on the value of legal_case_text
? If so, can you structure it like this:
# ...
tests:
- vars:
legal_case_text: 'first case text....'
assert:
- type: equals
value: "Yes"
- vars:
legal_case_text: 'second case text.....'
assert:
- type: equals
value: "It is likely"
from promptfoo.
thanks for the quick reply @typpo! that's the approach i initially took, but it doesn't fully address the fact that the output also depends on the prompt, not just on the value of the vars (in this case legal_case_text
). in other words, a common pattern when trying to optimize prompts is to run all the same test cases (input vars) on all of the different prompt variations (i.e. define your test cases via the cartesian product of [[var1a, var1b, ...], [var2a, var2b, ...], ...]
and [prompt_variant_1, prompt_variant_2, ...]
).
concretely, in the example we're discussing, "It is likely" is specifically a pattern only found in responses from prompt 2, so it wouldn't make sense to use it for tests run against prompt 1. you could alternatively create separate test config files, each of which references a different set of prompts, but that mostly defeats the purpose of being able to compare the performance of different prompt variations. ideally, the full set of tests (the cartesian product) in this example would be:
[{
"prompt": "Is the following a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease ONLY give a Yes or No answer.",
"legal_case_text": "first case text",
"assertion_type": "contains",
"assertion_value": "Yes"
},
{
"prompt": "Is the following a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease ONLY give a Yes or No answer.",
"legal_case_text": "second case text",
"assertion_type": "contains",
"assertion_value": "Yes"
},
{
"prompt": "Is the following likely a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease give a brief explanation along with your answer.",
"legal_case_text": "first case text",
"assertion_type": "contains",
"assertion_value": "It is likely"
},
{
"prompt": "Is the following likely a violation of U.S. Patent Law?\n\n{{legal_case_text}}\n\nPlease give a brief explanation along with your answer.",
"legal_case_text": "second case text",
"assertion_type": "contains",
"assertion_value": "It is likely"
}]
from promptfoo.
I could implement something like this, essentially what you asked for, an assert
list associated with each prompt that will get merged into each individual test cases.
prompts:
- prompt1.txt:
assert:
- type: contains
value: Yes
- prompt2.txt:
assert:
- type: contains
value: It is likely
providers: ['anthropic:completion']
tests:
- vars:
legal_case_text: 'first case text.....'
- vars:
legal_case_text: 'second case text.....'
assert:
- type: contains
value: Other
Based on my understanding I think this will work for you, but just to be explicit, this would test the following:
- anthropic, prompt1, first case text, contains "Yes"
- anthropic, prompt1, second case text, contains "Yes" and "Other"
- anthropic, prompt2, first case text, contains "It is likely"
- anthropic, prompt2, second case text, contains "It is likely" and "Other"
from promptfoo.
this would almost work. only issue is that this wouldn't permit toggling the assertion from "positive" to "negative" (in the case of binary classification tasks), because the type
is tied to the prompt. if we just put the value
in the prompt > assertion
and put the type
in the tests
section then i believe it would be resolved.
better yet, perhaps both type
and value
are optional in the prompts
section, and it uses them if defined there. otherwise, for each one, it will fall back to the type
and value
defined in the tests
section. perhaps this is what you were implying? e.g.:
prompts:
- prompt1.txt:
assert:
- type: icontains
value: Yes
- prompt2.txt:
assert:
- value: It is likely
providers: ['anthropic:completion']
tests:
- vars:
legal_case_text: 'first case text.....'
assert:
- type: contains
value: it is likely
- vars:
legal_case_text: 'second case text.....'
assert:
- type: contains
value: it is likely
- vars:
legal_case_text: 'third case text.....'
assert:
- type: contains
value: Other
this would effectively let users define both the type
and value
alongside each prompt as an override of the default that is defined in the test cases. we're probably still missing something fundamental in how binary classification test cases should be modeled, but this would probably be good enough for now!
from promptfoo.
Hey guys!
I have a very similar problem. Basically I want to assert that my LLM is not hallucinating. In order to do that, I have a prompt that I send to the LLM and the answer from the LLM should only contain links that are in the prompt, which means that I need to have access to both the prompt sent to the LLM and the output generated by the LLM in my assertion. Is it possible?
from promptfoo.
I could implement something like this, essentially what you asked for, an
assert
list associated with each prompt that will get merged into each individual test cases.prompts: - prompt1.txt: assert: - type: contains value: Yes - prompt2.txt: assert: - type: contains value: It is likely providers: ['anthropic:completion'] tests: - vars: legal_case_text: 'first case text.....' - vars: legal_case_text: 'second case text.....' assert: - type: contains value: OtherBased on my understanding I think this will work for you, but just to be explicit, this would test the following:
anthropic, prompt1, first case text, contains "Yes"
anthropic, prompt1, second case text, contains "Yes" and "Other"
anthropic, prompt2, first case text, contains "It is likely"
anthropic, prompt2, second case text, contains "It is likely" and "Other"
Is this feature online?
from promptfoo.
Related Issues (20)
- How to test prompt with triple dashes inside without breaking into separate prompts? HOT 1
- In the html report (view/shared), instruction tokens and completion tokens should be shown separately. HOT 3
- No Amazon Bedrock models can be used as embedding providers for similarity assertion HOT 2
- Request: extending the `equals` assertion to work with JSON HOT 7
- SQLITE_CONSTRAINT_NOT_NULL Error when running "promptfoo view" HOT 1
- How can we secure self hosted promptfoo server?
- Only most recent Eval is viewable in the WebViewer - Failed to execute 'setItem' on 'Storage' HOT 4
- Web ui doesn't support object prompts config HOT 1
- Set max dimensions for images rendered with markdown HOT 3
- Markdown appearing when manually failing test case HOT 2
- `n` parameters does not work with openai as provider HOT 1
- Allow overriding provider config in prompt or test case
- Using vars on rubricPrompt lacks serialization, leading to the failure of the entire assertion. HOT 3
- Mistral Instruct prompt erroneously considered as JSON HOT 5
- Not working using text prompt with Palm2 Google Vertex API HOT 1
- Individual external Python assertion that raises exception results in no grading result in JSON output HOT 1
- promptfoo 0.60.0 --share not working HOT 2
- CLI docs request: default behavior of `eval --output` HOT 1
- Support `systemInstruction` for Gemini (PALM) HOT 7
- Allow options to avoid using special characters HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from promptfoo.