Writing Tests for LLM Apps

In the development and refinement of our LLM based features, we regularly update our prompts and experiment with different models and their parameters. To test changes systematically, we utilize a few best practices:

Leveraging JSON for Structured Testing

Where possible, we use OpenAI's JSON mode for verifying outputs. This structured approach allows us to bypass the less reliable practices of string comparisons or output string checks.

Utilizing a library for writing tests for LLMs

We use promptfoo - a library that facilitates writing tests for LLMs - to establish a comprehensive suite of tests. These tests are crafted with carefully selected sample prompts alongside their expected outcomes. This method not only streamlines the testing process but also ensures that our tests remain robust and relevant over time.

Continuous Improvement and Experimentation

Our commitment to excellence compels us to continually refine our prompts. By experimenting with various models and adjusting their parameters, we aim to achieve optimal performance. Systematic testing plays a crucial role in this ongoing process, allowing us to objectively assess and enhance our LLMs.

Last updated