Some of the resources that I found to be useful for learning more about Evaluation of AI agents

🔸 Evaluating AI Agents – A DeepLearning.AI short course (in collaboration with Arize AI).
Teaches you how to build an agent, observe its step-by-step process, and evaluate both individual components (like tool selection and router logic) and end-to-end performance in development and production.

🔸 Mastering AI Agents – eBook by Pratik Bhavsar and the Galileo team.
A guide on agentic frameworks: how to choose them, evaluate performance, identify failure points, and deploy reliable agents at scale. Don't miss their blog posts too!

🔸 LLM Agent Evaluation – A blog post by Confident AI.
Provides a deep dive into evaluating agents, including tool usage, multi-step reasoning, and workflow-level metrics, using their DeepEval framework.

🔸 A Field Guide to Rapidly Improving AI Products – Blog post by Hamel Husain.
Covers practical techniques—error analysis, data-driven experimentation, observability tools—to iterate and optimize AI agents effectively.

Please share the resources that you found to be useful too!

🔗 Links are added in the comments below.

AIAgents

Image credits: Galileo team


This post was originally shared by on Linkedin.