🔸 Evaluating AI Agents – A DeepLearning.AI short course (in collaboration with Arize AI).
Teaches you how to build an agent, observe its step-by-step process, and evaluate both individual components (like tool selection and router logic) and end-to-end performance in development and production.
🔸 Mastering AI Agents – eBook by Pratik Bhavsar and the Galileo team.
A guide on agentic frameworks: how to choose them, evaluate performance, identify failure points, and deploy reliable agents at scale. Don't miss their blog posts too!
🔸 LLM Agent Evaluation – A blog post by Confident AI.
Provides a deep dive into evaluating agents, including tool usage, multi-step reasoning, and workflow-level metrics, using their DeepEval framework.
🔸 A Field Guide to Rapidly Improving AI Products – Blog post by Hamel Husain.
Covers practical techniques—error analysis, data-driven experimentation, observability tools—to iterate and optimize AI agents effectively.
Please share the resources that you found to be useful too!
🔗 Links are added in the comments below.
AIAgents
Image credits: Galileo team
This post was originally shared by on Linkedin.