Skip to content

Conversation

@hashwnath
Copy link

Summary

This PR adds a new agentic-eval skill to the skills collection, focused on patterns for evaluating and improving AI agent outputs.

Skill Contents

  • Reflection Pattern: Self-critique and iterative improvement loops
  • Evaluator-Optimizer Pattern: Separate generation/evaluation components
  • Code-Specific Reflection: Test-driven refinement workflows
  • Evaluation Strategies: Outcome-based, LLM-as-Judge, Rubric-based
  • Best Practices: Clear criteria, iteration limits, convergence checks

Use Cases

  • Implementing self-critique and reflection loops
  • Building evaluator-optimizer pipelines for quality-critical generation
  • Creating test-driven code refinement workflows
  • Designing rubric-based or LLM-as-judge evaluation systems
  • Measuring and improving agent response quality

This skill is domain-agnostic and can be applied to any AI agent system requiring output quality improvement.

Copy link
Contributor

@aaronpowell aaronpowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please ensure you run the update script so that the readme is updated with the changes

Ran the update script as requested by reviewer to regenerate the skills table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants