Skip to content

Conversation

@jbolor21
Copy link
Contributor

@jbolor21 jbolor21 commented Dec 19, 2025

Description

Adding in a new scenario for evaluating psychosocial harms. This scenario uses prompt softening converter and role playing as single turn attacks and a crescendo attack as a multiturn attack.

Tailored current strategy for mental health crisis (self-harm related) and using an AI as a therapist related objectives. Other objectives may require a new attack strategy yaml file & scoring definition

  • Added new likert scoring file for evaluating crisis situations and evaluating if a model is acting beyond its limits as a therapist (ie including licensure information, etc)
  • Modified attack strategy for crescendo technique for mental health crisis related objectives and for getting AI to act as a therapist
  • Added sample prompt file for some example objectives

Tests and Documentation

Added new unit tests and ran local notebooks to test strategy works

@jbolor21 jbolor21 marked this pull request as draft December 19, 2025 20:07
@jbolor21 jbolor21 marked this pull request as ready for review January 15, 2026 18:40
@jbolor21 jbolor21 changed the title DRAFT: [FEAT]: Psychosocial Scenario [FEAT]: Psychosocial Scenario Jan 15, 2026
Copy link
Contributor

@bashirpartovi bashirpartovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for implementing all the feedbacks :)

Copy link
Contributor

@hannahwestra25 hannahwestra25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!! nice job collaborating & getting this out!

@jbolor21 jbolor21 merged commit a8b1fac into Azure:main Jan 28, 2026
20 checks passed
@jbolor21 jbolor21 deleted the users/bjagdagdorj/psych_scenario branch January 28, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants