feat: Add support for SWE-Bench-style tasks with OpenHands#2
feat: Add support for SWE-Bench-style tasks with OpenHands#2adityasoni9998 wants to merge 2 commits intoApGa:apga/tinker_experimentalfrom
Conversation
| from copy import deepcopy | ||
| from platoon.envs.base import Task | ||
| from platoon.openhands.types import OpenHandsObservation, OpenHandsAction | ||
| from .types import OpenHandsObservation, OpenHandsAction |
There was a problem hiding this comment.
this change doesn't seem necessary?
There was a problem hiding this comment.
Let's remove this file for now if it is not needed yet.
| if self._conversation is not None: | ||
| self._conversation.close() | ||
| self._conversation = None | ||
| # TODO: check if cleaning up workspace manually is required |
There was a problem hiding this comment.
Let's tackle the TODOs before merging changes, if they are quick enough to do.
| @@ -0,0 +1,41 @@ | |||
| # platoon-openhands-rl | |||
There was a problem hiding this comment.
Instead of openhands-rl I wonder if we should name this plugin something like swe-issue-resolution?
| import sys | ||
| import logging | ||
| from datasets import Dataset | ||
| from areal.api.cli_args import load_expr_config |
There was a problem hiding this comment.
I think we can actually remove all of these for now. I switched AreaL code to use prints instead of logging for now.
| from platoon.episode.context import current_trajectory_collection | ||
| from pydantic import SecretStr | ||
| from platoon.visualization.event_sinks import JsonlFileSink | ||
| from .tasks import EVAL_AGENT_SERVER_IMAGE, SDK_SHORT_SHA, ENV_SETUP_COMMANDS, PROMPT_FILENAME |
There was a problem hiding this comment.
let's avoid relative imports, for consistency
| # official_image_name = docker_image_prefix.rstrip("/") | ||
| # official_image_name += f"/sweb.eval.x86_64.{repo}_1776_{name}:latest".lower() | ||
| # logger.debug(f"Official SWE-Bench image: {official_image_name}") | ||
| # return official_image_name |
There was a problem hiding this comment.
let's think of a way to clean this up to make it more modular and extensible. Perhaps can adopt some sort of factory pattern for different datasets.
There was a problem hiding this comment.
Similar for the get_instruction and any other dataset-specific functions below.
| from platoon.envs.base import Task | ||
| from platoon.openhands.types import OpenHandsObservation, OpenHandsAction | ||
| from .types import OpenHandsObservation, OpenHandsAction | ||
| from platoon.utils.openhands_utils import get_actions_for_last_obs |
There was a problem hiding this comment.
Let's move the openhands utils into the openhands plugin
| server_image=agent_server_image, | ||
| target_type="source" if "source" in build_target else "binary", | ||
| ) | ||
| else: #NOTE: Docker workspace not supported yet since Babel doesn't allow docker access |
There was a problem hiding this comment.
Can we add docker as well? Babel is specific to CMU, so need to restrict this for the general user.
| instruction = template.render(context) | ||
| return instruction | ||
|
|
||
| def prepare_workspace(instance: dict, task: Task) -> BaseWorkspace: |
There was a problem hiding this comment.
It might be better to move all the helper functions like this and others (e.g., prepare_llm) into utils (either openhands plugin utils if it is relevant to all plugins that might leverage openhands or utils for this plugin, if it is issue-solving-specific).
That way if someone wants to add another custom rollout file they can import from those utils.
No description provided.