Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 41 additions & 4 deletions .cursor/commands/qa.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,22 @@ Here are all valid language + template combinations:
| typescript | openai-computer-use | ts-openai-cua | ts-openai-cua | Yes | OPENAI_API_KEY |
| typescript | gemini-computer-use | ts-gemini-cua | ts-gemini-cua | Yes | GOOGLE_API_KEY |
| typescript | claude-agent-sdk | ts-claude-agent-sdk | ts-claude-agent-sdk | Yes | ANTHROPIC_API_KEY |
| typescript | yutori-computer-use | ts-yutori-cua | ts-yutori-cua | Yes | YUTORI_API_KEY |

> **Note:** The `yutori-computer-use` template supports two modes: `computer_use` (default, full VM screenshots) and `playwright` (viewport-only screenshots via CDP). Both modes should be tested.

| python | sample-app | py-sample-app | python-basic | No | - |
| python | captcha-solver | py-captcha-solver | python-captcha-solver | No | - |
| python | browser-use | py-browser-use | python-bu | Yes | OPENAI_API_KEY |
| python | anthropic-computer-use | py-anthropic-cua | python-anthropic-cua | Yes | ANTHROPIC_API_KEY |
| python | openai-computer-use | py-openai-cua | python-openai-cua | Yes | OPENAI_API_KEY |
| python | openagi-computer-use | py-openagi-cua | python-openagi-cua | Yes | OAGI_API_KEY |
| python | claude-agent-sdk | py-claude-agent-sdk | py-claude-agent-sdk | Yes | ANTHROPIC_API_KEY |
| python | yutori-computer-use | py-yutori-cua | python-yutori-cua | Yes | YUTORI_API_KEY |

> **Yutori Modes:**
> - `computer_use` (default): Uses Kernel's Computer Controls API with full VM screenshots
> - `playwright`: Uses Playwright via CDP WebSocket for viewport-only screenshots (optimized for n1 model)

### Create Commands

Expand All @@ -80,6 +89,7 @@ Run each of these (they are non-interactive when all flags are provided):
../bin/kernel create -n ts-openai-cua -l typescript -t openai-computer-use
../bin/kernel create -n ts-gemini-cua -l typescript -t gemini-computer-use
../bin/kernel create -n ts-claude-agent-sdk -l typescript -t claude-agent-sdk
../bin/kernel create -n ts-yutori-cua -l typescript -t yutori-computer-use

# Python templates
../bin/kernel create -n py-sample-app -l python -t sample-app
Expand All @@ -89,6 +99,7 @@ Run each of these (they are non-interactive when all flags are provided):
../bin/kernel create -n py-openai-cua -l python -t openai-computer-use
../bin/kernel create -n py-openagi-cua -l python -t openagi-computer-use
../bin/kernel create -n py-claude-agent-sdk -l python -t claude-agent-sdk
../bin/kernel create -n py-yutori-cua -l python -t yutori-computer-use
```

## Step 5: Deploy Each Template
Expand Down Expand Up @@ -176,6 +187,15 @@ echo "ANTHROPIC_API_KEY=<value from human>" > .env
cd ..
```

**ts-yutori-cua** (needs YUTORI_API_KEY):

```bash
cd ts-yutori-cua
echo "YUTORI_API_KEY=<value from human>" > .env
../bin/kernel deploy index.ts --env-file .env
cd ..
```

**py-browser-use** (needs OPENAI_API_KEY):

```bash
Expand Down Expand Up @@ -221,6 +241,15 @@ echo "ANTHROPIC_API_KEY=<value from human>" > .env
cd ..
```

**py-yutori-cua** (needs YUTORI_API_KEY):

```bash
cd py-yutori-cua
echo "YUTORI_API_KEY=<value from human>" > .env
../bin/kernel deploy main.py --env-file .env
cd ..
```

## Step 6: Provide Invoke Commands

Once all deployments are complete, present the human with these invoke commands to test manually:
Expand All @@ -235,6 +264,8 @@ kernel invoke ts-magnitude mag-url-extract --payload '{"url": "https://en.wikipe
kernel invoke ts-openai-cua cua-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 5 articles"}'
kernel invoke ts-gemini-cua gemini-cua-task --payload '{"startingUrl": "https://www.magnitasks.com/", "instruction": "Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board? You are done successfully when the items are moved."}'
kernel invoke ts-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "computer_use"}'
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "playwright"}'

# Python apps
kernel invoke python-basic get-page-title --payload '{"url": "https://www.google.com"}'
Expand All @@ -244,11 +275,13 @@ kernel invoke python-anthropic-cua cua-task --payload '{"query": "Go to http://m
kernel invoke python-openai-cua cua-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 5 articles"}'
kernel invoke python-openagi-cua openagi-default-task -p '{"instruction": "Navigate to https://agiopen.org and click the What is Computer Use? button"}'
kernel invoke py-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "computer_use"}'
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "playwright"}'
```

## Step 7: Automated Runtime Testing (Optional)

**STOP and ask the human:** "Would you like me to automatically invoke all 15 templates and report back on their runtime status?"
**STOP and ask the human:** "Would you like me to automatically invoke all 19 test cases and report back on their runtime status?"

If the human agrees, invoke each template use the Kernel CLI and collect results. Present findings in this format:

Expand All @@ -268,13 +301,17 @@ If the human agrees, invoke each template use the Kernel CLI and collect results
| ts-openai-cua | ts-openai-cua | | |
| ts-gemini-cua | ts-gemini-cua | | |
| ts-claude-agent-sdk | ts-claude-agent-sdk | | |
| ts-yutori-cua | ts-yutori-cua | | mode: computer_use |
| ts-yutori-cua | ts-yutori-cua | | mode: playwright |
| py-sample-app | python-basic | | |
| py-captcha-solver | python-captcha-solver | | |
| py-browser-use | python-bu | | |
| py-anthropic-cua | python-anthropic-cua | | |
| py-openai-cua | python-openai-cua | | |
| py-openagi-cua | python-openagi-cua | | |
| py-claude-agent-sdk | py-claude-agent-sdk | | |
| py-yutori-cua | python-yutori-cua | | mode: computer_use |
| py-yutori-cua | python-yutori-cua | | mode: playwright |

Status values:
- **SUCCESS**: App started and returned a result
Expand All @@ -287,9 +324,9 @@ Notes should include brief error messages for failures or confirmation of succes
- [ ] Built CLI with `make build`
- [ ] Created QA directory
- [ ] Got KERNEL_API_KEY from human
- [ ] Created all 15 template variations
- [ ] Got required API keys from human (OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, OAGI_API_KEY)
- [ ] Deployed all 15 apps
- [ ] Created all 17 template variations
- [ ] Got required API keys from human (OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, OAGI_API_KEY, YUTORI_API_KEY)
- [ ] Deployed all 17 apps
- [ ] Provided invoke commands to human for manual testing
- [ ] (Optional) Ran automated runtime testing and reviewed results

Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,6 @@ report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json
# Finder (MacOS) folder config
.DS_Store
kernel

# QA testing directories
qa-*
18 changes: 18 additions & 0 deletions pkg/create/templates.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ const (
TemplateStagehand = "stagehand"
TemplateOpenAGIComputerUse = "openagi-computer-use"
TemplateClaudeAgentSDK = "claude-agent-sdk"
TemplateYutoriComputerUse = "yutori-computer-use"
)

type TemplateInfo struct {
Expand Down Expand Up @@ -84,6 +85,11 @@ var Templates = map[string]TemplateInfo{
Description: "Implements a Claude Agent SDK browser automation agent",
Languages: []string{LanguageTypeScript, LanguagePython},
},
TemplateYutoriComputerUse: {
Name: "Yutori n1 Computer Use",
Description: "Implements a Yutori n1 computer use agent",
Languages: []string{LanguageTypeScript, LanguagePython},
},
}

// GetSupportedTemplatesForLanguage returns a list of all supported template names for a given language
Expand All @@ -108,6 +114,8 @@ func GetSupportedTemplatesForLanguage(language string) TemplateKeyValues {
return 1
case TemplateGeminiComputerUse:
return 2
case TemplateYutoriComputerUse:
return 3
default:
return 10
}
Expand Down Expand Up @@ -200,6 +208,11 @@ var Commands = map[string]map[string]DeployConfig{
NeedsEnvFile: true,
InvokeCommand: `kernel invoke ts-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'`,
},
TemplateYutoriComputerUse: {
EntryPoint: "index.ts",
NeedsEnvFile: true,
InvokeCommand: `kernel invoke ts-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}'`,
},
},
LanguagePython: {
TemplateSampleApp: {
Expand Down Expand Up @@ -237,6 +250,11 @@ var Commands = map[string]map[string]DeployConfig{
NeedsEnvFile: true,
InvokeCommand: `kernel invoke py-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'`,
},
TemplateYutoriComputerUse: {
EntryPoint: "main.py",
NeedsEnvFile: true,
InvokeCommand: `kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}'`,
},
},
}

Expand Down
65 changes: 65 additions & 0 deletions pkg/templates/python/yutori-computer-use/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Kernel Python Sample App - Yutori n1 Computer Use

This is a Kernel application that implements a prompt loop using Yutori's n1 computer use model with Kernel's Computer Controls API.

[n1](https://yutori.com/blog/introducing-navigator) is Yutori's pixels-to-actions LLM that predicts browser actions from screenshots.

## Setup

1. Get your API keys:
- **Kernel**: [dashboard.onkernel.com](https://dashboard.onkernel.com)
- **Yutori**: [yutori.com](https://yutori.com)

2. Deploy the app:
```bash
kernel login
cp .env.example .env # Add your YUTORI_API_KEY
kernel deploy main.py --env-file .env
```

## Usage

```bash
kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}'
```

## Recording Replays

> **Note:** Replay recording is only available to Kernel users on paid plans.

Add `"record_replay": true` to your payload to capture a video of the browser session:

```bash
kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com", "record_replay": true}'
```

When enabled, the response will include a `replay_url` field with a link to view the recorded session.

## Viewport Configuration

Yutori n1 recommends a **1280×800 (WXGA, 16:10)** viewport for best grounding accuracy. Kernel's closest supported viewport is **1200×800 at 25Hz**, which this template uses by default.

> **Note:** n1 outputs coordinates in a 1000×1000 relative space, which are automatically scaled to the actual viewport dimensions. The slight width difference (1200 vs 1280) should have minimal impact on accuracy.

See [Kernel Viewport Documentation](https://www.kernel.sh/docs/browsers/viewport) for all supported configurations.

## n1 Supported Actions

| Action | Description |
|--------|-------------|
| `click` | Left mouse click at coordinates |
| `scroll` | Scroll page in a direction |
| `type` | Type text into focused element |
| `key_press` | Send keyboard input |
| `hover` | Move mouse without clicking |
| `drag` | Click-and-drag operation |
| `wait` | Pause for UI to update |
| `refresh` | Reload current page |
| `go_back` | Navigate back in history |
| `goto_url` | Navigate to a URL |
| `stop` | End task with final answer |

## Resources

- [Yutori n1 API Documentation](https://docs.yutori.com/reference/n1)
- [Kernel Documentation](https://www.kernel.sh/docs/quickstart)
7 changes: 7 additions & 0 deletions pkg/templates/python/yutori-computer-use/_gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
__pycache__/
*.py[cod]
*$py.class
.env
*.log
.venv/
venv/
Loading