Skip to content

Conversation

@dprevoznik
Copy link
Contributor

@dprevoznik dprevoznik commented Jan 21, 2026

Add Yutori n1 Computer Use CLI Templates

This PR adds new CLI templates for Yutori's n1 computer use model, enabling users to quickly scaffold browser automation projects using Kernel's infrastructure.

New Templates

  • TypeScript: kernel create --template ts-yutori-cua
  • Python: kernel create --template python-yutori-cua

Features

Both templates include:

  • Agentic sampling loop with n1's OpenAI-compatible API
  • Computer tool mapping n1 actions (click, type, scroll, drag, hover, key_press, wait, refresh, go_back, goto_url, stop) to Kernel's Computer Controls API
  • Coordinate scaling from n1's 1000×1000 relative space to actual viewport dimensions
  • Session management with replay recording support

Dual Screenshot Modes

Mode Description
computer_use (default) Uses Kernel's Computer Controls screenshot API (stable)
playwright Uses CDP WebSocket connection for viewport-only screenshots without browser chrome, optimized for n1's training data per Yutori's documentation

Implementation Details

  • Model: n1-preview-2025-11 outputs coordinates in 1000×1000 space
  • Viewport: 1200×800 at 25Hz (closest to Yutori's recommended 1280×800)

With Playwright Mode for viewport-only screenshots

kernel invoke ts-yutori-cua cua-task --payload '{"query": "...", "mode": "playwright"}'

Files Changed

  • pkg/templates/typescript/yutori-computer-use/ - TypeScript template
  • pkg/templates/python/yutori-computer-use/ - Python template
  • pkg/create/templates.go - Template registration

Closes KERNEL-742


Note

Introduces ready-to-deploy Yutori n1 computer use templates with dual screenshot modes and full browser session/replay support.

  • New templates: typescript/yutori-computer-use and python/yutori-computer-use with n1 sampling loops, action-to-Computer Controls mapping, coordinate scaling, and optional Playwright CDP mode
  • Session managers added (TS/Py) for Kernel browser lifecycle and replay recording; per-language tooling for screenshots and key/mouse actions
  • Registers yutori-computer-use in pkg/create/templates.go (names, sorting priority, deploy/invoke configs for both languages)
  • QA updates: adds Yutori rows/commands, dual-mode invoke examples, increases test matrix counts; .gitignore ignores qa-* dirs

Written by Cursor Bugbot for commit 7e4ce52. This will update automatically on new commits. Configure here.

dprevoznik and others added 12 commits January 19, 2026 21:34
Add new CLI templates for Yutori's n1 computer use model, enabling users
to quickly scaffold browser automation projects using Kernel's infrastructure.

Templates (TypeScript & Python):
- Agentic sampling loop with n1's OpenAI-compatible API
- Computer tool mapping n1 actions (click, type, scroll, drag, etc.) to
  Kernel's Computer Controls API
- Coordinate scaling from n1's 1000x1000 relative space to actual viewport
- Session management with replay recording support
- read_texts_and_links action using Playwright execution API (with fallback)

Key implementation details:
- n1 requires screenshots sent with role 'observation' (not 'user')
- Model: n1-preview-2025-11 outputs coordinates in 1000x1000 space
- Viewport: 1200x800 at 25Hz (closest to Yutori's recommended 1280x800)
- Navigation actions (refresh, go_back, goto_url) use keyboard shortcuts
  via Computer Controls since n1 doesn't use Playwright directly

Also updated:
- .gitignore: Added qa-* to exclude QA testing directories
- pkg/create/templates.go: Registered new yutori-computer-use templates
- .cursor/commands/qa.md: Added Yutori templates to QA testing matrix

Closes KERNEL-742
Replace page.accessibility.snapshot() with page._snapshotForAI() which is
specifically designed for AI agents and documented in Kernel's MCP server.

The previous implementation used the experimental/deprecated accessibility
API which failed silently and fell back to screenshot-only mode.

_snapshotForAI() returns a structured representation of the page optimized
for LLM consumption, including visible text, interactive elements (links,
buttons, inputs), and page structure - exactly what n1 needs for reading
texts and saving URLs for citation.
Add PlaywrightComputerTool adapter that connects via CDP WebSocket for
browser-only screenshots, optimized for Yutori n1's training data per
their documentation recommendations.

Changes:
- Add PlaywrightComputerTool class (TS + Python) using CDP connection
- Add 'mode' parameter to sampling loop ('computer_use' | 'playwright')
- Default to 'computer_use' mode (stable); 'playwright' is opt-in
- Add configurable viewport dimensions (1200x800)
- Expose cdp_ws_url from session for Playwright connection
- Add playwright-core (TS) and playwright (Python) dependencies

The playwright mode provides viewport-only screenshots without OS UI or
browser chrome, improving n1 model performance per Yutori's docs:
https://docs.yutori.com/reference/n1#screenshot-requirements
Add templates + modes for Yutori to QA file
Fix drag operations that previously weren't working properly on Playwright mode operations.
Use ariaSnapshot instead of the existing method, as ariaSnapshot is stably available in both Python and TypeScript versions.
Issue: The ComputerTool.screenshot() method was a synchronous function, but:
The N1ComputerToolProtocol expected it to be async
The PlaywrightComputerTool.screenshot() was async
The loop.py code tried to await it

Fix:
Changed def screenshot() to async def screenshot()
Updated all handler methods to await self.screenshot() instead of return self.screenshot()
Update default delays for actions and screenshots
… moving. Clarified instructions for both computer_use and playwright modes to enhance user understanding and execution accuracy.
The cleanup removed ~300 lines of redundant inline comments and verbose method docstrings while keeping the useful class-level documentation you restored. The templates now match the minimal-comment style of the existing anthropic/openai templates in the codebase.
#88)

This PR updates the Go SDK to cee2050be3f8136505d41c20c2903dfca2cbc479
and adds CLI commands for new SDK methods.

## SDK Update
- Updated kernel-go-sdk to cee2050be3f8136505d41c20c2903dfca2cbc479

## Coverage Analysis
This PR was generated by performing a full enumeration of SDK methods
and CLI commands.

## New Commands
- `kernel credential-providers list` - List configured external
credential providers
- `kernel credential-providers get <id>` - Get a credential provider by
ID
- `kernel credential-providers create` - Create a new credential
provider (supports 1Password)
- `kernel credential-providers update <id>` - Update a credential
provider's configuration
- `kernel credential-providers delete <id>` - Delete a credential
provider
- `kernel credential-providers test <id>` - Test a credential provider
connection

## Breaking Changes Fixed
- Fixed `browsers.Get()` calls to pass new required `BrowserGetParams`
parameter

Triggered by:
kernel/kernel-go-sdk@cee2050 Reviewer:
@masnwilliams

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduces new CLI surfaces and updates for latest SDK.
> 
> - **Agent Auth CLI**: `kernel agents auth` with
`create/get/list/delete`, `invocations {create/get/exchange/submit}`,
and end‑to‑end `run` flow (auto field submission, TOTP, optional live
view); docs and examples added to `README.md`.
> - **Credential Providers CLI**: `kernel credential-providers
{list/get/create/update/delete/test}` (supports 1Password), wired into
root.
> - **Browsers API updates**: adapt to SDK breaking change
(`browsers.Get` now requires `BrowserGetParams`); add `process resize`
and filesystem watch (`fs watch start/stop/events`) commands; tests
updated accordingly.
> - **Dependencies**: bump `kernel-go-sdk` to cee2050… and add
`pquerna/otp`; regenerate `go.sum`.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
0b27df6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup> <!--
/CURSOR_SUMMARY -->

---------

Co-authored-by: Mason Williams <43387599+masnwilliams@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursor-agent@kernel.sh>
Co-authored-by: Cursor Agent <cursor-agent@onkernel.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
@dprevoznik dprevoznik marked this pull request as ready for review January 21, 2026 23:01
cursor[bot]

This comment was marked as outdated.

@dprevoznik
Copy link
Contributor Author

Working on fixing comments from bugbot then will request review

… and remove unused dependencies from Python and TypeScript templates.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants