From 827cfd76fe45a65da26653cf5091505fc2b6a61e Mon Sep 17 00:00:00 2001
From: Tyler Fong <tylerf@nvidia.com>
Date: Thu, 29 Jan 2026 10:56:41 -0800
Subject: [PATCH 1/5] more thorough docs based on conversations with clouds

---
 cloudManualExternal.md | 632 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 632 insertions(+)
 create mode 100644 cloudManualExternal.md

diff --git a/cloudManualExternal.md b/cloudManualExternal.md
new file mode 100644
index 0000000..a326c9e
--- /dev/null
+++ b/cloudManualExternal.md
@@ -0,0 +1,632 @@
+# Brev Cloud Provider Integration Guide
+
+**For Cloud Infrastructure Providers Integrating with Brev**
+
+---
+
+## Table of Contents
+
+1. [Integration Overview](#1-integration-overview)
+2. [How Brev Discovers Your Inventory](#2-how-brev-discovers-your-inventory)
+3. [Instance Types: Your SKU Catalog](#3-instance-types-your-sku-catalog)
+4. [Location and Availability Model](#4-location-and-availability-model)
+5. [GPU Normalization](#5-gpu-normalization)
+6. [Credential and Authentication Model](#6-credential-and-authentication-model)
+7. [Provisioning Lifecycle](#7-provisioning-lifecycle)
+8. [Network Requirements](#8-network-requirements)
+9. [SSH and Control Plane Access](#9-ssh-and-control-plane-access)
+10. [Firewall and Security Groups](#10-firewall-and-security-groups)
+11. [Instance Metadata and Tags](#11-instance-metadata-and-tags)
+12. [Error Handling and Status Reporting](#12-error-handling-and-status-reporting)
+13. [Pricing and Billing](#13-pricing-and-billing)
+14. [Common Questions](#14-common-questions)
+
+---
+
+## 1. Integration Overview
+
+### What Does Integration Mean?
+
+When you integrate with Brev, you're allowing Brev's control plane to:
+1. **Sync** your available GPU instance types into Brev's catalog
+2. **Provision** instances on your infrastructure via API calls
+3. **Manage** instance lifecycle (start, stop, terminate) through your API
+4. **Connect** to running instances via SSH to configure them
+
+### What Brev Needs From You (Cloud Provider)
+
+| Requirement | Purpose |
+|-------------|---------|
+| **Instance Type Listing API** | Discover your available SKUs |
+| **Instance Lifecycle APIs** | Create, get, start, stop, terminate |
+| **API Credentials for Brev** | Authenticate Brev's calls to your API |
+| **SSH Key Injection** | Accept SSH public key at VM creation |
+| **SSH Access on Port 22** | Control plane communication to VMs |
+
+### Integration Architecture
+
+### System Architecture Diagram
+
+```
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                              Brev Control Plane                                 │
+│  ┌───────────────────────────────────────────────────────────────────────────┐  │
+│  │                        Syncer Layer                                       │  │
+│  │  ┌─────────────────────┐    ┌─────────────────────────────┐               │  │
+│  │  │   InstanceSyncer    │    │   InstanceTypeSyncer        │               │  │
+│  │  │ (Real-time state)   │    │ (Catalog sync every 1-5min) │               │  │
+│  │  └──────────┬──────────┘    └──────────────┬──────────────┘               │  │
+│  └─────────────┼──────────────────────────────┼──────────────────────────────┘  │
+│                │                              │                                 │
+└────────────────┼──────────────────────────────┼─────────────────────────────────┘
+                 │                              │
+                 ▼                              ▼
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                           CLOUD SDK (v1) - This Repo                            │   
+│  ┌────────────────────────────────────────────────────────────────────────────┐ |
+│  │              Provider Implementations                                      │ │
+│  │  ┌─────────┐ ┌───────────┐ ┌─────────▼───┐ ┌───────────┐ ┌──────────────┐  │ │
+│  │  │   A   │ │ │    B      │ │         C   │ │     D     │ │       E      │  │ │ 
+│  │  │ Provider│ │  Provider │ │   Provider  │ │  Provider │ │   Provider   │  │ │
+│  │  └────┬────┘ └─────┬─────┘ └──────┬──────┘ └─────┬─────┘ └──────┬───────┘  │ │
+│  └───────┼────────────┼──────────────┼──────────────┼───────────────┼─────────┘ │
+└──────────┼────────────┼──────────────┼──────────────┼───────────────┼───────────┘
+           │            │              │              │               │
+           ▼            ▼              ▼              ▼               ▼
+┌──────────────────────────────────────────────────────────────────────────────────┐
+│                          CLOUD PROVIDER APIs                                     │
+│                                                                                  │
+└──────────────────────────────────────────────────────────────────────────────────┘
+
+---
+```
+
+## 2. How Brev Discovers Your Inventory
+
+### The Instance Type Syncer
+
+Brev runs a **continuous synchronization process** that periodically queries your API to understand what compute is available. This isn't a one-time import—it's an ongoing reconciliation.
+
+**Sync Behavior:**
+- Polls your instance type listing API at regular intervals (typically every 1-5 minutes)
+- Compares current catalog to previous state
+- Updates availability, pricing, and specs as they change
+- Marks types as unavailable when removed from your API
+- Adds new types when they appear
+
+### What We Query
+
+We need an API endpoint that returns your available instance types. For each type, we extract:
+
+| Field | What We Need | Example |
+|-------|--------------|---------|
+| **Type identifier** | Your internal name for this SKU | `gpu_1x_a100_sxm4` |
+| **GPU model** | What GPU is in this instance | `A100 SXM4 80GB` |
+| **GPU count** | How many GPUs | `8` |
+| **CPU cores** | vCPU count | `128` |
+| **Memory** | RAM in GB | `1024` |
+| **Storage** | Disk in GB | `2000` |
+| **Regions/Availability** | Where this type can launch | `us-west-1, us-east-2` |
+| **Pricing** | Cost per hour (USD cents) | `3200` (= $32.00/hr) |
+
+### API Patterns We Support
+
+**Pattern A: Locational API (like AWS, GCP)**
+Your API returns different availability per region. We query each region separately or you provide region-specific results.
+
+```
+GET /regions/us-west-1/instance-types → returns types available in us-west-1
+GET /regions/us-east-2/instance-types → returns types available in us-east-2
+```
+
+**Pattern B: Global API (like Lambda Labs)**
+Your API returns all types with their regional availability embedded.
+
+```
+GET /instance-types → returns all types with "available_regions": ["us-west-1", "us-east-2"]
+```
+
+Both patterns work. We adapt our sync logic to your API design.
+
+---
+
+## 3. Instance Types: Your SKU Catalog
+
+### What Is an Instance Type to Brev?
+
+Brev treats compute as **inventory**. Each instance type is a **SKU** (Stock Keeping Unit) in your catalog. Users browse your SKUs filtered by GPU, region, price, and availability.
+
+### The Canonical Instance Type Model
+
+When we ingest your instance types, we normalize them to this structure:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `ID` | string | Brev's composite identifier (see below) |
+| `Cloud` | string | Your cloud identifier (e.g., `"lambdalabs"`, `"crusoe"`) |
+| `Type` | string | Your native type name |
+| `Location` | string | Primary region identifier |
+| `SubLocation` | string | Availability zone (or `"noSub"` if N/A) |
+| `AvailableAzs` | []string | All zones where this type is available |
+| `GPU` | string | Normalized GPU model name |
+| `GPUCount` | int | Number of GPUs |
+| `CPUCores` | int | vCPU count |
+| `MemoryMB` | int | RAM in megabytes |
+| `StorageMB` | int | Disk in megabytes |
+| `PriceHr` | int | Price in cents per hour |
+| `IsAvailable` | bool | Currently launchable |
+
+### The Instance Type ID Format
+
+Brev generates a unique ID for each instance type using this pattern:
+
+```
+{location}-{subLocation}-{type}
+```
+
+**Examples:**
+- `us-west-1-us-west-1a-gpu_1x_a100` (locational cloud with AZs)
+- `us-east-noSub-1x_a100_80gb_sxm4` (global cloud, no sublocation concept)
+- `eu-central-1-noSub-h100_8x` (locational region, but you don't expose AZs)
+
+**Why This Matters:**
+This ID is how Brev tracks inventory. When provisioning, this ID connects the request to the correct SKU in your catalog.
+
+### The "noSub" Convention
+
+If your cloud doesn't have sub-locations (availability zones), we use the literal string `"noSub"` as a placeholder. This keeps the ID format consistent across all providers.
+
+---
+
+## 4. Location and Availability Model
+
+### Location Hierarchy
+
+Brev uses a two-tier location model:
+
+```
+Location (Region)
+└── SubLocation (Availability Zone)
+```
+
+**Examples:**
+
+| Your Term | Brev Location | Brev SubLocation |
+|-----------|---------------|------------------|
+| AWS `us-west-2a` | `us-west-2` | `us-west-2a` |
+| GCP `us-central1-a` | `us-central1` | `us-central1-a` |
+| Lambda Labs `us-tx-1` | `us-tx-1` | `noSub` |
+| Your DC `phoenix-dc1` | `phoenix-dc1` | `noSub` |
+
+### How Availability Is Tracked
+
+For each instance type, we track:
+
+1. **AvailableAzs**: List of all sub-locations where this type exists
+2. **IsAvailable**: Boolean indicating if it's currently launchable
+
+**Availability Meaning:**
+- `IsAvailable: true` + `AvailableAzs: ["us-west-1a", "us-west-1b"]` = Can launch in either AZ
+- `IsAvailable: false` = Type exists but is currently out of stock or disabled
+
+### Region Normalization
+
+We typically use your region identifiers as-is. If you have unique region names (`phoenix-main`, `denver-gpu-cluster`), those become the Location value.
+
+---
+
+## 5. GPU Normalization
+
+### Why GPU Normalization Matters
+
+Users search for GPUs by model. They want "H100" not "NVIDIA H100 80GB HBM3 SXM5 Accelerator". We normalize your GPU descriptions to standard names.
+
+### The GPU Taxonomy
+
+Brev normalizes GPUs to these canonical identifiers:
+
+| Your Description | Brev GPU |
+|------------------|----------|
+| `NVIDIA H100 80GB HBM3` | `H100_SXM5` or `H100_PCIE` |
+| `NVIDIA A100 SXM4 80GB` | `A100_SXM4_80GB` |
+| `NVIDIA A100 PCIe 40GB` | `A100_PCIE_40GB` |
+| `NVIDIA A10` | `A10` |
+| `NVIDIA L40S` | `L40S` |
+| `AMD MI300X` | `MI300X` |
+
+### What We Parse
+
+From your GPU field/description, we extract:
+- **Model family**: H100, A100, L40S, etc.
+- **Form factor**: SXM vs PCIe (affects interconnect and performance)
+- **Memory size**: 40GB vs 80GB variants
+- **Generation**: SXM4 vs SXM5
+
+### Providing Clean GPU Data
+
+The cleaner your GPU data, the better the user experience. Ideally provide:
+- `gpu_model`: `"H100"` or `"A100"`
+- `gpu_memory_gb`: `80`
+- `gpu_variant`: `"SXM5"` or `"PCIe"`
+
+If you only provide a description string, we'll parse it, but structured data is preferred.
+
+---
+
+## 6. Credential and Authentication Model
+
+### How Brev Authenticates to Your API
+
+Brev stores credentials for your cloud provider and uses them to make API calls. This is a direct relationship between **Brev's control plane** and **your cloud API**.
+
+### What You Need to Provide
+
+| Requirement | Details |
+|-------------|---------|
+| **API Credentials** | API key, token, or service account for Brev to use |
+| **Authentication Endpoint** | How Brev authenticates (API key header, OAuth, etc.) |
+| **Required Permissions** | List instance types, create/get/start/stop/terminate instances |
+
+### Credential Exchange Process
+
+1. **You provide** API credentials to Brev during integration setup
+2. **Brev stores** credentials securely (encrypted at rest)
+3. **Brev uses** credentials to call your API for sync and provisioning
+
+### Credential Types
+
+Providers define their own credential struct with whatever fields they need. Examples from existing providers:
+
+| Provider | Credential Fields |
+|----------|-------------------|
+| **Lambda Labs** | `APIKey` |
+| **Shadeform** | `APIKey` |
+| **FluidStack** | `APIKey` |
+| **AWS** | `AccessKeyID`, `SecretAccessKey` |
+| **Nebius** | `ServiceAccountKey` (JSON), `TenantID` |
+| **Launchpad** | `APIToken`, `APIURL` |
+
+Your credential struct just needs to implement the `CloudCredential` interface.
+
+### SSH Keys (Separate from API Credentials)
+
+For each VM launch, Brev provides an SSH public key in the create request. **You need to:**
+1. Accept an SSH public key parameter in your create instance API
+2. Install that key in the VM's default user `~/.ssh/authorized_keys`
+3. Ensure SSHD is running on port 22
+
+Brev generates and manages these SSH keys—you just need to accept and install them.
+
+---
+
+## 7. Provisioning Lifecycle
+
+### Instance States
+
+Brev tracks instances through these states:
+
+| State | Meaning |
+|-------|---------|
+| `pending` | Create request sent, waiting for VM |
+| `running` | Instance is up and accessible |
+| `stopping` | Stop request sent |
+| `stopped` | Instance stopped but not terminated |
+| `terminating` | Terminate request sent |
+| `terminated` | Instance terminated |
+| `failed` | Provisioning failed |
+
+### The Provisioning Flow
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ BREV CONTROL PLANE                                    │ YOUR CLOUD         │
+├───────────────────────────────────────────────────────┼────────────────────┤
+│ 1. Receives provision request                         │                    │
+│ 2. Calls your Create Instance API                     │                    │
+│    (with SSH public key)                              │───────────────────▶│
+│                                                       │ 3. Creates VM      │
+│                                                       │    Installs SSH key│
+│                                                       │    Returns ID      │
+│                                                       │◀───────────────────│
+│ 4. Polls your Get Instance API until "running"        │◀──────────────────▶│
+│ 5. Gets public IP from your API                       │                    │
+│ 6. SSHs to VM, configures environment                 │                    │
+│ 7. Instance ready                                     │                    │
+└───────────────────────────────────────────────────────┴────────────────────┘
+```
+
+**Your responsibility:** Create VM, install SSH key, return ID/status, respond to polling.
+**Brev's responsibility:** Orchestration, SSH key generation, VM configuration.
+
+### What Your Create API Should Return
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `instance_id` | Yes | Your unique identifier |
+| `status` | Yes | Current state |
+| `public_ip` | When running | IPv4 address for SSH |
+| `region` | Yes | Where it launched |
+| `instance_type` | Yes | What SKU was provisioned |
+
+### Polling vs Webhooks
+
+Most integrations use **polling**—Brev periodically calls your Get Instance API until status is `running`. If you support webhooks for state changes, that can reduce API load.
+
+---
+
+## 8. Network Requirements
+
+### Critical Requirement: Public IP with SSH Access
+
+Every instance **must** have a publicly routable IP address with port 22 (SSH) accessible. This is how Brev's control plane communicates with the instance.
+
+### Network Configuration at Launch
+
+When provisioning, we pass:
+- **SSH public key**: Key to install in `authorized_keys`
+- **Firewall rules**: Ports to open (see Section 10)
+
+### IP Assignment
+
+| Scenario | Requirement |
+|----------|-------------|
+| **Ideal** | Public IPv4 assigned automatically at launch |
+| **Acceptable** | Public IP available via API after instance starts |
+| **Not Supported** | NAT-only instances with no public ingress |
+
+### IPv6
+
+IPv6-only instances are not currently supported. We require IPv4 for SSH connectivity.
+
+---
+
+## 9. SSH and Control Plane Access
+
+### Why SSH Is Critical
+
+SSH (port 22) is Brev's **control channel**. After your VM is running, Brev connects via SSH to:
+
+1. **Configure the environment**: Install Brev agent, set up development tools
+2. **Enable connections**: Set up connection paths for users
+3. **Manage instance**: Execute commands, transfer files
+
+### What You Must Support
+
+| Requirement | Details |
+|-------------|---------|
+| **Accept SSH key in create request** | Your API must accept an SSH public key parameter |
+| **Install key in VM** | Key goes in default user's `~/.ssh/authorized_keys` |
+| **SSHD running on port 22** | Standard SSH daemon, default config is fine |
+| **Port 22 reachable** | Public IP with port 22 open |
+
+### SSH User
+
+We typically connect as:
+- `root` (if permitted)
+- `ubuntu` (common on Ubuntu images)
+- Whatever default user your images provide
+
+Let us know your default SSH user during integration setup.
+
+---
+
+## 10. Firewall and Security Groups
+
+### Brev's Firewall Model
+
+Brev uses a provider-agnostic firewall model that maps to your security group / firewall implementation:
+
+**Ingress Rules** (inbound traffic):
+```
+Port(s)     Protocol    Source
+22          TCP         0.0.0.0/0    # SSH - REQUIRED
+443         TCP         0.0.0.0/0    # HTTPS (optional)
+8080        TCP         0.0.0.0/0    # User app (optional)
+```
+
+**Egress Rules** (outbound traffic):
+```
+Port(s)     Protocol    Destination
+*           *           0.0.0.0/0    # Allow all outbound
+```
+
+### Minimum Required Ports
+
+| Port | Protocol | Direction | Purpose |
+|------|----------|-----------|---------|
+| **22** | TCP | Inbound | SSH (mandatory) |
+
+All other ports are configurable based on workload needs.
+
+### Mapping to Your System
+
+Your firewall / security group implementation should:
+1. Accept our firewall rules in the create request (or apply defaults)
+2. Ensure port 22 is open for Brev's control plane
+3. Allow additional ports to be specified for applications
+
+---
+
+## 11. Instance Metadata and Tags
+
+### Tags We Set
+
+Brev may set tags/labels on instances for identification:
+
+| Tag | Value | Purpose |
+|-----|-------|---------|
+| `brev-instance-id` | Brev's internal ID | Cross-reference |
+| `Name` | User-specified | Display name |
+
+### Tag Requirements
+
+Your API should support:
+- Setting tags at instance creation
+- Updating tags on running instances
+- Querying instances by tag (helpful but not required)
+
+If you don't support tags, we track the mapping on our side.
+
+---
+
+## 12. Error Handling and Status Reporting
+
+### Error Categories
+
+| Category | Examples | How to Report |
+|----------|----------|---------------|
+| **Out of Stock** | No capacity in region | Return specific error code |
+| **Quota Exceeded** | Hit account limit | Return quota error |
+| **Invalid Request** | Bad instance type | Return validation error |
+| **Auth Failed** | Bad API key | Return 401/403 |
+| **Internal Error** | Your system issue | Return 500 with details |
+
+### Preferred Error Format
+
+We prefer errors that include:
+- **Error code**: Machine-readable identifier
+- **Message**: Human-readable description
+- **Region** (if relevant): Where the failure occurred
+
+### Out of Stock Handling
+
+"Out of stock" is common with GPUs. Ideal handling:
+1. Your API returns a clear "no capacity" error
+2. We mark that type as temporarily unavailable in that region
+3. The syncer will re-check availability on the next poll
+
+---
+
+## 13. Pricing and Billing
+
+### How Pricing Works
+
+Brev displays your prices. We need:
+
+| Field | Format | Example |
+|-------|--------|---------|
+| **Hourly price** | Cents (integer) | `3200` = $32.00/hr |
+| **Currency** | USD assumed | - |
+
+### Billing
+
+Billing arrangements are handled separately during the integration partnership setup.
+
+### Price Sync
+
+Prices sync along with instance types. When you update pricing in your system, we pick it up in the next sync cycle.
+
+---
+
+## 14. Common Questions
+
+### "What credentials do you need from us?"
+
+We need API credentials that allow Brev to:
+- List your available instance types
+- Create, get, start, stop, and terminate instances
+- Optionally: update tags, modify firewall rules
+
+This is typically an API key or service account.
+
+### "Do you need access to our admin console?"
+
+No. We only need API access. All operations go through your public API.
+
+### "What images/OS should our VMs run?"
+
+We work best with:
+- **Ubuntu 22.04 or 24.04** (preferred)
+- **CUDA pre-installed** (for GPU instances)
+- **Python 3.10+** available
+- **SSHD running** on port 22
+
+Custom images can work, but Ubuntu with CUDA is the smoothest path.
+
+### "How do you handle the SSH keys?"
+
+For each VM:
+1. Brev generates an SSH key pair
+2. Brev passes the public key in the create request
+3. You install it in the VM's `authorized_keys`
+4. Brev connects using the private key
+
+You don't manage these keys—just accept them at VM creation.
+
+### "What if we don't have public IPs?"
+
+Public IP with SSH access is required for the standard integration. Alternatives:
+- **VPN/Private connectivity**: Custom integration needed
+- **Bastion host**: Brev can SSH through a jump box
+- **Cloudflare tunnel**: Instance calls out, no inbound needed
+
+These require additional integration work.
+
+### "How do you handle multi-GPU interconnect (NVLink, etc.)?"
+
+We track GPU configuration but don't currently differentiate NVLink vs PCIe interconnect in the UI. If you have multiple variants (NVLink cluster vs standalone), surface them as different instance types.
+
+### "What about bare metal vs VMs?"
+
+Both work. From Brev's perspective, if it has an IP and SSH access, it's an instance. Bare metal instances are provisioned the same way.
+
+### "How do we test the integration?"
+
+Typical integration process:
+1. **Staging environment**: Brev tests against your sandbox/dev API
+2. **Test credentials**: You provide test account with limited quota
+3. **Validation**: We verify create, get, stop, start, terminate
+4. **Production**: Enable in Brev's catalog
+
+### "What SLA/uptime do you expect from our API?"
+
+Your API should be:
+- **Available**: 99%+ uptime for instance operations
+- **Responsive**: <5 second response times typical
+- **Consistent**: Idempotent operations where possible
+
+Sync polling is resilient to brief outages—we retry and recover.
+
+### "What does Brev do on the VMs after launch?"
+
+After Brev creates a VM via your API:
+1. Brev SSHs into the VM using the key we provided at creation
+2. Brev installs a lightweight agent and configures the environment
+3. Brev sets up connection paths
+
+---
+
+## Next Steps
+
+To begin integration:
+
+1. **Share your API documentation**: Instance types, lifecycle, auth
+2. **Provide API credentials**: For Brev to access your API
+3. **Technical call**: We'll align on specifics
+4. **Implementation**: We build the adapter in our Cloud SDK
+5. **Testing**: Validate end-to-end flow
+6. **Launch**: Enable in Brev's catalog
+
+Contact the Brev team to start the integration process.
+
+---
+
+## Glossary
+
+| Term | Definition |
+|------|------------|
+| **Cloud Provider (You)** | Your company, providing GPU compute infrastructure |
+| **Brev Control Plane** | Brev's system that syncs inventory and provisions instances |
+| **Instance Type** | A SKU representing a compute configuration (CPU, GPU, RAM, etc.) |
+| **Location** | Primary region identifier (e.g., `us-west-2`) |
+| **SubLocation** | Availability zone within a region (e.g., `us-west-2a`) |
+| **noSub** | Placeholder when your cloud doesn't have availability zones |
+| **Syncer** | Brev's continuous process that polls your API for inventory |
+| **Cloud SDK** | Brev's internal layer that adapts to different cloud provider APIs |
+| **InstanceTypeID** | Brev's composite identifier: `{location}-{subLocation}-{type}` |
+| **SSH Key Injection** | Your API accepting Brev's SSH public key at VM creation |
+
+---
+
+*Document version: 2.0*
+*For Brev integration partners*

From 6f01c2e60e227ed969dfe0998b5b82c599b3a053 Mon Sep 17 00:00:00 2001
From: Tyler Fong <tylerf@nvidia.com>
Date: Thu, 29 Jan 2026 11:10:37 -0800
Subject: [PATCH 2/5] remove confusing diagram

---
 cloudManualExternal.md | 23 -----------------------
 1 file changed, 23 deletions(-)

diff --git a/cloudManualExternal.md b/cloudManualExternal.md
index a326c9e..7fb143f 100644
--- a/cloudManualExternal.md
+++ b/cloudManualExternal.md
@@ -315,29 +315,6 @@ Brev tracks instances through these states:
 | `terminated` | Instance terminated |
 | `failed` | Provisioning failed |
 
-### The Provisioning Flow
-
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│ BREV CONTROL PLANE                                    │ YOUR CLOUD         │
-├───────────────────────────────────────────────────────┼────────────────────┤
-│ 1. Receives provision request                         │                    │
-│ 2. Calls your Create Instance API                     │                    │
-│    (with SSH public key)                              │───────────────────▶│
-│                                                       │ 3. Creates VM      │
-│                                                       │    Installs SSH key│
-│                                                       │    Returns ID      │
-│                                                       │◀───────────────────│
-│ 4. Polls your Get Instance API until "running"        │◀──────────────────▶│
-│ 5. Gets public IP from your API                       │                    │
-│ 6. SSHs to VM, configures environment                 │                    │
-│ 7. Instance ready                                     │                    │
-└───────────────────────────────────────────────────────┴────────────────────┘
-```
-
-**Your responsibility:** Create VM, install SSH key, return ID/status, respond to polling.
-**Brev's responsibility:** Orchestration, SSH key generation, VM configuration.
-
 ### What Your Create API Should Return
 
 | Field | Required | Description |

From ed5a40cd1455820acb9e006ac96f3e560675dc84 Mon Sep 17 00:00:00 2001
From: Tyler Fong <tylerf@nvidia.com>
Date: Thu, 29 Jan 2026 14:32:45 -0800
Subject: [PATCH 3/5] fixed from comments

---
 cloudManualExternal.md | 1141 ++++++++++++++++++++++++++++------------
 1 file changed, 805 insertions(+), 336 deletions(-)

diff --git a/cloudManualExternal.md b/cloudManualExternal.md
index 7fb143f..2455df0 100644
--- a/cloudManualExternal.md
+++ b/cloudManualExternal.md
@@ -8,18 +8,17 @@
 
 1. [Integration Overview](#1-integration-overview)
 2. [How Brev Discovers Your Inventory](#2-how-brev-discovers-your-inventory)
-3. [Instance Types: Your SKU Catalog](#3-instance-types-your-sku-catalog)
-4. [Location and Availability Model](#4-location-and-availability-model)
+3. [Instance Types: Your Compute Catalog](#3-instance-types-your-compute-catalog)
+4. [Location Model](#4-location-model)
 5. [GPU Normalization](#5-gpu-normalization)
 6. [Credential and Authentication Model](#6-credential-and-authentication-model)
-7. [Provisioning Lifecycle](#7-provisioning-lifecycle)
-8. [Network Requirements](#8-network-requirements)
-9. [SSH and Control Plane Access](#9-ssh-and-control-plane-access)
-10. [Firewall and Security Groups](#10-firewall-and-security-groups)
-11. [Instance Metadata and Tags](#11-instance-metadata-and-tags)
-12. [Error Handling and Status Reporting](#12-error-handling-and-status-reporting)
-13. [Pricing and Billing](#13-pricing-and-billing)
-14. [Common Questions](#14-common-questions)
+7. [Instance Lifecycle Operations](#7-instance-lifecycle-operations)
+8. [SSH Connectivity](#8-ssh-connectivity)
+9. [Firewall and Security Groups](#9-firewall-and-security-groups)
+10. [Instance Metadata and Tags](#10-instance-metadata-and-tags)
+11. [Error Handling and Status Reporting](#11-error-handling-and-status-reporting)
+12. [Pricing and Billing](#12-pricing-and-billing)
+13. [Common Questions](#13-common-questions)
 
 ---
 
@@ -37,58 +36,87 @@ When you integrate with Brev, you're allowing Brev's control plane to:
 
 | Requirement | Purpose |
 |-------------|---------|
-| **Instance Type Listing API** | Discover your available SKUs |
+| **Instance Type Listing API** | Discover your available instance types |
 | **Instance Lifecycle APIs** | Create, get, start, stop, terminate |
 | **API Credentials for Brev** | Authenticate Brev's calls to your API |
 | **SSH Key Injection** | Accept SSH public key at VM creation |
-| **SSH Access on Port 22** | Control plane communication to VMs |
+| **SSH Access** | Control plane communication to VMs |
 
 ### Integration Architecture
 
 ### System Architecture Diagram
 
 ```
-┌─────────────────────────────────────────────────────────────────────────────────┐
-│                              Brev Control Plane                                 │
-│  ┌───────────────────────────────────────────────────────────────────────────┐  │
-│  │                        Syncer Layer                                       │  │
-│  │  ┌─────────────────────┐    ┌─────────────────────────────┐               │  │
-│  │  │   InstanceSyncer    │    │   InstanceTypeSyncer        │               │  │
-│  │  │ (Real-time state)   │    │ (Catalog sync every 1-5min) │               │  │
-│  │  └──────────┬──────────┘    └──────────────┬──────────────┘               │  │
-│  └─────────────┼──────────────────────────────┼──────────────────────────────┘  │
-│                │                              │                                 │
-└────────────────┼──────────────────────────────┼─────────────────────────────────┘
-                 │                              │
-                 ▼                              ▼
-┌─────────────────────────────────────────────────────────────────────────────────┐
-│                           CLOUD SDK (v1) - This Repo                            │   
-│  ┌────────────────────────────────────────────────────────────────────────────┐ |
-│  │              Provider Implementations                                      │ │
-│  │  ┌─────────┐ ┌───────────┐ ┌─────────▼───┐ ┌───────────┐ ┌──────────────┐  │ │
-│  │  │   A   │ │ │    B      │ │         C   │ │     D     │ │       E      │  │ │ 
-│  │  │ Provider│ │  Provider │ │   Provider  │ │  Provider │ │   Provider   │  │ │
-│  │  └────┬────┘ └─────┬─────┘ └──────┬──────┘ └─────┬─────┘ └──────┬───────┘  │ │
-│  └───────┼────────────┼──────────────┼──────────────┼───────────────┼─────────┘ │
-└──────────┼────────────┼──────────────┼──────────────┼───────────────┼───────────┘
-           │            │              │              │               │
-           ▼            ▼              ▼              ▼               ▼
-┌──────────────────────────────────────────────────────────────────────────────────┐
-│                          CLOUD PROVIDER APIs                                     │
-│                                                                                  │
-└──────────────────────────────────────────────────────────────────────────────────┘
-
----
+┌────────────────────────────────────────────────────────────────────────────────────┐
+│                              Brev Control Plane (dev-plane)                        │
+│                                                                                    │
+│  ┌──────────────────────────────────┐    ┌──────────────────────────────────────┐  │
+│  │         Syncer Layer             │    │     Instance Service Layer           │  │
+│  │    (Continuous Reconciliation)   │    │       (User-Triggered Actions)       │  │
+│  │                                  │    │                                      │  │
+│  │  ┌────────────────────────────┐  │    │  ┌────────────────────────────────┐  │  │
+│  │  │  InstanceTypeSyncer        │  │    │  │  Instance Lifecycle            │  │  │
+│  │  │  ─────────────────────     │  │    │  │  ─────────────────────         │  │  │
+│  │  │  Calls:                    │  │    │  │  Calls:                        │  │  │
+│  │  │  • GetInstanceTypes()      │  │    │  │  • CreateInstance()            │  │  │
+│  │  │  • GetLocations()          │  │    │  │  • TerminateInstance()         │  │  │
+│  │  │  • GetInstanceTypePollTime │  │    │  │  • StopInstance()              │  │  │
+│  │  │                            │  │    │  │  • StartInstance()             │  │  │
+│  │  │  Interval: 1-5 min         │  │    │  │                                │  │  │
+│  │  └────────────┬───────────────┘  │    │  └──────────────┬─────────────────┘  │  │
+│  │               │                  │    │                 │                    │  │
+│  │  ┌────────────┴───────────────┐  │    │  ┌──────────────┴─────────────────┐  │  │
+│  │  │  InstanceSyncer            │  │    │  │  Instance State & Queries      │  │  │
+│  │  │  ─────────────────────     │  │    │  │  ─────────────────────         │  │  │
+│  │  │  Calls:                    │  │    │  │  Calls:                        │  │  │
+│  │  │  • ListInstances()         │  │    │  │  • GetInstance()               │  │  │
+│  │  │                            │  │    │  │  • ListInstances()             │  │  │
+│  │  │  Interval: 5 sec           │  │    │  │  • AddFirewallRulesToInstance  │  │  │
+│  │  └────────────┬───────────────┘  │    │  │  • ResizeInstanceVolume()      │  │  │
+│  │               │                  │    │  │  • UpdateInstanceTags()        │  │  │
+│  └───────────────┼──────────────────┘    │  └──────────────┬─────────────────┘  │  │
+│                  │                       └─────────────────┼────────────────────┘  │
+│                  │                                         │                       │
+└──────────────────┼─────────────────────────────────────────┼───────────────────────┘
+                   │                                         │
+                   │       ┌─────────────────────────────────┘
+                   │       │
+                   ▼       ▼
+┌────────────────────────────────────────────────────────────────────────────────────┐
+│                           CLOUD SDK (v1) - This Repo                               │
+│                                                                                    │
+│  ┌──────────────────────────────────────────────────────────────────────────────┐  │
+│  │                         CloudClient Interface                                │  │
+│  │  (Composed of: CloudCredential, CloudBase, CloudQuota, CloudStopStart,       │  │
+│  │   CloudReboot, CloudResizeVolume, CloudModifyFirewall, CloudInstanceTags...) │  │
+│  └──────────────────────────────────────────────────────────────────────────────┘  │
+│                                                                                    │
+│  ┌──────────────────────────────────────────────────────────────────────────────┐  │
+│  │                        Provider Implementations                              │  │
+│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐            │  │
+│  │  │ Lambda   │ │ Fluidstk │ │ Shadefrm │ │  Nebius  │ │  Your    │            │  │
+│  │  │ Labs     │ │          │ │          │ │          │ │ Provider │   • • •    │  │
+│  │  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘            │  │
+│  └───────┼────────────┼────────────┼────────────┼────────────┼──────────────────┘  │
+│          │            │            │            │            │                     │
+└──────────┼────────────┼────────────┼────────────┼────────────┼─────────────────────┘
+           │            │            │            │            │
+           ▼            ▼            ▼            ▼            ▼
+┌────────────────────────────────────────────────────────────────────────────────────┐
+│                            CLOUD PROVIDER APIs                                     │
+│                                                                                    │
+│  Each provider's native REST/gRPC API for instance management                      │
+└────────────────────────────────────────────────────────────────────────────────────┘
 ```
 
 ## 2. How Brev Discovers Your Inventory
 
 ### The Instance Type Syncer
 
-Brev runs a **continuous synchronization process** that periodically queries your API to understand what compute is available. This isn't a one-time import—it's an ongoing reconciliation.
+Brev runs a **continuous synchronization process** that periodically queries your API to understand what compute is available.
 
 **Sync Behavior:**
-- Polls your instance type listing API at regular intervals (typically every 1-5 minutes)
+- Polls your instance type listing API at a configurable interval you define via `GetInstanceTypePollTime()` (default: 1 minute; existing implementations use 1-5 minutes depending on provider needs)
 - Compares current catalog to previous state
 - Updates availability, pricing, and specs as they change
 - Marks types as unavailable when removed from your API
@@ -96,122 +124,321 @@ Brev runs a **continuous synchronization process** that periodically queries you
 
 ### What We Query
 
-We need an API endpoint that returns your available instance types. For each type, we extract:
-
-| Field | What We Need | Example |
-|-------|--------------|---------|
-| **Type identifier** | Your internal name for this SKU | `gpu_1x_a100_sxm4` |
-| **GPU model** | What GPU is in this instance | `A100 SXM4 80GB` |
-| **GPU count** | How many GPUs | `8` |
-| **CPU cores** | vCPU count | `128` |
-| **Memory** | RAM in GB | `1024` |
-| **Storage** | Disk in GB | `2000` |
-| **Regions/Availability** | Where this type can launch | `us-west-1, us-east-2` |
-| **Pricing** | Cost per hour (USD cents) | `3200` (= $32.00/hr) |
-
-### API Patterns We Support
-
-**Pattern A: Locational API (like AWS, GCP)**
-Your API returns different availability per region. We query each region separately or you provide region-specific results.
-
-```
-GET /regions/us-west-1/instance-types → returns types available in us-west-1
-GET /regions/us-east-2/instance-types → returns types available in us-east-2
+We need an API endpoint that returns your available instance types. For each type, we map your data to the `v1.InstanceType` struct (defined in `cloud/v1/instancetype.go`):
+
+**Core Instance Type Fields:**
+
+| Struct Field | Type | Description | Example |
+|--------------|------|-------------|---------|
+| `Type` | `string` | Your internal type name | `"gpu_1x_a100_80gb_sxm4"` |
+| `Location` | `string` | Region identifier | `"us-west-1"` |
+| `VCPU` | `int32` | vCPU count | `128` |
+| `MemoryBytes` | `Bytes` | RAM (use `v1.NewBytes()`) | `v1.NewBytes(1024, v1.Gibibyte)` |
+| `BasePrice` | `*currency.Amount` | Hourly price in USD | `currency.NewAmountFromInt64(3200, "USD")` (= $32.00/hr) |
+| `IsAvailable` | `bool` | Currently launchable | `true` |
+
+**GPU Details (`SupportedGPUs []GPU`):**
+
+| Struct Field | Type | Description | Example |
+|--------------|------|-------------|---------|
+| `Count` | `int32` | Number of GPUs | `8` |
+| `Name` | `string` | GPU model name | `"A100"` |
+| `MemoryBytes` | `Bytes` | VRAM per GPU | `v1.NewBytes(80, v1.Gibibyte)` |
+| `NetworkDetails` | `string` | Interconnect type | `"SXM4"`, `"PCIe"` |
+| `Manufacturer` | `Manufacturer` | GPU vendor | `v1.ManufacturerNVIDIA` |
+
+**Storage Details (`SupportedStorage []Storage`):**
+
+| Struct Field | Type | Description | Example |
+|--------------|------|-------------|---------|
+| `SizeBytes` | `Bytes` | Disk size | `v1.NewBytes(2000, v1.Gibibyte)` |
+| `Type` | `string` | Storage type | `"ssd"`, `"nvme"` |
+| `PricePerGBHr` | `*currency.Amount` | Additional storage cost | `nil` (if included in base price) |
+
+**Example: Converting Provider Data to `v1.InstanceType`**
+
+From Lambda Labs implementation (`cloud/v1/providers/lambdalabs/instancetype.go`):
+
+```go
+it := v1.InstanceType{
+    Location:      location,
+    Type:          instType.Name,                                           // "gpu_1x_a100_80gb_sxm4"
+    SupportedGPUs: []v1.GPU{{
+        Count:       8,
+        Name:        "A100",
+        MemoryBytes: v1.NewBytes(80, v1.Gibibyte),
+        NetworkDetails: "SXM4",
+        Manufacturer: v1.ManufacturerNVIDIA,
+    }},
+    SupportedStorage: []v1.Storage{{
+        Type:      "ssd",
+        SizeBytes: v1.NewBytes(instType.Specs.StorageGib, v1.Gibibyte),
+    }},
+    VCPU:        instType.Specs.Vcpus,
+    MemoryBytes: v1.NewBytes(instType.Specs.MemoryGib, v1.Gibibyte),
+    BasePrice:   &amount,
+    IsAvailable: isAvailable,
+    Provider:    CloudProviderID,
+    Cloud:       CloudProviderID,
+}
+it.ID = v1.MakeGenericInstanceTypeID(it)  // Generate ID using helper (or set your own)
 ```
 
-**Pattern B: Global API (like Lambda Labs)**
-Your API returns all types with their regional availability embedded.
-
+### API Type Declaration
+
+When implementing the Cloud SDK, you declare how Brev's control plane should query your integration via `GetAPIType()`:
+
+| API Type | Meaning | Control Plane Behavior |
+|----------|---------|------------------------|
+| `APITypeGlobal` | Your `GetInstanceTypes()` returns all regions in one call | Brev calls once with `locations = ["all"]` |
+| `APITypeLocational` | Your `GetInstanceTypes()` is region-scoped | Brev iterates over `GetLocations()` results |
+
+**You handle the mapping internally.** The SDK doesn't call your API directly—your implementation does. Whether your cloud's native API is regional, global, or something else entirely, you write the conversion logic in `GetInstanceTypes()`.
+
+**Example: Global API (Lambda Labs)**
+Lambda Labs' API returns all instance types with regional availability embedded. The SDK implementation fetches once and expands to per-region `v1.InstanceType` entries:
+
+```go
+// Simplified from cloud/v1/providers/lambdalabs/instancetype.go
+func (c *LambdaLabsClient) GetInstanceTypes(ctx context.Context, args v1.GetInstanceTypeArgs) ([]v1.InstanceType, error) {
+    resp, _ := c.client.InstanceTypes(ctx)  // Single API call returns all types
+    
+    // Expand each type to all its available regions
+    for _, instType := range resp.Data {
+        for _, region := range locations {
+            isAvailable := slices.Contains(instType.RegionsWithCapacityAvailable, region.Name)
+            instanceTypes = append(instanceTypes, convertToV1(region.Name, instType, isAvailable))
+        }
+    }
+    return instanceTypes, nil
+}
 ```
-GET /instance-types → returns all types with "available_regions": ["us-west-1", "us-east-2"]
+
+**Example: Locational API (Nebius)**
+Nebius requires per-region quota checks. The SDK implementation iterates regions internally:
+
+```go
+// Simplified from cloud/v1/providers/nebius/instancetype.go
+func (c *NebiusClient) GetInstanceTypes(ctx context.Context, args v1.GetInstanceTypeArgs) ([]v1.InstanceType, error) {
+    platforms, _ := c.sdk.Compute().Platform().List(ctx, c.projectID)
+    
+    for _, location := range locations {
+        // Check quota per-region
+        isAvailable := c.checkQuotaAvailability(platform, location.Name, quotaMap)
+        instanceTypes = append(instanceTypes, convertToV1(location.Name, platform, isAvailable))
+    }
+    return instanceTypes, nil
+}
 ```
 
-Both patterns work. We adapt our sync logic to your API design.
+**Key point:** You decide how to call your cloud's API. Brev only cares that `GetInstanceTypes()` returns properly formatted `v1.InstanceType` entries with accurate `Location` and `IsAvailable` fields.
 
 ---
 
-## 3. Instance Types: Your SKU Catalog
+## 3. Instance Types: Your Compute Catalog
 
 ### What Is an Instance Type to Brev?
 
-Brev treats compute as **inventory**. Each instance type is a **SKU** (Stock Keeping Unit) in your catalog. Users browse your SKUs filtered by GPU, region, price, and availability.
+Brev treats compute as **inventory**. Each instance type represents a distinct compute configuration in your catalog. Users browse your instance types filtered by GPU, region, price, and availability.
 
 ### The Canonical Instance Type Model
 
-When we ingest your instance types, we normalize them to this structure:
+When we ingest your instance types, we normalize them to the `v1.InstanceType` struct. Here are the key fields (see `cloud/v1/instancetype.go` for the complete definition):
 
 | Field | Type | Description |
 |-------|------|-------------|
-| `ID` | string | Brev's composite identifier (see below) |
-| `Cloud` | string | Your cloud identifier (e.g., `"lambdalabs"`, `"crusoe"`) |
-| `Type` | string | Your native type name |
-| `Location` | string | Primary region identifier |
-| `SubLocation` | string | Availability zone (or `"noSub"` if N/A) |
-| `AvailableAzs` | []string | All zones where this type is available |
-| `GPU` | string | Normalized GPU model name |
-| `GPUCount` | int | Number of GPUs |
-| `CPUCores` | int | vCPU count |
-| `MemoryMB` | int | RAM in megabytes |
-| `StorageMB` | int | Disk in megabytes |
-| `PriceHr` | int | Price in cents per hour |
-| `IsAvailable` | bool | Currently launchable |
-
-### The Instance Type ID Format
-
-Brev generates a unique ID for each instance type using this pattern:
+| `ID` | `InstanceTypeID` | Stable, unique identifier (you define the format—see below) |
+| `Cloud` | `string` | Your cloud identifier (e.g., `"lambdalabs"`, `"crusoe"`) |
+| `Provider` | `string` | Provider identifier (often same as `Cloud`) |
+| `Type` | `string` | Your native type name |
+| `Location` | `string` | Primary region identifier |
+| `SubLocation` | `string` | Availability zone (optional; helper uses `"noSub"` if empty) |
+| `AvailableAzs` | `[]string` | All zones where this type is available |
+| `SupportedGPUs` | `[]GPU` | GPU details (see `GPU` struct below) |
+| `VCPU` | `int32` | vCPU count |
+| `MemoryBytes` | `Bytes` | RAM (use `v1.NewBytes()` helper) |
+| `SupportedStorage` | `[]Storage` | Storage options (see `Storage` struct) |
+| `BasePrice` | `*currency.Amount` | Hourly price in USD |
+| `IsAvailable` | `bool` | Currently launchable |
+| `Stoppable` | `bool` | Can instances be stopped/resumed |
+| `Rebootable` | `bool` | Can instances be rebooted |
+
+**The `GPU` struct** (`cloud/v1/instancetype.go`):
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `Count` | `int32` | Number of GPUs |
+| `Name` | `string` | GPU model name (e.g., `"A100"`, `"H100"`) |
+| `Type` | `string` | Full GPU type (e.g., `"A100.SXM4"`) |
+| `MemoryBytes` | `Bytes` | VRAM per GPU |
+| `MemoryDetails` | `string` | Memory type: `"HBM"`, `"GDDR"`, etc. |
+| `NetworkDetails` | `string` | Interconnect: `"PCIe"`, `"SXM4"`, `"SXM5"` |
+| `Manufacturer` | `Manufacturer` | `ManufacturerNVIDIA`, `ManufacturerIntel`, etc. |
+
+**The `Storage` struct** (`cloud/v1/storage.go`):
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `Count` | `int32` | Number of disks |
+| `SizeBytes` | `Bytes` | Disk size |
+| `Type` | `string` | Storage type (e.g., `"ssd"`, `"nvme"`) |
+| `PricePerGBHr` | `*currency.Amount` | Additional storage cost (if applicable) |
+| `IsEphemeral` | `bool` | Lost on stop/terminate |
+
+### Instance Type ID
+
+The `ID` field must be a **stable, unique identifier** for each instance type across all regions. You control the format.
+
+**Requirements:**
+- **Stable**: The same instance type must return the same ID on every sync
+- **Unique**: No two instance types can share an ID
+- **Deterministic**: IDs must not change between API calls
+
+**Option 1: Use the Helper Function**
+
+The SDK provides `MakeGenericInstanceTypeID()` which generates IDs using this pattern:
 
 ```
 {location}-{subLocation}-{type}
 ```
 
-**Examples:**
-- `us-west-1-us-west-1a-gpu_1x_a100` (locational cloud with AZs)
-- `us-east-noSub-1x_a100_80gb_sxm4` (global cloud, no sublocation concept)
-- `eu-central-1-noSub-h100_8x` (locational region, but you don't expose AZs)
+If your instance type has no sublocation, the helper uses `"noSub"` as a placeholder.
 
-**Why This Matters:**
-This ID is how Brev tracks inventory. When provisioning, this ID connects the request to the correct SKU in your catalog.
+```go
+// Set all fields first, then call the helper at the END
+it := v1.InstanceType{
+    Location: "us-west-1",
+    Type:     "gpu_1x_a100",
+    // ... other fields
+}
+it.ID = v1.MakeGenericInstanceTypeID(it)  // Result: "us-west-1-noSub-gpu_1x_a100"
+```
 
-### The "noSub" Convention
+**Option 2: Define Your Own Format**
 
-If your cloud doesn't have sub-locations (availability zones), we use the literal string `"noSub"` as a placeholder. This keeps the ID format consistent across all providers.
+If you prefer a different ID format, set `ID` directly:
 
----
+```go
+// Shadeform uses: {cloud}_{instanceType}_{region}
+it := v1.InstanceType{
+    ID:       v1.InstanceTypeID("massedcompute_L40_desmoines-usa-1"),
+    Location: "desmoines-usa-1",
+    Type:     "massedcompute_L40",
+    // ... other fields
+}
+```
+
+**Why Stability Matters:**
 
-## 4. Location and Availability Model
+Brev uses this ID to track inventory and match provisioning requests. If your IDs change between syncs, Brev loses the ability to correlate instance types correctly.
 
-### Location Hierarchy
+### CRITICAL: ID Consistency Between InstanceType and Instance
 
-Brev uses a two-tier location model:
+> **Warning**: This is the most common cause of integration failures. Instance types may sync successfully but instances fail to provision or appear "orphaned."
 
+When Brev provisions an instance, it looks up the corresponding instance type using the instance's `InstanceTypeID`. **These IDs must match exactly.**
+
+**A Common Problem:**
+
+The SDK has two helper functions that generate IDs differently:
+
+| Function | Used For | SubLocation Source |
+|----------|----------|-------------------|
+| `MakeGenericInstanceTypeID()` | InstanceType structs | `AvailableAzs[0]` (first AZ) |
+| `MakeGenericInstanceTypeIDFromInstance()` | Instance structs | `SubLocation` field |
+
+If `AvailableAzs[0]` and `SubLocation` don't match, the IDs diverge and lookup fails.
+
+**The Mistakes:**
+
+```go
+// WRONG - Manually setting InstanceTypeID
+inst := &v1.Instance{
+    InstanceType:   "gpu-h100-8x",
+    InstanceTypeID: v1.InstanceTypeID("gpu-h100-8x"),  // BUG: Missing location!
+}
+
+// WRONG - Inconsistent SubLocation vs AvailableAzs
+instanceType := v1.InstanceType{
+    Location:     "us-east-1",
+    SubLocation:  "us-east-1a",      // Set to "us-east-1a"
+    AvailableAzs: []string{"us-east-1b"},  // But AZs has "us-east-1b"!
+}
 ```
-Location (Region)
-└── SubLocation (Availability Zone)
+
+**The Fix:**
+
+1. **For InstanceType**: Set all fields first, then call `MakeGenericInstanceTypeID()` at the END
+2. **For Instance**: Set all fields first, then call `MakeGenericInstanceTypeIDFromInstance()` at the END
+3. **Ensure consistency**: If you set both `SubLocation` and `AvailableAzs`, make sure `SubLocation == AvailableAzs[0]`
+
+```go
+// CORRECT - InstanceType
+it := v1.InstanceType{
+    Location:     "us-east-1",
+    AvailableAzs: []string{"us-east-1a"},
+    Type:         "gpu-h100-8x",
+    // ... other fields
+}
+it.ID = v1.MakeGenericInstanceTypeID(it)  // LAST
+
+// CORRECT - Instance
+inst := &v1.Instance{
+    Location:     "us-east-1",
+    SubLocation:  "us-east-1a",  // Matches the AZ
+    InstanceType: "gpu-h100-8x",
+    // ... other fields
+}
+inst.InstanceTypeID = v1.MakeGenericInstanceTypeIDFromInstance(*inst)  // LAST
 ```
 
-**Examples:**
+**Symptoms of ID Mismatch:**
+- Instance types sync successfully but don't appear in the Brev catalog
+- `CreateInstance` succeeds but subsequent operations fail
+- "instance type not found" errors during provisioning
+- Instances appear "orphaned" (no associated instance type)
 
-| Your Term | Brev Location | Brev SubLocation |
-|-----------|---------------|------------------|
-| AWS `us-west-2a` | `us-west-2` | `us-west-2a` |
-| GCP `us-central1-a` | `us-central1` | `us-central1-a` |
-| Lambda Labs `us-tx-1` | `us-tx-1` | `noSub` |
-| Your DC `phoenix-dc1` | `phoenix-dc1` | `noSub` |
 
-### How Availability Is Tracked
+## 4. Location Model
 
-For each instance type, we track:
+### The Location Hierarchy
 
-1. **AvailableAzs**: List of all sub-locations where this type exists
-2. **IsAvailable**: Boolean indicating if it's currently launchable
+Brev uses a three-level location model to represent where compute resources exist:
 
-**Availability Meaning:**
-- `IsAvailable: true` + `AvailableAzs: ["us-west-1a", "us-west-1b"]` = Can launch in either AZ
-- `IsAvailable: false` = Type exists but is currently out of stock or disabled
+| Level | Field | Description | Example |
+|-------|-------|-------------|---------|
+| **Region** | `Location` | Primary geographic region | `"us-west-1"`, `"europe-west4"` |
+| **Availability Zone** | `SubLocation` | Specific zone within a region | `"us-west-1a"`, `"europe-west4-b"` |
+| **Available Zones** | `AvailableAzs` | All zones where this type can launch | `["us-west-1a", "us-west-1b"]` |
+
+> **Note:** The distinction between these fields can be confusing. `Location` is the region, `SubLocation` is a specific zone (used for instances), and `AvailableAzs` lists all zones where an instance type is available (used for instance types).
+
+### The Location Struct
+
+When implementing `GetLocations()`, you return a list of `Location` structs (defined in `cloud/v1/location.go`):
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `Name` | `string` | Region identifier (acts as the ID) |
+| `Description` | `string` | Human-readable name |
+| `Available` | `bool` | Whether the region is currently operational |
+| `Endpoint` | `string` | API endpoint for this region (if applicable) |
+| `Priority` | `int` | Preference order for region selection |
+| `Country` | `string` | ISO 3166-1 alpha-3 country code |
+
+### Availability on Instance Types
 
-### Region Normalization
+Availability is tracked **per instance type** using two fields on the `InstanceType` struct:
 
-We typically use your region identifiers as-is. If you have unique region names (`phoenix-main`, `denver-gpu-cluster`), those become the Location value.
+| Field | Type | Meaning |
+|-------|------|---------|
+| `IsAvailable` | `bool` | Whether this type can currently be launched |
+| `AvailableAzs` | `[]string` | Which availability zones have capacity |
+
+**Interpreting Availability:**
+- `IsAvailable: true` + `AvailableAzs: ["us-west-1a", "us-west-1b"]` = Can launch in either AZ
+- `IsAvailable: false` = Type exists but is currently out of stock or disabled
+- Empty `AvailableAzs` with `IsAvailable: true` = Region-level availability only (no AZ granularity)
 
 ---
 
@@ -219,37 +446,70 @@ We typically use your region identifiers as-is. If you have unique region names
 
 ### Why GPU Normalization Matters
 
-Users search for GPUs by model. They want "H100" not "NVIDIA H100 80GB HBM3 SXM5 Accelerator". We normalize your GPU descriptions to standard names.
+Users search for GPUs by model. They want "H100" not "NVIDIA H100 80GB HBM3 SXM5 Accelerator". Your provider implementation must normalize GPU data into the SDK's structured `GPU` type.
+
+### The GPU Struct
 
-### The GPU Taxonomy
+The Cloud SDK represents GPUs with these fields:
 
-Brev normalizes GPUs to these canonical identifiers:
+```go
+type GPU struct {
+    Name           string           // Base model: "H100", "A100", "L40S"
+    Count          int32            // Number of GPUs
+    Memory         units.Base2Bytes // VRAM per GPU (deprecated, use MemoryBytes)
+    MemoryBytes    Bytes            // VRAM per GPU in structured format
+    MemoryDetails  string           // Memory type: "HBM2", "HBM3", "HBM2e", "GDDR"
+    NetworkDetails string           // Form factor: "PCIe", "SXM", "SXM4", "SXM5"
+    Manufacturer   Manufacturer     // "NVIDIA", "AMD", "Intel"
+    Type           string           // Optional: original type identifier
+}
+```
+
+### Implementer Responsibility
 
-| Your Description | Brev GPU |
-|------------------|----------|
-| `NVIDIA H100 80GB HBM3` | `H100_SXM5` or `H100_PCIE` |
-| `NVIDIA A100 SXM4 80GB` | `A100_SXM4_80GB` |
-| `NVIDIA A100 PCIe 40GB` | `A100_PCIE_40GB` |
-| `NVIDIA A10` | `A10` |
-| `NVIDIA L40S` | `L40S` |
-| `AMD MI300X` | `MI300X` |
+**You are responsible for normalizing GPU data.** Brev does not automatically parse GPU descriptions. Your `GetInstanceTypes` must populate the `GPU` struct.
 
-### What We Parse
+| Field | Example | Notes |
+|-------|---------|-------|
+| `Name` | `"H100"`, `"A100"` | Base model, uppercase |
+| `Count` | `8` | GPUs per instance |
+| `MemoryBytes` | `v1.NewBytes(80, v1.Gibibyte)` | VRAM per GPU |
+| `NetworkDetails` | `"SXM4"`, `"PCIe"` | Form factor |
+| `Manufacturer` | `"NVIDIA"` |
 
-From your GPU field/description, we extract:
-- **Model family**: H100, A100, L40S, etc.
-- **Form factor**: SXM vs PCIe (affects interconnect and performance)
-- **Memory size**: 40GB vs 80GB variants
-- **Generation**: SXM4 vs SXM5
+### Provider Examples
 
-### Providing Clean GPU Data
+**Lambda Labs** (`cloud/v1/providers/lambdalabs/instancetype.go:parseGPUFromDescription`)
 
-The cleaner your GPU data, the better the user experience. Ideally provide:
-- `gpu_model`: `"H100"` or `"A100"`
-- `gpu_memory_gb`: `80`
-- `gpu_variant`: `"SXM5"` or `"PCIe"`
+Parses `"8x A100 (40 GB SXM4)"` using regex:
+
+```go
+gpu.Count = int32(count)           // from (\d+)x
+gpu.Name = nameStr                 // from x (.*?) \(
+gpu.MemoryBytes = v1.NewBytes(v1.BytesValue(memoryGiB), v1.Gibibyte)
+gpu.NetworkDetails = networkDetails // remainder after "GB"
+gpu.Manufacturer = "NVIDIA"
+```
 
-If you only provide a description string, we'll parse it, but structured data is preferred.
+**Launchpad** (`cloud/v1/providers/launchpad/instancetype.go:launchpadGpusToGpus`)
+
+Maps structured API fields:
+
+```go
+gpus[i] = v1.GPU{
+    Name:           strings.ToUpper(gp.Family),
+    Count:          gp.Count,
+    MemoryBytes:    v1.NewBytes(v1.BytesValue(gp.MemoryGb), v1.Gigabyte),
+    NetworkDetails: string(gp.InterconnectionType),
+    Manufacturer:   v1.GetManufacturer(gp.Manufacturer),
+}
+```
+
+### Key Points
+
+- `Name`: base model only (`"H100"` not `"NVIDIA H100 80GB"`)
+- `NetworkDetails`: `"SXM"`, `"SXM4"`, `"SXM5"`, or `"PCIe"`
+- `Manufacturer`: always set to `"NVIDIA"`
 
 ---
 
@@ -263,9 +523,33 @@ Brev stores credentials for your cloud provider and uses them to make API calls.
 
 | Requirement | Details |
 |-------------|---------|
-| **API Credentials** | API key, token, or service account for Brev to use |
+| **API Credentials** | A JSON-serializable Go struct containing your authentication fields (API key, token, service account, etc.) |
 | **Authentication Endpoint** | How Brev authenticates (API key header, OAuth, etc.) |
-| **Required Permissions** | List instance types, create/get/start/stop/terminate instances |
+
+### Credential Storage Model
+
+Credentials are stored in Brev's control plane database as **raw JSON** (`json.RawMessage`). This means your credential struct must be JSON-serializable with proper struct tags.
+
+**How it works:**
+1. **You define** a credential struct with JSON tags for each field
+2. **Brev stores** the struct as raw JSON bytes in the database (encrypted at rest)
+3. **Brev deserializes** the JSON back into your struct type when making API calls
+
+**Example credential struct:**
+
+```go
+type MyProviderCredential struct {
+    RefID  string            // Set by Brev (the cloud_cred ID)
+    APIKey string `json:"api_key"`
+    Region string `json:"region,omitempty"`  // Optional fields use omitempty
+}
+```
+
+**Key requirements:**
+- All fields you need serialized must have `json:"field_name"` tags
+- The `RefID` field is set by Brev after storage (it's the database record ID)
+- Use `json:"...,omitempty"` for optional fields
+- The struct must implement the `CloudCredential` interface
 
 ### Credential Exchange Process
 
@@ -275,301 +559,486 @@ Brev stores credentials for your cloud provider and uses them to make API calls.
 
 ### Credential Types
 
-Providers define their own credential struct with whatever fields they need. Examples from existing providers:
-
-| Provider | Credential Fields |
-|----------|-------------------|
-| **Lambda Labs** | `APIKey` |
-| **Shadeform** | `APIKey` |
-| **FluidStack** | `APIKey` |
-| **AWS** | `AccessKeyID`, `SecretAccessKey` |
-| **Nebius** | `ServiceAccountKey` (JSON), `TenantID` |
-| **Launchpad** | `APIToken`, `APIURL` |
+Providers define their own credential struct with whatever fields they need. The struct fields use JSON tags that determine the field names in the stored JSON.
+
+| Provider | Struct Fields | JSON Fields |
+|----------|---------------|-------------|
+| **Lambda Labs** | `APIKey string` | `api_key` |
+| **Shadeform** | `APIKey string` | `api_key` |
+| **FluidStack** | `APIKey string` | `api_key` |
+| **AWS** | `AccessKeyID`, `SecretAccessKey` | `access_key_id`, `secret_access_key` |
+| **Nebius** | `ServiceAccountKey`, `TenantID` | `service_account_key`, `tenant_id` |
+| **Launchpad** | `APIToken`, `APIURL` | `api_token`, `api_url` |
+
+**Complete credential struct example (from Launchpad):**
+
+```go
+type LaunchpadCredential struct {
+    RefID    string            // Not serialized - set by Brev after storage
+    APIToken string `json:"api_token"`
+    APIURL   string `json:"api_url"`
+}
+
+var _ v1.CloudCredential = &LaunchpadCredential{}  // Compile-time interface check
+
+func (c *LaunchpadCredential) Validate() error {
+    return validation.ValidateStruct(c,
+        validation.Field(&c.APIToken, validation.Required),
+        validation.Field(&c.APIURL, validation.Required),
+    )
+}
+```
 
-Your credential struct just needs to implement the `CloudCredential` interface.
+Your credential struct must implement the `CloudCredential` interface, which requires these methods:
+
+```go
+type CloudCredential interface {
+    MakeClient(ctx context.Context, location string) (CloudClient, error)
+    GetTenantID() (string, error)
+    GetReferenceID() string
+    GetAPIType() APIType
+    GetCapabilities(ctx context.Context) (Capabilities, error)
+    GetCloudProviderID() CloudProviderID
+}
+```
 
 ### SSH Keys (Separate from API Credentials)
 
-For each VM launch, Brev provides an SSH public key in the create request. **You need to:**
-1. Accept an SSH public key parameter in your create instance API
-2. Install that key in the VM's default user `~/.ssh/authorized_keys`
-3. Ensure SSHD is running on port 22
+SSH keys are passed at instance creation time via the `PublicKey` field in `CreateInstanceAttrs`.
+
+Your implementation must:
+1. Accept this public key in your create instance API
+2. Install it in the VM's default user `~/.ssh/authorized_keys` before the instance becomes accessible
 
-Brev generates and manages these SSH keys—you just need to accept and install them.
+Brev generates a unique SSH key pair for each instance. The control plane retains the private key and uses it to connect after creation.
 
 ---
 
-## 7. Provisioning Lifecycle
+## 7. Instance Lifecycle Operations
 
-### Instance States
+This section describes each lifecycle operation, its requirements, and expected behavior. Not all operations are required—providers declare their capabilities via `GetCapabilities()`.
 
-Brev tracks instances through these states:
+### Lifecycle States
+
+The SDK defines these states in `LifecycleStatus` (from `cloud/v1/instance.go`):
 
 | State | Meaning |
 |-------|---------|
-| `pending` | Create request sent, waiting for VM |
-| `running` | Instance is up and accessible |
-| `stopping` | Stop request sent |
-| `stopped` | Instance stopped but not terminated |
-| `terminating` | Terminate request sent |
-| `terminated` | Instance terminated |
-| `failed` | Provisioning failed |
+| `pending` | Create initiated, VM provisioning |
+| `running` | Instance is up with a public IP |
+| `stopping` | Stop requested, shutting down |
+| `stopped` | Powered off, storage preserved |
+| `suspending` | Suspend requested |
+| `suspended` | Hibernated state |
+| `terminating` | Terminate requested |
+| `terminated` | Instance destroyed |
+| `failed` | Provisioning or operation failed |
+
+### Create Instance (Required)
+
+**Interface:** `CloudCreateTerminateInstance.CreateInstance(ctx, CreateInstanceAttrs) (*Instance, error)`
+
+**Contract:**
+- On success: Return an `*Instance` with a valid `CloudID`. The instance must exist in your system.
+- On error: Return an error **and ensure no instance was created**. Brev will not attempt cleanup on errors.
+
+**Key input fields from `CreateInstanceAttrs`:**
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `RefID` | `string` | Yes | Brev's reference ID; use for idempotency |
+| `InstanceType` | `string` | Yes | Your instance type name |
+| `Location` | `string` | Yes | Region to launch in |
+| `SubLocation` | `string` | No | Specific availability zone |
+| `PublicKey` | `string` | Yes | SSH public key (OpenSSH format) |
+| `Name` | `string` | No | Display name for the instance |
+| `ImageID` | `string` | No | OS image; use your default if empty |
+| `DiskSize` | `units.Base2Bytes` | No | Boot disk size |
+| `FirewallRules` | `FirewallRules` | No | Ports to open (SSH port is always required) |
+| `Tags` | `Tags` | No | Key-value metadata |
+| `UserDataBase64` | `string` | No | Cloud-init or startup script |
+
+**Key output fields on `Instance`:**
+
+| Field | When Required | Description |
+|-------|---------------|-------------|
+| `CloudID` | Always | Your unique instance identifier |
+| `Status.LifecycleStatus` | Always | Current state (`pending` or `running`) |
+| `Location` | Always | Region where launched |
+| `InstanceType` | Always | Instance type that was provisioned |
+| `PublicIP` | When running | Public IPv4 for SSH access |
+| `SSHUser` | Always | Username for SSH (e.g., `ubuntu`, `root`) |
+| `SSHPort` | Always | SSH port (typically `22`) |
+| `RefID` | Always | Echo back the input `RefID` |
+
+**Example flow (from Lambda Labs implementation):**
+
+```go
+// 1. Register the SSH key with your API
+keyPairResp, err := c.addSSHKey(ctx, openapi.AddSSHKeyRequest{
+    Name:      attrs.RefID,
+    PublicKey: &attrs.PublicKey,
+})
+
+// 2. Launch the instance with the key
+resp, err := c.launchInstance(ctx, openapi.LaunchInstanceRequest{
+    RegionName:       attrs.Location,
+    InstanceTypeName: attrs.InstanceType,
+    SshKeyNames:      []string{keyPairName},
+})
+
+// 3. Return instance details
+return c.GetInstance(ctx, v1.CloudProviderInstanceID(resp.Data.InstanceIds[0]))
+```
 
-### What Your Create API Should Return
+### Terminate Instance (Required)
 
-| Field | Required | Description |
-|-------|----------|-------------|
-| `instance_id` | Yes | Your unique identifier |
-| `status` | Yes | Current state |
-| `public_ip` | When running | IPv4 address for SSH |
-| `region` | Yes | Where it launched |
-| `instance_type` | Yes | What SKU was provisioned |
+**Interface:** `CloudCreateTerminateInstance.TerminateInstance(ctx, instanceID) error`
 
-### Polling vs Webhooks
+**Contract:**
+- Initiate instance termination. Storage may or may not be preserved (provider-dependent).
+- Return `nil` on success, even if the instance is already terminated.
+- The instance should eventually reach `terminated` state.
 
-Most integrations use **polling**—Brev periodically calls your Get Instance API until status is `running`. If you support webhooks for state changes, that can reduce API load.
+**Idempotency:** Should succeed if called multiple times on the same instance.
 
----
+### Stop Instance (Optional)
 
-## 8. Network Requirements
+**Capability:** `CapabilityStopStartInstance`
 
-### Critical Requirement: Public IP with SSH Access
+**Interface:** `CloudStopStartInstance.StopInstance(ctx, instanceID) error`
 
-Every instance **must** have a publicly routable IP address with port 22 (SSH) accessible. This is how Brev's control plane communicates with the instance.
+**Contract:**
+- Power off the instance while preserving storage.
+- Return `nil` once the stop operation is initiated.
+- Instance should transition: `running` → `stopping` → `stopped`
 
-### Network Configuration at Launch
+**When to implement:** Only if your platform supports instances that can stop and perserve storae. Lambda Labs does not support this, but Nebius does.
 
-When provisioning, we pass:
-- **SSH public key**: Key to install in `authorized_keys`
-- **Firewall rules**: Ports to open (see Section 10)
+### Start Instance (Optional)
+
+**Capability:** `CapabilityStopStartInstance`
+
+**Interface:** `CloudStopStartInstance.StartInstance(ctx, instanceID) error`
+
+**Contract:**
+- Power on a previously stopped instance.
+- Return `nil` once the start operation is initiated.
+- Instance should transition: `stopped` → `pending` → `running`
 
-### IP Assignment
+**Note:** If you implement `StopInstance`, you must also implement `StartInstance`.
 
-| Scenario | Requirement |
-|----------|-------------|
-| **Ideal** | Public IPv4 assigned automatically at launch |
-| **Acceptable** | Public IP available via API after instance starts |
-| **Not Supported** | NAT-only instances with no public ingress |
 
-### IPv6
+### Get Instance (Required)
 
-IPv6-only instances are not currently supported. We require IPv4 for SSH connectivity.
+**Interface:** `CloudInstanceReader.GetInstance(ctx, instanceID) (*Instance, error)`
+
+**Contract:**
+- Return current state of the instance.
+- Return `ErrResourceNotFound` if the instance doesn't exist.
+
+### List Instances (Required)
+
+**Interface:** `CloudInstanceReader.ListInstances(ctx, ListInstancesArgs) ([]Instance, error)`
+
+**Contract:**
+- Return all instances matching the filter criteria.
+- Used by the Instance Syncer to reconcile state (called every ~5 seconds).
+
+### Capability Declaration
+
+Your credential's `GetCapabilities()` must return the capabilities you support:
+
+```go
+func (c *MyCredential) GetCapabilities(ctx context.Context) (v1.Capabilities, error) {
+    return v1.Capabilities{
+        v1.CapabilityCreateInstance,           // Required
+        v1.CapabilityTerminateInstance,        // Required
+        v1.CapabilityCreateTerminateInstance,  // Required (composite)
+        // Optional:
+        v1.CapabilityStopStartInstance,        // If you support stop/start
+        v1.CapabilityRebootInstance,           // If you support reboot
+        v1.CapabilityTags,                     // If you support instance tags
+        v1.CapabilityModifyFirewall,           // If you support dynamic firewall rules
+        v1.CapabilityResizeInstanceVolume,     // If you support volume resizing
+    }, nil
+}
+```
+
+Brev checks capabilities before calling optional methods. If you don't declare a capability, Brev won't attempt that operation.
 
 ---
 
-## 9. SSH and Control Plane Access
+## 8. SSH Connectivity
+
+### Core Requirement
 
-### Why SSH Is Critical
+Brev's control plane must be able to connect to your instances via SSH using the provided keys. This is the **only hard requirement** for network connectivity.
 
-SSH (port 22) is Brev's **control channel**. After your VM is running, Brev connects via SSH to:
+After your VM is running, Brev connects via SSH to:
 
 1. **Configure the environment**: Install Brev agent, set up development tools
-2. **Enable connections**: Set up connection paths for users
-3. **Manage instance**: Execute commands, transfer files
+2. **Enable connections**: Set up tunnels and connection paths for users
+3. **Manage instance**: Execute commands, transfer files, health checks
 
-### What You Must Support
+### What You Provide at Launch
 
-| Requirement | Details |
-|-------------|---------|
-| **Accept SSH key in create request** | Your API must accept an SSH public key parameter |
-| **Install key in VM** | Key goes in default user's `~/.ssh/authorized_keys` |
-| **SSHD running on port 22** | Standard SSH daemon, default config is fine |
-| **Port 22 reachable** | Public IP with port 22 open |
+When provisioning, we pass:
+- **SSH public key**: Key to install in `authorized_keys` (via `CreateInstanceAttrs.PublicKey`)
+- **Firewall rules**: Ports to open (see Section 9)
+
+### Instance Requirements
+
+Your instances must return these fields so Brev can connect:
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `SSHUser` | Yes | Username for SSH (e.g., `ubuntu`, `root`, `ec2-user`) |
+| `SSHPort` | Yes | SSH port (commonly `22`, but can be any port) |
+| `PublicIP` | Yes | Publicly routable address for SSH connection |
+
+**Note:** While `PublicIP` is the required field, public routing via DNS also works in practice. The key requirement is that Brev can reach your instance over SSH.
 
 ### SSH User
 
-We typically connect as:
-- `root` (if permitted)
-- `ubuntu` (common on Ubuntu images)
-- Whatever default user your images provide
+Brev connects as the default user your image provides:
+
+| Image | Default User |
+|-------|--------------|
+| Ubuntu | `ubuntu` |
+| Debian | `admin` or `debian` |
+| Amazon Linux | `ec2-user` |
+| Custom | Whatever you configure |
 
-Let us know your default SSH user during integration setup.
+### Runtime Requirements
+
+| Requirement | Details |
+|-------------|---------|
+| **SSHD running** | On the port specified by `Instance.SSHPort` |
+| **Port publicly reachable** | No NAT or firewall blocking inbound SSH |
+| **Key installed** | The public key from `CreateInstanceAttrs.PublicKey` in `authorized_keys` |
 
 ---
 
-## 10. Firewall and Security Groups
+## 9. Firewall and Security Groups
 
-### Brev's Firewall Model
+**Can you dynamically expose ports at instance creation?** Yes, if you support user-data or have a native firewall API.
 
-Brev uses a provider-agnostic firewall model that maps to your security group / firewall implementation:
+**Can you modify firewall rules after creation without SSH/reboot?** Only if you have a native API. Most GPU clouds don't.
 
-**Ingress Rules** (inbound traffic):
-```
-Port(s)     Protocol    Source
-22          TCP         0.0.0.0/0    # SSH - REQUIRED
-443         TCP         0.0.0.0/0    # HTTPS (optional)
-8080        TCP         0.0.0.0/0    # User app (optional)
-```
 
-**Egress Rules** (outbound traffic):
+### SDK Structures
+
+```go
+type FirewallRules struct {
+    IngressRules []FirewallRule
+    EgressRules  []FirewallRule
+}
+
+type FirewallRule struct {
+    FromPort int32
+    ToPort   int32
+    IPRanges []string // CIDR notation
+}
 ```
-Port(s)     Protocol    Destination
-*           *           0.0.0.0/0    # Allow all outbound
+
+Passed via `CreateInstanceAttrs.FirewallRules`.
+
+### If You Have a Native API
+
+Use it. Implement `CloudModifyFirewall` for post-creation changes:
+
+```go
+type CloudModifyFirewall interface {
+    AddFirewallRulesToInstance(ctx context.Context, args AddFirewallRulesToInstanceArgs) error
+    RevokeSecurityGroupRules(ctx context.Context, args RevokeSecurityGroupRuleArgs) error
+}
 ```
 
-### Minimum Required Ports
+Add `CapabilityModifyFirewall` to your capabilities.
+
+### If You Only Have User-Data
+
+Inject UFW commands at boot. See `cloud/v1/providers/shadeform/ufw.go`.
+
+```go
+// Core pattern
+commands := []string{
+    "ufw --force reset",
+    "ufw default deny incoming",
+    "ufw default allow outgoing",
+    "ufw allow 22/tcp",
+}
+for _, rule := range firewallRules.IngressRules {
+    for _, cidr := range rule.IPRanges {
+        commands = append(commands, fmt.Sprintf("ufw allow in from %s to any port %d", cidr, rule.FromPort))
+    }
+}
+commands = append(commands, "ufw --force enable")
+
+// Base64 encode and pass as user-data
+script := strings.Join(commands, "\n")
+encoded := base64.StdEncoding.EncodeToString([]byte(script))
+```
 
-| Port | Protocol | Direction | Purpose |
-|------|----------|-----------|---------|
-| **22** | TCP | Inbound | SSH (mandatory) |
+**Do not** implement `CloudModifyFirewall`. Return `ErrNotImplemented`.
 
-All other ports are configurable based on workload needs.
+### If You Only Have IP Allowlists
 
-### Mapping to Your System
+See `cloud/v1/providers/launchpad/instance_create.go`. You can only restrict by source IP, not port. Extract `/32`s from the rules and pass to your API:
 
-Your firewall / security group implementation should:
-1. Accept our firewall rules in the create request (or apply defaults)
-2. Ensure port 22 is open for Brev's control plane
-3. Allow additional ports to be specified for applications
+```go
+ips := []string{}
+for _, rule := range firewallRules.IngressRules {
+    for _, cidr := range rule.IPRanges {
+        _, ipNet, _ := net.ParseCIDR(cidr)
+        ones, bits := ipNet.Mask.Size()
+        if ones == bits { // /32 only
+            ips = append(ips, ipNet.IP.String())
+        }
+    }
+}
+```
 
 ---
 
-## 11. Instance Metadata and Tags
+## 10. Instance Metadata and Tags
 
-### Tags We Set
+Brev uses tags to track and correlate instances. Your API must support setting tags at creation and reading them back.
 
-Brev may set tags/labels on instances for identification:
+### Required Tags
 
-| Tag | Value | Purpose |
-|-----|-------|---------|
-| `brev-instance-id` | Brev's internal ID | Cross-reference |
-| `Name` | User-specified | Display name |
+| Tag | Purpose |
+|-----|---------|
+| `RefID` | Instance correlation and idempotency |
+| `CloudCredRefID` | Identifies which credential created the instance |
 
-### Tag Requirements
+### Optional Tags
 
-Your API should support:
-- Setting tags at instance creation
-- Updating tags on running instances
-- Querying instances by tag (helpful but not required)
+| Tag | Purpose |
+|-----|---------|
+| `Name` | Display name (implementer-dependent) |
 
-If you don't support tags, we track the mapping on our side.
+Additional custom tags may also be passed through.
 
 ---
 
-## 12. Error Handling and Status Reporting
+## 11. Error Handling and Status Reporting
 
 ### Error Categories
 
-| Category | Examples | How to Report |
-|----------|----------|---------------|
-| **Out of Stock** | No capacity in region | Return specific error code |
-| **Quota Exceeded** | Hit account limit | Return quota error |
-| **Invalid Request** | Bad instance type | Return validation error |
-| **Auth Failed** | Bad API key | Return 401/403 |
-| **Internal Error** | Your system issue | Return 500 with details |
+Your provider implementation should translate API errors into the standard error constants defined in [`v1/errors.go`](v1/errors.go):
+
+| Category | Examples | Return This Error Constant |
+|----------|----------|---------------------------|
+| **Out of Stock** | No capacity in region | `v1.ErrInsufficientResources` |
+| **Quota Exceeded** | Hit account limit | `v1.ErrOutOfQuota` |
+| **Resource Not Found** | Instance/image doesn't exist | `v1.ErrResourceNotFound`, `v1.ErrInstanceNotFound`, `v1.ErrImageNotFound` |
+| **Service Unavailable** | API temporarily down | `v1.ErrServiceUnavailable` |
+| **Auth Failed** | Bad API key | Return HTTP 401/403 error |
+| **Internal Error** | Your system issue | Return error with HTTP 500 details |
+
+**Reference:** See [`v1/errors.go`](v1/errors.go) for the full list of error constants:
+
+```go
+var (
+	ErrInsufficientResources = errors.New("zone has insufficient resources to fulfill the request, InsufficientCapacity")
+	ErrOutOfQuota            = errors.New("out of quota in the region fulfill the request, InsufficientQuota")
+	ErrImageNotFound         = errors.New("image not found")
+	ErrDuplicateFirewallRule = errors.New("duplicate firewall rule")
+	ErrInstanceNotFound      = errors.New("instance not found")
+	ErrResourceNotFound      = errors.New("resource not found")
+	ErrServiceUnavailable    = errors.New("api is temporarily unavailable")
+)
+```
 
-### Preferred Error Format
+### Out of Stock Handling
 
-We prefer errors that include:
-- **Error code**: Machine-readable identifier
-- **Message**: Human-readable description
-- **Region** (if relevant): Where the failure occurred
+"Out of stock" is common with GPUs. Your implementation should return `v1.ErrInsufficientResources`:
 
-### Out of Stock Handling
+1. Your API returns your specific "no capacity" error
+2. Your provider translates this to `v1.ErrInsufficientResources`
+3. Brev marks that type as temporarily unavailable in that region
+4. The syncer will re-check availability on the next poll
+
+**Example from Shadeform provider** ([`v1/providers/shadeform/instance.go`](v1/providers/shadeform/instance.go)):
+
+```go
+if shadeformErrorResponse.ErrorCode == outOfStockErrorCode {
+    return v1.ErrInsufficientResources
+}
+```
+
+**Example from Lambda Labs provider** ([`v1/providers/lambdalabs/errors.go`](v1/providers/lambdalabs/errors.go)):
 
-"Out of stock" is common with GPUs. Ideal handling:
-1. Your API returns a clear "no capacity" error
-2. We mark that type as temporarily unavailable in that region
-3. The syncer will re-check availability on the next poll
+```go
+if strings.Contains(e.Error(), "Not enough capacity") || strings.Contains(e.Error(), "insufficient-capacity") {
+    return v1.ErrInsufficientResources
+}
+```
 
 ---
 
-## 13. Pricing and Billing
+## 12. Pricing and Billing
 
 ### How Pricing Works
 
-Brev displays your prices. We need:
+Brev displays your prices via `InstanceType.BasePrice` (see [`v1/instancetype.go`](v1/instancetype.go)).
 
-| Field | Format | Example |
-|-------|--------|---------|
-| **Hourly price** | Cents (integer) | `3200` = $32.00/hr |
-| **Currency** | USD assumed | - |
+| Field | Type | Notes |
+|-------|------|-------|
+| **BasePrice** | `*currency.Amount` | From [`github.com/bojanz/currency`](https://pkg.go.dev/github.com/bojanz/currency#Amount) |
+| **Currency** | Up to implementer | Most providers use `"USD"` |
 
 ### Billing
 
 Billing arrangements are handled separately during the integration partnership setup.
 
-### Price Sync
-
-Prices sync along with instance types. When you update pricing in your system, we pick it up in the next sync cycle.
 
 ---
 
-## 14. Common Questions
-
-### "What credentials do you need from us?"
-
-We need API credentials that allow Brev to:
-- List your available instance types
-- Create, get, start, stop, and terminate instances
-- Optionally: update tags, modify firewall rules
-
-This is typically an API key or service account.
+## 13. Common Questions
 
 ### "Do you need access to our admin console?"
 
-No. We only need API access. All operations go through your public API.
+No. We only need programmatic API access. All operations go through your public API—see Section 6 for credential details.
 
 ### "What images/OS should our VMs run?"
 
-We work best with:
-- **Ubuntu 22.04 or 24.04** (preferred)
-- **CUDA pre-installed** (for GPU instances)
-- **Python 3.10+** available
-- **SSHD running** on port 22
-
-Custom images can work, but Ubuntu with CUDA is the smoothest path.
-
-### "How do you handle the SSH keys?"
-
-For each VM:
-1. Brev generates an SSH key pair
-2. Brev passes the public key in the create request
-3. You install it in the VM's `authorized_keys`
-4. Brev connects using the private key
+| Requirement | Details |
+|-------------|---------|
+| **OS** | Ubuntu 22.04 (preferred) or 24.04 |
 
-You don't manage these keys—just accept them at VM creation.
+Custom images work if they meet these requirements. The SDK validates image compatibility via `ValidateInstanceImage()`.
 
 ### "What if we don't have public IPs?"
 
-Public IP with SSH access is required for the standard integration. Alternatives:
-- **VPN/Private connectivity**: Custom integration needed
-- **Bastion host**: Brev can SSH through a jump box
-- **Cloudflare tunnel**: Instance calls out, no inbound needed
-
-These require additional integration work.
-
-### "How do you handle multi-GPU interconnect (NVLink, etc.)?"
+Public IP with SSH access is required for standard integration. Bastion/jump host routing is supported (see `InternalPortMappings` in the `Instance` struct). Other alternatives (VPN, Cloudflare tunnels) require custom integration work.
 
-We track GPU configuration but don't currently differentiate NVLink vs PCIe interconnect in the UI. If you have multiple variants (NVLink cluster vs standalone), surface them as different instance types.
+### "How do you track GPU interconnect (NVLink, SXM, PCIe)?"
 
-### "What about bare metal vs VMs?"
+We track interconnect type via the `GPU.NetworkDetails` field. Your implementation should populate this with values like `"PCIe"`, `"SXM"`, `"SXM4"`, or `"SXM5"`. If you have multiple variants (e.g., PCIe vs SXM versions of the same GPU), surface them as separate instance types.
 
-Both work. From Brev's perspective, if it has an IP and SSH access, it's an instance. Bare metal instances are provisioned the same way.
-
-### "How do we test the integration?"
-
-Typical integration process:
-1. **Staging environment**: Brev tests against your sandbox/dev API
-2. **Test credentials**: You provide test account with limited quota
-3. **Validation**: We verify create, get, stop, start, terminate
-4. **Production**: Enable in Brev's catalog
 
 ### "What SLA/uptime do you expect from our API?"
 
-Your API should be:
-- **Available**: 99%+ uptime for instance operations
-- **Responsive**: <5 second response times typical
-- **Consistent**: Idempotent operations where possible
+| Requirement | Target |
+|-------------|--------|
+| **Availability** | 99%+ uptime |
+| **Response time** | < 5 seconds typical |
+| **Idempotency** | Supported where possible |
 
-Sync polling is resilient to brief outages—we retry and recover.
+The Instance Syncer is resilient to brief outages—it retries and recovers automatically.
 
 ### "What does Brev do on the VMs after launch?"
 
-After Brev creates a VM via your API:
-1. Brev SSHs into the VM using the key we provided at creation
-2. Brev installs a lightweight agent and configures the environment
-3. Brev sets up connection paths
+After `CreateInstance` returns successfully:
+
+1. **SSH connection**: Brev waits for SSH to become available (up to 10 minutes via `ValidateInstanceSSHAccessible`)
+2. **Key bootstrapping**: Brev adds admin keys to `authorized_keys` via SSH
+3. **Agent setup**: Brev installs a lightweight agent for tunnel management and environment configuration
+
+You don't need to do anything special—just ensure the SSH public key from `CreateInstanceAttrs.PublicKey` is installed before the instance becomes accessible.
 
 ---
 
@@ -594,13 +1063,13 @@ Contact the Brev team to start the integration process.
 |------|------------|
 | **Cloud Provider (You)** | Your company, providing GPU compute infrastructure |
 | **Brev Control Plane** | Brev's system that syncs inventory and provisions instances |
-| **Instance Type** | A SKU representing a compute configuration (CPU, GPU, RAM, etc.) |
+| **Instance Type** | A compute configuration (CPU, GPU, RAM, storage, etc.) |
 | **Location** | Primary region identifier (e.g., `us-west-2`) |
 | **SubLocation** | Availability zone within a region (e.g., `us-west-2a`) |
-| **noSub** | Placeholder when your cloud doesn't have availability zones |
+| **noSub** | Placeholder used by `MakeGenericInstanceTypeID()` when no availability zone exists |
 | **Syncer** | Brev's continuous process that polls your API for inventory |
 | **Cloud SDK** | Brev's internal layer that adapts to different cloud provider APIs |
-| **InstanceTypeID** | Brev's composite identifier: `{location}-{subLocation}-{type}` |
+| **InstanceTypeID** | Stable, unique identifier for an instance type (format defined by implementer) |
 | **SSH Key Injection** | Your API accepting Brev's SSH public key at VM creation |
 
 ---

From 3cd548aac48e64397976011116e648c31ef102c3 Mon Sep 17 00:00:00 2001
From: Tyler Fong <tylerf@nvidia.com>
Date: Fri, 30 Jan 2026 09:42:01 -0800
Subject: [PATCH 4/5] fixed capability tag docmentation

---
 cloudManualExternal.md | 81 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 69 insertions(+), 12 deletions(-)

diff --git a/cloudManualExternal.md b/cloudManualExternal.md
index 2455df0..440d18c 100644
--- a/cloudManualExternal.md
+++ b/cloudManualExternal.md
@@ -758,7 +758,7 @@ func (c *MyCredential) GetCapabilities(ctx context.Context) (v1.Capabilities, er
         // Optional:
         v1.CapabilityStopStartInstance,        // If you support stop/start
         v1.CapabilityRebootInstance,           // If you support reboot
-        v1.CapabilityTags,                     // If you support instance tags
+        v1.CapabilityTags,                     // If your API supports instance tags/labels (see Section 10)
         v1.CapabilityModifyFirewall,           // If you support dynamic firewall rules
         v1.CapabilityResizeInstanceVolume,     // If you support volume resizing
     }, nil
@@ -767,6 +767,8 @@ func (c *MyCredential) GetCapabilities(ctx context.Context) (v1.Capabilities, er
 
 Brev checks capabilities before calling optional methods. If you don't declare a capability, Brev won't attempt that operation.
 
+> **Note on `CapabilityTags`:** This capability is optional, but `RefID` and `CloudCredRefID` data is **required** regardless. If your API doesn't support tags, you must use an alternative mechanism to store and retrieve this data. See [Section 10: Instance Metadata and Tags](#10-instance-metadata-and-tags) for details and examples.
+
 ---
 
 ## 8. SSH Connectivity
@@ -904,22 +906,77 @@ for _, rule := range firewallRules.IngressRules {
 
 ## 10. Instance Metadata and Tags
 
-Brev uses tags to track and correlate instances. Your API must support setting tags at creation and reading them back.
+Brev uses metadata to track and correlate instances. The control plane requires certain data to be persisted with instances and retrievable later.
+
+### Required Instance Data
+
+These values **MUST** be stored with the instance and returned in `GetInstance`/`ListInstances`:
+
+| Field | Purpose |
+|-------|---------|
+| `RefID` | Instance correlation and idempotency (passed in `CreateInstanceAttrs.RefID`) |
+| `CloudCredRefID` | Identifies which credential created the instance (from `GetReferenceID()`) |
+
+### The `CapabilityTags` Capability
+
+If your cloud provider's API supports instance tagging/labeling, declare `v1.CapabilityTags` in your capabilities:
+
+```go
+func (c *MyCredential) GetCapabilities(ctx context.Context) (v1.Capabilities, error) {
+    return v1.Capabilities{
+        v1.CapabilityCreateInstance,
+        v1.CapabilityTerminateInstance,
+        v1.CapabilityTags,  // Declare this if your API supports tags/labels
+    }, nil
+}
+```
+
+**When `CapabilityTags` is declared:**
+- Store `RefID`, `CloudCredRefID`, and any additional tags via `CreateInstanceAttrs.Tags`
+- The control plane will call `UpdateInstanceTags()` to add metadata after creation
+- `ListInstances()` should support filtering via `TagFilters` for efficient queries
+
+**Example (Shadeform with tags):**
+```go
+// At creation - store RefID and CloudCredRefID as tags
+refIDTag := fmt.Sprintf("refID=%s", attrs.RefID)
+cloudCredRefIDTag := fmt.Sprintf("cloudCredRefID=%s", c.GetReferenceID())
+tags := []string{refIDTag, cloudCredRefIDTag}
+
+// When reading back - extract from tags
+refID := tags["refID"]
+cloudCredRefID := tags["cloudCredRefID"]
+```
+
+### Alternative: When Tags Are NOT Supported
+
+If your API doesn't support tags, you **still must** persist and return `RefID` and `CloudCredRefID`. Use creative alternatives:
+
+**Example (Lambda Labs without tags):**
+```go
+// At creation - encode CloudCredRefID in instance name
+name := fmt.Sprintf("%s--%s", c.GetReferenceID(), time.Now().UTC().Format(timeFormat))
+// Use RefID as the SSH key pair name
+keyPairName := attrs.RefID
+
+// When reading back - extract from name and SSH key
+nameParts := strings.Split(instance.Name, "--")
+cloudCredRefID := nameParts[0]
+refID := instance.SshKeyNames[0]
+```
 
-### Required Tags
+### Recommendation: Use Tags If Possible
 
-| Tag | Purpose |
-|-----|---------|
-| `RefID` | Instance correlation and idempotency |
-| `CloudCredRefID` | Identifies which credential created the instance |
+**Tags are the recommended and easiest integration path.** They provide:
+- Clean separation of metadata from instance properties
+- Efficient server-side filtering via `TagFilters`
+- Full billing/usage tracking capabilities
+- Straightforward implementation
 
-### Optional Tags
+If your cloud API supports any form of instance tagging, labels, or metadata—**use it**.
 
-| Tag | Purpose |
-|-----|---------|
-| `Name` | Display name (implementer-dependent) |
 
-Additional custom tags may also be passed through.
+> **Before implementing a custom solution**, please reach out to the Brev team. We can help design an approach that works reliably with the control plane and avoid edge cases that could cause instance correlation issues.
 
 ---
 

From 9a364712d2c43be2f9ec909d3fd87180fc613d1b Mon Sep 17 00:00:00 2001
From: Tyler Fong <tylerf@nvidia.com>
Date: Mon, 2 Feb 2026 09:33:16 -0800
Subject: [PATCH 5/5] fixed up doc clarification

---
 cloudManualExternal.md | 87 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 81 insertions(+), 6 deletions(-)

diff --git a/cloudManualExternal.md b/cloudManualExternal.md
index 440d18c..7999c56 100644
--- a/cloudManualExternal.md
+++ b/cloudManualExternal.md
@@ -398,6 +398,37 @@ inst.InstanceTypeID = v1.MakeGenericInstanceTypeIDFromInstance(*inst)  // LAST
 - "instance type not found" errors during provisioning
 - Instances appear "orphaned" (no associated instance type)
 
+### Validating Your Instance Type IDs
+
+The SDK provides validation functions to catch ID generation issues early. **Run these in your test suite:**
+
+**1. `ValidateStableInstanceTypeIDs`** - Ensures your instance type IDs are stable and unique:
+```go
+// In your validation tests
+err := v1.ValidateStableInstanceTypeIDs(ctx, client, stableIDs)
+require.NoError(t, err, "ValidateStableInstanceTypeIDs should pass")
+```
+
+This validates:
+- Each instance type ID is unique (no duplicates)
+- Your designated stable IDs exist in the current instance types
+- All instance types have required properties (base price, storage pricing)
+
+**2. `ValidateCreateInstance`** - Validates that instance and instance type IDs match:
+```go
+// In your validation tests
+instance, err := v1.ValidateCreateInstance(ctx, client, attrs, selectedType)
+require.NoError(t, err, "ValidateCreateInstance should pass")
+```
+
+This validates (among other things):
+- `instance.InstanceTypeID == selectedType.ID` — **catches ID generation mismatches**
+- `instance.RefID` matches the provided RefID
+- Location and instance type fields are consistent
+
+> **Why this matters:** If `MakeGenericInstanceTypeID()` and `MakeGenericInstanceTypeIDFromInstance()` produce different IDs for the same logical type, the control plane cannot correlate instances with their types. `ValidateCreateInstance` catches this.
+
+See [`internal/validation/suite.go`](internal/validation/suite.go) for the full validation test suite you can use as a reference.
 
 ## 4. Location Model
 
@@ -444,10 +475,6 @@ Availability is tracked **per instance type** using two fields on the `InstanceT
 
 ## 5. GPU Normalization
 
-### Why GPU Normalization Matters
-
-Users search for GPUs by model. They want "H100" not "NVIDIA H100 80GB HBM3 SXM5 Accelerator". Your provider implementation must normalize GPU data into the SDK's structured `GPU` type.
-
 ### The GPU Struct
 
 The Cloud SDK represents GPUs with these fields:
@@ -610,7 +637,7 @@ Your implementation must:
 1. Accept this public key in your create instance API
 2. Install it in the VM's default user `~/.ssh/authorized_keys` before the instance becomes accessible
 
-Brev generates a unique SSH key pair for each instance. The control plane retains the private key and uses it to connect after creation.
+Brev manages SSH keys per user. The public key provided in `CreateInstanceAttrs.PublicKey` belongs to the user, and the control plane retains the corresponding private key to connect after creation.
 
 ---
 
@@ -713,7 +740,7 @@ return c.GetInstance(ctx, v1.CloudProviderInstanceID(resp.Data.InstanceIds[0]))
 - Return `nil` once the stop operation is initiated.
 - Instance should transition: `running` → `stopping` → `stopped`
 
-**When to implement:** Only if your platform supports instances that can stop and perserve storae. Lambda Labs does not support this, but Nebius does.
+**When to implement:** Only if your platform supports instances that can stop and preserve storage. Lambda Labs does not support this, but Nebius does.
 
 ### Start Instance (Optional)
 
@@ -728,6 +755,54 @@ return c.GetInstance(ctx, v1.CloudProviderInstanceID(resp.Data.InstanceIds[0]))
 
 **Note:** If you implement `StopInstance`, you must also implement `StartInstance`.
 
+### Stop/Start: Three Levels of Control
+
+Stop/start support is controlled at three levels:
+
+| Level | What to Set | Purpose |
+|-------|-------------|---------|
+| **Provider Capability** | `CapabilityStopStartInstance` in `GetCapabilities()` | Indicates your API supports stop/start operations |
+| **Instance Type** | `InstanceType.Stoppable = true/false` | Indicates whether this instance type can be stopped (e.g., spot instances typically cannot) |
+| **Instance** | `Instance.Stoppable = true/false` | Indicates whether this specific instance can be stopped |
+
+**Example - Nebius (supports stop/start):**
+```go
+// In GetCapabilities()
+v1.CapabilityStopStartInstance,  // API supports it
+
+// In GetInstanceTypes() - instance type level
+instanceType := v1.InstanceType{
+    Stoppable: true,  // This type supports stop/start
+    // ...
+}
+
+// In GetInstance()/CreateInstance() - instance level
+instance := v1.Instance{
+    Stoppable: true,  // This instance can be stopped
+    // ...
+}
+```
+
+**Example - Lambda Labs (no stop/start support):**
+```go
+// In GetCapabilities()
+// CapabilityStopStartInstance NOT included
+
+// In GetInstanceTypes()
+instanceType := v1.InstanceType{
+    Stoppable: false,  // Cannot be stopped
+    // ...
+}
+
+// In GetInstance()/CreateInstance()
+instance := v1.Instance{
+    Stoppable: false,  // Cannot be stopped
+    // ...
+}
+```
+
+The control plane checks all three levels before allowing a stop/start operation. If any level indicates `false`, the operation won't be attempted.
+
 
 ### Get Instance (Required)