Skip to content

Conversation

@tylerfong
Copy link

Documentation for third-party cloud providers integrating with Brev's
control plane. Covers the complete integration requirements including
API patterns, instance type modeling, credential configuration, SSH key
injection, provisioning lifecycle, and network/firewall requirements.

@tylerfong tylerfong requested a review from a team as a code owner January 29, 2026 18:58

### The Instance Type Syncer

Brev runs a **continuous synchronization process** that periodically queries your API to understand what compute is available. This isn't a one-time import—it's an ongoing reconciliation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the LLM-speak?

This isn't a one-time import—it's an ongoing reconciliation.

│ │
└──────────────────────────────────────────────────────────────────────────────────┘

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line look superfluous.

Comment on lines 68 to 71
│ │ ┌─────────┐ ┌───────────┐ ┌─────────▼───┐ ┌───────────┐ ┌──────────────┐ │ │
│ │ │ A │ │ │ B │ │ C │ │ D │ │ E │ │ │
│ │ │ Provider│ │ Provider │ │ Provider │ │ Provider │ │ Provider │ │ │
│ │ └────┬────┘ └─────┬─────┘ └──────┬──────┘ └─────┬─────┘ └──────┬───────┘ │ │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some visual quirks here (the arrow above "C", the extra "|" in the "A" box, the line under "E").

┌─────────────────────────────────────────────────────────────────────────────────┐
│ Brev Control Plane │
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
│ │ Syncer Layer │ │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the rest of the integration (direct API calls, not just syncing)?

Brev runs a **continuous synchronization process** that periodically queries your API to understand what compute is available. This isn't a one-time import—it's an ongoing reconciliation.

**Sync Behavior:**
- Polls your instance type listing API at regular intervals (typically every 1-5 minutes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polling is up to the user -- there is a default, but we should probably state that the poll rate is up to the implementer.

Comment on lines 454 to 459
Brev may set tags/labels on instances for identification:

| Tag | Value | Purpose |
|-----|-------|---------|
| `brev-instance-id` | Brev's internal ID | Cross-reference |
| `Name` | User-specified | Display name |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is up to the implementer.

- Updating tags on running instances
- Querying instances by tag (helpful but not required)

If you don't support tags, we track the mapping on our side.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need tags :(

Comment on lines 478 to 482
| **Out of Stock** | No capacity in region | Return specific error code |
| **Quota Exceeded** | Hit account limit | Return quota error |
| **Invalid Request** | Bad instance type | Return validation error |
| **Auth Failed** | Bad API key | Return 401/403 |
| **Internal Error** | Your system issue | Return 500 with details |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should link to the actual consts here.

### Out of Stock Handling

"Out of stock" is common with GPUs. Ideal handling:
1. Your API returns a clear "no capacity" error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a very specific error we need to do so. We should indicate that here.

| Field | Format | Example |
|-------|--------|---------|
| **Hourly price** | Cents (integer) | `3200` = $32.00/hr |
| **Currency** | USD assumed | - |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to the implementer.


## 10. Instance Metadata and Tags

Brev uses tags to track and correlate instances. Your API must support setting tags at creation and reading them back.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 761 says the Tag Capability is optional, but we require it here. Either move it to required or clarify that tagging and this tagging are different?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep made some clarification. Great Catch

Copy link
Member

@theFong theFong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This great work T. Lots of good details and corner cases to help guide implementation for impelements.

We have a "how to add a provider" doc? Can we augment?


**Option 2: Define Your Own Format**

If you prefer a different ID format, set `ID` directly:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

point to ValidateStableInstanceTypeIDs to get confidence in your type creation


> **Warning**: This is the most common cause of integration failures. Instance types may sync successfully but instances fail to provision or appear "orphaned."

When Brev provisions an instance, it looks up the corresponding instance type using the instance's `InstanceTypeID`. **These IDs must match exactly.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point to making sure ValidateCreateInstance is run in test suite to gain confidence. There we validate that the types on instance and instance type are the same


### Why GPU Normalization Matters

Users search for GPUs by model. They want "H100" not "NVIDIA H100 80GB HBM3 SXM5 Accelerator". Your provider implementation must normalize GPU data into the SDK's structured `GPU` type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to actually define the normalization more explicitly. We say that it matters and to normalize but we don't define it beyond base model uppercase. Base model is obvious for some but is unclear for others like A100. We should consider providing in code mapping of all the names or a stronger formalization of gpu name = NVIDIA GPU family point to doc.

### Key Points

- `Name`: base model only (`"H100"` not `"NVIDIA H100 80GB"`)
- `NetworkDetails`: `"SXM"`, `"SXM4"`, `"SXM5"`, or `"PCIe"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be useful to also formalize this string

1. Accept this public key in your create instance API
2. Install it in the VM's default user `~/.ssh/authorized_keys` before the instance becomes accessible

Brev generates a unique SSH key pair for each instance. The control plane retains the private key and uses it to connect after creation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the case, it is per user pub key

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we say that brev manages the ssh key per instance


### Start Instance (Optional)

**Capability:** `CapabilityStopStartInstance`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto ^


Brev checks capabilities before calling optional methods. If you don't declare a capability, Brev won't attempt that operation.

> **Note on `CapabilityTags`:** This capability is optional, but `RefID` and `CloudCredRefID` data is **required** regardless. If your API doesn't support tags, you must use an alternative mechanism to store and retrieve this data. See [Section 10: Instance Metadata and Tags](#10-instance-metadata-and-tags) for details and examples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super important. I think its kind of irrelevant to Tags, we need to put it in more places(in comments on struct) that the instance refid and cloudcredrefid is mandatory. You just have to find a way to return it

```go
// Core pattern
commands := []string{
"ufw --force reset",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated with latest commands. lets point to the new tests that capture this requirement. Or we should just extract this logic into a library (nebius and shadeform) and tell folks to use it if its a user managed firewall.


### Required Instance Data

These values **MUST** be stored with the instance and returned in `GetInstance`/`ListInstances`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let point to the tests that check this I believe its ValidateCreateInstance

**Example (Lambda Labs without tags):**
```go
// At creation - encode CloudCredRefID in instance name
name := fmt.Sprintf("%s--%s", c.GetReferenceID(), time.Now().UTC().Format(timeFormat))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful. We can point to lambda labs example here. The datetime might throw people off. Thats because lambdalabs api does not have create time on its instance struct which is also helpful for us to store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants