-
Notifications
You must be signed in to change notification settings - Fork 6
Add external cloud provider integration guide #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cloudManualExternal.md
Outdated
|
|
||
| ### The Instance Type Syncer | ||
|
|
||
| Brev runs a **continuous synchronization process** that periodically queries your API to understand what compute is available. This isn't a one-time import—it's an ongoing reconciliation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the LLM-speak?
This isn't a one-time import—it's an ongoing reconciliation.
cloudManualExternal.md
Outdated
| │ │ | ||
| └──────────────────────────────────────────────────────────────────────────────────┘ | ||
|
|
||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line look superfluous.
cloudManualExternal.md
Outdated
| │ │ ┌─────────┐ ┌───────────┐ ┌─────────▼───┐ ┌───────────┐ ┌──────────────┐ │ │ | ||
| │ │ │ A │ │ │ B │ │ C │ │ D │ │ E │ │ │ | ||
| │ │ │ Provider│ │ Provider │ │ Provider │ │ Provider │ │ Provider │ │ │ | ||
| │ │ └────┬────┘ └─────┬─────┘ └──────┬──────┘ └─────┬─────┘ └──────┬───────┘ │ │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some visual quirks here (the arrow above "C", the extra "|" in the "A" box, the line under "E").
cloudManualExternal.md
Outdated
| ┌─────────────────────────────────────────────────────────────────────────────────┐ | ||
| │ Brev Control Plane │ | ||
| │ ┌───────────────────────────────────────────────────────────────────────────┐ │ | ||
| │ │ Syncer Layer │ │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the rest of the integration (direct API calls, not just syncing)?
cloudManualExternal.md
Outdated
| Brev runs a **continuous synchronization process** that periodically queries your API to understand what compute is available. This isn't a one-time import—it's an ongoing reconciliation. | ||
|
|
||
| **Sync Behavior:** | ||
| - Polls your instance type listing API at regular intervals (typically every 1-5 minutes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The polling is up to the user -- there is a default, but we should probably state that the poll rate is up to the implementer.
cloudManualExternal.md
Outdated
| Brev may set tags/labels on instances for identification: | ||
|
|
||
| | Tag | Value | Purpose | | ||
| |-----|-------|---------| | ||
| | `brev-instance-id` | Brev's internal ID | Cross-reference | | ||
| | `Name` | User-specified | Display name | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is up to the implementer.
cloudManualExternal.md
Outdated
| - Updating tags on running instances | ||
| - Querying instances by tag (helpful but not required) | ||
|
|
||
| If you don't support tags, we track the mapping on our side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need tags :(
cloudManualExternal.md
Outdated
| | **Out of Stock** | No capacity in region | Return specific error code | | ||
| | **Quota Exceeded** | Hit account limit | Return quota error | | ||
| | **Invalid Request** | Bad instance type | Return validation error | | ||
| | **Auth Failed** | Bad API key | Return 401/403 | | ||
| | **Internal Error** | Your system issue | Return 500 with details | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should link to the actual consts here.
cloudManualExternal.md
Outdated
| ### Out of Stock Handling | ||
|
|
||
| "Out of stock" is common with GPUs. Ideal handling: | ||
| 1. Your API returns a clear "no capacity" error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a very specific error we need to do so. We should indicate that here.
cloudManualExternal.md
Outdated
| | Field | Format | Example | | ||
| |-------|--------|---------| | ||
| | **Hourly price** | Cents (integer) | `3200` = $32.00/hr | | ||
| | **Currency** | USD assumed | - | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to the implementer.
cloudManualExternal.md
Outdated
|
|
||
| ## 10. Instance Metadata and Tags | ||
|
|
||
| Brev uses tags to track and correlate instances. Your API must support setting tags at creation and reading them back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 761 says the Tag Capability is optional, but we require it here. Either move it to required or clarify that tagging and this tagging are different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep made some clarification. Great Catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This great work T. Lots of good details and corner cases to help guide implementation for impelements.
We have a "how to add a provider" doc? Can we augment?
|
|
||
| **Option 2: Define Your Own Format** | ||
|
|
||
| If you prefer a different ID format, set `ID` directly: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
point to ValidateStableInstanceTypeIDs to get confidence in your type creation
|
|
||
| > **Warning**: This is the most common cause of integration failures. Instance types may sync successfully but instances fail to provision or appear "orphaned." | ||
|
|
||
| When Brev provisions an instance, it looks up the corresponding instance type using the instance's `InstanceTypeID`. **These IDs must match exactly.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point to making sure ValidateCreateInstance is run in test suite to gain confidence. There we validate that the types on instance and instance type are the same
|
|
||
| ### Why GPU Normalization Matters | ||
|
|
||
| Users search for GPUs by model. They want "H100" not "NVIDIA H100 80GB HBM3 SXM5 Accelerator". Your provider implementation must normalize GPU data into the SDK's structured `GPU` type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to actually define the normalization more explicitly. We say that it matters and to normalize but we don't define it beyond base model uppercase. Base model is obvious for some but is unclear for others like A100. We should consider providing in code mapping of all the names or a stronger formalization of gpu name = NVIDIA GPU family point to doc.
| ### Key Points | ||
|
|
||
| - `Name`: base model only (`"H100"` not `"NVIDIA H100 80GB"`) | ||
| - `NetworkDetails`: `"SXM"`, `"SXM4"`, `"SXM5"`, or `"PCIe"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be useful to also formalize this string
| 1. Accept this public key in your create instance API | ||
| 2. Install it in the VM's default user `~/.ssh/authorized_keys` before the instance becomes accessible | ||
|
|
||
| Brev generates a unique SSH key pair for each instance. The control plane retains the private key and uses it to connect after creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the case, it is per user pub key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we say that brev manages the ssh key per instance
|
|
||
| ### Start Instance (Optional) | ||
|
|
||
| **Capability:** `CapabilityStopStartInstance` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto ^
|
|
||
| Brev checks capabilities before calling optional methods. If you don't declare a capability, Brev won't attempt that operation. | ||
|
|
||
| > **Note on `CapabilityTags`:** This capability is optional, but `RefID` and `CloudCredRefID` data is **required** regardless. If your API doesn't support tags, you must use an alternative mechanism to store and retrieve this data. See [Section 10: Instance Metadata and Tags](#10-instance-metadata-and-tags) for details and examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super important. I think its kind of irrelevant to Tags, we need to put it in more places(in comments on struct) that the instance refid and cloudcredrefid is mandatory. You just have to find a way to return it
| ```go | ||
| // Core pattern | ||
| commands := []string{ | ||
| "ufw --force reset", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated with latest commands. lets point to the new tests that capture this requirement. Or we should just extract this logic into a library (nebius and shadeform) and tell folks to use it if its a user managed firewall.
|
|
||
| ### Required Instance Data | ||
|
|
||
| These values **MUST** be stored with the instance and returned in `GetInstance`/`ListInstances`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let point to the tests that check this I believe its ValidateCreateInstance
| **Example (Lambda Labs without tags):** | ||
| ```go | ||
| // At creation - encode CloudCredRefID in instance name | ||
| name := fmt.Sprintf("%s--%s", c.GetReferenceID(), time.Now().UTC().Format(timeFormat)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is helpful. We can point to lambda labs example here. The datetime might throw people off. Thats because lambdalabs api does not have create time on its instance struct which is also helpful for us to store.
Documentation for third-party cloud providers integrating with Brev's
control plane. Covers the complete integration requirements including
API patterns, instance type modeling, credential configuration, SSH key
injection, provisioning lifecycle, and network/firewall requirements.