[nexus] create read-only disks from images or snapshots #9731

hawkw · 2026-01-27T20:52:36Z

At the time of writing, both Crucible and Propolis have the necessary
machinery to support read-only disks. Propolis can expose a block device
to the guest as read-only, and Crucible VCRs can contain a read-only
flag, which is already necessary in order to support snapshots. However,
the control plane does not know how to create and manage read-only
disks. This branch adds control plane support for read-only virtual
disks.

Theory of Operation

The read-onlyness of read-only disks created by the implementation in
this branch is enforced both at the Propolis level (by making the
virtual block device read-only) and at the Crucible level, by making
the regions themselves read-only. In order to use read-only Crucible
regions to back such disks, all read-only disks are backed by existing
read-only volumes. This way, the regions backing the disk are created
as read-only. If we did not do this, we would need to have a way to
create read-write regions in order to write data to the disk initially,
and then make those regions become read-only, which would be
substantially more complex and require changes in Crucible.

This means that only read-only disks may only be created from existing
images or snapshots, as the read-only volumes backing the image or
snapshot is used as the backend for the read-only disk. Eventually, we
could also support creating a read-only disk by importing blocks
directly, by having the saga create a transient read/write disk, take a
snapshot of that disk once it has been finalized, use the snapshot to
create a read-only disk, and then delete the temporary read/write disk
and snapshot. I didn't do this here since it didn't really seem that
useful --- you could just upload a project image, instead, and then you
can use it to make as many read-only disks as you'd like. On the other
hand, it probably never makes sense to allow creating a read-only blank
disk, since...well, you could never make it not be blank. While we could
make such a disk service reads BLAZINGLY FAST, most Unixes already
come with such a block device in the form of /dev/null, and it's
probably unnecessary to let you make your Oxide distributed disk version
of /dev/null.

Virtual Resource Accounting

Note that creating a read-only disk currently requires a disk_create
saga, but it's a very small one. Since the read-only volumes always
already exist, we don't actually need a saga to create the new
read-only disk --- it's really just something that exists exclusively in
CockroachDB and doesn't require actually creating new Crucible regions
out there in the Real World. Most of this database work is done in a
transaction, since it's all just in CRDB. However, currently, we must do
virtual resource accounting for every read-only disk, and the current
virtual provisioning CTE was a bit annoying to hack up into something
that can run in that transaction, so we just make a saga that does the
virtual resource accounting in one saga node, and then does "everything
else" in another one.

With that said, it's actually pretty stupid for a read-only disk of
size s to require s bytes of the storage quota, since...it's really
just a pointer to the volume that was already provisioned for the source
image or snapshot, and any number of read-only disks can be created from
that same image or snapshot without actually storing multiple copies of
the data on a physical disk. The only overhead is the handful of
database records representing the read-only disk and its VCR. So, in an
ideal world, we wouldn't count them against the quota. But, currently,
deleting such a disk will release virtual storage, so if we didn't
allocate it when creating them, there would be an unfortunate quota
leak. @jmpesp and I discussed this a bit and we agreed that if and when
we implement the physical storage accounting described in RFD 639,
this should all change so that the physical disk space is allocated
once, by the image/snapshot, rather than repeatedly by each read-only
disk. But for now, the current behavior of allocating virtual storage to
each read-only disk is unfortunately correct.

Other Notes

While adding tests for this change, I made some changes to the
resource_helpers::create_disk_from_snapshot function in order to

allow passing the read_only flag to the params::DiskCreate it
constructs, and,
to always use the correct size of the parent snapshot for the child
disk.

This necessitated a diff that touched basically all the other tests that
used that function. Sorry about that, those changes should be trivially
ignorable in code review.

common/src/api/external/mod.rs

nexus/db-queries/src/db/datastore/disk.rs

jmpesp · 2026-01-27T21:00:17Z

nexus/db-queries/src/db/datastore/disk.rs

+            if let Some(sub_err) = sub_err.take() {
+                err.bail(
+                    Error::from(sub_err)
+                        .internal_context("failed to checkout snapshot volume"),


could we do something with InlineErrorChain here and the other spots that have the sub err?

i'm not really sure what that gets us here, since none of the sub errors we get back have complex source() chains in them?

i think the VolumeGetError and VolumeCreateError types could probably be made much nicer but I wasn't sure if i wanted to mess with the way they formatted themselves in this otherwise-unrelated PR...

nexus/src/app/sagas/disk_create.rs

nexus/src/app/disk.rs

nexus/db-queries/src/db/datastore/disk.rs

hawkw · 2026-01-28T21:05:35Z

This change still needs some proper end-to-end testing on a racklette with real Crucibles and Propolises, but I think it should be more or less ready for review. I'll do real end-to-end testing once I get a TUF repo of the latest commit built.

jmpesp

It's also worth adding a test for read-only region replacement succeeding for these new types of disks (see crucible_replacements.rs). I'm 99.9% positive the existing machinery will just work but it's worth it to check.

nexus/db-queries/src/db/datastore/disk.rs

nexus/src/app/disk.rs

hawkw · 2026-01-28T23:01:57Z

It's also worth adding a test for read-only region replacement succeeding for these new types of disks (see crucible_replacements.rs)

Will do, thanks!

as per #9731 (comment)

hawkw · 2026-01-30T17:06:44Z

This failure of test_instance_start_creates_networking_state sure looks like a flake; it looks like there's a previous issue around a possible flake in that test (#6540) but that was determined to be an actual bug and fixed. I'll make a new ticket for that.

jmpesp

🛳️

nexus/db-queries/src/db/datastore/disk.rs

nexus/src/app/disk.rs

ahl · 2026-01-31T16:50:27Z

nexus/types/src/external_api/params.rs

+    Snapshot {
+        snapshot_id: Uuid,
+        /// If `true`, the disk created from this snapshot will be read-only.
+        read_only: bool,


Since 99.99% of the time people will not be provisioning read-only disks, can we make this either Option<bool> or #[serde(default)]

Agree, was slightly surprised by this while doing the web console implementation but I forgot to bring it up

Yeah, I'm happy to change this. It seems like adding a #[serde(default)] ought to not require a whole API version bump if it makes it in before the release, but I think it probably does, since the tooling will get mad if the existing generated API spec changes?

Also, as we discussed on Matrix, there is the potential to revisit whether this should be on DiskSource and not DiskBackend or at the top level of the DiskCreate params. @rmustacc pushed back on it being on DiskSource, and I think he was conceptually right that it is a property of the disk rather than of the source...but, in practice, putting it on one of the higher level types means that there are more invalid DiskCreate params that users can construct and which we will reject with a 400 error. I felt like it was worth having the OpenAPI document make it clear which params are valid, as it felt more self-documenting than having it always be part of the API type but invalid in many cases. But, I'm open to being convinced that's wrong, so I wanted to bring it up while we were discussing changing the API again.

First thinking about where the read_only property appears. I think it might make sense to have it on "attach" in the future to enforce r/o semantics on an otherwise r/w disk. But it also makes sense on these particular variants because we're creating a fundamentally distinct disk type i.e. one whose contents are known to be fixed for all time. In other words, I think having read_only on these variants makes sense and in the future we might also have this setting at the "attach" level... probably as an Option<bool> where e.g. read_only = Some(false) on a read-only disk would produce an error?

I can't think of a strong argument for Option<bool> vs. #[serde(default)]. The generated code is no easier or harder on the rust side. I do think that #[serde(default)] is slightly cleaner in that for the Option there could be ambiguity about what None (i.e. an absent) field means.

It would ne nice to have the types enforce the right combination of things, but I think I agree with rm that it’s confusing to make it sound like read only is a property of the source. Especially since it is at top level in the Disk view. ideally, we would be able to put the field at top level and still have the types enforce that it is only allowed to be true when the disk source is image or snapshot. I can write that type in TypeScript, but we don’t have a nice way of expressing it in OpenAPI, let alone generating the right type from that in all the SDKs. So I think I lean toward putting it at top level on the create body and letting 400s and doc comments clarify the rules. But I am fine with either way.

On the new API version question, it’s fine if it requires that — seems like we are resigned to essentially any change requiring that.

@ahl

@ahl rightly [points out][1] that the common case for the `disk_create` APIs is to create read-write disks, and it seems unfortunate to require all calls to create a read-write disk from an image or snapshot to require a `"read_only": false` field in the JSON body. Thus, this commit adds `#[serde(default)]` so that disk source params _without_ a `"read_only"` are the same as `"read_only": false`. Unfortunately, we didn't catch this until _after_ #9731 merged, so fixing it the proper way requires a whole new API version. Since the conversion between the Rust params types in `nexus-types` is trivial, it might have been possible to bend the rules a bit here and just tweak the previously-generated API version to make the field nullable --- clients operating with the slightly-older version of that version would still always be accepted since they would _always_ send the field. But, bending the rules makes me scared I'm gonna get in trouble, so I did it the proper way, at the expense of a whole lot of code that does basically nothing.

hawkw added 10 commits January 27, 2026 09:19

okay i think that's the external API change done

256cf60

@jmpesp's patch, mostly

012947c

fixup stuff

2ffc7e8

fix everything else

b46007e

add db stuff

38634b0

rm comment

1ee8059

more plumbing

042b83d

create read-only disks from snapshots

cefe505

remove @jmpesp's patch (whoopsie)

fc3edd0

+IF EXISTS

5dd187b

david-crespo reviewed Jan 27, 2026

View reviewed changes

common/src/api/external/mod.rs Show resolved Hide resolved

hawkw added 2 commits January 27, 2026 13:02

undo whitespace change from removing read_only from DiskBackend

5e7c8a9

update comment

541f2e5

jmpesp reviewed Jan 27, 2026

View reviewed changes

hawkw added 8 commits January 27, 2026 13:14

some of @jmpesp's initial feedback

3815004

tidy up a couple things (line wrappy etc)

46cfa99

@david-crespo doc tweak

1e4c2d9

you have to actually update the docs

3015ede

omdb db disk info should know about readonlyness

06e0a5f

apparently there is no ALTER COLUMN IF EXISTS in crdb

dbbecb2

add some integration tests

66bac7e

don't leave newly-created ro disks in Creating forever...

a93f3a0

hawkw commented Jan 28, 2026

View reviewed changes

nexus/db-queries/src/db/datastore/disk.rs Show resolved Hide resolved

hawkw marked this pull request as ready for review January 28, 2026 21:04

validate snapshot and disk sizes are exactly equal

84fb39a

hawkw requested a review from jmpesp January 28, 2026 21:13

jmpesp reviewed Jan 28, 2026

View reviewed changes

nexus/db-queries/src/db/datastore/disk.rs Outdated Show resolved Hide resolved

nexus/db-queries/src/db/datastore/disk.rs Show resolved Hide resolved

nexus/src/app/disk.rs Show resolved Hide resolved

hawkw added a commit that referenced this pull request Jan 28, 2026

stricter expected initial states

d90b077

as per #9731 (comment)

hawkw requested a review from jmpesp January 29, 2026 23:31

hawkw added 2 commits January 29, 2026 18:09

remove unnecessary oomto

6d50e20

I COMMITED THE WROMG FILE

9680b35

hawkw added 2 commits January 30, 2026 09:19

pre merge version renamey to avoid merge conflicts

bb8a9ae

Merge branch 'main' into eliza/ro

4426d2d

hawkw force-pushed the eliza/ro branch from 4df23c6 to 4426d2d Compare January 30, 2026 17:40

jmpesp approved these changes Jan 30, 2026

View reviewed changes

hawkw enabled auto-merge (squash) January 30, 2026 17:57

hawkw disabled auto-merge January 30, 2026 17:58

lookup snapshot or image VCRs? (#9757)

748f323

david-crespo mentioned this pull request Jan 30, 2026

Read only disks oxidecomputer/console#3021

Draft

also create RO disks from images

9c2c676

hawkw changed the title ~~[nexus] create read-only disks from snapshots~~ [nexus] create read-only disks from images or snapshots Jan 30, 2026

hawkw added 2 commits January 30, 2026 11:58

unfuck API versioning

0b30967

whoopsie

4c3bd6b

jmpesp approved these changes Jan 30, 2026

View reviewed changes

nexus/db-queries/src/db/datastore/disk.rs Outdated Show resolved Hide resolved

nexus/db-queries/src/db/datastore/disk.rs Outdated Show resolved Hide resolved

nexus/src/app/disk.rs Outdated Show resolved Hide resolved

nexus/src/app/disk.rs Outdated Show resolved Hide resolved

hawkw added 6 commits January 30, 2026 12:41

@jmpesp's review feedback

2d3b6df

fix test trying to create disk from dinky alpine image

fed8f0d

also expect Detached when source is image

74bb442

prepare to rebase with version merge conflict

8f20f53

Merge branch 'main' into eliza/ro

afeb9fc

regen versions

a34b8db

hawkw mentioned this pull request Jan 30, 2026

Bump Propolis #9726

Open

hawkw enabled auto-merge (squash) January 30, 2026 21:54

hawkw merged commit 3dfb80f into main Jan 31, 2026
16 checks passed

hawkw deleted the eliza/ro branch January 31, 2026 01:20

ahl reviewed Jan 31, 2026

View reviewed changes

hawkw mentioned this pull request Jan 31, 2026

[nexus] Accept DiskSource::{Image, Snapshot}s without read_only #9766

Open

[nexus] create read-only disks from images or snapshots #9731

[nexus] create read-only disks from images or snapshots #9731

Conversation

hawkw commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Theory of Operation

Virtual Resource Accounting

Other Notes

Uh oh!

Uh oh!

Uh oh!

jmpesp Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hawkw commented Jan 28, 2026

Uh oh!

jmpesp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hawkw commented Jan 28, 2026

Uh oh!

hawkw commented Jan 30, 2026

Uh oh!

jmpesp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahl Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

david-crespo Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

ahl Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

david-crespo Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hawkw commented Jan 27, 2026 •

edited

Loading

david-crespo Jan 31, 2026 •

edited

Loading