Omicron is the control plane for the Oxide system. There are a few different ways to run Omicron depending on what resources you have available and how much of the stack you want to run using real components vs. simulated ones. Generally speaking, it’s easier to get things going and automate testing with simulated components, but of course much of the system’s functionality is missing or faked up when using simulated components.
The most common development configurations are:
-
Using one or more simulated Sled Agents. There are no real VMs. All components listen on localhost and talk to each other directly. The automated tests in this repo generally use this kind of deployment. This is the only mode that’s supported on non-illumos systems (i.e., Linux and MacOS).
-
Using one or more real Sled Agents (on separate physical machines) using SoftNPU-based Boundary Services. You can provision real VMs this way. Intra-"rack" networking and boundary Services (i.e., external connectivity) are fully functional using SoftNPU, a software-based simulation of the Tofino device that’s normally responsible for these functions.
Within this mode, you can run on real Oxide hardware ("Gimlets") or ordinary server PCs.
These are historically called "simulated" vs. "non-simulated", though they both involve simulation of some parts of a real Oxide system.
Here’s a summary of the tradeoffs:
"Simulated" deployment | "Non-simulated" deployment | |
---|---|---|
Works on |
illumos, Linux, MacOS |
illumos only |
Nexus |
Real implementation |
Real implementation |
CockroachDB |
Real implementation, 1-node cluster |
Real implementation, multi-node cluster |
Internal/External DNS |
Real implementation |
Real implementation |
Crucible Pantry |
Real implementation |
Real implementation |
Sled Agent |
Simulated, multiple okay |
Real implementation, requires separate machines for each |
Propolis |
Missing (VMs are faked-up by simulated Sled Agent) |
Real implementation |
Dendrite |
Stub implementation |
Simulated implementation |
Management Gateway Service (MGS) |
Missing |
Real implementation |
Internal ("rack") networking |
Localhost |
Real implementation on-sled, simulated implementation across sleds (SoftNPU) |
Boundary services (external connectivity) |
Missing. Externally-facing Oxide services listen directly on localhost. The VMs are fake, so external connectivity is moot. |
Simulated implementation (SoftNPU) |
Wicket |
Missing |
Not used |
Management network |
Missing |
Not used |
To run the simulated control plane, refer to the guide on running simulated Omicron. The rest of this document describes how to run a single-Sled Omicron with a real Sled Agent and SoftNPU-based Boundary Services.
There are other configurations besides the ones mentioned here:
-
A real Oxide rack: real hardware with real implementations of all components.
-
A Gimlet in a real Oxide rack, deployed as a single system in isolation from the rest of the rack.
-
A Gimlet on a bench (i.e., not in a rack — requires separate hardware to connect power and networking) connected a Sidecar (the Oxide switch), also on a bench.
-
A Gimlet on a bench with no Sidecar.
This document doesn’t describe how to deploy these cases.
Two modes of external networking are described here:
-
You’ll be plugging your Omicron system into an existing IPv4 network (e.g., a home network or a lab network) from which you can allocate a few IP addresses that won’t be used elsewhere on the network. The existing network will be used as the "external" network for the system, meaning that externally-facing services (like the API and console) will use IPs on this network. If you want to be able to reach the the internet from Instances, there must be a gateway on this network that provides access to the internet.
As an example, in this doc:
-
The network is 192.168.1.0/24.
-
The network’s gateway is 192.168.1.199.
-
You can carve off a range from 192.168.1.20 to 192.168.1.40 for use by the rack:
-
192.168.1.20 - 192.168.1.29 for externally-facing services provided by the system,
-
192.168.1.30 as the address used by SoftNPU on the external network, and
-
192.168.1.31 - 192.168.1.40 as the addresses used for Instances that need some public IPs (even if just for NAT to the internet).
-
-
-
Alternatively, you’ll set up an "external" network that only exists on your test machine. If you go this route, we’ll choose 192.168.1.0/24 and all the same other details as in the case above, just for convenience, and it happens to match what is in the non-gimlet.toml file. In this mode, you’ll need to create your made-up network, give the global zone an IP address on it, and set up IPv4 forwarding and address translation (NAT) so that the NTP zone and any instances can get out to the outside world. We’ll use 192.168.1.199 for the GZ interface.
ℹ️
|
In the two map lines, replace igb0 with the name of your machine’s
physical interface that connects to the outside world.
|
$ pfexec dladm create-etherstub -t fake_external_stub0
$ pfexec dladm create-vnic -t -l fake_external_stub0 fake_external0
$ pfexec ipadm create-if -t fake_external0
$ pfexec ipadm create-addr -t -T static --address 192.168.1.199 fake_external0/external
$ echo "map igb0 192.168.1.0/24 -> 0/32 portmap tcp/udp auto" > /tmp/ipnat.conf
$ echo "map igb0 192.168.1.0/24 -> 0/32" >> /tmp/ipnat.conf
$ pfexec cp /tmp/ipnat.conf /etc/ipf/ipnat.conf
$ pfexec routeadm -e ipv4-forwarding -u
$ svcadm enable ipfilter
Other network configurations are possible but beyond the scope of this doc.
When making this choice, note that in order to use the system once it’s set up, you will need to be able to access it from a web browser. If you go with option 2 here, you may need to use an SSH tunnel (see: Setting up an SSH tunnel for console access) or the like to do this.
Omicron packages (discussed in more detail below) are associated with a particular machine type, which is one of:
-
gimlet
(real Oxide hardware deployed in a real Oxide rack with a bunch of other Gimlets that together form a multi-sled system) -
gimlet-standalone
(real Oxide server hardware deployed in a real Oxide rack, but running as a separate single-node system) -
non-gimlet
(some kind of PC running as a single-machine "rack"; can potentially also be used for Gimlet running on the bench?)
The main difference are the configuration files used for the Sled Agent and Rack Setup Service (RSS).
The steps below will install several executables that will need to be in your PATH
. You can set that up first using:
$ source env.sh
(You’ll want to do this in the future in every shell where you work in this workspace.)
Then install prerequisite software with the following script:
$ ./tools/install_prerequisites.sh
You need to do this step once per workspace and potentially again each time you fetch new changes. If the script reports any PATH problems, you’ll need to correct those before proceeding.
This script expects that you are both attempting to compile code and execute it on the same machine. If you’d like to have a different machine for a "builder" and a "runner", you can use the two more fine-grained scripts:
# To be invoked on the machine building Omicron
$ ./tools/install_builder_prerequisites.sh
# To be invoked on the machine running Omicron
$ ./tools/install_runner_prerequisites.sh
Again, if these scripts report any PATH problems, you’ll need to correct those before proceeding.
The rest of these instructions assume that you’re building and running Omicron on the same machine.
The Sled Agent supports operation on both:
-
a Gimlet (i.e., real Oxide hardware), and
-
an ordinary PC running illumos that’s been set up to look like a Gimlet using
cargo xtask virtual-hardware create
(described next).
This script also sets up a "softnpu" zone to implement Boundary Services. SoftNPU simulates the Tofino device that’s used in real systems. Just like Tofino, it can implement sled-to-sled networking, but that’s beyond the scope of this doc.
If you’re running on a PC and using either of the networking configurations mentioned above, you can usually just run this script with a few argumnets set. These arguments tell SoftNPU about your local network. You will need the gateway for your network as well as the whole range of IPs that you’ve carved out for the Oxide system (see [_external_networking] above):
cargo xtask virtual-hardware create
--gateway-ip 192.168.1.199 # The gateway IP address for your local network (see above)
--pxa-start 192.168.1.20 # The first IP address your Oxide cluster can use (see above)
--pxa-end 192.168.1.40 # The last IP address your Oxide cluster can use (see above)
If you’re using the fake sled-local external network mentioned above, then you’ll need to set --physical-link
:
--physical-link fake_external_stub0 # The etherstub for the fake external network
If you’re using an existing external network, you likely don’t need to specify anything here because the script will choose one. You can specify a particular one if you want, though:
--physical-link igb0 # The physical link for your external network.
If you’re running on a bench Gimlet, you may not need (or want) most of what cargo xtask virtual-hardware create
does, but you do still need SoftNPU. You can tweak what resources are created with the --scope
flag.
Later, you can clean up the resources created by cargo xtask virtual-hardware create
with:
$ cargo xtask virtual-hardware destroy
If you’ve done all this before and Omicron is still running, these resources will be in use and this script will fail. Uninstall Omicron (see below) before running this script.
You can skip this step. In that case, the externally-facing services (API and console) will run on insecure HTTP.
You can generate a self-signed TLS certificate chain with:
$ cargo xtask cert-dev create ./smf/sled-agent/$MACHINE/initial-tls- '*.sys.oxide.test'
The relevant configuration files are in ./smf/sled-agent/$MACHINE
. Start with config-rss.toml
in one of those directories. There are only a few parts you need to review:
[[internal_services_ip_pool_ranges]]
first = "192.168.1.20"
last = "192.168.1.29"
This is a range of IP addresses on your external network that Omicron can assign to externally-facing services (like DNS and the API). You’ll need to change these if you’ve picked different addresses for your external network. See [_external_networking] above for more on this.
You will also need to update route information if your $GATEWAY_IP
differs from the default.
The below example demonstrates a single static gateway route; in-depth explanations for testing with BGP can be found in the Network Preparations guide and the Configuring BGP guide:
# Configuration to bring up boundary services and make Nexus reachable from the
# outside. This block assumes that you're following option (2) above: putting
# your Oxide system on an existing network that you control.
[rack_network_config]
# An internal-only IPv6 address block which contains AZ-wide services.
# This does not need to be changed.
rack_subnet = "fd00:1122:3344:0100::/56"
# A range of IP addresses used by Boundary Services on the network. In a real
# system, these would be addresses of the uplink ports on the Sidecar. With
# softnpu, only one address is used.
infra_ip_first = "192.168.1.30"
infra_ip_last = "192.168.1.30"
# Configurations for BGP routers to run on the scrimlets.
# This array can typically be safely left empty for home/local use,
# otherwise this is a list of { asn: u32, originate: ["<v4 network>"] }
# structs which will be be inserted when Nexus is started by sled-agent.
# See the 'Network Preparations' guide linked above.
bgp = []
[[rack_network_config.ports]]
# Routes associated with this port.
# NOTE: The below `nexthop` should be set to $GATEWAY_IP for your configuration
routes = [{nexthop = "192.168.1.199", destination = "0.0.0.0/0"}]
# Addresses associated with this port.
# For softnpu, an address within the "infra" block above that will be used for
# the softnpu uplink port. You can just pick the first address in that pool.
addresses = [{address = "192.168.1.30/24"}]
# Name of the uplink port. This should always be "qsfp0" when using softnpu.
port = "qsfp0"
# The speed of this port.
uplink_port_speed = "40G"
# The forward error correction mode for this port.
uplink_port_fec="none"
# Switch to use for the uplink. For single-rack deployments this can be
# "switch0" (upper slot) or "switch1" (lower slot). For single-node softnpu
# and dendrite stub environments, use "switch0"
switch = "switch0"
# Neighbors we expect to peer with over BGP on this port.
# see: common/src/api/internal/shared.rs – BgpPeerConfig
bgp_peers = []
In some configurations (not the one described here), it may be necessary to update smf/sled-agent/$MACHINE/config.toml
:
# An optional data link from which we extract a MAC address. # This is used as a unique identifier for the bootstrap address. # # If empty, this will be equivalent to the first result from: # $ dladm show-phys -p -o LINK # data_link = "igb0" # On a multi-sled system, transit-mode Maghemite runs in the `oxz_switch` zone # to configure routes between sleds. This runs over the Sidecar's rear ports # (whether simulated with SoftNPU or not). On a Gimlet deployed in a rack, # tfportd will create the necessary links and Maghemite will be configured to # use those. But on non-Gimlet systems, you need to specify physical links to # be passed into the `oxz_switch` zone for this purpose. You can skip this if # you're deploying a single-sled system. # switch_zone_maghemite_links = ["ixgbe0", "ixgbe1"]
The omicron-package
tool builds Omicron and bundles all required files into packages that can be copied to another system (if necessary) and installed there. This tool acts on package-manifest.toml
, which describes the contents of the packages.
Packages have a notion of "build targets", which are used to select between different variants of certain components. For example, the Sled Agent can be built for a real Oxide system, for a standalone Gimlet, or for a non-Gimlet system. This choice is represented by the --machine
setting here:
$ cargo run --release --bin omicron-package -- target create --help
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.55s
Running `target/release/omicron-package target create --help`
Error: Creates a new build target, and sets it as "active"
Usage: omicron-package target create [OPTIONS] --preset <PRESET>
Options:
-p, --preset <PRESET>
The preset to use as part of the build (use `dev` for development).
Presets are defined in the `target.preset` section of the config. The other configurations are layered on top of
the preset.
-i, --image <IMAGE>
The image to use for the target.
If specified, this configuration is layered on top of the preset.
Possible values:
- standard: A typical host OS image
- trampoline: A recovery host OS image, intended to bootstrap a Standard image
-m, --machine <MACHINE>
The kind of machine to build for
Possible values:
- gimlet: Use sled agent configuration for a Gimlet
- gimlet-standalone: Use sled agent configuration for a Gimlet running in isolation
- non-gimlet: Use sled agent configuration for a device emulating a Gimlet
-s, --switch <SWITCH>
The switch to use for the target
Possible values:
- asic: Use the "real" Dendrite, that attempts to interact with the Tofino
- stub: Use a "stub" Dendrite that does not require any real hardware
- softnpu: Use a "softnpu" Dendrite that uses the SoftNPU asic emulator
-r, --rack-topology <RACK_TOPOLOGY>
Specify whether nexus will run in a single-sled or multi-sled environment.
Set single-sled for dev purposes when you're running a single sled-agent. Set multi-sled if you're running with
multiple sleds. Currently this only affects the crucible disk allocation strategy- VM disks will require 3
distinct sleds with `multi-sled`, which will fail in a single-sled environment. `single-sled` relaxes this
requirement.
Possible values:
- multi-sled: Use configurations suitable for a multi-sled deployment, such as dogfood and production racks
- single-sled: Use configurations suitable for a single-sled deployment, such as CI and dev machines
-c, --clickhouse-topology <CLICKHOUSE_TOPOLOGY>
Specify whether clickhouse will be deployed as a replicated cluster or single-node configuration.
Replicated cluster configuration is an experimental feature to be used only for testing.
Possible values:
- replicated-cluster: Use configurations suitable for a replicated ClickHouse cluster deployment
- single-node: Use configurations suitable for a single-node ClickHouse deployment
-h, --help
Print help (see a summary with '-h')
Setting up a target is typically done by selecting a preset. Presets are defined in package-manifest.toml
under [target.preset]
.
For development purposes, the recommended preset is dev
. This preset sets up a build target for a non-Gimlet machine with simulated (but fully functional) external networking:
$ cargo run --release --bin omicron-package -- -t default target create -p dev
Finished release [optimized] target(s) in 0.66s
Running `target/release/omicron-package -t default target create -p dev`
Created new build target 'default' and set it as active
To customize the target beyond the preset, use the other options (for example, --image
). These options will override the settings in the preset.
ℹ️
|
The target create command will set the new target as active and thus let you omit the -t flag in subsequent commands.
|
To kick off the build and package everything up, you can run:
$ cargo run --release --bin omicron-package -- package
This will package up all the packages defined in the manifest that are selected by the active build target. Packing involves building software from this repo, downloading prebuilt pieces from elsewhere, and assembling the results into tarballs. The final artifacts will be placed in a target directory of your choice (by default, out/
) ready to be unpacked as services.
ℹ️
|
Running in release mode isn’t strictly required, but improves the performance of the packaging tools significantly.
|
ℹ️
|
Instead of package you can also use the check subcommand to essentially run cargo check without building or creating packages.
|
To install the services on a target machine:
$ cargo build --release --bin omicron-package
$ pfexec ./target/release/omicron-package install
|
Do not use If you’ve done this already, and you wish to recover, run from the root of this repository |
This command installs an SMF service called svc:/oxide/sled-agent:default
, which itself starts the other required services. This will take a few minutes. You can watch the progress by looking at the Sled Agent log:
$ tail -F $(svcs -L sled-agent)
(You may want to pipe that to looker for better readability.)
You can also list the zones that have been created so far:
# View zones managed by Omicron (prefixed with "oxz_"):
$ zoneadm list -cnv
# View logs for a service:
$ pfexec tail -f $(pfexec svcs -z oxz_nexus_<UUID> -L nexus)
At this point, the system should be up and running! You should be able to reach the external API and web console from your external network. But how? The URL for the API and console will be:
-
http://
/https://
(depending on whether you provided TLS certificates in the steps above) -
recovery
(assuming you did not change the default recovery Silo name) -
.sys.
-
oxide.test
(assuming you did not change the delegated DNS domain).
This won’t be in public DNS, though. You’d need to be using the deployed system’s external DNS servers as your DNS server for things to "just work".[2] You can query them directly:
$ dig recovery.sys.oxide.test @192.168.1.20 +short
192.168.1.22
192.168.1.23
192.168.1.24
Where did 192.168.1.20 come from? That’s an external address of the external
DNS server. We knew that because it’s listed in the external_dns_ips
array in
the config-rss.toml
file we’re using.
Having looked this up, the easiest thing will be to use http://192.168.1.22
for your URL (replacing with https
if you used a certificate, and replacing that IP if needed). If you’ve set up networking right, you should be able to reach this from your web browser. You may have to instruct the browser to accept a self-signed TLS certificate. See also Connecting securely with TLS using the CLI.
If you set up a fake external network (method 2 in External networking), one way to be able to access the console of your deployment is by setting up an SSH tunnel. Console access is required to use the CLI for device authentication. The following is an example of how to access the console with an SSH tunnel.
Nexus serves the console, so first get a nexus IP from the instructions above.
In this example, Omicron is running on the lab machine dunkin
. Usually, you’ll
want to set up the tunnel from the machine where you run a browser, to the
machine running Omicron. In this example, one would run this on the machine
running the browser:
$ ssh -L 1234:192.168.1.22:80 dunkin.eng.oxide.computer
The above command configures ssh
to bind to the TCP port 1234
on the machine
running the browser, forward packets through the ssh connection, and redirect
them to 192.168.1.22 port 80 as seen from the other side of the connection.
Now you should be able to access the console from the browser on this machine,
via something like: 127.0.0.1:1234
, using the port from the ssh
command.
Follow the instructions to set up the Oxide CLI. See the previous section to find the URL for the API. Then you can start the login flow with:
$ oxide auth login --host http://192.168.1.22
Opened this URL in your browser:
http://192.168.1.22/device/verify
Enter the code: CXKX-KPBK
Assuming you haven’t already logged in, this page will bring you to the recovery silo login. The username and password are defined in config-rss.toml
and default to:
username: recovery
password: oxide
Once logged in, enter the 8-character code to complete the login flow. In a few moments the CLI should show you’re logged in.
ℹ️
|
If you’re using an SSH tunnel, you will either need to change the |
Setting resource quotas is required before you can begin uploading images, provisioning instances, etc. In this example we’ll update the recovery silo so we can provision instances directly from it:
$ oxide silo quotas update \
--silo fa12b74d-30f8-4d5a-bc0e-4d229f13c6e5 \
--cpus 9999999999 \
--memory 999999999999999999 \
--storage 999999999999999999
# example response
{
"cpus": 9999999999,
"memory": 999999999999999999,
"silo_id": "fa12b74d-30f8-4d5a-bc0e-4d229f13c6e5",
"storage": 999999999999999999
}
An IP pool is needed to provide external connectivity to Instances. The addresses you use here should be addresses you’ve reserved from the external network (see [_external_networking]).
Here we will first create an ip pool for the recovery silo:
$ oxide ip-pool create --name "default" --description "default ip-pool"
# example response
{
"description": "default ip-pool",
"id": "1c3dfa5c-7b00-46ff-987a-4e59e512b250",
"name": "default",
"time_created": "2024-01-16T22:51:54.679751Z",
"time_modified": "2024-01-16T22:51:54.679751Z"
}
Now we will associate (link) the pool with the recovery silo.
$ oxide ip-pool silo link --pool default --is-default true --silo recovery
# example response
{
"ip_pool_id": "1c3dfa5c-7b00-46ff-987a-4e59e512b250",
"is_default": true,
"silo_id": "5c0aca09-d7ee-4be6-b7b1-060655659f74"
}
Now we will add an address range to the recovery silo:
oxide ip-pool range add --pool default --first $IP_POOL_START --last $IP_POOL_END
# example response
{
"id": "6209516e-2b38-4cbd-bff4-688ffa39d50b",
"ip_pool_id": "1c3dfa5c-7b00-46ff-987a-4e59e512b250",
"range": {
"first": "192.168.1.35",
"last": "192.168.1.40"
},
"time_created": "2024-01-16T22:53:43.179726Z"
}
First, create a Project:
$ oxide project create --name=myproj --description demo
Create a Project Image that will be used as initial disk contents.
This can be the alpine.iso image that ships with propolis:
$ oxide api /v1/images?project=myproj --method POST --input - <<EOF
{
"name": "alpine",
"description": "boot from propolis zone blob!",
"os": "linux",
"version": "1",
"source": {
"type": "you_can_boot_anything_as_long_as_its_alpine"
}
}
EOF
Or an ISO / raw disk image / etc hosted at a URL:
$ oxide api /v1/images --method POST --input - <<EOF
{
"name": "crucible-tester-sparse",
"description": "boot from a url!",
"os": "debian",
"version": "9",
"source": {
"type": "url",
"url": "http://[fd00:1122:3344:101::15]/crucible-tester-sparse.img",
"block_size": 512
}
}
EOF
You’ll need the id $IMAGE_ID
of the image you just created. You can fetch that
with oxide image view --image $IMAGE_NAME
.
Now, create a Disk from that Image. The disk size must be a multiple of 1 GiB and at least as large as the image size. The example below creates a disk using the image made from the alpine ISO that ships with propolis, and sets the size to the next 1GiB multiple of the original alpine source:
$ oxide api /v1/disks?project=myproj --method POST --input - <<EOF
{
"name": "alpine",
"description": "alpine.iso blob",
"block_size": 512,
"size": 1073741824,
"disk_source": {
"type": "image",
"image_id": "$IMAGE_ID"
}
}
EOF
Now we’re ready to create an Instance, attaching the alpine disk created above:
$ oxide api /v1/instances?project=myproj --method POST --input - <<EOF
{
"name": "myinst",
"description": "my inst",
"hostname": "myinst",
"memory": 1073741824,
"ncpus": 2,
"disks": [
{
"type": "attach",
"name": "alpine"
}
],
"external_ips": [{"type": "ephemeral"}]
}
EOF
To uninstall all Omicron services from a machine:
$ cargo build --release --bin omicron-package
$ pfexec ./target/release/omicron-package uninstall
Once all the Omicron services are uninstalled, you can also remove the previously created virtual hardware as mentioned above:
$ cargo xtask virtual-hardware destroy
If you provided TLS certificates during setup, you can connect securely to the API. But you’ll need to be accessing it via its DNS name. That’s usually hard because in development, you’re not using a real top-level domain that’s in public DNS. Both curl(1) and the Oxide CLI provide (identical) flags that can help here:
$ curl -i --resolve recovery.sys.oxide.test:443:192.168.1.22 --cacert ./smf/sled-agent/$MACHINE/initial-tls-key.pem https://recovery.sys.oxide.test
$ oxide --resolve recovery.sys.oxide.test:443:192.168.1.22 --cacert ./smf/sled-agent/$MACHINE/initial-tls-key.pem auth login --host https://recovery.sys.oxide.test
In a real rack, two of the Gimlets (referred to as Scrimlets) will be connected directly to the switch (Sidecar). Those sleds will thus be configured with a switch zone (oxz_switch
) used to manage the switch. The sled_mode
option in Sled Agent’s config will indicate whether the sled its running on is potentially a Scrimlet or Gimlet.
The relevant config will be in smf/sled-agent/$MACHINE/config.toml
.
# Identifies whether sled agent treats itself as a scrimlet or a gimlet.
#
# If this is set to "scrimlet", the sled agent treats itself as a scrimlet.
# If this is set to "gimlet", the sled agent treats itself as a gimlet.
# If this is set to "auto":
# - On illumos, the sled automatically detects whether or not it is a scrimlet.
# - On all other platforms, the sled assumes it is a gimlet.
sled_mode = "scrimlet"
Once Sled Agent has been configured to run as a Scrimlet (whether explicitly or implicitly), it will attempt to create and start the switch zone. This will depend on the switch type that was specified in the build target:
-
asic
implies we’re running on a real Gimlet and are directly attached to the Tofino ASIC. -
stub
provides a stubbed out switch implementation that doesn’t require any hardware. -
softnpu
provides a simulated switch implementation that runs the same P4 program as the ASIC, but in software.
For the purposes of local development, the softnpu
switch is used. Unfortunately, Omicron does not currently automatically configure the switch with respect to external networking, so you’ll need to manually do so.
The components of Omicron are deployed into separate zones that act as separate hosts on the network, each with their own address. Since this network is private to the deployment, we can use the same IPv6 prefix in all development deployments and even hardcode the IPv6 addresses of each component. If you’d like to modify these values to suit your local network, you can modify them within the smf/
subdirectory.
Service | Endpoint |
---|---|
Sled Agent: Bootstrap |
Derived from MAC address of physical data link. |
Sled Agent: Dropshot API |
|
Switch Zone |
|
Cockroach DB |
|
Nexus: Internal API |
|
Oximeter |
|
Clickhouse |
|
Crucible Downstairs 1 |
|
Crucible Downstairs 2 |
|
Crucible Downstairs 3 |
|
Internal DNS Service |
|
External DNS |
|
External DNS |
|
Nexus: External API |
|
Nexus: External API |
|
Nexus: External API |
|
Note that Sled Agent runs in the global zone and is the one responsible for bringing up all the other other services and allocating them with VNICs and IPv6 addresses.
Host images for both the standard Omicron install and the trampoline/recovery install are built as a part of CI. To build them locally, first run the CI script:
$ ./.github/buildomat/jobs/package.sh
This will create a /work
directory with a few tarballs in it. Building a host
image requires a checkout of
helios; the instructions below
use $HELIOS_PATH
for the path to this repository.
To build a standard host image:
$ ./tools/build-host-image.sh -B $HELIOS_PATH /work/global-zone-packages.tar.gz
To build a recovery host image:
$ ./tools/build-host-image.sh -R $HELIOS_PATH /work/trampoline-global-zone-packages.tar.gz
oximeter
is the program used to collect metrics from producers in the control
plane. Normally, the producers register themselves with nexus
, which creates a
durable assignment between the producer and an oximeter
collector in the
database. That allows components to survive restarts, while still producing
metrics.
To ease development, oximeter
can be run in "standalone" mode. In this case, a
mock nexus
server is started, with only the minimal subset of the internal API
needed to register producers and collectors. Neither CockroachDB nor ClickHouse
is required, although ClickHouse can be used, if one wants to see how data is
inserted into the database.
To run oximeter
in standalone, use:
$ cargo run --bin oximeter -- standalone
The producer should still register with nexus
as normal, which is usually done
with an explicit IP address and port. This defaults to [::1]:12221
.
When run this way, oximeter
will print the samples it collects from the
producers to its logs, like so:
Sep 26 17:48:56.006 INFO sample: Sample { measurement: Measurement { timestamp: 2023-09-26T17:48:56.004565890Z, datum: CumulativeF64(Cumulative { start_time: 2023-09-26T17:48:45.997404777Z, value: 10.007154703 }) }, timeseries_name: "virtual_machine:cpu_busy", target: FieldSet { name: "virtual_machine", fields: {"instance_id": Field { name: "instance_id", value: Uuid(564ef6df-d5f6-4204-88f7-5c615859cfa7) }, "project_id": Field { name: "project_id", value: Uuid(2dc7e1c9-f8ac-49d7-8292-46e9e2b1a61d) }} }, metric: FieldSet { name: "cpu_busy", fields: {"cpu_id": Field { name: "cpu_id", value: I64(0) }} } }, component: results-sink, collector_id: 78c7c9a5-1569-460a-8899-aada9ad5db6c, component: oximeter-standalone, component: nexus-standalone, file: oximeter/collector/src/lib.rs:280
Sep 26 17:48:56.006 INFO sample: Sample { measurement: Measurement { timestamp: 2023-09-26T17:48:56.004700841Z, datum: CumulativeF64(Cumulative { start_time: 2023-09-26T17:48:45.997405187Z, value: 10.007154703 }) }, timeseries_name: "virtual_machine:cpu_busy", target: FieldSet { name: "virtual_machine", fields: {"instance_id": Field { name: "instance_id", value: Uuid(564ef6df-d5f6-4204-88f7-5c615859cfa7) }, "project_id": Field { name: "project_id", value: Uuid(2dc7e1c9-f8ac-49d7-8292-46e9e2b1a61d) }} }, metric: FieldSet { name: "cpu_busy", fields: {"cpu_id": Field { name: "cpu_id", value: I64(1) }} } }, component: results-sink, collector_id: 78c7c9a5-1569-460a-8899-aada9ad5db6c, component: oximeter-standalone, component: nexus-standalone, file: oximeter/collector/src/lib.rs:280