Concepts

Fluence GPU Cloud is a marketplace of GPU compute from providers in data centers around the world. You can deploy three types of GPU workloads — containers, VMs, and bare metal instances — and manage them through the console or the API.

Workload types

GPU Cloud supports three deployment types. Each suits a different use case:

	Container	VM	Bare Metal
What it is	A containerized workload running on a shared host with full GPU access	A virtual machine with GPU passthrough	A dedicated physical server with direct GPU access
Image	Container image (Docker)	OS image	OS image
Connectivity	Domain with forwarded ports	Direct SSH via public IP	Direct SSH via public IP
Updatable	Yes — image, start command, env vars, SSH keys	Rename only	Rename only
Best for	ML inference, training jobs, GPU-accelerated apps packaged as containers	Workloads that need full OS control with GPU access	Maximum performance with no virtualization overhead

Containers are lightweight deployments ideal for running GPU workloads packaged as Docker images. You specify a container image, optional start command, environment variables, and exposed ports. The platform assigns a domain and maps your ports to externally accessible forwarded ports — for example, your port 22 may be forwarded to 30694.

VMs provide a full virtual machine with GPU passthrough. You choose an OS image and connect directly via SSH over a public IP.

Bare metal instances give you a dedicated physical server with direct GPU access and no virtualization overhead. The deployment flow and management are identical to VMs — the only difference is the underlying hardware isolation.

Instance lifecycle

Each instance goes through the following statuses:

Status	Description
`Initiated`	Instance is being provisioned
`Active`	Instance is running and accessible
`Failed`	Provisioning failed — funds are returned to your account
`Ended`	Instance was terminated by the user or the system

Provisioning typically takes 2–3 minutes. Once active, connection details become available — a domain with forwarded ports for containers, or an SSH connection for VMs and bare metal.

Billing model

GPU Cloud uses hourly pre-paid billing in USD. Billing periods are fixed hourly intervals (e.g., 15:00–16:00, 16:00–17:00 UTC) regardless of when the instance was started. At each billing period, one hour of rent is charged from the instance's balance.

Instance balance

Every instance has its own reserved balance that covers ongoing rent. When you deploy an instance, the system deducts an amount equivalent to 3 hours of rent from your account balance and transfers it to the instance's reserved balance. One hour is charged immediately for the current billing period, while the remaining 2 hours stay as a reserve.

If provisioning fails, the full amount is returned to your account balance.

Automatic top-ups

The system maintains a reserve of 2 hours of rent on each instance's balance. When the reserve drops below this target — typically after each hourly charge — the system automatically tops it up from your account balance.

Insufficient funds

If your account balance cannot cover a top-up, the system keeps retrying. Your instance continues running through any billing period that has already been paid for. Termination happens only when the next charge fails because the reserved balance has been fully spent — at that point the instance is deleted.

Refunds on termination

When an instance is stopped or terminated — whether by you or the system — all unused funds remaining on the instance's reserved balance are returned to your account balance.

Workload types​

Instance lifecycle​

Billing model​

Instance balance​

Automatic top-ups​

Insufficient funds​

Refunds on termination​