nos allows you to schedule Pods requesting fractions of GPUs. The GPUs are automatically partitioned into slices that can be requested by individual containers. In this way, GPUs are shared among multiple Pods increasing the overall utilization.
The GPUs partitioning is performed automatically in real-time based on the requests of the Pods in your cluster.
nos constantly watches the pending Pods and finds the best possible GPU partitioning configuration
to schedule the highest number of the ones requesting fractions of GPUs.
You can think of
nos as a Cluster Autoscaler for GPUs: instead of adjusting the number of nodes and GPUs, it dynamically partitions them to maximize their utilization, leading to spare GPU capacity. Then, you can schedule more Pods or reduce the number of GPU nodes needed, reducing infrastructure costs.