nos is the open-source module for running AI workloads on Kubernetes in an optimized way,
increasing GPU utilization, cutting down infrastructure costs and improving workloads performance.
Currently, the available features are:
Dynamic GPU partitioning: allow to schedule Pods requesting fractions of GPU. GPU partitioning is performed automatically in real-time based on the Pods pending and running in the cluster, so that Pods can request only the resources that are strictly necessary and GPUs are always fully utilized.
Elastic Resource Quota management: increase the number of Pods running on the cluster by allowing namespaces to borrow quotas of reserved resources from other namespaces as long as they are not using them.