Machine Learning (ML) graduated in recent years from being a buzzword, to be applied today applied in many areas like health science, finance, and intelligent systems, etc. It allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values. In recent years, the emergence of deep learning has brought the increasing adoption of machine learning applications in a broader and deeper aspect of our lives.

Machine Learning workloads brought the requirements of Graphical Processing Units (GPUs) to the datacenters. A GPU is essentially designed to render high-resolution. This type of task doesn’t require a lot of context switching like a Central Processing Unit (CPU) would do. That means that CPUs can switch between multiple tasks quickly to support the more generalized operations.
In contrast, GPUs focus on breaking down complex tasks like identical computations into smaller subtasks that can be continuously performed. Large data sets help to expand and refine what an algorithm can do. So, the more data, the better these algorithms can learn from it. This is particularly true with deep-learning algorithms and neural networks, where parallel computing can support complex, multi-step processes.

VxRail provides a platform for deploying machine learning workloads. It offers end-to-end support for Machine Learning workloads using vSphere from training to inferencing to managing deployment of machine learning applications. VxRail supports a wide range of GPUs to support machine learning workloads for virtual desktops.


In vSphere NVIDIA GPUs can be accessed in two ways:

1. Virtualized Mode: When it comes to GPU virtualization, developed NVIDIA GRID vGPU - a GPU virtualization solution. It allows multiple VMs to share the same physical GPU. It also enables well-known virtualization benefits, such as cloning a VM or suspending and resuming a VM. The NVIDIA GRID vGPU manager is installed in ESXi to virtualize the underlying physical GPUs. The graphics memory of the physical GPU is divided into equal chunks and those chunks are given to each VM. The type of vGPU profile assigned to the VM determines the amount of graphics memory to be consumed.






2. Passthrough Mode: Passthrough on vSphere (also known as VMware DirectPath I/O) allows direct access from the guest operating system in a virtual machine (VM) to the physical PCIe devices of the server controlled by the hypervisor layer. With passthrough, each VM is assigned one or more GPUs as PCI devices. Passthrough provides exclusive access of the GPU when you want a VM to have one or multiple physical GPUs for huge computation needs of an applications running inside the VM. Since the guest OS bypasses the virtualization layer to access the GPUs, the overhead of using passthrough mode is low. There is no GPU sharing amongst VMs when using this mode.




Configuring Virtualized Mode in VxRail


First, we need to configure the GPU settings on the card to “Shared Direct” to allow NVIDA GRID to work.






Next the we will be installing NVIDIA GRID vGPU manager in the ESX to virtualize the GPUs. It is installed as VMware Installation Bundle (VIB).




We can verify the installation of the NVIDIA driver by using the command nvidia-smi






After the driver installation is completed, we can add the GPU device to a VM and assign the respective vGPU profile.



Configuring Passthrough Mode in VxRail


Initially we need to enable the GPU on our ESXi server for passthrough.







After passthrough has been enabled, the GPU can be added to the VM as a PCI device.



Assign GPU via DirectPath IO or Dynamic DirectPath (if same GPU is present in other hosts)