Guide KVM Host/Guest Performance Optimizations

Status
Not open for further replies.

ech0

Administrator
Staff member
May 26, 2020
35
0
6
Introduction

Here are some optization things you can do to improve the performance of your KVM Guest. This section focuses on Host-side optimizations but you can follow the Windows Guests section below this post to improve Guest OS.


Optimizing the Host:
When following my VFIO guide make sure you set your isolcpus parameter in the Grub Kernel line to the CPU cores you want to dedicate to the VM.

Grub Line Options:
Code:
isocpus=2-7
nohz_full=2-7
rcu_nocbs=2-7
iommu=pt
transparent_hugepage=never
pcie_acs_override=downstream,multifunction
Set CPU governor to "Performance" on the Host and increase frequency on all cores if needed
Install cpupower on Host. You can also install the cpupower-gui package as well.
Code:
# pacman -S cpupower
$ cpupower frequency-info
$ cpupower frequency-set -f [I]clock_freq
$ cpupower frequency-set -g perfomance
Setting Hugepages ion kernel line can help latency on the guest as the VM gets it own "share" of the RAM and doesn't have to share or trade with the host. Set t he value "20:" to a value that makes sense for your system. I would not go below 12.
Code:
default_hugepagesz=1G hugepagesz=1G hugepages=20"
Keep all host processes limited to a set of cores. For example I force my host to just use 2 cores (0-1). This improves performance for the guest as cores 2-6 are only used by the guest and not shared with the host.

For systemd distros edit the file below:
Code:
[ech0@ech0-arch ~]$ nano /etc/systemd/system.conf
...
[Manager]
CPUAffinity=1
...
[Exec]
CPUAffinity=0-1

Kernel Preemptible Kernel: Low Latency Desktop:
This must be done at kernel compile time. Basically this sacrifices a very small amount of throughput (overall performance) for the lowest latency possible.
If low jitter is really important to you may want to use both 1000hz and PREEMPT. In most cases though this is not necessary.


Using a Zen-VFIO kernel:
This is basically the above but already on the AUR. You will just need to compile yourself or use one of the precompiled kernels (link below). Do NOT use an AUR helper to compile this! Compile using makepkg.
Code:
mkdir /tmp/compile && cd /tmp/compile
pacman -Syyu git
git clone https://aur.archlinux.org/linux-zen-vfio.git
cd linux-zen-vfio && makepkg -sci
Note: This can take up to 30 minutes to an hour or even longer depending on your CPU/SSD speeds. To speed up the process you can force makepkg to use all available cores.

Add option in /etc/makepkg.conf with
MAKEFLAGS="-j$(nproc)"

If you are not on Arch or you don't want to compile yourself you can grab a precompiled build here


Clocksource
Make sure ‘tsc’ is set as clock source. You can check this via:
Code:
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
more host optimizations coming soon....
 
Last edited:

ech0

Administrator
Staff member
May 26, 2020
35
0
6
Optimizing the Guest:

First make sure you are using OVMF for your Guest OS. This has the biggest effect especially for Windows guest in terms of latency.
  • When the VM creation wizard asks you to name your VM (final step before clicking "Finish"), check the "Customize before install" checkbox.
  • In the "Overview" section, set your firmware to Q35/UEFI. If Q35 or UEFI is missing make sure you followed the steps in my VFIO guide here.
  • Make sure you pass through entire USB controllers if your IOMMU grouping allows instead of each USB device individually. This greatly improves VM latency and I/O latency
  • Make sure you are using VIRTIO for all of your hardware, including passthrough disks and network cards.
  • Pass through an entire SSD for Windows instead of an image file. If an image must be used use RAW format images
  • CPU Pinning: Guide here
  • Use MSI_Utility to great reduce VM input/output latency. Guide here
You can also set emulatorpin & iothreadpin to use the cores that are NOT pinned to the vm.
So for example:

Code:
...
<vcpu placement="static" current="6">36</vcpu>
  <iothreads>1</iothreads>
  <cputune>
    <vcpupin vcpu="2" cpuset="2"/>
    <vcpupin vcpu="3" cpuset="3"/>
    <vcpupin vcpu="4" cpuset="4"/>
    <vcpupin vcpu="5" cpuset="5"/>
    <vcpupin vcpu="6" cpuset="6"/>
    <vcpupin vcpu="7" cpuset="7"/>
    <emulatorpin cpuset='0,1'/>
    <iothreadpin iothread='1' cpuset='0-1'/>
  </cputune>
...
AMD Ryzen CPU model for optimal performance:
Code:
<cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
    <!-- add additional cpu features here-->
</cpu>
For QEMU 3.1 and below “EPYC” is prefered over “host-passthrough”:
Code:
<cpu mode='custom' match='exact' check='none'>
    <model fallback='allow'>EPYC</model>
    <topology sockets='1' cores='4' threads='2'/>
    <feature policy='require' name='topoext'/>
    <!-- add additional cpu features here-->
</cpu>

Hyper-V Optimal Settings for Performance:
Code:
<hyperv>
   <relaxed state='on'/>
   <vapic state='on'/>
   <spinlocks state='on' retries='8191'/>
   <vpindex state='on'/>
   <synic state='on'/>
   <stimer state='on'/>
   <reset state='on'/>
   <vendor_id state='on' value='1234567890ab'/> <!-- nvidia error code 43 prevention -->
   <frequencies state='on'/>
</hyperv>
Increasing Network Performance for Passthrough NICs:
Note: This only works for PCIE NICs that are passthroughed to the VM as an invidual device NOT onboard NICs that reside on the Mobo. For a complete list of NIC hardware that support SR-IOV see here.

Setting the NIC from VIRTIO to SR-IOV will give near-native performance. See below for XML Example:
Code:
<interface type="direct">
  <mac address="52:54:00:2d:18:2f"/>
  <source dev="eno1" mode="bridge"/>
  <model type="sr-iov"/>
  <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>

Clocksource
Make sure ‘tsc’ is set as clock source. You can check this in your XML

Windows specific optimizations here

more guest optimizations coming soon....
 
Last edited:
Status
Not open for further replies.