Help Arch fails to boot after re-configuring initramfs

Bogi

New member
Jun 18, 2020
3
2
3
Hello. I would appreciate some help in resolving this issue if at all possible. I have installed Arch Linux on my NVME drive, I have 2 GPUs (RX 5700 XT and 1050 Ti) and I would like to pass the RX 5700 XT. I am also using AMD Ryzen 7 3700x and MSI Tomahawk MAX motherboard. I have enabled IOMMU and SVM Mode for virtualization in the BIOS.

Output of: dmesg | grep -i -e DMAR -e IOMMU

Code:
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=acc166ed-e8d6-4d76-a5cd-d329f84967bc rw amd_iommu=on loglevel=3 quiet
[    0.092663] Kernel command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=acc166ed-e8d6-4d76-a5cd-d329f84967bc rw amd_iommu=on loglevel=3 quiet
[    1.293963] iommu: Default domain type: Translated
[    1.390515] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    1.390570] pci 0000:00:01.0: Adding to iommu group 0
[    1.390723] pci 0000:00:01.1: Adding to iommu group 1
[    1.390828] pci 0000:00:01.3: Adding to iommu group 2
[    1.390964] pci 0000:00:02.0: Adding to iommu group 3
[    1.391072] pci 0000:00:03.0: Adding to iommu group 4
[    1.391215] pci 0000:00:03.1: Adding to iommu group 5
[    1.391317] pci 0000:00:04.0: Adding to iommu group 6
[    1.391442] pci 0000:00:05.0: Adding to iommu group 7
[    1.391578] pci 0000:00:07.0: Adding to iommu group 8
[    1.391683] pci 0000:00:07.1: Adding to iommu group 8
[    1.391712] pci 0000:00:08.0: Adding to iommu group 9
[    1.391853] pci 0000:00:08.1: Adding to iommu group 9
[    1.391874] pci 0000:00:08.2: Adding to iommu group 9
[    1.391895] pci 0000:00:08.3: Adding to iommu group 9
[    1.391912] pci 0000:00:14.0: Adding to iommu group 10
[    1.392013] pci 0000:00:14.3: Adding to iommu group 10
[    1.392046] pci 0000:00:18.0: Adding to iommu group 11
[    1.392179] pci 0000:00:18.1: Adding to iommu group 11
[    1.392197] pci 0000:00:18.2: Adding to iommu group 11
[    1.392215] pci 0000:00:18.3: Adding to iommu group 11
[    1.392231] pci 0000:00:18.4: Adding to iommu group 11
[    1.392247] pci 0000:00:18.5: Adding to iommu group 11
[    1.392264] pci 0000:00:18.6: Adding to iommu group 11
[    1.392281] pci 0000:00:18.7: Adding to iommu group 11
[    1.392302] pci 0000:01:00.0: Adding to iommu group 12
[    1.392417] pci 0000:03:00.0: Adding to iommu group 13
[    1.392562] pci 0000:03:00.1: Adding to iommu group 13
[    1.392589] pci 0000:03:00.2: Adding to iommu group 13
[    1.392607] pci 0000:20:00.0: Adding to iommu group 13
[    1.392625] pci 0000:20:01.0: Adding to iommu group 13
[    1.392642] pci 0000:20:04.0: Adding to iommu group 13
[    1.392668] pci 0000:22:00.0: Adding to iommu group 13
[    1.392693] pci 0000:25:00.0: Adding to iommu group 13
[    1.392711] pci 0000:25:00.1: Adding to iommu group 13
[    1.392734] pci 0000:26:00.0: Adding to iommu group 14
[    1.392842] pci 0000:27:00.0: Adding to iommu group 15
[    1.393014] pci 0000:28:00.0: Adding to iommu group 16
[    1.393036] pci 0000:28:00.0: Using iommu direct mapping
[    1.393062] pci 0000:28:00.1: Adding to iommu group 17
[    1.393165] pci 0000:29:00.0: Adding to iommu group 8
[    1.393184] pci 0000:2a:00.0: Adding to iommu group 9
[    1.393200] pci 0000:2a:00.1: Adding to iommu group 9
[    1.393217] pci 0000:2a:00.3: Adding to iommu group 9
[    1.393234] pci 0000:2a:00.4: Adding to iommu group 9
[    1.393253] pci 0000:30:00.0: Adding to iommu group 9
[    1.393272] pci 0000:31:00.0: Adding to iommu group 9
[    1.393407] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    1.395028] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    1.435530] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
I have also eddited the /etc/defaults/grub accordingly:

Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="Arch"
GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on loglevel=3 quiet"
GRUB_CMDLINE_LINUX=""
the issue comes up when I make changes to the /etc/mkinitcpio.conf . Once I add the modules in there and reboot, the system just freezes up on the line "Starting version 245.6-4-arch" after passing GRUB selection.

Modules section of /etc/mkinitcpio.conf

Code:
# MODULES
# The following modules are loaded before any boot hooks are
# run.  Advanced users may wish to specify all system modules
# in this array.  For instance:
#     MODULES=(piix ide_disk reiserfs)
MODULES=(vfio_pci vfio vfio_iommu_type1 vfio_virqfd)
I have also created the /etc/modprobe.d/vfio.conf
Code:
options vfio-pci ids=1002:731f,1002:ab38
executing the script from Arch Wiki to find the BUS IDs for the GPU and it's components within the IOMMU groups:

Bash:
#!/bin/bash
shopt -s nullglob
for g in /sys/kernel/iommu_groups/*; do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;
gives the following (only showing the AMD GPU because that's the only one I really care about to passthrough):

Code:
IOMMU Group 14:
        26:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c1)
IOMMU Group 15:
        27:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU Group 16:
        28:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1)
IOMMU Group 17:
        28:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
It does look a bit weird that each controller is alone in it's separate group but that should not matter I believe? I only passed the VGA controller and HDMI Audio controller. I did try also passing 1 or both PCI bridges but the result was always the same. Can anyone please help me out to resolve this issue. Thank you! Cheers!
 

ech0

Administrator
Staff member
May 26, 2020
35
0
6
First thanks for signing up on the forum! I really appreciate it. I will help you out the best I can.

I just want to make sure. Did you go into the BIOS and make sure that you have the proper PCIE slot set as the boot device if your motherboard supports that? Mine does and I have to set my PCIE slot to PCIE slot 2 as that is the slot my host GPU uses.

It sounds like to me, as I have done this myself, that you blacklisted your 5700-XT properly and bound it to the vfio driver but your system is trying to boot to it. I get the same "Starting version 245.6-4-arch" after passing GRUB selection." otherwise.

Also I don't know if you only have 1 monitor like I do, but if you do, make sure you changed input on your monitor to the GPU your host is supposed to be using (and selected in BIOS as boot device). You will actually get to the grub screen and to that "Starting Version..." message off of the 5700 XT before the VFIO driver kicks in take control of the GPU.

~ech0
 

Bogi

New member
Jun 18, 2020
3
2
3
I have been able to fix the issue. Here's how I did it exactly if anyone has the problem with the freeze on boot. I am using the MSI Tomahawk MAX B450 motherboard and in BIOS I had to disable the CSM mode and switch to EFI completely (by default CSM is enabled - Legacy + EFI). When CSM mode is enabled a dedicated GPU is being used during POST and that's why when the time comes for Arch to hijack it and pass it to VFIO it freezes up because the card is already in use, but when CSM mode is disabled the motherboard's grapics are initiated first for the POST leaving 1 or more GPUs (in my case 2) completely free and passable by Arch. For information about other motherboard brands and their respective BIOS checkout Ryzen mainboards: selection of primary graphics device it helped me in solving this issue greatly.


I hope this will help anyone who faces a similar issue as I did.
 
  • Like
Reactions: ech0

ech0

Administrator
Staff member
May 26, 2020
35
0
6
I have been able to fix the issue. Here's how I did it exactly if anyone has the problem with the freeze on boot. I am using the MSI Tomahawk MAX B450 motherboard and in BIOS I had to disable the CSM mode and switch to EFI completely (by default CSM is enabled - Legacy + EFI). When CSM mode is enabled a dedicated GPU is being used during POST and that's why when the time comes for Arch to hijack it and pass it to VFIO it freezes up because the card is already in use, but when CSM mode is disabled the motherboard's grapics are initiated first for the POST leaving 1 or more GPUs (in my case 2) completely free and passable by Arch. For information about other motherboard brands and their respective BIOS checkout Ryzen mainboards: selection of primary graphics device it helped me in solving this issue greatly.


I hope this will help anyone who faces a similar issue as I did.
Thank you for providing the solution! When I posted last night I was running off 3 hours of sleep in 36+ hours and had to rush to bed. But I was going to mention CSM mode for my next post. I have to do the same on my Z390 board.

I will likely update my VFIO guide with the minor detail today. If there is anything else I should add to the guide for AMD let me know. I do not have an AMD system to test with but will be building one in the next few months.
 

Bogi

New member
Jun 18, 2020
3
2
3
Thank you for providing the solution! When I posted last night I was running off 3 hours of sleep in 36+ hours and had to rush to bed. But I was going to mention CSM mode for my next post. I have to do the same on my Z390 board.

I will likely update my VFIO guide with the minor detail today. If there is anything else I should add to the guide for AMD let me know. I do not have an AMD system to test with but will be building one in the next few months.
I would like to point out only a few minor things that I found different compared to majority of other guides out there including the official Wiki pages. For AMD (Ryzen in particular) I have noticed that the line which clearly says "iommu=on" in the output of the dmesg command looks like this: "1.390515] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported". The AMD-V is actually found in BIOS as SVM Mode which may confuse someone while searching for AMD-v in their BIOS (at least that's how it is in my case).
 
  • Like
Reactions: ech0