A curious case of display hotplugging

Hi there!

I (try to) maintain the workstations my fellow grad students and I use. We currently are having some issues with display hotplugging with DisplayPort, and am hoping someone here might have some advice on what to try next.

We use Dell OptiPlex 7060 workstations, running version EL9.3 with kernel version 5.14.0-362. They each have a pair of DP connectors on the motherboard, with either 1-2 external monitors. If I unplug the 1080p monitors, and then hotplug them back in, the graphical display doesn’t return and monitors read ‘No signal’.

The issue:

  • Happens when unplugging either the DP cable or the AC cable of the display monitors, or when switching the display with a KVM switch.
  • Only happens when ‘last’ monitor is unplugged. (If there are dual displays, unplugging just one doesn’t cause an issue).
  • Does not occur when turning off the monitor normally with the power button.
  • Happens regardless of GNOME vs KDE, or Wayland vs X11.
  • Happens even in a basic Linux console, with graphical target disabled via systemctl set-target multi-user.target
  • Happens on several identical machines, all with the same OS config.
  • Does not happen on two other identical machine, which are running Fedora 38.
  • Can be fixed by rebooting the machines, inconsistently by restarting the desktop environment over SSH, or inconsistently by switching TTYs when running a console with no desktop environment.)

I’ve gathered the following information to try and debug:

  1. I can still ssh into the machine, and see my processes running via top and the graphical user logged in via who.
  2. The PCIe device which is driving these two DP connectors is 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
  3. Running sudo udevadm monitor, I checked to make sure uevents were properly being generated by the kernel.

When I have two monitors plugged, and I just unplug one, I see a kernel uevent and a matching udev rule being triggered:

KERNEL[920.660497] change   /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV  [920.666026] change   /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)

Then when I replug it, the display returns and I see another pair of events:

KERNEL[924.803333] change   /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV  [924.807285] change   /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)

However if I unplug both cables (or just one, if the system only had one monitor), I see:

KERNEL[955.564511] change   /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV  [955.568427] change   /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
KERNEL[957.001857] change   /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV  [957.003971] change   /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)

…and when I then plug them/it back in, I get nothing! The monitor isn’t detected and I get no display. (On the Fedora machine, I do see replug events and get my display back.)

  1. The system is booted with these kernel paramaters:
$ cat /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.14.0-362.8.1.el9_3.x86_64 root=UUID=fba9b961-aaa4-4ae2-8ab1-0051c443c757 ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=720f1edd-941c-449e-8d17-a6d0433d8d82 rhgb quiet
  1. The follow kernel modules are loaded (both before and after hotplugging):
$ lsmod | grep -e drm -e i915
i915                 3796992  10
drm_buddy              20480  1 i915
intel_gtt              28672  1 i915
drm_ttm_helper         16384  1 nouveau
drm_display_helper    200704  2 i915,nouveau
drm_kms_helper        245760  4 drm_display_helper,i915,nouveau
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
cec                    69632  2 drm_display_helper,i915
ttm                    98304  3 drm_ttm_helper,i915,nouveau
drm                   704512  12 drm_kms_helper,drm_display_helper,drm_buddy,drm_ttm_helper,i915,ttm,nouveau
i2c_algo_bit           16384  3 igb,i915,nouveau
video                  73728  3 dell_wmi,i915,nouveau
  1. And checking the dmesg buffer, I see the following messages after boot (with no additions after hotplugging):
$ dmesg | grep -e i915 -e drm -e fb
[    0.274253] pci 0000:00:02.0: BAR 2: assigned to efifb
[    0.287283] pci 0000:00:1f.4: reg 0x20: [io  0xefa0-0xefbf]
[    0.493034] efifb: probing for efifb
[    0.493039] efifb: framebuffer at 0x80000000, using 9000k, total 9000k
[    0.493040] efifb: mode is 1920x1200x32, linelength=7680, pages=1
[    0.493041] efifb: scrolling: redraw
[    0.493041] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[    0.496062] fb0: EFI VGA frame buffer device
[    1.543671] ACPI: bus type drm_connector registered
[    1.999687] i915 0000:00:02.0: vgaarb: deactivate vga console
[    2.001763] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    2.002485] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[    2.355823] i915 0000:00:02.0: [drm] [ENCODER:94:DDI A/PHY A] failed to retrieve link info, disabling eDP
[    2.358474] i915 0000:00:02.0: [drm] [ENCODER:110:DDI C/PHY C] is disabled/in DSI mode with an ungated DDI clock, gate it
[    2.358477] i915 0000:00:02.0: [drm] [ENCODER:120:DDI D/PHY D] is disabled/in DSI mode with an ungated DDI clock, gate it
[    2.397412] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[    2.454382] fbcon: i915drmfb (fb0) is primary device
[    2.528446] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[    3.460761] systemd[1]: Starting Load Kernel Module drm...
[    3.472272] systemd[1]: modprobe@drm.service: Deactivated successfully.
[    3.472405] systemd[1]: Finished Load Kernel Module drm.
[    3.882919] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
  1. Checking in the sysfs directory /sys/devices, I see the the aforementioned integrated graphics at PCIe address 00:02.0 appears to be setup as card0. When I examine the enabled file for either port, before hotplugging, I see the ports are in fact ‘enabled’:
$ cat /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-DP-1/enabled 
$ cat /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-DP-2/enabled 

But after hotplugging the cables, I see the ports read as ‘disabled’, as though the connected displays were still powered off or unplugged:

$ cat /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-DP-1/enabled 
$ cat /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-DP-2/enabled 
  1. I checked the BIOS on the machines, but didn’t notice any settings related to the display that seemed suspect.

Any thoughts would be super appreciated!

With X11 there should be additional logs: /var/log/Xorg.?.log*
Possibly also something in /var/log/gdm/

Obviously, if the issue is not in window system but lower, those logs do not reveal anything more.

Your output shows that nouveau module is loaded too. Doesn’t that mean that system has NVidia too?

Thank you for the feedback!

Examining the two logs: I see /var/log/gdm is empty, and the Xorg log doesn’t have any additions after the hotplug. (I think this makes sense, as the hotplugging fails even when on the gdm login screen, and also when graphical target is disabled entirely. So it must not be related to the desktop environment?)

Regarding nouveau, some of the machines did have a Nvidia GT 710 graphics PCIe card, but it wasn’t being used, and so I actually removed it during my debugging to make sure it wasn’t the problem. In the original message, I should have instead pasted the lsmod command output post-removing Nvidia cards:

$ lsmod | grep -e drm -e i915
i915                 3796992  10
i2c_algo_bit           16384  1 i915
drm_buddy              20480  1 i915
intel_gtt              28672  1 i915
drm_display_helper    200704  1 i915
drm_kms_helper        245760  2 drm_display_helper,i915
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
cec                    69632  2 drm_display_helper,i915
ttm                    98304  1 i915
drm                   704512  9 drm_kms_helper,drm_display_helper,drm_buddy,i915,ttm
video                  73728  2 dell_wmi,i915

In other words, all my previous tests were actually done without an Nvidia card, and without the nouveau module loaded. Sorry for the confusion!

This did remind me, though, that I hadn’t checked if this exists when instead using the ports of an external graphics card. Sure enough, when I put in an Nvidia card, connect Nvidia GPU → HDMI cable → 1080p Monitor, boot the machines, and then unplug/replug the cable:

# unplug event
KERNEL[128.615650] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0 (drm)
UDEV  [128.624982] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0 (drm)

# replug event
KERNEL[149.020287] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0 (drm)
UDEV  [149.024238] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0 (drm)

…and the display comes back to life! So, for some reason, hotplugging works with the external GPU (with nouveau) is working, but not when using integrated graphics (with i915 drivers).

This isn’t a total solution, as not every machine can have a GPU, but is another data point in understanding the issue. :thinking: