Hi there!
I (try to) maintain the workstations my fellow grad students and I use. We currently are having some issues with display hotplugging with DisplayPort, and am hoping someone here might have some advice on what to try next.
We use Dell OptiPlex 7060 workstations, running version EL9.3 with kernel version 5.14.0-362
. They each have a pair of DP connectors on the motherboard, with either 1-2 external monitors. If I unplug the 1080p monitors, and then hotplug them back in, the graphical display doesn’t return and monitors read ‘No signal’.
The issue:
- Happens when unplugging either the DP cable or the AC cable of the display monitors, or when switching the display with a KVM switch.
- Only happens when ‘last’ monitor is unplugged. (If there are dual displays, unplugging just one doesn’t cause an issue).
- Does not occur when turning off the monitor normally with the power button.
- Happens regardless of GNOME vs KDE, or Wayland vs X11.
- Happens even in a basic Linux console, with graphical target disabled via
systemctl set-target multi-user.target
- Happens on several identical machines, all with the same OS config.
- Does not happen on two other identical machine, which are running Fedora 38.
- Can be fixed by rebooting the machines, inconsistently by restarting the desktop environment over SSH, or inconsistently by switching TTYs when running a console with no desktop environment.)
I’ve gathered the following information to try and debug:
- I can still
ssh
into the machine, and see my processes running viatop
and the graphical user logged in viawho
. - The PCIe device which is driving these two DP connectors is
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
- Running
sudo udevadm monitor
, I checked to make sure uevents were properly being generated by the kernel.
When I have two monitors plugged, and I just unplug one, I see a kernel uevent and a matching udev
rule being triggered:
KERNEL[920.660497] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV [920.666026] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
Then when I replug it, the display returns and I see another pair of events:
KERNEL[924.803333] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV [924.807285] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
However if I unplug both cables (or just one, if the system only had one monitor), I see:
KERNEL[955.564511] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV [955.568427] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
KERNEL[957.001857] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV [957.003971] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
…and when I then plug them/it back in, I get nothing! The monitor isn’t detected and I get no display. (On the Fedora machine, I do see replug events and get my display back.)
- The system is booted with these kernel paramaters:
$ cat /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.14.0-362.8.1.el9_3.x86_64 root=UUID=fba9b961-aaa4-4ae2-8ab1-0051c443c757 ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=720f1edd-941c-449e-8d17-a6d0433d8d82 rhgb quiet
- The follow kernel modules are loaded (both before and after hotplugging):
$ lsmod | grep -e drm -e i915
i915 3796992 10
drm_buddy 20480 1 i915
intel_gtt 28672 1 i915
drm_ttm_helper 16384 1 nouveau
drm_display_helper 200704 2 i915,nouveau
drm_kms_helper 245760 4 drm_display_helper,i915,nouveau
syscopyarea 16384 1 drm_kms_helper
sysfillrect 16384 1 drm_kms_helper
sysimgblt 16384 1 drm_kms_helper
cec 69632 2 drm_display_helper,i915
ttm 98304 3 drm_ttm_helper,i915,nouveau
drm 704512 12 drm_kms_helper,drm_display_helper,drm_buddy,drm_ttm_helper,i915,ttm,nouveau
i2c_algo_bit 16384 3 igb,i915,nouveau
video 73728 3 dell_wmi,i915,nouveau
- And checking the
dmesg
buffer, I see the following messages after boot (with no additions after hotplugging):
$ dmesg | grep -e i915 -e drm -e fb
[ 0.274253] pci 0000:00:02.0: BAR 2: assigned to efifb
[ 0.287283] pci 0000:00:1f.4: reg 0x20: [io 0xefa0-0xefbf]
[ 0.493034] efifb: probing for efifb
[ 0.493039] efifb: framebuffer at 0x80000000, using 9000k, total 9000k
[ 0.493040] efifb: mode is 1920x1200x32, linelength=7680, pages=1
[ 0.493041] efifb: scrolling: redraw
[ 0.493041] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[ 0.496062] fb0: EFI VGA frame buffer device
[ 1.543671] ACPI: bus type drm_connector registered
[ 1.999687] i915 0000:00:02.0: vgaarb: deactivate vga console
[ 2.001763] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[ 2.002485] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[ 2.355823] i915 0000:00:02.0: [drm] [ENCODER:94:DDI A/PHY A] failed to retrieve link info, disabling eDP
[ 2.358474] i915 0000:00:02.0: [drm] [ENCODER:110:DDI C/PHY C] is disabled/in DSI mode with an ungated DDI clock, gate it
[ 2.358477] i915 0000:00:02.0: [drm] [ENCODER:120:DDI D/PHY D] is disabled/in DSI mode with an ungated DDI clock, gate it
[ 2.397412] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[ 2.454382] fbcon: i915drmfb (fb0) is primary device
[ 2.528446] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[ 3.460761] systemd[1]: Starting Load Kernel Module drm...
[ 3.472272] systemd[1]: modprobe@drm.service: Deactivated successfully.
[ 3.472405] systemd[1]: Finished Load Kernel Module drm.
[ 3.882919] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
- Checking in the sysfs directory
/sys/devices
, I see the the aforementioned integrated graphics at PCIe address00:02.0
appears to be setup ascard0
. When I examine theenabled
file for either port, before hotplugging, I see the ports are in fact ‘enabled’:
$ cat /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-DP-1/enabled
enabled
$ cat /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-DP-2/enabled
enabled
But after hotplugging the cables, I see the ports read as ‘disabled’, as though the connected displays were still powered off or unplugged:
$ cat /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-DP-1/enabled
disabled
$ cat /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-DP-2/enabled
disabled
- I checked the BIOS on the machines, but didn’t notice any settings related to the display that seemed suspect.
Any thoughts would be super appreciated!