AL9 - snmpd crashes regularly with "error on subcontainer 'ifTable container' remove (-1)" message

yyy · August 26, 2024, 8:16am

Hi everyone, first time here.

I have an AlmaLinux 9 vm that runs a GitLab instance.

Almost everytime GitLab runs a CI pipeline, snmpd crashes.

Here is an extract of /var/log/messages

Aug 26 07:37:38 code gitlab-runner <Many GitLab job stuff here>
Aug 26 07:37:47 code snmpd[1277]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
Aug 26 07:37:47 code systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Aug 26 07:38:47 code snmpd[1277]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
Aug 26 07:39:47 code snmpd[1277]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
Aug 26 07:40:47 code snmpd[1277]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
Aug 26 07:41:47 code snmpd[1277]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
Aug 26 07:42:06 code snmpd[1277]: error on subcontainer 'ifTable container' remove (-1)
Aug 26 07:42:09 code snmpd[1277]: free(): invalid pointer
Aug 26 07:42:09 code systemd[1]: Created slice Slice /system/systemd-coredump.
Aug 26 07:42:09 code systemd[1]: Started Process Core Dump (PID 423302/UID 0).
Aug 26 07:42:09 code systemd-coredump[423303]: Resource limits disable core dumping for process 1277 (snmpd).
Aug 26 07:42:09 code systemd-coredump[423303]: Process 1277 (snmpd) of user 0 dumped core.
Aug 26 07:42:09 code systemd[1]: systemd-coredump@0-423302-0.service: Deactivated successfully.
Aug 26 07:42:09 code systemd[1]: snmpd.service: Main process exited, code=dumped, status=6/ABRT
Aug 26 07:42:09 code systemd[1]: snmpd.service: Failed with result 'core-dump'.
Aug 26 07:42:09 code systemd[1]: snmpd.service: Consumed 2min 16.325s CPU time.

The GitLab runner uses Docker intensively for my CI, and dozens of containers as spun up everytime.

I see that RHEL has this in their KB, but behind a paywall : RHEL9: snmpd crashes regularly with "error on subcontainer 'ifTable container' remove (-1)" message - Red Hat Customer Portal
It is mentioned :

This usually occurs on systems running Docker workloads, hence having many non-persistent veth interfaces configured

Has anybody encounter this, or has any clue ?

Thanks for your help

Regards

yyy · August 26, 2024, 8:46am

Ubuntu seems to have a similar issue, and backported some code to fix it : Bug #720638 “snmpd crashes with segfault (libnetsnmpmibs.so.15.1...” : Bugs : net-snmp package : Ubuntu

yyy · August 26, 2024, 10:19am

A coworker pointed me to workarounds : you can Exclude or Filter interfaces in your /etc/snmp/snmpd.conf.

This should avoid docker interfaces that are created and deleted every second to crash snmpd.

I chose to filter/whitelist instead of excluding as I’m not certain to predict the docker interfaces names.

Here is the include_ifmib_iface_prefix doc (debian) :
https://manpages.debian.org/unstable/snmpd/snmpd.conf.5.en.html#include_ifmib_iface_prefix

This filters interfaces that are included in the MIB data collection by iface prefix.

As my “real” interfaces I want to monitor are all named “eth[number]” or “enX[number]”, prefix is perfect.

Here is a snippet of my /etc/snmp/snmpd.conf.

# To avoid crash like "error on subcontainer 'ifTable container' remove" caused by docker creating and deleting a lot of interfaces.
# Doc : Sets the interface name prefixes to include in the IF-MIB data collection. For servers with a large number of interfaces the IF-MIB processing will take a large chunk of CPU for ioctl calls (on Linux).
#       A set of space separated interface name prefixes will reduce the CPU load for IF-MIB processing
include_ifmib_iface_prefix eth enX

EDIT : It crashed again

yyy · August 26, 2024, 9:19pm

Annnnnnnd it crashed again

kingmango · August 28, 2024, 12:34pm

Seeing the same behavior. Keep us posted if you find a fix!

yyy · August 29, 2024, 6:19am

No luck yet.

As a workaround, I we forced the service to restart.

/etc/systemd/system/snmpd.service.d/override.conf

[Service]
Restart=always

It keeps crashing, but at least I can continue ton monitor the server.

yyy · August 29, 2024, 6:37am

There’s a bug referenced for rhel-9.4 : https://issues.redhat.com/browse/RHEL-45595?jql=text%20~%20"snmpd"