Constant 501 on repomd.xml for Almalinux 8.10 (in specific cirumstances)

Hello,

Starting on Feb 26 2026 we have experiencing many HTTP 501 error response when trying to mirror the Almalinux 8.10 repository, the failing URLs are always the repomd.xml files.

Some examples:

  • (first occurrence) Mon, 02 Mar 2026 06:30:09 GMT Exception: HTTP-501 from https://repo.almalinux.org/almalinux/8.10/AppStream/x86_64/os/repodata/repomd.xml

  • Mon, 02 Mar 2026 09:29:10 GMT Exception: HTTP-501 from https://repo.almalinux.org/almalinux/8.10/AppStream/x86_64/os/repodata/repomd.xml

  • Mon, 02 Mar 2026 09:29:10 GMT Exception: HTTP-501 from https://repo.almalinux.org/almalinux/8.10/BaseOS/x86_64/os/repodata/repomd.xml

Any known issues there?

The status page only reports “Mirrorlist Maintenance” with no further details or update. :frowning:

Hello. I am unable to reproduce this issue here at the moment.
Repomd.xml on AlmaLinux 8.10 consistently returns HTTP 200 for both GET and HEAD.

ip=151.101.131.52 code=200
ip=151.101.67.52 code=200
ip=151.101.67.52 code=200
ip=151.101.195.52 code=200
[redadmin@www ~]$ for i in {1..200}; do   curl -sS -o /dev/null -w "ip=%{remote_ip} code=%{http_code}\n"     https://repo.almalinux.org/almalinux/8.10/BaseOS/x86_64/os/repodata/repomd.xml || true;   sleep 1; done

If you’re still getting a 501 error, could you please share the following?

Time of occurrence

Remote IP address reached (e.g., curl -w %{remote_ip})

Whether or not an HTTP proxy is used

Name of tool/user-agent used to obtain the mirror

I’m curious if you can recreate the issue now.

I identified an issue with the current running copy of Nginx here, which had used systemctl reload nginx quite extensively to push config updates into prod. After a full Nginx restart I cannot recreate the issue anymore.

Seems we may have found a bug in Nginx’s config reloading somehow…

PS - any reason you’re using repo.almalinux.org instead of our mirrorlist for dnf? The mirrorlist really is quite good :wink:

H!, thanks for the answer, I definitely still encounter this:

I have also seen this in the browser, however in that case it’s always hard to know if it was Chromium trying to be “smart” in showing a cached response, because every time when I reload it works.. :roll_eyes:

In the case of Github Actions (shown above) I consistently get this error from our python script. I’ve submitted a PR to add the debug info when the 501 is encoutnered, as soon as it’s merged I’ll rerun the share the outcome.

PS: I’ve run a quick test in another repo’s workflow and I only get 200s responses, both via python, as well as via python, so i suspect it’s something up with that specific workflow. :thinking: I’ll keep this thread updated.

Here are some debugging information:

2026-03-03 07:48:50 ERROR --- HTTP error diagnostic info ---
  Timestamp (UTC): 2026-03-03 07:48:50.141252
  Remote IP (connected): 151.101.131.52
  DNS resolved IPs for repo.almalinux.org: 151.101.131.52, 151.101.195.52, 151.101.3.52, 151.101.67.52, 2a04:4e42:200::820, 2a04:4e42:400::820, 2a04:4e42:600::820, 2a04:4e42::820
  Proxy in use: no
  User-Agent: python-requests/2.32.5
  Response headers: {'Connection': 'keep-alive', 'Content-Length': '0', 'Server': 'nginx', 'Content-Type': 'text/xml', 'Last-Modified': 'Thu, 26 Feb 2026 09:59:07 GMT', 'ETag': 'W/"69a0196b-f36"', 'Cache-Control': 'public, max-age=60', 'Content-Encoding': 'gzip', 'Accept-Ranges': 'bytes', 'Age': '0', 'Date': 'Tue, 03 Mar 2026 07:48:50 GMT', 'Via': '1.1 varnish', 'X-Served-By': 'cache-pao-kpao1770035-PAO', 'X-Cache': 'MISS', 'X-Cache-Hits': '0', 'X-Timer': 'S1772524130.955171,VS0,VE185', 'Vary': 'Accept-Encoding', 'alt-svc': 'h3=":443";ma=86400,h3-29=":443";ma=86400,h3-27=":443";ma=86400'}
  Request headers: {'User-Agent': 'python-requests/2.32.5', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

I’ve created an issue at https://status.almalinux.org/incidents/04e1d93c-541d-4873-a508-ddd09153afea to track this and I’ll also post updates here.

It looks to have began about a week ago so I’m reviewing our infra changes from that timeframe (there were several).

All fixed now :slight_smile: Let me know if you see any further issues.

Ha! Thanks for the fix, I confirm that we were able to sync. :raising_hands:

Ha!
By the look of things there might still be some issue on that specific CDN, because we’ve seen double-compressed metadata files. :grimacing:

Context: We’ve been investigating why a repodata/8[...]f-comps.xml file has been written as gzip encoded instead of text/xml.
We cannot find out what could cause this in our sync setup which has been unchanged in weeks, so we’re assuming this is a reverse proxy/CDN which is incorrectly re-encoding the xml in transit. :thinking:

What makes you think it’s double-compressed? I’m not seeing anything to indicate that.

If the request headers say that compressed content is acceptable, we’re still compressing it on the edge before serving it.

[jonathan@jon-home ~]$ curl -s -I https://repo.almalinux.org/almalinux/8/BaseOS/x86_64/os/repodata/3d3be38a8dd0d53588ff1
a130db18bb0b2988e68badbdec22413308d7692197f-comps.xml |grep -aE "encoding|type"
content-type: text/xml

versus:

[jonathan@jon-home ~]$ curl -s -i -H "Accept-Encoding: gzip" https://repo.almalinux.org/almalinux/8/BaseOS/x86_64/os/repodata/3d3be38a8dd0d53588ff1a130db18bb0b2988e68badbdec22413308d7692197f-comps.xml |grep -aE "encoding|type"
content-type: text/xml
content-encoding: gzip

I went ahead and flushed the entire CDN cache in case any pre-fix remnants were in there that could cause what you described.

Thanks, for that!

It seems that this was indeed a bug on our side. :roll_eyes: Here is a quick sumamry on what happened, just FYI.

When mirroring RPM repository metadata from upstream AlmaLinux servers to our object storage, certain XML files (e.g. *-comps.xml) end up stored with gzipped content despite having a plain .xml extension.

How:

  1. The upstream repository’s repomd.xml references metadata files by path, e.g. repodata/<hash>-comps-BaseOS.x86_64.xml. This is a plain XML file — not a compressed archive.

  2. Our sync tool fetches this file over HTTPS using the Python requests library in streaming mode. The upstream web server applies standard HTTP Content-Encoding: gzip to the response — this is normal HTTP transport compression to reduce bandwidth, entirely separate from file-level compression like .xml.gz.

  3. Under the hood, requests delegates to urllib3 for reading the response body. The response is read with:

    response.raw.read(decode_content=False)
    

    This tells urllib3 to skip Content-Encoding processing and return the raw bytes off the wire — meaning the transport-level gzip is not stripped.

  4. These gzipped bytes are uploaded to S3 under the original key ending in .xml, without any Content-Encoding metadata on the S3 object.

  5. When a YUM client later fetches this file from the mirror, it receives gzipped bytes for a URL ending in .xml, with no Content-Encoding: gzip header to signal that decompression is needed. The client treats it as raw XML, fails to parse it, and the repository appears broken.