RSS is cool! Some RSS feed readers are not (yet)...
Fresh look at RSS after a migration
This blog had a RSS feed since its inception about 10 years ago. It was (and is) an easy way for readers to quickly discover released and updated articles. Although a lot has changed in 10 years, including a migration from WordPress to Hugo, the RSS feed is still available. Recently, as part of the migration, we looked again at all individual layers that makes this blog possible. From the web server configuration, up to the final HTML output, everything got a review.
Instead of just copying the old configuration, we set everything up from scratch. A fresh start, questioning all choices. With each change, we looked what we could tune and improve. Things that could improve availability and performance. For example, the SSL/TLS configuration settings were updated, including enabling 0-RTT handshakes. The blog was already somewhat static and quick, but there was still room for improvement. This time everything is really static output and we let the web server focus on what it is good at: delivering content at a high speed! Upon on our analysis we discovered a few things, and that is what this article is about.
Bad bots
Like every website on the internet, our logs getting spammed with bad bots. We already had some measures implemented, but we decided to optimize this even further. So this means not just blocking bad bots, but also blocking badly behaving clients. Using still the HTTP/1.0 protocol? That’s fine, but not on this website. Not offering to accept compressed data transfers, sorry, no data for you. After rejecting some of these requests, that is where things got interesting!
Our rationale
In the initial version of this blog post, we did not mention the rationale behind this blog post. So let’s have a look at that first before showing some examples.
- Reduce the amount of unnecessary traffic
- Block clients that misbehave (on purpose or by accident)
- Become more sustainable with our digital resources and assets
- Remove any clutter from our log files, to easier monitor requests
- Increase our security posture
- Help the community
- Inform about this “invisible” issue
- Pointing out incorrectly configured clients
- Reporting the issues to the (open source) projects
That being said, let’s have a look together at some of the things we recently observed, including our thoughts. As we are in favor of RSS, we will also add the relevant actions that we took to see if things can be improved. Not just for us, but for the whole RSS community. During our journey, we already encountered some negative responses to reporting the issues. This article and all actions are written with the best intention in mind.
Examples of issues and improvements
Different types of requests from Slackbot
2024-04-13T11:56:26+00:00 200 1.2.3.4 "HEAD /feed/ HTTP/2.0" 0 "-" "Slackbot 1.0 (+https://api.slack.com/robots)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-13T11:56:26+00:00 200 2.3.4.5 "GET /feed/ HTTP/1.1" 19046 "-" "Slackbot 1.0 (+https://api.slack.com/robots)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
This is interesting. It looks like Slack first queries some basic details about the feed by using a HEAD. The assumption was that this is feedback for decided to pull in the feed (or not). However if we look at the timing, we see something else. In the very same second that the first request came in, another system does an actual GET. I doubt they got the chance to process the information from the first request before firing up the second. Bad implementation? Not sure. Another interesting thing is that the used TLS protocol and ciphers are the same, but the HEAD request was done with an older HTTP protocol version. Might be a thing related to reducing overhead?
Multiple requests from the same system
Some clients seem to request the feed a few times per minute.
2024-04-13T11:58:13+00:00 200 1.2.3.4 "GET /atom.xml HTTP/1.1" 14697 "https://linux-audit.com/" "Inoreader/1.0 (+http://www.inoreader.com/feed-fetcher; 1 subscribers; )" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-13T11:58:29+00:00 200 1.2.3.4 "GET /atom.xml HTTP/1.1" 14697 "https://linux-audit.com/" "Inoreader/1.0 (+http://www.inoreader.com/feed-fetcher; 1 subscribers; )" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
While this client only had two requests, it is a waste of 50 percent. After all, nothing changed in this short time. It’s not fully clear why this client did this, especially as it is not continuously doing this. If it had, our rate-limit would kick in.
Status: cause unclear, more research needed
Actions:
- None, more research needed to see if this is a one-time event or common issue
Newsboat: Too many requests
With rate-limiting in place, we noticed that the Newsboat client got picked up. Example from the logs:
2024-04-14T09:07:39+00:00 304 1.2.3.4 "GET /feed/ HTTP/2.0" 0 "-" "Newsboat/r2.35 (Linux x86_64)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-14T09:07:39+00:00 304 1.2.3.4 "GET /feed/ HTTP/2.0" 0 "-" "Newsboat/r2.35 (Linux x86_64)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-14T09:07:39+00:00 304 1.2.3.4 "GET /feed/ HTTP/2.0" 0 "-" "Newsboat/r2.35 (Linux x86_64)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-14T09:07:39+00:00 304 1.2.3.4 "GET /feed/ HTTP/2.0" 0 "-" "Newsboat/r2.35 (Linux x86_64)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-14T09:07:40+00:00 429 1.2.3.4 "GET /feed/ HTTP/2.0" 74 "-" "Newsboat/r2.35 (Linux x86_64)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
These requests were interesting, as a 304 was reported for the first four, then followed by a 429 error. The HTTP 304 status means that content was most likely not modified compared with the copy that the client has. This is done by comparing the last-modified header. So bonus points for implementing this, as this saves a lot of unneeded data traffic. With zero bytes being sent, that is a perfect outcome. At the same time, we see that multiple requests are made in the same second. So in the end, the client and web server are still processing useless requests. When the rate-limit kicks in, a 429 status is returned and the conversion stops. That is, until the next set of requests.
Status: waiting for new Newsboat release
Actions:
- Created issue
- Issue acknowledged
- Should be resolved with pull request
The issue was reported and one of the developers quickly picked it up. Awesome! This one needs to be monitored and hopefully next release will no longer be responsible for unneeded requests.
Selfoss: Not supporting data compression
The next one is Selfoss. We have seen it showing up in the logs and nothing special so far. Until we toggled the switch to disallow requests that don’t support compression (accept-encoding header).
2024-04-13T10:05:26+00:00 426 1.2.3.4 "GET /atom.xml HTTP/1.1" 16 "https://linux-audit.com/atom.xml" "Selfoss/2.19 (+https://selfoss.aditu.de)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-13T12:02:24+00:00 426 2.3.4.5 "GET /feed/ HTTP/1.1" 16 "https://linux-audit.com/feed/" "Selfoss/2.19 (+https://selfoss.aditu.de)" TLSv1.2/ECDHE-ECDSA-AES256-GCM-SHA384 0.000 .
So here we have two different clients, about two hours apart from each other. One uses the /atom.xml link, the other the alias /feed/. Same file, different path. Both clients use a different set of TLS protocol and ciphers, so I assume that has to do with the underlying operating system and libraries. The Selfoss software itself looks to be the same when looking at the version number. The used HTTP protocol is also the same. Maybe the HTTP/1.1 is somewhat outdated, but that’s fine.
The interesting part in this case is that the requests both got blocked. This can be seen as we returned a 426 message, telling the client to upgrade. As it is not due to the HTTP protocol version, it is related to the lack of compression support.
Status: issue most likely solved (depends on Guzzle)
Actions:
- Open an issue on GitHub
- Project implemented changes to improve this, including upstream to the Guzzle (PHP HTTP client)
With the actions taken by the project, most likely this issue will be resolved in the upcoming update. That’s awesome!
Feedbin: Sometimes supporting date compression?
Like the example with Selfoss, we came also across clients that behave differently per request.
2024-04-13T12:06:35+00:00 200 1.2.3.4 "GET /feed/ HTTP/1.1" 17863 "-" "Feedbin feed-id:MASKED - MASKED subscribers" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-13T12:06:36+00:00 426 1.2.3.4 "GET /web/nginx-log-only-some-requests/ HTTP/1.1" 16 "-" "Down/5.4.1" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
In this example we see that the RSS feed is pulled in. A second later, the latest blog post is retrieved. It came from the same IP address, but with a different user agent. The HTTP protocol, TLS protocols, and ciphers, all the same. So probably different components are at work. One that tracks RSS feeds, while the other pulls in the data related to the article? Not sure what it is and this needs more research.
Status: more research needed
Actions:
- None so far, need more samples
Tiny Tiny RSS: Not all versions supporting data compression
2024-04-20T18:25:59+00:00 304 1.2.3.4 "GET /feed/ HTTP/2.0" 0 "-" "Tiny Tiny RSS/21.07-73d14338a (http://tt-rss.org/)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-20T22:48:30+00:00 304 2.3.4.5 "GET /feed/ HTTP/2.0" 0 "-" "Tiny Tiny RSS/22.09-d47b8c8 (https://tt-rss.org/)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-21T04:53:18+00:00 304 3.4.5.6 "GET /feed/ HTTP/2.0" 0 "-" "Tiny Tiny RSS/23.09-f489f620 (https://tt-rss.org/)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-21T04:56:30+00:00 426 4.5.6.7 "GET /feed/ HTTP/1.1" 16 "-" "Tiny Tiny RSS/24.03-435c321ca (https://tt-rss.org/)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
After looking in the code, it seems that Tiny Tiny RSS is using Guzzle as its HTTP client. But wait, that is the same component as in Selfoss! So it might be possible that without any changes to Tiny Tiny RSS, it will inherit the changes.
Update: later on, we noticed that earlier versions of TT RSS did actually use the Accept-Encoding header (and even use the modern HTTP/2.0 protocol instead of HTTP/1.1).
Status: issue most likely solved (no further action)
Actions:
- Created bug report at their community forum
- Received responses from two developers
- Developer wn_name confirmed they switched to Guzzle, which explains why the older version performs a different request
- Updated log entries to show some older and newer versions
I hope the project also considers to check out if Guzzle can do HTTP/2.0 requests in the future to further optimize the performance. As one of the replies (about blocking clients that not offer data compression) was “tells me he’s some kind of self-important internet weirdo which i’d rather not do anything for.”, I believe any other feedback is not very welcome at the moment. Cased closed.
Miniflux: supporting Gzip, but not Brotli (resolved)
In the log we also discovered different file sizes for the feed. Example of a few requests:
2024-04-13T13:30:02+00:00 200 1.2.3.4 "GET /feed/ HTTP/1.1" 17183 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-13T13:31:22+00:00 200 2.3.4.5 "GET /feed/ HTTP/2.0" 20621 "-" "Mozilla/5.0 (compatible; Miniflux/2.1.2; +https://miniflux.app)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
The first request looks to be a normal browser, while the second one is another RSS reader named Miniflux. It already uses compression, but it needed more data traffic to receive the same file. When we look at the disk, we can see the related values for the feed (at that very moment, it changes daily).
-rw-r--r-- 1 www-data www-data 72474 Apr 13 13:18 atom.xml
-rw-r--r-- 1 www-data www-data 17183 Apr 13 13:18 atom.xml.br
-rw-r--r-- 1 www-data www-data 20621 Apr 13 13:18 atom.xml.gz
This RSS reader is already a good job. It uses a modern HTTP protocol version and has data encoding implemented. BY using a different compression method, it could save (in this case) 3438 bytes. That doesn’t sound like a lot, but we limited the number of entries in our feed. There are many more feeds that are much bigger in size and then the differences add up.
Status: improved (waiting for new release)
Actions:
- Opened a feature request
- Change has been made with an existing pull request and new PR to add Brotli!
2024-04-19: Brotli support added, now waiting for next release to have this active
Nextcloud News App
Also the Nextcloud News app has interesting behavior: it opens an initial connection to the feed, then pulls in the first 10 URLs. Seeing that these are recently changed pages, most likely that is coming from the feed.
2024-04-18T11:11:04+00:00 200 1.2.3.4 "GET /feed/ HTTP/1.1" 17777 "-" "NextCloud-News/1.0" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T11:11:04+00:00 426 1.2.3.4 "GET /viewing-available-test-categories-in-lynis/ HTTP/2.0" 16 "-" "-" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T11:11:04+00:00 426 1.2.3.4 "GET /cheat-sheets/awk/ HTTP/2.0" 16 "-" "-" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T11:11:04+00:00 426 1.2.3.4 "GET /contact/ HTTP/2.0" 16 "-" "-" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T11:11:04+00:00 426 1.2.3.4 "GET /web/nginx-show-all-configured-virtual-hosts/ HTTP/2.0" 16 "-" "-" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T11:11:04+00:00 426 1.2.3.4 "GET /alternative-netstat-ss-tool/ HTTP/2.0" 16 "-" "-" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T11:11:04+00:00 426 1.2.3.4 "GET /yum-plugins-available-plugins-and-built-in-security-support/ HTTP/2.0" 16 "-" "-" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T11:11:04+00:00 426 1.2.3.4 "GET /ubuntu-server-hardening-guide-quick-and-secure/ HTTP/2.0" 16 "-" "-" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T11:11:05+00:00 426 1.2.3.4 "GET /why-linux-security-hardening-scripts-might-backfire/ HTTP/2.0" 16 "-" "-" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T11:11:05+00:00 426 1.2.3.4 "GET /what-is-a-security-audit/ HTTP/2.0" 16 "-" "-" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T11:11:05+00:00 426 1.2.3.4 "GET /how-to-clear-the-arp-cache-on-linux/ HTTP/2.0" 16 "-" "-" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
The strange behavior is that the feed is pulled in with another HTTP protocol version. Also, the initial request uses data compression, yet the other 10 seconds are not. We are refusing to waste bandwidth if not needed, so they are blocked with a 426 message.
Status: most likely improved (monitoring)
Actions:
- Opened a feature request
- Similar issue was reported, which looks to indicate that older software is having this issue
Feed on Feeds (with SimplePie dependency)
Another tool using a different set of protocols, with the initial request allowing data compression, yet the other two not.
2024-04-18T18:12:43+00:00 200 1.2.3.4 "GET /feed/ HTTP/2.0" 21552 "https://linux-audit.com/feed/" "FoF SimplePie/1.5.6 (Feed Parser; http://simplepie.org; Allow like Gecko) Build/20230917075900" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T18:12:43+00:00 426 1.2.3.4 "GET / HTTP/1.1" 16 "-" "FavIcon/1.0 (Caching Utility; ; Allow like Gecko) Build/20160424000000" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-18T18:12:43+00:00 426 1.2.3.4 "GET /favicon.ico HTTP/1.1" 16 "-" "FavIcon/1.0 (Caching Utility; ; Allow like Gecko) Build/20160424000000" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
Sometimes it looks like projects use different clients to pull in data. But it may also be as simple that the request to an underlying library is getting different parameters. Let’s see!
Status: waiting for response on reported issue
Actions:
- Opened an issue. Closed.
- Seems that it is not SimplePie, but another project (Feed-on-Feeds) that uses an older version of SimplePie, so created a new issue
Feedly (fixed)
With some AWK magic, I found also a consumer of the feed that apparently does not such headers to see if the feed was changed at all.
2024-04-22T12:11:14+00:00 200 1.2.3.4 "GET /atom.xml HTTP/2.0" 51797 "-" "FeedlyBot/1.0 (http://feedly.com)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-22T12:11:21+00:00 200 1.2.3.4 "GET /atom.xml HTTP/2.0" 51797 "-" "FeedlyBot/1.0 (http://feedly.com)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
2024-04-22T12:11:32+00:00 200 1.2.3.4 "GET /atom.xml HTTP/2.0" 51797 "-" "FeedlyBot/1.0 (http://feedly.com)" TLSv1.3/TLS_AES_256_GCM_SHA384 0.000 .
Actions:
- Contacted them by email (2024-04-22)
- Resolved (2024-05-15)
Solution: They changed the feed URL on their side. When a redirect (301) happens, the state is not correctly stored, resulting in a 200 after the redirect. With the feed URL change on their end, things should be fine now.
Conclusion
While most RSS feed readers seem to continue to work properly after adjusting our web server configuration, a few unexpected issues came up. So far, multiple open source projects used these insights to improve and made changes right away. Where one project started to support Brotli compression, others enabled the required Accept-Encoding header to enable data compression. Awesome!
The general attitude towards the suggestions made was very positive, from readers of this blog post, up to several developers. So far there is only one negative experience, but maybe it was to be expected that how much you try to do something good, there is always someone having a bad day.
What’s next?
In the upcoming months we will continue to monitor our log and specifically look at the RSS feed. Hopefully more clients can be upgraded to use modern protocols, content encoding, and reduce the number of requests by using the Last-Modified header. When we receive updates, this blog post will be updated.
I also received a private message with an interesting RSS issue tracker. Might be worth checking now and then!
Tips for a better RSS community
Developers of RSS readers
- Implement the usage of the headers If-Modified-Since or If-Unmodified-Since to leverage the Last-Modified status
- Use data encoding methods to reduce the file size that needs to be sent
- Implement HTTP/2.0 protocol where possible
Publishers of RSS feeds
- Reduce the number of entries in the feed (based on your publish frequency). More is not always better.
- Compress your feeds, when possible pre-compressed and with different methods (e.g. Gzip and Brotli)
- Implement HTTP/2.0 protocol where possible
- Consider implementing rate-limiting to indicate misbehaving clients
- Make your feed cachable
- When possible, provide a Last-Modified header with a value that is consistent
- When possible, set the Expires header (or Cache-Control)
- When possible, provide an ETag header for use with If-None-Match
Tips for users of RSS feeds
If you want to be a good net citizen, reduce the time that you refresh your feeds. Is it really needed to request them multiple times a day? Consider refreshing the ones that are not updated daily with a lower interval than those that are updated daily. Also check if your RSS reader is up-to-date, especially after several improvements have been implemented.
Feedback?
Got something to share based on the results that we saw recently? Let it know!