π

Don't Contribute Anything Relevant in Web Forums Like Reddit

Show Sidebar

If you're, for example, contributing to a reddit thread about something which is irrelevant or anything with only a short-term relevance, this article does not apply to you right now.

However, as soon as you're helping somebody solving an interesting issue, summarize your experiences with something or write anything that might be cool to be around in a couple of years as well, you do provide potential high-value content. My message to all those authors is: don't use web-based forums.

In 2022, I talked about this topic at the Grazer Linuxtage and there is a video on the pages of the CCC as well as on YT:

The initial slide of the slide deck for the talk.
My talk about this topic (45min).

In late 2023, I got the opportunity to give a talk at the 37C3 by the CCC in Hamburg. This talk was not recorded but overlaps in most parts with the recorded talk above.

TL;DR: all of the content of closed, centralized services will be lost in the long run. Choose the platform you contribute to wisely now instead of learning through more large data loss events later-on.

The longer version is worth your time:

What Do I Mean With Web-Based Forums Here?

In this article, I'm using the term "web-based forums" as an umbrella term for closed, centralized services like Reddit, Hacker News, Slashdot, Facebook, or any other web-based forum where you are able to add comments, articles, and so forth in most cases only after creating an account. Some issues are even true for Lemmy.

Typically, those services don't provide any possibility to extract or synchronize content. They don't offer open APIs that allow users to choose among different and open user interfaces. They are owned and operated by private companies.

Please note that when I'm going to mention more or less only reddit as an example in the next sections, this is because reddit is the only web-based forum I'm familiar with to a certain level. This does not mean that reddit is worse than other closed, centralized web-based forums. Not at all.

So What's the Issue With Web-Based Forums?

There is not one issue. There are several things where web-based forums don't qualify for being a platform for quality content. Let's take a look at some of them.

I'm glad you're still reading this article and I hope you bear with me until the end of it. Most people will realize and learn about having contributed lots and lots of high-value information only when platforms are down for good. And this is what makes me really sad. It is just like you know that one building of the Library of Alexandria is going to burn down in a few years and people still bring many unique copies of high-quality books into its shelves, unaware of destroying knowledge this way.

Issue: No Backup, No Distribution

For reasons and examples stated in this article, any centralized web-based service will go offline some day. Some sooner, some later. Popularity is not even a guarantee that a service gets continued, as you can see with hundreds of (partly) very well known and widely used Google services that were shut down. Nothing will be on the web forever. Most people are not aware of this fact. The books set on this machine are more likely to survive history than all of your reddit/Facebook/... contributions:

A Linotype machine.

So when you begin to be aware of this fact, you might want to think of things you can do to mitigate data loss when services are discontinued or "sunrized" as some marketing experts say.

You could, for example, back-up the data of this service. By providing the information on multiple servers, chances are high that not all of them are lost at the same time.

This requires certain properties. For example, you need to be able to duplicate the service on multiple servers. To be able to do so, you'll need not only the data but also the software that is providing access to the service. When different organization are running mirrored servers, it is required to openly share the data and software. This can be ensured by using Open Source software or at least open APIs and a business model that does not rely on keeping data and technical things a secret.

All major commercial services such as reddit, Facebook and so forth keep everything a secret that is not ultimately necessary to use their services. Their software is a secret, they don't offer open APIs or only very crippled ones, you don't have the possibility to get to the raw data. So no luck there. You do have a lock-in situation. You also might recognize the term switching costs which is maximized by platform owners.

Even with personal blogs, "fragile" as they are, you are able to use the Wayback Machine of the Internet Archive to back up your blog. For example, every page on my blog contains a link to its archive in the page footer. This ensures that you can not only browse the latest version of all of my blog articles in case of a server breakdown. This also enables you to browse all previous version, probably changed over time. Go ahead, try a few "Archive" links of my articles. If any of my articles start with an "Updates:" section, you know for sure that there are older versions accessible via the Internet Archive.

The Wayback Machine does not archive reddit threads. It can not properly back up Facebook pages. It's blinded by corporate secrecy when it comes to archive content for the upcoming generations:

Why isn't the site I'm looking for in the archive?
Some sites may not be included because the automated crawlers were unaware of their existence at the time of the crawl. It's also possible that some sites were not archived because they were password protected, blocked by robots.txt, or otherwise inaccessible to our automated systems. Site owners might have also requested that their sites be excluded from the Wayback Machine.

Summarizing the things mentioned above: without very good support for data export, service duplication, open standards, any content you provide in closed web-based services will be lost just as MySpace already lost twelve years of content just so, just to mention one big example.

Issue: User Interface Dictatorship

When you grew up only knowing centralized web-based forums, you can not imagine the many advantages of having the freedom to choose your preferred user interface. While some people might think this is a minor issue, let me explain a few examples where this makes a huge difference.

The first example starts with something that might only annoy people. With comments like on this thread, you clutter up other people's interface for personal gain. It's selfish and distracts from the information consumption.

The reason why people are using such reminder bots is multi-fold. First, they don't use a proper todo management system that would be able to remind them to read a certain article in a few days. They externalize this inability to the web-based forum and all of its other users. I'm working on fixing these educational issues. Secondly, there is no way to have features that you can use that do not affect other people's interface.

Consider people with visual impairment do have special needs. The WHO reports an estimate of 285 million people that do are visually impaired, ninety percent of them living in developing countries. Those are not numbers you can simply ignore. It is obvious that they do need different kind of interfaces. Either they have to use a high-contrast interface, highly unusual interface scaling factors, an interface that avoids certain color combinations, text-to-speech systems or Braille readers that are able to extract the content properly.

If a web-based services that - remember from before - does not offer proper open APIs and which does not implement said features, all those people simply can not participate and you can not profit from their knowledge and experience.

And even when you think that this is just a minority I can provide examples where everybody profits from choosing his or her own interface.

Some services are providing interfaces that aren't working properly on small displays or mobile devices in general. In these cases, without any ability to switch to an alternative app or web-page, you are locked out even with perfect eyesight.

#reddit can't be viewed in a mobile browser? #wtf I need to find a way to get rid of reddit somehow. #cloud #decentralization #lockin
Screenshot of a Mastodon toot where I'm complaining that you can't read reddit in a mobile web browser.

When you're using an web-based forum that does not provide the feature that already read articles are marked or collapsed, you need to skim though a thread completely and re-read content to find out new postings when re-visiting the thread after a while. Our time should not spent on senseless tasks like this.

Alternative interfaces might provide advanced rating features based on your personal taste and choice so that you are able to filter out the most relevant articles easily and do not clutter your view with irrelevant articles at all. This is also called "scoring". It can be based on keywords, the amount of personal contributions to a longer thread, friendship relationships from your contact management, and so forth.

Some people prefer navigating using the keyboard. Either by personal taste or by physical restrictions. If the web-based centralized service only supports mouse-based navigation, you can not use this service.

I could continue with examples like that. The common theme is: when one particular centralized web-based forum is not implementing all of those nice features you need or like, you can not use them properly.

In any case, the information should be made public as a text and not as a video, sound file or images only. This is the only viable way of optimizing for its consumation and making sure that it can be found in the first place.

Issue: Rule Monopoly and Subjective Censorship

When you do live in a society with certain set of (legal) rules, providers of relevant web-based forums have to follow and enforce some of them. However, the issue is that this kind of censorship is and will always be related to a particular culture and society at a specific time.

For example, in Germany and Austria, being a Nazi is punishable by law. In the USA, freedom-loving people think fans of the human monsters that tortured and murdered millions of Jews in the Second World War need the possibility to express their personal "opinion". As you can see, there is a different point of view in-between the lines when I write about Nazis compared to an author from the USA who values "freedom of speech" higher than "being a die-hard fan of mass murders". It's a very difficult topic you can not enforce with a world-wide service.

You don't have to follow Godwin's law to make a point here. There are countries where child pornography is - within certain degrees - somewhat legal and socially accepted. In mid-Europe we do have a more relaxed point of view related to nudity. In contrast, we do not accept certain levels of brutality and violence like I've seen in some TV productions when I was living in the USA.

So there is an inherent and not solvable conflict between "enforcing some rules" and "providing a service world-wide". This results in subjective censorship. There are always groups of people who are upset when a service provider regulates its service somehow. While this situation also holds true for open, distributed services, local servers hosting illegal content are able to be put down by law enforcement easily whereas big centralized web-based services often do not react to such request or need to be forced by law before they cooperate. I don't get it why it is practical impossible to upload a nipple while child pornography and other highly problematic content stays online for months even after it got reported.

Rules, rules, rules.

I've got an issue with even less dramatic rules and content. For example, I'm not able to post something to r/privacy when it contains a link to an article on my blog even though I don't make any money with my site. Therefore, readers of reddit will never discuss with me about my privacy-related work although I think that my contributions are worth reading.

Also interesting here:

Issue: User Account Hurdle

With every web-based forum, you need to have yet another user account. While this is acceptable for services you consume on a daily basis, this gets tedious when you just have one quick question in that forum where people talk about this new gadget you just bought.

Of course, you must not share passwords among different services. So you need to curate more and more different account credentials. I probably do have credentials to one hundred web-based forums meanwhile.

Whenever I do have just one question I'd pose in a specific web-based forum, I hesitate before creating a new account. I've spent too much nerves on bad usability of registration processes.

The situation is even worse: when I stumble over a thread in a forum where I know exactly how to solve the issue mentioned and I don't have an account for that forum, I just don't accept that ten to fifteen minute registration effort and learning curve to know how to operate the interface and contribute. It's sad but true.

What to Do About It?

Now that I did explain the most important reasons why centralized, web-based forums aren't a good idea at all, you might want to read about things you can do differently or alternatives to those forums.

Some issues mentioned above could be fixed. Some issues can not be fixed because they are fundamental technical and business/political issues of centralized web-based platforms. Therefore, you need to fix most issues by using a different concept in the first place.

Fixing Web-Based Platforms

In order to overcome some issues, platforms might open up and agree to follow open standards for adding content, getting content of the platforms as well as synchronize to separate instances.

One example is lemmy which is a free, federated alternative reddit clone. Similar to email, users are able to freely choose any email provider they want: local Internet provider, running a server on their own, using web-based email providers like GMail, and so forth. When you do not like your current instance, you move over to a different one, taking your data with you.

From my current point of view, I would say that chances for reddit, Facebook, and the others for switching to an open approach is zero point zero. On the contrary: they do whatever they can to lock-in their user with their data even more. Money can only be made with maximized time on their platform and not somewhere else. So you're the product being sold, not the user.

However, the good news is that we already do have alternatives that are around for a couple of years or decades which is a good thing. They have reached a level of maturity most modern platforms will never reach before they collapse for various reasons. So let's take a look at a view of them in the next sections.

Alternative: NNTP

When looking for alternatives, the good news is that we already do have plenty of them.

In contrast to web-based platforms, email as an open and federated/distributed standard is far from being dead despite all the articles that said so. Of course, email is no replacement for web-based platforms. However, there are technologies that are almost as old as email that provided very good forum services for many, many years until the big companies privatized forum content and locked it into their closed services. The most prominent example is the Usenet, or "Newsgroups" as they are called. This is why we need to remember that there was a time before the big web-based platforms where people freely exchanged postings in threads on all kind of topics elsewhere.

The open standard protocol used for the Usenet is called NNTP and there are tons of great clients speaking NNTP, Thunderbird being one of the most prominent ones. For any type of special need (remember the handicapped people from above!), you can get text-based Usenet clients, mobile clients, professional clients and even web-based NNTP clients. This way, you can choose an interface that reflects your software environment, technical knowledge, level of features, simplicity and taste. This way, you easily get simple features like "hide already read articles" up to fancy stuff to deal with high-volume Usenet consumption such as scoring.

As a user of the Usenet, you could fetch messages from one or many different servers. So you most probably only need one single account for accessing all major newsgroups worldwide, in case your server has a good connectivity.

With NNTP being an open standard, anybody is able to "back up" or archive Usenet content. For example, this server holds an archive of my local Usenet server (of the Graz, University of Technology) from 2001 onward and provides a nice search feature.

Update 2022-04-10: Recently, the newsarchive server was discontinued due to lack of public interest and too much hazzle with deletion requests by people who are afraid that old postings might get found. However, due to the open nature of the service, you can still browse through the archive here.

Alternative: Private Blogs with Feeds

Another approach to be able to publish articles on the Internet are personal blogs. The test you're reading is hosted on my personal blog which is running on my server. I even wrote my own software for blogging.

However, you don't have to do this at all. You can start your personal blog using one of the manifold blogging services out there. This way, you don't have to have much technical knowledge. You just concentrate on writing short or long articles and share them with the world.

If you choose to blog yourself, please do make sure that a few things are working fine. The page should be indexed by the WayBackMachine in order to have a fall-back for your content in case something happens to your server instance. 2016 they already covered over 477 billion web pages. This page explains how to add your page to the archive and this does it for whole sites. If you can afford it, please do donate a few bucks that they are able to continue this service.

If you're tech-savvy, you should definitively read the Manifesto for Preserving Content on the Web "This Page is Designed to Last". It describes all necessary things to make sure that your content can be accessed as long as possible. It's not that hard. Actually, it's more about not doing things compared to investing extra effort.

In general, you should make sure that your articles should be indexed by independent search engines. This way, people are able to locate your thoughts and ideas by querying the Internet in contrast to "be on one single platform whose algorithm decides to show this content". Pages that can be indexed and therefore found on the Internet are part of the free web, in contrast to the Dark Web.

When you're publishing great articles on your blog, you don't want to force your readers to re-visit your page every day in order to find your new articles. There is an awesome solution to this issue as well. Actually, there are two standards, solving this issue. One is the older one and much better known: RSS. The more modern standard to accomplish the same is called Atom. Users subscribe to RSS or Atom feeds by adding their URL to their software that deals with those feeds which is called News aggregator. From the user perspective, you don't have to care much about the standards since all modern software solutions can deal with both feed types. If both feed standards are provided, choose Atom.

Logo for RSS and Atom.

This way, people using a web-based aggregator service or a local aggregator software are able to get their personal news feed. As an user of aggregators, you take back control. You can even read the articles when being completely offline while taking a train or flying in a plane. I really can not imagine a decent knowledge worker who does not use this great concept.

One final advantage of running personal blogs is that you can keep your privacy and the privacy of your readers. In contrast to centralized, web-based platforms, the access logs won't be analyzed and sold. It's much harder to automatically derive personal profiles from distributed, heterogeneous blog sites than with centralized, closed platforms.

Alternative: Mixed Approaches

Let's assume, you are using the Usenet or your personal blog for publishing articles, questions, opinions, whatever. Of course, then you can also think of posting to centralized, closed web-based forums and link your original content. This way, you can get the visibility on those platforms while the content is still archived and being able to be found with search engines and so forth.

One thing that still persists is the example of certain sub-reddits having rules where postings of people adding links to their personal blog are deleted automatically. As much as I understand some of it related to people with people self-promoting commercial sites, I don't understand it for personal blogs where no commercial interest is involved. As a consequence, I can not participate on and contribute to the privacy subreddit with my thoughts as I already have mentioned briefly above.

Summary

A hat-tip to everybody who read this far. You may have noticed, it's very important to me to explain the negative implications of centralized, web-based forums. Most implications will affect us only in a couple of years. The urgency of the matter lies in the fact that when you realize the implications, it will be too late to save anything or make anything undone.

Therefore, it's necessary to learn about the inevitable data-loss that those services will cause in order to plan for it and deliberately make good decisions starting from now. By distributing content and using open platforms that can be interconnected and share content freely, most of the threats are addressed while getting the advantages of choosing your own interface and so forth.

So let's go ahead and stop dragging books into libraries that are known to be burned down in a couple of years for sure.

After reading this very long article, you have now deserved a picture of a cat:

A cat named Murli.

Comment by Erik

Erik added a Disqus comment which I would like to include here as well in order to be read by people who do not activate JavaScript or Disqus on my site. A also added links to it:

The indieweb movement calls it POSSE for "Publish (on your) Own Site, Syndicate Elsewhere". Or the other way round: PESOS for "Publish Elsewhere, Syndicate (to your) Own Site". Either way you preserve your own content on your own side.

I wasn't aware of the indieweb movement nor that my suggested approach does have a name. Thank you very much for this. I'm completely on their side.

Since a couple of years, I'm following this principle also with my engagement on Twitter and Mastodon. I do post new status updates using my current Mastodon account only and I have set up a cross-posting service to "the bird-site". This way, I enjoy having the fresh community interaction of a federated and free platform while keeping the old service fed with messages until I quit Twitter for good. A temporary workaround for the hen-and-egg-problem which is a valid approach, if your Mastodon instance has a rule-set that allows bi-directional cross-posting. I moved to a limited instance to my current Mastodon instance for that. This is truly amazing to see a great federated service which supports moving your account that smoothly.

Comment by Gustavo

Hi Karl,
this is a very good post. I've been moving my own site to Org (from Wordpress.com), and have found plenty of good food for though here. Thank you very much!

You're welcome.

I have one practical question about storage of content in the Internet Archive/Wayback Machine. I've seen the links you provided, and besides Archive-It, which is a paid subscription service (fair, of course), I've grasped no way to do it systematically and automatically available. How do you do it? I'd love to be able to put this in a script and let systemd take care of scheduling, but I'm afraid something more manual will be required.

Well, I was lucky enough that Archive.org decided to archive my web site periodically. I can not influence the frequency. So I just "blindly" generate the archive.org-URL with every new article. If you click on a brand new article, you will notice that archive.org did not grab the content and made it available via their service yet. After a while (you see their frequency of crawling in older articles of mine) the content appears on archive.org.

So far, that's fine with me. The main thing is that they begin to fetch my content and that older articles are fetched with certainty.

I periodically send them money but I don't have a subscription account so far. If you do have questions about The Wayback Machine and its archiving service, please do read their FAQs.

Backlinks

Similar Angles

Here are some thoughts with different angles on the same topic:


Related articles that link to this one:

Comment via email (persistent) or via Disqus (ephemeral) comments below: