- This article is not about what developers do with your data in bad faith and whether telemetry is ethical or not.
- Unfortunately, the sources and examples I provide might not be credible, as telemetry is uncommon on the Linux desktop.
- This article is aimed at projects whose goal is to adapt their software to less technical users.
- Definition and Methods of Gathering User Data
Problems With Opt-in Telemetry and Asking Users
- Prone to Inaccurate Data
- Undiscoverable for Less Technical Users
- Aggressively Shifting Focus to Ask for Feedback
- Opt-out Telemetry
Telemetry is one of the biggest controversial topics in the Linux community. Many people believe that telemetry is entirely meaningless, because developers can “just” ask their users. Some people also argue that users can opt into telemetry if they want to participate, but most of these users are in consensus that opt-out telemetry shouldn’t be there in the first place.
However, I don’t believe that asking users or explicitly opting into telemetry helps to a degree where developers and designers can form educated conclusions, as both methods share many issues regarding gathering data accurately. In this article, we’re going to explore the issues around asking users and opting into telemetry, and then I will explain why opt-out telemetry is a better approach to gather accurate data and forming educated conclusions.
Definition and Methods of Gathering User Data
Per Wikipedia, telemetry “is used to gather data on the use and performance of applications and application components, e.g. how often certain features are used, measurements of start-up time and processing time, hardware, application crashes, and general usage statistics and/or user behavior. In some cases, very detailed data are reported like individual window metrics, counts of used features, and individual function timings.”
There are three main methods to gather user data and they will be defined as such:
- Asking users: when developers explicitly ask users for feedback through surveys or similar.
- Opt-in telemetry: when users explicitly enable telemetry, meaning telemetry is disabled by default.
- Opt-out telemetry: when users explicitly disable telemetry, meaning telemetry is enabled by default.
Problems With Opt-in Telemetry and Asking Users
There are several problems with opt-in telemetry and asking users: they are prone to inaccurate and misleading data, they’re undiscoverable for less technical users, and if they are discoverable, then they are usually displayed in such a way they aggressively shift users’ focus.
Prone to Inaccurate Data
The main (and harmful) problem with asking users and opt-in telemetry is inaccurate and biased data provided by enthusiasts.
I’m going to take “gnome-info-collect: What we learned” as an example. Recently, GNOME conducted a research to gather data from their users. Users were encouraged to participate, but the research was entirely opt-in. They needed to install a package named “gnome-info-collect” and run it. With all the data collected by GNOME, we got the following result for distributions used:
Distro Number of responses % of responses Fedora 1376 54.69% Arch 469 18.64% Ubuntu 267 10.61% Manjaro 140 5.56% Other 78 3.10% EndeavourOS 66 2.62% Debian 44 1.75% openSUSE 38 1.51% Pop! 38 1.51% Total 2516 100.00%
Notice that Fedora Linux users represent the highest amount of participants — literally more than half of the vote count, followed by Arch Linux and Ubuntu. However the consensus seems to be that Ubuntu is the most used Linux distribution (excluding Android). Ubuntu Desktop (the main Ubuntu in the lineup) ships with GNOME, just like Fedora Workstation, which means that the majority of GNOME users are actually Ubuntu users, not Fedora Linux users (or even Arch Linux users). In this case, Fedora Linux and Arch Linux users would be considered as vocal minorities, as there are more participants than Ubuntu’s, despite having less overall users.
Vocal minorities and enthusiasts are not representative of the entire userbase, especially in the context of software designed for less technical users. These data can be harmful, as they are often inaccurate and unrepresentative. Especially in user-interface (UI) and user-experience (UX) research, these data can skew the opinions and conclusions of developers and designers, which could exclusively benefit vocal minorities but harm their target audience(s).
This could also make it difficult to judge which statistics are accurate or even form a moderate conclusion, because we can’t tell which one is representative, unless there is a consensus on the matter.
In the above example, since there is a consensus that Ubuntu is the most used distribution, we have a reference that the data provided is not representative. However, in topics that lack consensus, our conclusions may be misguided.
Undiscoverable for Less Technical Users
Asking users and opt-in telemetry may be undiscoverable to users, as they must be made aware of said method(s) in some way.
For opt-in telemetry, they are often obscured that users will rarely discover them. Once again, this is typically enabled by vocal minorities and enthusiasts, like myself, which may result in biased results. For example, on Mozilla Thunderbird, to enable telemetry, we go through the hamburger menu → Settings → Privacy & Security, then scroll all the way down to “Thunderbird Data Collection and Use”. I only discovered it because I felt like exploring in the Settings.
As for asking users, they need a way to be reached out. In GNOME’s telemetry, I quote:
The people who provided their data with gnome-info-collect were primarily recruited via GNOME’s media channels, including Discourse and Twitter. This means that the data we collected was on a fairly particular subset of GNOME’s users: namely, people who follow our social media channels, are interested enough to respond to our call for help […]
Typically, enthusiasts will participate in surveys and/or opt into telemetry, as they’re active in chat rooms, follow the project on social media and/or blog, participate in forums, etc. However, there won’t be an easy way to reach out to less technical users without implementing annoyances. This makes it very difficult to gain accurate data of the entire userbase and making it representative.
Aggressively Shifting Focus to Ask for Feedback
A common tactic to ask users for feedback and the most “effective” one I know of is by displaying popups, banners, notifications, or other forms of behavior that will shift users’ focus.
For example, LibreOffice sometimes asks for feedback using a banner:
If you’ve ever seen this banner on LibreOffice or other apps, then you have probably clicked the close button, which is a totally normal thing to do. When you open LibreOffice, you don’t open it to participate in surveys; you typically open it to actually use LibreOffice.
When you’re in the middle of using an app and are prompted to shift your focus onto something that is completely unrelated to your intended goal, then you will likely do everything you can to get rid of it. If there is a “do not show again” button, then you will likely press that alongside, so it doesn’t annoy you again.
There are some exceptions, though. Some might want to participate if they have the time. However, this brings back to data that may be inaccurate and unrepresentative.
Many users may not have the time or mood to participate in surveys for whatever reason, myself included. As pointed out previously, when you open an app, you open it to use it, not to participate in a project. I maintain and contribute to several projects, play video games with friends, spend time with family, etc. Many of us have jobs and chores to do additionally. We may not always feel like spending time filling up a survey, as we’re already exhausted and have very little time to spare. At that time, I was creating a resumé.
At the same time, personally, even if it takes 5 minutes, I prefer to participate properly, in the right mood; provide useful information and answer questions correctly, but this kind of mood rarely happens to me, so I rarely participate in surveys.
Opt-out telemetry, despite its controversies, addresses those problems by including everybody by default, including less technical users, as opposed to opt-in telemetry and asking users. This means that less technical users’ data have more potential to influence the data into results that projects can use as a reference and form educated conclusions, and tailor their software around their target audience(s) and the majority of the userbase.
In addition, since opt-out telemetry is enabled by default, there is the benefit that there will be more participants as a result. Of course, users can always opt-out if they want, but at least these data get us much closer to something that is representative and accurate.
Opt-in telemetry and asking users for feedback may result in data that is unrepresentative and inaccurate, which may cause the collected data to be misleading and lead to misguided conclusions and incorrect decisions. Furthermore, these methods may be unreachable to less technical users, as they don’t follow the projects’ social media, forums, chat, etc. Asking users for feedback can also be obnoxious, because they are designed to shift users’ focus to something that they did not anticipate, which can be distracting. They can also be time-consuming, given the circumstances we may live in.
This is not to say that asking users and opt-in telemetry are a bad thing. I personally support asking users specifically for feature proposals — in fact, this is what issue trackers are for. However, I do not support asking users for general feedback, because general feedback struggle with representativeness.
In my opinion, inaccurate data are more harmful than no data.