Files are fraught with peril
This is a psuedo-transcript for a talk given at Deconstruct 2019. To make this accessible for people on slow connections as well as people using screen readers, the slides have been replaced by in-line text (the talk has ~120 slides; at an average of 20 kB per slide, that's 2.4 MB. If you think that's trivial, consider that half of Americans still aren't on broadband and the situation is much worse in developing countries. Let's talk about files! Most developers seem to think that files are easy. Just for example, let's take a look at the top reddit r/programming comments from when Dropbox announced that they were only going to support ext4 on Linux (the most widely used Linux filesystem). For people not familiar with reddit r/programming, I suspect r/programming is the most widely read English language programming forum in the world. The top comment reads: I'm a bit confused, why do these applications have to support these file systems directly? Doesn't the kernel itself abstract away from having to know the lower level details of how the files themselves are stored? The only differences I could possibly see between different file systems are file size limitations and permissions, but aren't most modern file systems about on par with each other? The #2 comment (and the top replies going two levels down) are: #2: Why does an application care what the filesystem is? #2: Shouldn't that be abstracted as far as "normal apps" are concerned by the OS? Reply: It's

Files are fraught with peril, a topic explored in a pseudo-transcript for a talk given at Deconstruct 2019. The presentation aimed to make the content accessible for users on slow connections and those using screen readers by replacing slides with inline text. The talk, spanning around 120 slides, totals 2.4 MBтАФa figure that may seem trivial, but it's essential to consider the global internet access disparities. Half of Americans still lack broadband, and the situation is even more challenging in developing countries.
The talk delved into the misconception that files are straightforward for developers. To illustrate this, the speaker examined the top comments from r/programming, a widely read English-language programming forum on Reddit, when Dropbox announced it would only support ext4 on Linux, the most widely used Linux filesystem. The top comment questioned why applications need to directly support file systems, arguing that the kernel abstracts these details. The commenter suggested that the only differences between file systems are file size limitations and permissions, with modern systems being largely comparable.
The second comment posed a similar question: "Why does an application care what the filesystem is?" Responses emphasized that the operating system should handle such abstractions. However, a deeper dive revealed that this abstraction is not perfect. A key reply highlighted that each file system has its own bugs and requires specific fixes in the Dropbox codebase. Supporting more file systems necessitates additional testing to ensure functionality.
Further replies questioned Dropbox's need for file system specificity, pointing to existing tools like inotify and distributed storage software that function seamlessly. The confusion stemmed from the assumption that applications should not be concerned with file systems, given the kernel's role in abstraction. Yet, the reality is more complex. File systems can introduce subtle issues that impact application performance and reliability, even if the kernel mitigates many differences.
In conclusion, the talk underscored that while developers often overlook the intricacies of file systems, they are far from simple. The abstraction provided by the kernel is leaky, and applications must still account for file system-specific nuances. This awareness is crucial for building robust, cross-platform applications that function reliably across diverse environments. The discussion serves as a reminder that even seemingly mundane aspects like file systems can pose significant challenges, requiring careful consideration and testing.










