Please Offer An Excel Export Option
By Evan Miller
November 30, 2014
I know I know know. Excel is terrible, binary formats are evil, R/Python/open source is the future, et cetera et cetera et cetera.
Still, most people of your website’s users are also Excel users. When they export their data as CSV, they’ll probably just bring it into Excel first to have a look around. You should probably offer an explicit XLS export, and take it seriously. Yes, I’m talking to you.
What does it mean to take XLS seriously? It means supporting the data-transport features of XLS that aren’t present in CSV. CSV is a pretty awful data transport when it comes down to it. It lacks semantics for basic data features as:
- Specifying the text encoding
- Lossless encoding of double-precision numbers
- Dates, times, and durations
- Number formatting
Well, ignore that last one. It does take care to export an error-free, semantically rich XLS document, but I think the effort is worth it.
Data is ultimately meant to be consumed by people. This may come as a shock, but most people are not computer programmers. They are not going to read your CSV with Python and process it with Pandas or Numpy. They are not versed in the libcsv API, nor do they possess ambitions to convert your CSV to JSON in order to build a web mashup. Of the grammar of graphics, they know nothing.
In all likelihood they are going to open your CSV file in Microsoft Excel 2004, or if you’re lucky Microsoft Excel 2007, and they are going to spend hours building pixelated pie charts, bar charts, and 3D line graphs. You might not use Excel to analyze data, but they almost certainly will. So why not accommodate their intentions as well as you can? Isn’t programming fundamentally about helping users achieve their goals?
“But we let our users analyze their data with our online tools!” I am sure your online tools are innovative and beautiful and follow best practices. I am also sure that they implement, at most, 1% of the functionality of Excel. Please stop assuming your homebrew charting interface is the end-all of data analysis. Let the data out of its cage already.
Here are more amazing things of which CSV files are utterly incapable, but which Excel handles with aplomb:
Pretty printing. People want to look at data in tabular form, and they’d like it to be easy on the eyes. You can add some bold, italic, indentation, and tasteful cell merges without detracting from the semantics of the underlying data. It’s easy to rage against over-formatted, under-defined Excel files downloaded from the Department of Spreadsheets, but people derive value from things that they can easily understand, even if your Python script can’t make heads or tails of that wild and krazy XLS file from outer space.
Interactive formulas. This is my favorite feature of the XLS format: Excel lets you show relationships between data. If a value is simply derived from other values in the file, why not encode that relationship? This relationship could be as simple as a percent change, or it could be the p-value from a complex statistical test. Encoding numerical relationships is infinitely preferable to “mystery meat” numbers that pervade most CSV files — not to mention all those XLS files lazily converted from CSV server-side. Excel formulas help people understand where all those numbers come from in a discreet but user-friendly manner.
Pop-up buttons. This may sound trite, but it’s a great feature: In XLS, you can define a set of legal values for a particular cell, and let users choose values from that set. CSV has no such facilities. Compared to the free-for-all nature of CSV, a well-crafted XLS file is an exemplar of restraint and orderliness. Combined with interactive formulas, pop-up buttons let users fulfill the original teleology of PC spreadsheets — creating “What if?” scenarios, and envisioning themselves as all-powerful Gods of the galaxy.
Multiple tables per file. Behold the power of the sheet. A single CSV file can encode, at most, one table. The mighty XLS file can encode an unlimited number of related tables. Howzabout that.
Sure, the XLS format has weaknesses and limitations. The row and column limits are a bummer. I’m not arguing that XLS should replace CSV. But if you have a CSV exporter, it means you want your users to consume data, and so you should probably invest in a high-quality XLS exporter as well. Don’t just run csv2xls and call it a day.
If you’re using a C-derived language, I highly recommend xlslib. I’ve been using the library for a long time, and you’ll find commits in my name implementing formulas, pop-up buttons, and inter-sheet cell references. It’s easy to sniff at Excel, but this stuff is important to most people.
It should be important to you, too.
You’re reading evanmiller.org, a random collection of math, tech, and musings. If you liked this you might also enjoy:
Want to look for statistical patterns in your MySQL, PostgreSQL, or SQLite database? My desktop statistics software Wizard can help you analyze more data in less time and communicate discoveries visually without spending days struggling with pointless command syntax. Check it out!
Statistics the Mac way