Jupyter, Mathematica, and the Future of the Research Paper

The Atlantic has a great article on new ways to share research results. Its three parts make three points:

  1. A graphical user interface (GUI) can facilitate better technical writing.
  2. Wolfram’s proprietary notebook showcased innovative technology, but decades after its introduction, still has few users.
  3. Jupyter is a new open-source alternative that is well on the way to becoming a standard for exchanging research results.

Each is spot on. I had to learn the hard way why so many kept their distance from Mathematica. Now, I’m much more productive with Jupyter. I’m experimenting with, and excited about, its potential as a way to write up research results.

The open question

The article asks why Jupyter succeed where Mathematica failed. The obvious contrast is between the proprietary world of Wolfram and the open-source model of the software ecosystem that Jupyter mobilizes.

The Mathematica developers claim that the hierarchy afforded by the proprietary model is a better way to organize innovation. To their credit, Mathematica did open up a huge technical lead in the 1990s. (Pay no attention to the preposterous suggestion that it is still the technological leader.) There are, of course, many offsetting examples of visionaries who succeeded by mobilizing an open-source community. Still, Mathematica’s early lead offers some support for the claim that from the perspective of software engineering, the proprietary model may sometimes have its advantages.

The difference that matters

This technical engineering dimension is not the only one we should use to compare the proprietary and open models. There is an independent social dimension, where the metrics assess the interactions between people. Does it increase trust? Does it increase the importance that people attach to a reputation for integrity?

It is along this social dimension that open source unambiguously dominates the proprietary model. Moreover, at a time when trust and truth are in retreat, the social dimension is the one that matters.

Jupyter rewards transparency; Mathematica rationalizes secrecy. Jupyter encourages individual integrity; Mathematica lets individuals hide behind corporate evasion. Jupyter exemplifies the social systems that emerged from the Scientific Revolution and the Enlightenment, systems that make it possible for people to cooperate by committing to objective truth; Mathematica exemplifies the horde of new Vandals whose pursuit of private gain threatens a far greater pubic loss–the collapse of social systems that took centuries to build.

Membership in an open source community is like membership in the community of science. There is a straightforward process for finding a true answer to any question. People disagree in public conversations. They must explain clearly and listen to those who response with equal clarity. Members of the community pay more attention to those who have been right in the past, and to those who enhance their reputation for integrity by admitting in public when they are wrong. They shun those who mislead. There is no court of final appeal. The only recourse is to the facts.

It’s a messy process but it works, the only one in all of human history that ever has. No other has ever achieved consensus at scale without recourse to coercion.

In science, anyone can experiment. In open source, anyone can access the facts of the code. Linus Torvalds may supervise a hierarchy that decides what goes into the Linux kernel, but anyone can see what’s there. Because the communities of science and open source accept facts as the ultimate source of truth and use the same public system for resolving disagreements about the facts, they foster the same norms of trust grounded in individual integrity.

The answer to the question and the lesson we should learn

So here is my conjecture about the question the article poses. Mathematica failed, despite technical accomplishments, because the norms of its developers clashed so obviously with the norms of its intended users. Jupyter is succeeding because the norms of the community that is developing it are aligned with the norms of its users.

This answer does not give me much comfort. If Steven Wolfram’s personality had made him just a bit better at faking both sincere apologies and sincere promises to do better, things might have turned out differently. The clash might not have been apparent to users until it was too late.

The take-away lessons are not to be seduced by promises of shiny technology from some proprietary initiative, even one that seems to have no strings attached; to ignore the personality of the leader of the proprietary effort; to go with the non-proprietary alternative that is fully committed to the open model; and if it doesn’t exist, create it.

Which reminds me. If you are a Julia enthusiast, how do you suppose the investors* in this new language plan to make their big score?

*(Edit Sept. 2021: Removed the link to the Cruchbase entry for Julia Computing because it uses an opaque “data” url.)

My experience with Mathematica

In 2015, I tried to share some research results in a Mathematica notebook. I knew that Wolfram’s proprietary business model made it difficult for anyone to check many of the assertions it made. I anticipated neither the dishonesty that this would facilitate nor the cost in wasted time that it would impose.

Then, I still clung to the belief that for a for-profit corporation, the risk of damage to its reputation would keep dishonesty in check, just as it did for a person. I interpreted examples of corporate dishonesty the same way that I interpreted instances of scientific fraud, as unrepresentative exceptions. I was slow to recognize that under the proprietary software model, dishonesty isn’t a bug; it’s a feature.

I’ve been revising my expectations, but it’s so hard to keep up. I can remember a time when “You opted-in” meant “We tricked you fair and square.” Now it’s little more than a short-hand for the Bart Simpson defense, “I didn’t do it, no one saw me do it, you can’t prove anything!” I even remember how people once accepted the common law principle that a contract is not complete if its terms and conditions are unclear.

So back in 2015, full of naive optimism, I set out to correct something that was wrong in a published paper. (Yes, I know. It captures the point I am trying to make, that the publications of science are different from content of the internet, which poses a threat that we have been too slow to appreciate.) I needed to present some symbolic calculations to prove that the steady-state approximation that the paper relied on was fatally flawed, and some numerical results, summarized in graphs, which showed that the error it caused was important.

On technical grounds, the Mathematica notebook was the perfect vehicle. It let me interleave typeset text and math with tables and figures that summarized the numerical calculations, and do so in a way that made it easy for anyone to replicate my results. My plan was to distribute a PDF of the static output from one run of the notebook and to invite anyone who wanted to replicate its results to download the notebook and run it using the required Wolfram software.

Now, in my defense, I have to explain that I had used the Mathematica REPL (read, evaluate, print loop) on code and never had any reason to write paragraphs of typeset text as notes to myself. The REPL is quick only if it prints to the screen, so I had rarely tried to print to PDF. (I did save individual graphs as PDFs and this worked just fine.)

This meant that when I embarked on the production of a document that I could share with others, I had not paid any attention to the typography of the typeset text and math in the PDFs that Mathematica generates. As I wrote, the screen version of the notebook interface lived up to its promise; the typeset text and math looked good. But when I tried to print to PDF, I discovered that the built-in article styles had typography that was bad, absurdly bad, so bad that someone must have worked at making it bad. I tried to fix a print style, but gave up. Combinatorial explosion easily overwhelms trial and error via a GUI. I extracted barely acceptable PDF output by making small changes to a screen style and cut my losses.

Wolfram made it hard to share a readable PDF version of a notebook because it wanted someone like me to distribute content in its proprietary file format, the CDF. It offered a free player, analogous to Adobe’s PDF reader, albeit one that required a 1.3 gigabyte download. To keep PDF output from leaking out of Mathematica’s walled garden, this player, like the full Mathematica application, was geared only to on-screen display. The tell that this was an intentional, hidden part of Wolfram’s strategy was that the same people who had been so responsive to other questions when I explored the possibility of using notebooks to share research results, went silent when I asked how to print a PDF with reasonable typography. They knew how. This was how they converted notebooks into articles for their in-house Mathematica Journal. It must surely be how Steven Wolfram produced his books.

Wolfram knew how to do what I wanted to do. It did not want me to be able to do it. It pretended, dishonestly, that I would be able to, and refused, dishonestly, to admit that they did not want me to be able to do it.

I’m happy with Jupyter

I stopped using Mathematica and gave up on notebooks, so it was only recently that I discovered how easy it is to use the Jupyter notebook to as a front end for Python libraries. It offers the best REPL I’ve ever used. It does a better job of delivering what Theodore Gray had in mind when he designed the Mathematica notebook. It lets me get quick feedback, via text or graphics, about what happens when I select a line of code and run it.

Python libraries let me replicate everything I wanted to do with Mathematica: Matplotlib for graphics, SymPy for symbolic math, NumPy and SciPy for numerical calculations, Pandas for data, and NLTK for natural language processing. Jupyter makes it easy to use Latex to display typeset math. With Matplotlib, Latex works even in the label text for graphs. (I have not yet tried the major update, JupyterLab, which is still in beta testing.)

I’m more productive. I’m having fun. On both counts, it helps to be able to get an honest answer when I have a question.

I’m frightened by the Vandals

In the larger contest between open and proprietary models, Mathematica versus Jupyter would be a draw if the only concern were their technical accomplishments. In the 1990s, Mathematica opened up an undeniable lead. Now, Jupyter is the unambiguous technical leader.

The tie-breaker is social, not technical. The more I learn about the open source community, the more I trust its members. The more I learn about proprietary software, the more I worry that objective truth might perish from the earth.