Environment as Code

In a previous post, I wrote that Anaconda got in my way when I was trying to learn Python. After I wrote that, I tried to recall the specifics. What was it that I started doing with the official Python tools that I didn’t do with Anaconda? Mostly, I broke the rules and made a mess.


NOTE: Since I published this post, I discovered an error and someone pointed out a second error. I fixed them in the body of the post but to be transparent, I list the changes in the Errata section at the end.


To explain why breaking the rules and making a mess was productive, my first impulse was to invoke the Chaos Monkey at Netflix and the insight from Chaos Engineering, that breaking things is the best to create incentives that encourage investments that reduce the cost of recovery.

This justifies a particular type of random behavior, but does not justify breaking such rules as “work in a virtual environment.” To identify the deeper advantage of what I was doing, I thought about where I started and where I ended up.

These names draw on the distinction between servers as “pets” versus “cattle” and the dev-ops concept of “infrastructure as code.” If you haven’t encountered these terms before, Digital Ocean has a good summary:

https://www.digitalocean.com/community/tutorials/what-is-immutable-infrastructure

The Challenge of Global Optimization

One thing that these two mindsets share is that each encourages myopic responses that reinforce the mindset. This is a warning that learning means optimizing on a rugged landscape and that myopic hill-climbing is likely to leave you stuck at the top of a little hill. Most people recognize that environment-as-pet is suboptimal. It is natural, almost inevitable, that people start out in its vicinity because they have relied on a Graphical User Interface (GUI). This means that a fundamental strategy for learning has to be to go beyond local hill-climbing. The only way to learn what’s best is to explore many different hills.

This reveals the real advantage of randomness. It is like simulated annealing as a strategy for global optimization. It gets us to move around and explore widely.

A commitment to this type of randomness will be particularly important in the early stages of the learning process. Rules that make sense for seasoned developers might be harmful for someone with little experience. Seasoned developers have presumably been around long enough to have explored and jumped around on many hills. They also write code that more frequently goes into production, so the cost of any error they might make is higher. For both reasons, they may live according to tight prescriptions and strict taboos that might not serve a young developer.

New learners need to be able to try many strategies. They should experiment and accumulate a wide range of experience. They should expect errors and prepare for them. Otherwise, they risk being trapped early on in the self-reinforcing environment-as-pet mindset that GUIs foster.

For someone with lots to learn, it is crucial that they reduce the cost of error, not because error is inevitable, but rather because error is productive. Or to be precise, because error is an inevitable part of experimentation. A lower cost of errors supports more experiments, hence faster learning.

Environment as Code

Because it is top of mind, I’ll start with a recent decision that shows what environment as code means for me.

While looking for ways to help students on macOS who were trapped by Anaconda, I posed a question to ChatGPT. I don’t rely on it, Copilot, or any of the other LLMs, but I did want to see what the experience would be for a student who tried to follow its suggestions about how to get an official Python to run. One such suggestion was to have Homebrew install a new instance of Python. I generally do not use Homebrew to install Python, but was curious to see what would happen if I started from an Anaconda environment.

When I was done, I didn’t want to use it, but it was too early on my path to leave it alone. I could have told Homebrew to uninstall it, but what I did instead was mimic the Chaos Monkey. I deleted /opt/homebrew, the folder with all the software that Homebrew installs. Then I watched to see if I had code in place that reinstalled anything that Homebrew managed that I still needed.

I soon discovered that a build script failed because I had forgotten that it relied on Homebrew for Node. So I added a note reminding myself that I should install Node before running that it. Environment-as-code doesn’t have to mean automation. Sometimes a script (as in script that an actor follows) is enough.

This experiment revealed a gap. I was glad I found it. Because I do this kind of thing regularly, I have most of the code in place. It is =easy for me to recover. The positive feedback loop is obvious. When it is easy to recover, I’m willing to just try things, which means that I learn and get even better at recovering.

Install a recent version of Anaconda? Sure. An old version? Why not. Use Homebrew to install Python? Sure. Delete my Homebrew directory? Sure.

Environment as Pet

Randy Bias, who first categorized servers as pets or cattle, gave this description of a server as pet:

Servers or server pairs that are treated as indispensable or unique systems that can never be down. Typically, they are manually built, managed, and “hand fed”.

When I first started using the Anaconda distribution of Python, I treated my environment like a pet. I set it up at least partly through a GUI, making a series of changes over time. I knew that I did not have a script I could follow to recreate the current state. This left me afraid that if broke it, I wouldn’t know how to fix it. Like a devoted pet owner, I thought about cloning it in the vain hope that I could keep it working forever.

An advocate for the Anaconda distribution could surely argue that it is possible to use its package manager, conda, to implement the “environment as code” strategy. Experienced users of Anaconda might chime in saying that this is precisely how they use it.

But even if this is possible in principle, in practice, it is not where most users end up. New users start on the environment-as-pet little hill. By encouraging them to make changes through a GUI, Anaconda keeps them where they are. They will learn to climb the much higher environment-as-code hill only if they try experiments that look difficult and frightening.

In addition to encouraging the use of a GUI, Anaconda makes it hard for a new user to experiment. The first thing a new user learns is to avoid having to repeat its painfully slow installation process. Its rigid insistence of auto-activation makes it extremely difficult for someone to experiment with installing and uninstalling Python. Rigid file formats and the complexity of keeping track separately of libraries that are managed by conda and pip makes it much harder for a new user to start thinking about an environment as something that can easily be described as code. The focus turns inevitably toward keeping an existing install of Anaconda alive and protecting a few virtual environments.

After Anaconda

Anaconda forces every line of code to run inside a virtual environment. Once I broke free, I wanted to learn about the other extreme, so I stopped using virtual environments at all. (Please keep reading. It was a phase.)

I started learning Python in 2019 and always worked with versions greater than 3.0, so I never used the system Python that was once part of macOS. Without virtual environments, every library that I pip-installed ended up in the site-packages folder for some version of Python that I downloaded from python.org instead of in the base environment for conda, which is generally where they had ended up before.

The first big difference was that it is a trivial matter to install an official Python. I ended up with several, at an early stage, python3.4, python3.5, and python3.6. To switch between them, I started keeping an editor window open to my shell profile (at the time, .bash-profile.) To use a specific version of Python, I commented out the lines that added the versions to PATH, saved the profile, and opened a new terminal window.

Even though I was not creating virtual environments, using several versions of Python gave me an incentive to add a requirements.txt file to every project. This was the truly important difference in my workflow. After I had picked my version of Python, I’ll be honest and say that what I did was run

pip3 install -r requirements.txt

Most of the libraries were in the cache, so running this consumed little time and bandwidth.

Problems were rare, but when they arose, my first response was to delete the version of Python, reinstall, and rerun pip3. If that didn’t work, I’d switch to a different version of Python. My goal was to learn, not to minimize the probability of an error. I learned a lot about the evolution of Python from version 3.4 to version 3.12 (and counting.)

I stumbled into this environment-as-code workflow without giving it much thought. Positive feedback made it increasingly easy to use, but the main reason I stuck with it was that the user experience was dramatically better. The focus was where it belonged, on the requirements.txt files that told me how to recreate the environments I worked in. They gave me confidence. I experimented and learned faster. I was not afraid.

And mind you, all of this was without using any virtual environments.

Deleting and Reinstalling

Of course, an unruly pile of libraries tended to accumulate in the site-packages folder for each version of Python. But as you might have guessed from the story about deleting /opt/homebrew, there was an easy response. Clean house regularly.

Before continuing, it helps to be specific about terminology. For a specific version such as Python3.12.5, here is the output you will get if you import the sys module (which is part of the standard library) and run the command:

>>> sys.version_info

sys.version_info(major=3, minor=12, micro=5, releaselevel='final', serial=0)

In terms of this terminology for the versions, it didn’t take long to notice that installing a new micro version (for example 3.7.4) on top of an existing micro version (3.7.3) does not clean out the site-packages folder, so I adopted the practice of deleting first. If I had python3.7.3 installed, I would delete the subtree at /Library/Frameworks/Python.framework/Versions/3.7 before I installed python3.7.4. If I encountered a problem and there was no new point release, I would delete the subtree at /Library/Frameworks/Python.framework/Versions/3.7 and reinstall python3.7.3.

Either way, when I pip-installed the requirements for a project into a clean install of Python, I was working in the functional equivalent of a virtual environment.

requirements.txt as Code

It is worth repeating that this workflow put the emphasis where it belongs, on the code for building a work environment, not on an instantiation of the environment, virtual or otherwise. It is the code that reduces the cost of recovering from error. I created requirements.txt files not because someone told me to but because they were simple to use and made it easier to start over.

If you are not convinced that the code is more important than the virtual environment, consider this thought experiment. Two people have just left the company you run.

  1. One leaves a workstation where every Python library is installed in one of a handful of virtual environments. All the code that the person wrote is in one big directory that has a bunch of *.py files that import from other *.py files. There are no requirement.txt files.

  2. The other person leaves a workstation with no virtual environments but with projects in separate directories, each of which contains a requirements.txt file that lists its abstract dependencies.

If you have to continue the work that these two people were doing, which workstation would make your task easier?

On the second workstation, it would be easy to use the requirements.txt files to create a virtual environment for every project. You’d have to pick a version of Python to use, but it probably won’t matter which one and if it did, you’ll find out soon enough.

On the other workstation? Good luck.

Back to the Future

As I got more familiar with the Python ecosystem, I eventually decided to try out Python’s venv module and create a virtual environment for some new project. (I don’t remember which.) Because venv is simple, and I already had the requirements.txt files, this was easy to do. This gave a new variation for the Chaos Monkey. I could also delete the virtual environment for a project, then try to create a new one with a different micro version of Python. Or delete and reinstall Python and leave the virtual environment alone. (Have you tried that one?)

Over time, creating virtual environments turned into a habit. Now, I use them always, but not because it would be a disaster if I don’t. I use them because they are easy to use, somewhat helpful, and now I understand what they do.

I also tried other tools, including virtualenv and pipenv, but kept gravitating back to:

I suspect that I came back to them partly because they were so simple and easy to use. To create a virtual environment in a folder I’d run

python3 -m venv .venv

Because I’m using virtual environments, I now run pip with the command

python -m pip install -r requirements.txt

I leave the 3 off of python because I’ll get an error if I have forgotten to activate.

An Appreciation of the Official Tools

There is a lot of complaining these days about Python packaging. I’m perfectly willing to accept the conclusion that others have reached that going forward, there is room to do better, especially for someone who wants to publish a new library or application.

Nevertheless, I am not convinced that the core developers at python.org and pypi.org mismanaged things. They made trade-offs that offered the inevitable combination of advantages and disadvantages. As I learned about the Python ecosystem over the last 5 years, there were many advantages. Once I started using the official tools, they were simple to use and encouraged me to experiment and learn. There are, of course, things that a requirements.txt can’t do, but it is incredibly easy to use it to do what it is supposed to do. Same for pip. Same for venv. I don’t think my experience is unique or unrepresentative. Lots of people joined the Python community over the last five years.

On the basis of my experience, and the learned helplessness I see in every student on macOS who gets trapped by Anaconda, I’m skeptical that it would have been easy to come up with solutions that would have dominated the official tools. Remember that one of the key features of those tools was that they were so simple to use. The requirements.txt was a simple text file. The pip command doesn’t require a URL. These kinds of details may seem trivial for someone who understands, but for someone who is struggling to remember and in a hurry, every complication hurts.

It makes me shudder to contemplate an alternative scenario in which I would have been forced to keep struggling with the full complexity of conda before I had a chance to understand what a requirements.txt could do; and had been prohibited forever from even opening a terminal, much less running code, unless I activated a virtual environment.

For the benefit of the many people who will join the Python community in years to come, I hope that the solution to the current dissatisfaction about packaging does not deprive new learners of the chance to use simple tools that encourage experimentation and learning, tools that treat users as “consenting adults” and accept that making a mess is part of learning.

Conclusion

If you are learning Python on macOS, consider these actions:

  1. Delete an official version of Python, then reinstall it.

  2. Delete the virtual environment for one of your projects, then recreate it.

  3. Delete your Homebrew directory, then reinstall the code you use on an as-needed basis.

  4. Backup the settings for your text editor and delete its Application Support folder from ~/Library. (What’s in those cache files anyway?)

  5. Backup your .zprofile and .zshrc files and start with new ones that are empty, adding back only the commands that you want to run after you know what they do.

  6. If you use Anaconda or Miniconda, disable it and pick a project to work on using an official distribution and the official tools.

If any of these possibilities give you a knot in your stomach, think about how much more fun work would be if you were not afraid. Think about the experiments you would do and the things you would learn.

Then perhaps, make some backups and break something. Or violate some taboo. What the heck, pip-install some libraries directly into the site-packages folder for some version of Python. Delete it. Reinstall it. It will be ok.

Errata

  1. The original draft omitted the 3 that must be appended to pip if you are not working inside a virtual environment.

  2. The original version misused the terms “major version” and “minor version” of Python. Thanks to zahlman for contributing a comment on YCombinator News that pointed out this error.