The Software Scientist
By Evan Miller
December 14, 2014
You’d think that any programmer with a basic grasp of recursive algorithms would be hesitant to take a job as a software engineer. By and large, programming is a young man’s game, so one of three things usually happens before an engineer turns forty-five:
The engineer is fired, or “managed out”
The engineer becomes a manager
The engineer becomes a kind of nomadic hermit (“Principal Engineer”)
Do a little backwards induction on your life, and you can see why even talented coders often choose other careers.
But there’s a new option opening up, one which I’m going to explore here. The role was hinted at recently in a blog post by lizTheDeveloper, and I’ve witnessed a few of the incarnated specimens out in the wild. As far as I know, they don’t have a name yet, so I’m going to call them what I think they really are: they’re software scientists.
What is software science?
Any engineering organization that’s been around long enough has accumulated a culture — usually oral, sometimes written — about the best ways to do things. Software science attempts to formalize and test the prescriptions of that culture, thereby eliminating its superstitions and increasing the confidence that can be placed in its validated beliefs.
In that sense, software science is a kind of introspected version of The Lean Startup methodology; but instead of studying customer behavior, and thinking up ways to increase revenue, software scientists study the behavior of engineers, and think up ways to improve productivity.
Software scientists do not write production code. Instead, software scientists attempt to discover the principles that lead to the production of good code. This involves studying a mix of technical and human elements; it requires enough math (“Big O math”) to think about problems in a formal way, and yet requires a high degree of emotional intelligence to understand the human factors that may drive or hinder software development.
The consummate example of a software scientist is Fred Brooks, manager of IBM’s System/360 project, and author of The Mythical Man-Month. Brooks famously described the reason why increasing the size of software teams doesn’t always make it more productive: \(N\) employees might do \(O(N)\) work, but the communication overhead is \(O(N^2)\). That simple insight, I think, represents what software science is all about. It’s about applying algorithmic thinking to the processes that result in software.
Two more recent examples of software scientists in the private sector are Pat Helland from Microsoft/Amazon and Grady Booch from IBM. (Search for their public talks to see what I mean.)
Software science draws on systems and organization theory, but it is not as math-intensive as traditional operations research. Historically, the software scientist has been a kind of manager turned self-trained social science researcher. The practice of software science requires a strong intuitive grasp of psychology, including how it plays out at the individual level (how programmers feel about their work and tools), the organization level (group dynamics, “office politics”, etc.), and the industry level (adoption of technologies, attraction of talent, and so on).
Software scientists have been something of anomalies within their organizations, almost voices in the wilderness. But that’s about to change, I think.
Enter data science
Up to now, software scientists have been self-appointed wise persons, who might give a talk to anyone who will listen, or, in the extreme, publish a book. But they’re starting to discover a new tool in their toolbox, one makes their findings more insightful, more relevant. They’re starting to discover data science.
That is, rather than rely on intuition and remembered observations to verify their theories, software scientists are gaining the ability to analyze actual data from the organization; this data might include commit histories from the version control system, demographic data from HR, archives from internal mailing lists, logs from internal IRC channels, or anything the software scientist might think of.
Having actual data lets the software scientist throw out bad theories, refine the good ones, and ultimately advise organizations on how to make the most of limited engineering resources.
So there are really two mechanisms by which data science stands to radically increase the influence of software science in industry. The first mechanism is that analyzing actual data improves the science, and thereby makes any resulting prescriptions more effective.
The second mechanism is that the prescribed medicine is more easily digested. With the data visualization tools that have emerged in the last few years, “memes” (here I’m referring to graphs, not advice animals) produced by data scientists can propagate quickly through organizations. By making use of the same tools, software scientists are much better positioned to influence the course of the conversation within an organization than they would with the traditional long essays and dusty equations.
Software scientists can also fill a kind of “public intellectual” role for the organization, which I’ll cover in the next section.
Why do organizations need software scientists?
Software science fulfills three distinct roles within a software engineering organization.
First, and most obviously, software science fulfills a “decision support” role by offering insights and advice to management about the practice of software engineering. Software scientists are equipped to evaluate the effects that new engineering policies have on overall productivity, and advise on the potential effects of proposed policies. That is, they help to create an environment where engineers can be most productive.
(Relatedly, after new policies have received management approval, software scientists are in a position to “sell” changes to the engineering organization with empirical findings and intellectual arguments, thereby speeding up the adoption of those policies.)
Second, software scientists are ideal candidates to be “public intellectuals” who represent the company by speaking at conferences and writing blog posts. By delivering interesting — and widely applicable — talks about the practice of software engineering, software scientists raise awareness about the company among potential engineering applicants, and also create a desire to work at the company. By writing and speaking publicly, the software scientist offers evidence both that “smart people” work at the company, and that the company’s engineering practices are driven by data and observation rather than intrigue and superstition.
Software science serves a third function, which I alluded to in the beginning of this essay. It provides a much-needed career path for highly talented engineers who want a promotion track that’s not the same-old-same-old (programming), and that’s also not management.
The problem with going into management is that engineers often don’t find the work very interesting, or they aren’t very good at it, or (likely) both. The problem with remaining in engineering is that it is perceived as having lower prestige than management, and attempts to make it high prestige (“Principal Engineer”) tend to create social tension within engineering; the Principals feel a measure of guilt about their theoretical freedom, and the rank and file feel resentment about the “drive-by engineering” performed by the mighty Principals.
Treating the position of software scientist as a viable, alternative promotion track for software engineers provides a convenient solution to the to-manage-or-not-to-manage career dilemma, in at least three ways.
Software scientists have a clearly defined role that is distinct from engineering. The role entails both separate, enhanced privileges (e.g. access to certain organizational data that regular employees do not have) and separate responsibilities (giving talks, writing articles), both of which convey an increase in prestige. Just as important, this role does not have the uncertainty that surrounds the title of Principal Engineer, whose rights and responsibilities are usually vague and nebulous.
Software science provides a way for engineers to continue learning, which is an opportunity prized by engineers. Engineers can leverage their accumulated knowledge to do better software science — so they’re not starting at zero, as they would be with management. When using data science tools, they’ll be able to leverage their programming skills, but they’ll also have an opportunity to grow as they learn enough statistics to do the job of a data scientist.
Software scientists provide aspirational role models for the engineering organization. They are in respected, public, high-profile roles, and they also have a lot of contact with regular engineers in the course of their work (gathering data, testing ideas, giving talks). For younger software engineers who are wondering about their ultimate fate, the position of software scientist provides a concrete goal to work towards — and, for those recursive-thinking worrywarts, a reason to remain in software engineering rather than seek another career that has better perceived long-term prospects.
Of course, like any promotion, the position is not simply a laurel bestowed on employees who have done the most work, and it can’t be filled by any employee of the requisite vintage. There are specific skills required.
What skills are required to be a software scientist?
Software science requires a set of skills that many of the best software engineers will have — but the relative weights assigned to the skills are a bit different than in regular software engineering. The skills required of a software scientist include:
Big-picture, “systems” thinking. Software scientists are the ones who think about how everything fits together. They need to be able to step back and model everything in the organization, not just the usual low-level algorithms.
Mathematical reasoning abilities. They’ll be able to do asymptotic (“Big O”) analysis on the processes that they’re studying. They should be able to build simple mathematical models and analyze them either analytically or via simulation.
Intuition and emotional intelligence. This is something that software scientists have in common with the best managers. They need to be able to look beyond what people say to understand why people do the things they do.
Strong communication skills. They’re going to do a lot of speaking and writing. They should have a knack for writing and at least a little bit of natural charisma as they hit the conference circuit and enter the blogosphere.
Respect from engineers. They’re going to be weighing in on issues that engineers may have strong opinions about. To be taken seriously, they need to be respected as engineers, which is another reason why software engineers should most likely be drawn from the engineering organization rather than recruited from without.
Statistical education. This is currently optional, but I predict that it will become critical as data science becomes an integral part of software science. Having newly anointed software scientists enroll in a one-year program in predictive analytics (or, alternatively, serve a one-year apprenticeship under an existing data science team) would serve as a concrete reward for a successful engineering career, and form an initial step towards become a first-class software scientist who is able to test and validate theories about software engineering with data.
This combination of abilities represents a high bar to clear, but then, the software scientist is an unusual position that requires a range of skills to do well. To perform their functions, having a small number of highly able practitioners is preferable to a large number of mediocre ones; so the role of software scientist ends up serving as a kind of alternative apotheosis at the top of the software engineering career ladder.
I will hazard a guess and say that a software scientist will command roughly the same amount of respect and compensation accorded to a Principal Engineer; but ultimately their status will depend on their ability to continue providing valuable insights to the organization.
Painting the big picture
I suspect there’s quite a bit of untapped talent for software science currently locked up in engineering organizations. Ideas from “big picture” thinkers in the organization usually aren’t welcome until their authors have mastered the details — that is, acquired a lot of experience — but then when they get a lot of experience, the only real role for a “big picture” person is Lead Architect.
That role is problematic, because the only way to “test” ideas is to place a wager on a working hypothesis, with the fate of an important project as the stakes. Furthermore, if the Big Picture Lead Architect desires to pontificate on larger issues regarding the right way to “do software”, they risk sounding like a blowhard.
Software science provides a formal institutional mechanism for taking these ideas — and these people — seriously, in the process distilling untested opinions into valuable knowledge.
In some sense the only people who have anything to lose from the growth of software science as a profession are self-appointed software gurus and consultants who have invented systems for doing things, but who offer no real evidence that their system is any better than what it replaced. The advantage of having in-house software scientists over hiring what amount to peripatetic witch-doctors is that the in-house team will have a better long-term alignment with the organization’s goals, superior data for validating theories and ideas, and a more localized knowledge of the specific histories and cultural influences that may affect the efficacy of any proposed changes.
The in-house team is also uniquely positioned to supplant the sense of external authority conveyed by consultants — which is often the consultants’ only real economic function — because the ranks of the software scientists will include some of the most respected names in the organization.
Software science has existed, in some form or other, for decades, but under various academic and para-academic umbrellas. I believe the time is ripe to give the discipline a proper name (“software science”), inject it with hard evidence by commandeering the tools of data science, and to formalize it with an institutional role that will both enhance engineering organizations’ self-understanding and alleviate some of the career-path anxiety that software engineers currently face.
Secondarily, the public role of software scientists can both advance the general understanding of the right way to develop software, and help raise awareness about the sponsoring engineering organization that is striving to differentiate itself in the crowded market for software talent. (“That’s somebody I want to work with.”)
Finally, establishing software science as a distinct discipline will, I hope, bring out some of those “big picture” thinkers from the woodwork. The world of software is a fascinating one, but (in my opinion) under-studied. Within engineering organizations, there’s no real role for the big-picture thinker right now; and in the public blogosphere, the big-picture thinkers tend to disappear quickly, either because they realize that they don’t have time to consider issues as carefully as they’d like (a realization usually reached thanks to the tireless efforts of Internet commenters), or else they dislike being known as a household name within their own organizations.
But enough about making the world a better place. If you’re currently a software engineer and you have a hunch that, one day, you’d make an effective software scientist, I’d say go ahead and start the conversation with your boss now. With apologies to Alan Kay, the best way to predict your professional future is to invent it.
Correction: A previous version of this essay used “Coda Hale of Etsy” as an example of a software scientist; he does not work at Etsy, and denies being a software scientist. Grady Booch is a better example.
You’re reading evanmiller.org, a random collection of math, tech, and musings. If you liked this you might also enjoy:
Want to look for statistical patterns in your MySQL, PostgreSQL, or SQLite database? My desktop statistics software Wizard can help you analyze more data in less time and communicate discoveries visually without spending days struggling with pointless command syntax. Check it out!
Statistics the Mac way