The Phoenix Project: Book Overview and Lessons

This article is an excerpt from the Shortform book guide to "The Phoenix Project" by Gene Kim, Kevin Behr, and George Spafford. Shortform has the world's best summaries and analyses of books you should be reading.

Like this article? Sign up for a free trial here.

Is the book The Phoenix Project worth reading? What can IT departments learn from the business parable?

In The Phoenix Project, Gene Kim, Kevin Behr, and George Spafford present a fictional case study of a business that’s failing because it doesn’t align the work of IT services with the company’s goals. The authors then show how an IT department can turn itself around and get back in the game.

Keep reading for a The Phoenix Project book overview, as well as the lessons you can learn from the authors.

The Phoenix Project Book Recap

A business today lives or dies based on the strength of its IT department. Because information technology is so deeply integrated into everything a business does—from producing goods and services to interacting with customers, processing orders, and even paying its own employees—every company in the modern age has to be proficient in how to optimize IT services. Not doing so risks heavy consequences, up to and including the failure of the business.

The Phoenix Project book was published in 2013 and presents a fictional case study of just such a scenario—an imaginary auto parts manufacturer that’s falling behind its competitors because it’s unable to align the work of IT services with the company’s larger goals. A new business initiative, dubbed “The Phoenix Project,” promises to bring the company into the 21st century by integrating online ordering, in-store sales, inventory management, and marketing campaigns. However, by botching the Phoenix rollout, the auto parts company almost implodes from a disastrous series of technical failures.

The authors of The Phoenix Project are Gene Kim, founder of the digital security company Tripwire, Kevin Behr, who co-founded the IT Process Institute along with Kim, and George Spafford, Vice President Analyst for the business consulting firm Gartner. Together, they use their combined expertise in technology management and business practices to paint a picture of how a company that does everything wrong when it comes to IT can turn itself around, revamp its core practices, and get back in the competitive game.

In this guide, we’ll begin by summarizing the decline and phoenix-like resurrection of the fictional Parts Unlimited’s IT department. We’ll then dissect the story’s central message, first by explaining the authors’ diagnosis of what makes IT processes fall apart, then by describing the three fundamental pillars of IT management that Kim, Behr, and Spafford prescribe as a remedy—speeding up workflow, providing quick feedback, and encouraging a culture of perpetual improvement.

In addition, we’ll provide clarification on IT, business, and production concepts that the authors assume the reader’s already familiar with. We’ll discuss the practice of using fiction as a teaching tool for real-world situations and explore other authors’ ideas on management, leadership, workplace culture, and maximizing workplace productivity.

The Saga of an IT Department

Our story focuses on the character Bill Palmer, a mid-level director at Parts Unlimited who’s promoted to vice president (VP) of IT Operations shortly before the grand rollout of the Phoenix Project, an online sales management tool that’s been years in the making. What he finds, though, is an IT department in utter disarray, stretched to its limit by constant demands and arbitrary deadlines it can’t possibly meet. Right out of the gate, Bill must cope with an emergency payroll issue, an ongoing conflict between IT Operations and the software development team, and a prospective board member who suspects that Bill’s IT department is going about its business all wrong.

On his first day as VP, Bill is thrust into trying to solve a payroll data problem that could result in many workers not receiving their paychecks. The difficulty resolving the issue is compounded by the lack of communication within the department. After sleepless nights and much departmental overtime, Bill determines that the root of the problem was a system change enacted by a vendor without the department’s knowledge. This clues him into what the authors say ought to be obvious in hindsight: A process for tracking and approving system changes is essential to IT management.

With the launch of the Phoenix Project on the horizon, Bill also has to mediate between his own department (Ops) and Application Development (Dev). Development accuses Ops of not prioritizing work such as Phoenix because Ops spends all of its time putting out fires like the payroll debacle. Operations fires back that Dev hasn’t left Ops enough time to properly test Phoenix before launch, nor has Dev provided the system specifications and operating instructions Ops will need to roll it out, meaning that Ops will still be fixing Phoenix problems even after the new system has already gone live. Much of that work will have to wait on Brent, the only software engineer who fully understands the company’s systems and can fix them.

While trying to come to grips with the chaos in IT and the impending Phoenix launch, Bill meets Erik Reid, a management expert whom Parts Unlimited is courting to be a board member. Speaking on behalf of the authors, Erik understands Bill’s problems even better than he does, but Erik doesn’t spill his knowledge all at once. Instead, he gives Bill a few guiding pointers such as the need to understand the different types of work, the danger of bottlenecks, and the three pillars of IT management—fast workflow, quick feedback, and perpetual improvement (all of which we’ll expand on in this guide).

The Phoenix Disaster

In spite of Bill’s apprehensions, the Phoenix Project launch goes even worse than anyone could have imagined. Thanks to the combination of a dysfunctional corporate culture, unrealistic expectations, and a lack of collaboration between Dev and Ops, the Phoenix release crashes not only the company’s online ordering system, but also the ability of their brick-and-mortar stores to make sales or process credit card transactions. The authors use the crisis to illustrate many points of failure—between management and IT, between Dev and Ops, and within IT Operations itself.

Even before the failures begin, Bill goes to the CEO begging for more resources, as well as permission to prioritize Phoenix above any other demands on IT. The CEO refuses to budge, demanding that IT make do with what it has and give equal weight to every request made by the organization. Meanwhile, Ops is unable to get Phoenix to work in a simulated test environment. Nevertheless, since the company’s marketing has already announced Phoenix’s release to the media, IT is forced to move ahead with implementation.

The Phoenix release almost destroys the company, making it hemorrhage money and customers as sales become nearly impossible to complete and users’ credit card data become unsecure. The CEO lays all the blame on IT and even threatens to outsource the whole department if Bill can’t find a way to meet the company’s bloated expectations. The authors show how this is a point of common ground between Dev and Ops because Operations and their counterparts in Development are frequently asked to perform miracles with little or nothing to go on. Bill despairs about finding any solution, and he considers leaving the company altogether.

The Turnaround

Bringing the company back from the brink is a larger job than merely IT’s. With some nudging from potential future board member Erik, the CEO admits his own culpability while Bill begins to envision new ways to correct IT’s problems and prevent them in the future. These include the short-term solution of stopping all work on everything except Phoenix while restructuring for the long term to produce faster, smaller product releases that allow for quicker testing and releases.

Bill suggests stopping any work unrelated to Phoenix. This includes not accepting any new projects into the IT workstream until enough of the work in progress is completed and his team can assess IT’s technical debt—the amount of future work that’s accrued by taking shortcuts and quick fixes in the past. It also gives them time to better manage Brent, the one engineer whose skills are so essential that he acts as a bottleneck to all of IT’s work.

The freeze on new tasks gives IT enough space to correct some of Phoenix’s problems while also giving Bill room to explore how IT’s work impacts the company as a whole. According to the authors, that impact is huge. IT’s functions support every single business goal of a modern corporation in one way or another. From his research into the company’s needs, Bill realizes that the Phoenix Project—one giant platform, years in the making, designed to do everything, everywhere, all at once—was ill-conceived from the start. What’s been needed all along are smaller applications that are faster to design, test, and roll out, and that can therefore be more responsive to customers’ needs in real time.

The Unicorn Project

While Bill keeps some of his team on the job to stop Phoenix from sinking the company’s boat, he devotes the rest of IT’s resources to a new initiative called “Project Unicorn,” which merges Dev and Ops in a collaborative cycle that automates their redundancies and allows them to design, test, and implement software solutions at a rapid pace. Using the Unicorn structure, Dev and Ops are able to bypass Phoenix entirely and deliver on its promised functions by creating smaller, more versatile applications.

After the success of the Unicorn Project, Bill builds on IT’s work by launching a program to continually test their systems for weakness and introduce improvements, both to the software they create and to the process by which IT functions. As a result of Bill’s success in IT, the CEO places him on a fast track to become Parts Unlimited’s next Chief Operating Officer (COO). The authors predict that in the future, most corporate COOs will have backgrounds in IT because IT is now so heavily integrated into every business function a company performs.

Work and What Stops It

Throughout their narrative, Kim, Behr, and Spafford illustrate the common workflow challenges that plague IT departments. At the root of these issues, when they occur, is a failure to recognize, prepare for, and manage the two greatest disruptors of productivity—unexpected work and bottlenecks.

In the story, Erik challenges Bill to identify and understand the four types of work IT performs. The first of these are business projects initiated by the company or one of its divisions, such as sales, marketing, or human resources. The novel’s titular Phoenix Project is a business initiative on the largest scale. The second type of work constitutes internal IT projects, such as upgrading servers or migrating data. The third class of IT work is made up of changes, often minuscule, to databases, app configurations, and lines of code. The authors suggest that such changes are a major source of work and potential problems that, if unmanaged, contribute to the fourth type of work IT does—unexpected work that grinds the system to a halt.

Unexpected work is almost always an emergency, or at least is made to seem that way, and it gets in the way of doing anything else. Unexpected work includes fixing a nonstop flood of technical problems, many of which aren’t prioritized by importance, but rather by how vocal the person complaining about the issue is. What’s worse, unexpected work creates even more work by taking time away from the system testing and preventive maintenance that would stop such problems from arising in the first place. To correct unexpected problems quickly, the solutions implemented are often untested patches and workarounds that build up technical debt in the system, laying the groundwork for more unexpected problems.

The Bottleneck

The authors say that the other work stoppage in any system is the bottleneck (which the authors refer to as the constraint), defined as the one link in the chain that limits the speed of the entire production process. In the fictional example, the bottleneck is Brent, the lone engineer whose unique and exhaustive knowledge of Parts Unlimited’s computer systems makes him the indispensable go-to guy for every task IT tries to perform. As a result, IT can’t do anything without involving Brent in some way, and so no work gets done any faster than Brent is able to get to it. Because Brent is so overburdened, he doesn’t have time to document his work, which means his knowledge isn’t shared and IT becomes even more reliant on him as a crutch.

Brent is merely one example of the kind of bottlenecks that can constrict a system. The authors list other types of bottlenecks based on those described in The Goal: A Process of Ongoing Improvement by Eliyahu M. Goldratt. These include the creation of test environments for software, installing large amounts of new code, and getting approval for changes from committees. Whatever your system’s bottleneck is, the authors are clear that you can’t push work through your department any faster than your bottleneck will allow. Any attempt to do so will result in a traffic jam of work piling up at one station while the rest of the production line sits idle.

The Pillars of Production

None of the problems of IT are insurmountable, but Kim, Behr, and Spafford argue that addressing them requires completely rethinking how IT work is done. In their fictional case study, they demonstrate how work management principles developed on factory production lines can be applied in an IT environment, where the production of software, databases, and networks can be likened to manufacturing physical products. The three foundational pillars of production can be summed up as 1) fast workflow, 2) quick feedback, and 3) continual improvement.

Because the authors’ point-of-view character is a vice president of IT operations, one might assume that their advice is intended for readers in corporate management positions. However, understanding the principles that follow will be essential for everyone in the production process, since implementing the authors’ recommendations will require buy-in from many people in a company and certainly everyone in IT.

Pillar 1: Fast Workflow

The first keystone, “fast workflow,” may sound like a goal and not a place to start. However, the authors lay out a series of procedures that can establish a faster flow of work through IT from the outset. These include creating a visual tracking method to monitor and schedule work through IT, reducing the paths by which work enters the pipeline, breaking projects down into smaller, independently manageable components, and opening up your bottleneck so that work flows through it as quickly as possible.

To begin with, managing workflow is impossible without a way to monitor its progress. The authors repeatedly recommend using a kanban board (a visual tool that shows the status of tasks as they pass through IT) both for tracking work and for scheduling tasks as they come in. A visual tracking tool lets you document how much time a task takes at each station, and therefore it indicates how much work you can afford to take on. If certain tasks are regularly repeated, then documentation will allow you to plan for exactly how much time they will take. It will also make you aware of how many time-consuming handoffs occur as a task is passed from one station to the next and whether you can reduce those handoffs in order to speed up workflow even more.

As you monitor your department’s workflow, you’ll also become aware of all the different avenues by which work—often unexpected work—enters and confounds the flow of operations. People in many organizations have grown used to calling IT staff directly to fix what they perceive as “urgent” computer problems, interrupting and backlogging whatever projects your staff was meant to be working on. Identifying where and how unexpected work interrupts production is the first step in limiting the amount of work in progress. The authors stress that doing so is vitally important to speeding up IT production.

Another key component of speeding up workflow is to reduce the size of the projects you take on. Huge projects meant to accomplish multiple business goals at once, such as the authors’ disastrous Phoenix Project, are so unwieldy that identifying design problems and errors is that much more time-consuming and difficult. Dividing large projects into small, discrete units can let each part move through design, testing, error-fixing, and rollout in a timely manner, sometimes on the order of days or weeks instead of months or years. If there’s any fundamental flaw in one of the smaller components of a project, it can be caught and corrected that much sooner, before it can have a devastating impact on the whole.

Open Your Bottleneck

The authors insist that to truly speed up workflow through IT, you should make the most efficient use of your bottleneck. By visually monitoring the work through IT, you’ll quickly identify where the bottleneck is—most likely one overburdened workstation. Though it may be counterintuitive, workstations need idle time in order for work not to pile up. The authors provide a simple formula to determine how much time a task will spend in the queue at any given station (the wait time) depending on how much time that station spends idle:

Wait time = Percent of time busy / Percent of time idle

According to this formula, wait times at a station that’s 80% busy and 20% idle will be four times longer than at a station that’s 50% busy and 50% idle.

Once you’ve identified your bottleneck, you can arrange your workflow so that you don’t send work to the bottleneck faster than it can handle. Tracking work will also let you know if any of the bottleneck’s work can be automated. If the reason for the bottleneck is unshared skills or knowledge, as in the fictional example of Brent the software engineer, you should document everything the bottleneck does so that knowledge and skills can be shared among your team, eventually leading to sharing of the workload. Once one bottleneck has been opened up, you may find another in your production line. If so, apply the same steps as before.

Pillar 2: Quick Feedback

In order for a streamlined, faster workflow to be beneficial, your system must generate and implement corrective feedback all along the production chain. Feedback cycles, like your projects themselves, should be small and efficient so that problems can be identified quickly, connecting both Development and Operations, and resolved in a way that generates new information.

The authors note that systems such as large software packages and sprawling computer networks are far too complex for any one person to fully understand. Therefore, work on those systems must be designed in such a way that errors can be detected and corrected quickly. Small project sizes let you catch and fix problems before they become disasters, but to enable this, your production process must generate feedback at every step in which work is performed. One example of a way to create this kind of feedback is to institute checklists for every task being done, especially at any point in the process where work is passed from one station to another.

Feedback cycles can’t be confined to Operations—they must include Development as well. Developers must know if their code is effective or problematic as quickly as possible, not months down the line after they’ve moved on to other projects. The authors say that feedback from Ops must be incorporated into the very beginning of the Development process, in effect merging the two departments into one cohesive DevOps gestalt. Ideally, the feedback and response time between designing and installing new systems needs to be fast enough to keep up with customer demand, whether those “customers” are clients of your company or other departments inside your organization.

The moment your feedback measures detect a problem, it should be fixed immediately, not patched with a workaround and put off until later. The authors insist that problems should be tackled by the people closest to them as part of their regular responsibilities. More than that, those people should bring in as much help as they can get from their department to resolve the issue. This way, the issue becomes a learning opportunity that generates new knowledge for the organization. By documenting the whole solution process, that knowledge becomes embedded in IT’s procedures and toolkit for future problems.

Pillar 3: Constant Improvement

The last and perhaps most important component to harnessing IT’s—or any system’s—full productivity is to create a culture of continual improvement through practice, repetition, experimentation, and useful failure. Using the story of their fictional company’s ill-conceived “Phoenix Project,” the authors provide a negative example of a company with a toxic workplace culture and a blueprint for a better way to encourage growth, development, and innovation that benefits your business as a whole.

The defining characteristic of a toxic workplace is that employees’ behavior is guided by a constant fear of failure. In such a culture, administrators address mistakes and malfunctions by assigning blame and taking punitive action. As a result, people are discouraged from identifying errors and problems in a system. Feedback is silenced, because who will provide it when the default cultural response is to kill the messenger? Toxic work cultures stifle any improvement, and without constant improvement, a system will stagnate and problems will fester until they become catastrophic.

On the other hand, a productive company culture will encourage people to report problems at once. The authors show that if employees can trust that they won’t be reprimanded—and may in fact be rewarded—for identifying and helping solve issues detrimental to the company, then those employees will feel an ownership stake in seeking out ways to improve the whole system. If they’re encouraged to take risks in the process by trying solutions that may or may not work without fear of reprisal from on high, then your company will foster a culture of innovation in which even failed attempts at improvement are seen as a way to generate knowledge that can be shared across the organization.

Finally, the authors state that businesses should formalize their systems of continual improvement. IT staff can hone new solutions and practices while refining them through repetition and practice. Teams within DevOps can root out flaws in their products by pushing their products’ limits, forcing errors to emerge before they crop up on their own. One way to accomplish this is to deliberately introduce faults and design flaws into the production line so that staff can practice identifying errors, providing feedback, and coming up with resolutions. Not only can you identify systemic deficiencies in this way, but you’ll also promote a culture in which experimentation, risk-taking, and learning become an institutional way of life.

The Phoenix Project: Book Overview and Lessons

———End of Preview———

Like what you just read? Read the rest of the world's best book summary and analysis of Gene Kim, Kevin Behr, and George Spafford's "The Phoenix Project" at Shortform.

Here's what you'll find in our full The Phoenix Project summary:

Why a poorly-run IT department will destroy a business
The three pillars of IT management recommended for any modern business
A fictional case study about a man who turns around an auto parts company

Get the world's best book summaries now

The Phoenix Project: Book Overview and Lessons