How did Nvidia CEO Jensen Huang become one of the most powerful figures in the tech and AI industry? In The Thinking Machine, Stephen Witt claims it’s because of one idea: parallel processing. This approach allows computers to break complex problems into smaller pieces and solve all of those pieces at the same time, unlike traditional chips that work through each piece sequentially.
In developing the technology that would power the AI revolution, Nvidia pioneered an approach to computing that would challenge how computers process information. But this breakthrough began with a much more practical problem: making video games look better. Continue reading to learn more about what parallel processing is, and how it started with the goal of creating realistic graphics for popular video games.
Table of Contents
What Is Parallel Processing?
So what is parallel processing? To understand how it works, you have to know where it got its origins. When Jensen Huang and his co-founders started Nvidia in 1993, they wanted to build specialized chips to render the complex 3D graphics that games like Quake and Half-Life demanded. To create realistic visual effects, these graphics processing units (GPUs) needed to calculate the color and lighting for thousands of pixels simultaneously. This required a different approach from traditional chips, which operated sequentially, performing one calculation at a time, albeit very quickly.
(Shortform note: The demand for computationally intensive graphics was driven by competition to create more immersive video games. Doom, where players battled demons on Mars, used clever tricks to create the illusion of depth while running on flat, 2D maps. Quake was built from true 3D polygons, enabling players to navigate medieval dungeons they could see from every angle. Half-Life, where users played a scientist at a research facility overrun by aliens, escalated the arms race by adding environmental storytelling that demanded even more processing power. This competition set the stage, but it still took engineers like Huang to see parallel processing as the solution and figure out how to make it work at scale.)
Witt explains that while most computers relied on central processing units (CPUs) that worked through tasks step by step, Nvidia’s GPUs broke complex visual problems into thousands of smaller pieces and solved them all at once. This approach, called parallel processing, wasn’t new in theory—scientists had been experimenting with it for years—but it remained difficult to implement reliably. Most attempts at parallel computing had been commercial failures, but Nvidia succeeded because it had a clear, immediate application to focus on: rendering video game graphics in real time.
(Shortform note: Parallel processing mimics how the human brain computes, using networks of billions of neurons working simultaneously. For decades, researchers had tried to recreate this in computers. By the 1980s, they showed that transistor circuits could mimic the way neural membranes work in the brain. Around the same time, David Rumelhart, James McClelland, and Geoffrey Hinton developed “parallel distributed processing” frameworks, computer models of how the brain processes information with many simple units working together. But these brain-inspired approaches struggled because computers weren’t yet powerful enough to handle the massive parallel calculations they required—exactly the problem Nvidia would solve.)
How Researchers Discovered the Broader Potential
Parallel processing’s potential became apparent when researchers discovered that Nvidia’s gaming chips could be repurposed for entirely different computational tasks. In 2000, Ian Buck, a graduate student at Stanford, wanted to create an immersive gaming experience. He chained together 32 of Nvidia’s chips to display a single game, Quake III, across eight large projectors, creating an ultra-high-definition gaming wall that filled an entire room. Witt explains that the calculations required to render just 30 frames of the game would take a human working with pencil and paper about 16,000 years to complete. But Buck’s array of Nvidia cards performed these calculations every single second. For about $20,000, a fraction of what traditional supercomputers cost, he had built a machine with extraordinary processing power.
Buck and other researchers began “hacking” Nvidia’s gaming chips to trick them into solving scientific problems instead of rendering explosions and car chases for video games. They used the cards’ existing programming tools to redirect their parallel processing power toward tasks like financial modeling, weather simulation, and medical imaging. Soon, academics were bulk-purchasing Nvidia’s GeForce cards and repurposing them as affordable scientific instruments. A new market was emerging, and it didn’t escape Huang’s notice: Witt notes that around 2004, Huang began recruiting researchers like Buck to join Nvidia.
| Breaking Down the GPU Programming Barrier At Stanford, Buck was working on a fundamental problem: GPUs could theoretically handle scientific computing, but researchers had to finagle the chips into solving math problems by disguising calculations as graphics operations—for example, representing scientific data as textures and mathematical operations as pixel shading. Buck developed Brook, a language that allowed engineers to write instructions for processing large amounts of data across many processors, rather than having to think in terms of triangles and pixels. Before Brook, using GPUs for scientific work required mastering complex graphics programming: learning to speak the GPU’s native language rather than expressing mathematical problems directly. Brook represented a new layer of software that translated familiar programming concepts into graphics operations that GPUs understood, enabling scientists to focus on their computational problems. This represents a fundamental principle in computing: Programming languages exist at different levels of abstraction, with higher-level languages hiding complex technical details to make programming easier. Where others just saw gaming chips that were difficult to program, Buck added a layer of abstraction specifically for parallel computing, envisioning a new category of affordable supercomputing where massive parallel processing power would be democratized by the introduction of more intuitive software to harness that power. |
Why Jensen Huang Bet Everything on Parallel Computing
Nvidia’s decision to bet on parallel computing was based on more than creative uses for gaming chips: It was driven by an insight into the future of computing. Huang realized that existing approaches to making computers faster were approaching physical limits. For decades, chip companies had been shrinking the transistors inside microchips, but they were becoming so small that they’d soon leak electricity and slow computers down. If computers were going to continue advancing, the fundamental approach to computing would have to change. While competitors like Intel continued to focus on making traditional processors faster, Huang saw parallel processing as the path forward.
The hardware to make parallel processing work at scale was only half of the solution. Witt explains that parallel processing was incredibly difficult to program. Writing software for thousands of simultaneous processes required completely different thinking than traditional sequential programming, and most developers found it so challenging that they avoided parallel computing altogether. Nvidia changed this with CUDA (Compute Unified Device Architecture), a software platform the company launched in 2006 to enable developers to use familiar tools, like the C programming language, to access the parallel processing power of GPUs.
| When Hardware Hits Physical Limits, Software Opens New Possibilities There are two levers to pull to make computers more sophisticated: hardware and software. Moore’s Law, the observation that transistor density doubles every two years, described the steady progress in hardware from the 1960s onward as components became smaller and faster. By around 2005, experts saw that this approach was hitting physical limits, as transistors were becoming so tiny they would soon be unable to function reliably. Some researchers are exploring ideas that abandon traditional silicon altogether, like chemical reactions that process information through oscillating wave patterns, biological systems like slime molds that excel at solving problems, and microfluidic systems that use liquid droplets to perform calculations. But Huang recognized that software provided an alternate solution. Rather than waiting for new types of hardware to be developed, Nvidia created software that could unlock new capabilities from existing silicon chips. While GPUs and CPUs are built from essentially the same materials, it’s the software that determines how those components work together. CUDA enabled GPUs to perform thousands of simultaneous calculations that CPUs simply cannot handle. This software-first approach demonstrated how software innovation can be just as revolutionary as hardware breakthroughs in traditional computing. |
Huang’s belief that parallel processing was the way forward led him to invest heavily in CUDA. He was pursuing what he called a “zero-billion-dollar market,” building technology for customers who didn’t yet exist, in hopes that pushing parallel processing forward (and making it easier to use) would create massive demand. Witt reports that Huang knew competitors might eventually copy Nvidia’s hardware, but they would be starting from scratch on the software side. So Nvidia spent years building tools, libraries, and documentation that made its chips easy to use.
By investing heavily in parallel computing infrastructure and building a comprehensive software ecosystem around it, Nvidia had inadvertently created the perfect platform for the AI revolution that was waiting in the wings. The foundation was now set for Nvidia to become the essential infrastructure provider for AI—but only because they had spent over a decade building capabilities that seemed commercially worthless at the time.
(Shortform note: After Witt finished his book, Chinese company DeepSeek claimed to have found shortcuts to advanced AI, but the controversy that followed showed why Huang’s CUDA strategy has proven so durable. While CUDA works at a high level of abstraction so developers don’t need to understand GPU hardware, DeepSeek’s platform requires programming at a much lower level. Developers have to control memory allocation, thread scheduling, and cache optimization, tasks CUDA handles automatically, but less optimally. DeepSeek also claimed to have trained its AI models cheaply. But OpenAI alleged DeepSeek used “distillation,” learning from OpenAI’s models rather than starting from scratch, suggesting there may be no shortcuts.)
Learn More About Parallel Processing
To better understand parallel processing in its broader context, take a look at Shortform’s guide to the book The Thinking Machine by Stephen Witt.