CUDA proves that Nvidia is a software company

Forgive me Starting with a cliche, a piece of financial jargon that has recently seeped into the tech lexicon, but I’m afraid I have to talk about the “moats.” The term was popularized decades ago by Warren Buffett to refer to a company’s competitive advantage, and the word found its way into Silicon Valley presentations when a memo allegedly… Leaked from Googletitled “We Don’t Have a Moat, and Neither Does OpenAI,” expressed concern that open source AI could pillage the citadel of big tech companies.

A few years later, the castle walls are still safe. Aside from a brief panic attack when… Deep Sick When they first appeared, open source AI models did not significantly outperform proprietary models. However, none of the leading labs — such as OpenAI, Anthropic, and Google — have a moat to speak of.

The company that has a moat is Nvidia. CEO Jensen Huang described it as his most precious “treasure.” It’s not, as you might assume chip companyA piece of hardware. It’s something called CUDA. What sounds like a chemical compound banned by the Food and Drug Administration may be the only real moat in artificial intelligence.

CUDA technically stands for To account for a unified, but very similar, hardware architecture Laser or DivingNo one cares about expanding the acronym; We just say “KOO-duh”. So what is the use of this extremely important treasure? If I had to give a one-word answer: parallelism.

Here’s a simple example. Suppose we task a machine to fill in a 9×9 multiplication table. With a single-core computer, all 81 processes are executed faithfully one after the other. But a nine-core GPU can assign tasks so that each core takes a different column — one from 1 x 1 to 1 x 9, another from 2 x 1 to 2 x 9, and so on — for a nine-fold speed boost. Modern GPUs can be smarter. For example, if it was programmed to recognize commutation — 7 x 9 = 9 x 7 — it could avoid duplicate work, reducing 81 operations to 45, cutting the workload by nearly half. When a single training costs $100 million, every improvement counts.

Nvidia GPUs were originally designed to provide graphics for video games. In the early 2000s, a PhD student at Stanford University named Ian Buck, who first got into GPUs as a gamer, realized that their architecture could be repurposed for general high-performance computing. He created a programming language called Brook, was hired by Nvidia, and along with John Nickolls led the development of CUDA. If AI ushers in the era of a permanent underclass and autonomous weapons, just know that it will all be because someone somewhere is playing… death I think the devil’s scrotum must be vibrating at 60 frames per second.

CUDA is not a programming language per se but a “platform”. I use this elusive word because, unlike The New York Times which is also a gaming company, CUDA has, over the years, become a nested package of software libraries for artificial intelligence. Each function shave nanoseconds off individual calculations — combined, they make GPUs, in industry parlance, fire brrr.

Modern graphics A card is not just a circuit board crammed with chips, memory, and fans. It is an elaborate combination of cache hierarchies and specialized units called “tensor cores” and “streaming multiprocessors.” In this sense, what chip companies sell is like a professional kitchen, and more cores are like more grilling stations. But even a kitchen with 30 grilling stations won’t run faster without a chef able to deftly assign tasks — as CUDA does with GPU cores.

Extending the metaphor, hand-tuned CUDA libraries optimized for a single array operation are the equivalent of kitchen tools designed for one function and no more — the cherry vase, the shrimp cleaner — which are indulgences for home cooks but not if you have 10,000 shrimp guts to pluck. Which brings us back to DeepSeek. Its engineers went below this already deep layer of abstraction to work directly in PTX, a type of assembly language for Nvidia’s GPUs. Suppose the task is to peel garlic. The non-optimized GPU will say: “Peel the skin with your fingernails.” CUDA can command: “Crush the cloves with a flat knife.” PTX lets you dictate each sub-instruction: “Lift the blade 2.35 inches above the cutting board, keeping it parallel to the equator of the clove, and strike downward with your palm with a force of 36.2 Newtons.”

Leave a ReplyCancel Reply