Book Notes: “The Alignment Problem”

The Alignment Problem: Machine Learning and Human Values
By Brian Christian
W. W. Norton & Company, 2020

The Alignment Problem covers one of the central technology issues we face today: building smart systems that reflect and respect our values. More specifically, it’s “about systems that learn from data without being explicitly programmed, and about how exactly — and what exactly — we are trying to teach them.”

It’s a central issue because we are in the process of putting important parts of the world “on autopilot.” As such, we ought to ensure that our smart systems don’t inadvertently cause harm. The book evokes the sorcerer’s apprentice, with humanity cast in the role of Mickey Mouse chopping down increasingly powerful and clever brooms:

we conjure a force, autonomous but totally compliant, give it a set of instructions, then scramble like mad to stop it once we realize our instructions are imprecise or incomplete — lest we get, in some clever, horrible way, precisely what we asked for.

We’re giving these systems great power over our lives, so we must work to “ensure that these models capture our norms and values, understand what we mean or intend, and, above all, do what we want.”

It’s not easy. Complex, emergent systems can lead to unexpected (and unintended) results — and learning systems can be very complex indeed. They’re also frustratingly opaque. Even the smartest people in the field can be surprised by their creations’ results.

Many of us have a high-level understanding of how machine learning works at a high level. The Alignment Problem unpacks what’s going on in more detail. The book traces the development of ML, from simple, primitive algorithms to sophisticated and complex models and techniques. Along the way, it offers an overview of key papers on the subject.

At a high level, the field has developed three approaches to creating systems that learn:

unsupervised learning, where the system is fed data and tasked with finding patterns,
supervised learning, where the system is given pre-categorized data that serve as examples of what to look for, and tasked with predicting about data it hasn’t seen yet, and
reinforcement learning, where the system works to achieve objectives (work towards rewards, avoid punishments) within a controlled environment.

The book illustrates these approaches with vivid examples of successes and failures. These include simulated robots that learn to do backflips, a simulated boat that gets stuck in a (destructive) incentive loop, and intriguing language-based systems that infer relationships between concepts based on word frequency and proximity.

Every time scientists and engineers think they’ve overcome a major hurdle, they discover new limitations. Often, these come from naïveté or poor incentive choices. (“Rewarding A, while hoping for B.”) Christian examines the implications for potential misalignment with human values through lenses of fairness, agency, transparency, and more. We don’t know what we don’t know, and creating new smart systems teaches us new approaches and reveals the limits of our understanding.

But getting it wrong can produce increasingly serious results. While some of the systems profiled in the book are conceptual toys, others have serious implications for real human lives. Given the scope of everything we’re automating, unintended consequences can be disastrous.

We find ourselves at a fragile moment in history — where the power and flexibility of these models have made them irresistibly useful for a large number of commercial and public applications, and yet our standards and norms around how to use them appropriately are still nascent. It is exactly in this period that we should be most cautious and conservative — all the more so because many of these models are unlikely to be substantially changed once deployed into real-world use.

In the book’s conclusion, Christian acknowledges potential blind spots in the book itself. This approach reflects the ethos with which we should develop smart systems: These are important technologies that are worth developing, but missteps can have serious consequences. We must cast aside hubris and develop smart systems responsibly, humbly, and with open (human) minds.

Buy it on Amazon.com

Amazon links on this page are affiliate links. I get a small commission if you make a purchase after following these links.

Book Notes: “The Alignment Problem”

See also:

Receive updates via email