AI doesn’t ‘see’ the way that you do, and that could be a problem when it categorizes objects and scenes

An AI and a human might classify this mammal with gray, wrinkled skin as very different animals. Richard Bailey/Corbis via Getty Images

Even with no fur in frame, you can easily see that a photo of a hairless Sphynx cat depicts a cat. You wouldn’t mistake it for an elephant.

But many artificial intelligence vision systems would. Why? Because when AI systems learn to categorize objects, they often rely on visual cues – like surface texture or simple patterns in pixels. This tendency makes them vulnerable to getting confused by small changes that have little effect on human perception.

A vision system aligned more closely with human perception – one that perhaps emphasizes shape, for instance – might still confuse the cat for another similarly shaped mammal, like a tiger; but it is unlikely to indicate an elephant.

The kinds of mistakes an AI makes reveal how it organizes visual information, with potential limitations that become concerning in higher-stakes settings.

red stop sign with stickers and graffiti — Stickers and graffiti on a stop sign could serve as an adversarial attack, confusing AI in autonomous vehicles.
rick/Flickr, CC BY

Imagine an autonomous vehicle approaching a vandalized stop sign. While a human driver recognizes the sign from its shape and context, an AI that relies on pixel patterns may misclassify it, pushing the altered sign out of the category “sign” altogether and into a different group of images that it identifies as similar, such as a billboard, advertisement or other roadside object.

Together, these problems point to a misalignment between how humans perceive the visual world and how AI represents it.

We are experts in visual perception, and we work at the intersection of human and machine perception. People organize visual input into objects, meaning and relationships shaped by experience and context. AI models don’t organize visual information the same way. This key difference explains why AI sometimes fails in surprising ways.

Seeing objects, not features

Imagine that in front of you is a small, opaque object with both straight and curved edges. But you don’t see those features; you just see your coffee mug.

Vision isn’t a camera, passively recording the world. Instead, your brain rapidly turns the light your eyes absorb into objects you recognize and understand, organizing experience into structured mental representations.

Researchers can understand how these representations are structured by examining how people judge similarity. Your coffee mug is not like your computer, but it’s similar to a glass of water despite differences in appearance. That judgment reflects how the mug is mentally represented: not just in terms of appearance, but also what the mug is used for and how it fits into everyday activities.

clear glass of water next to white ceramic mug in saucer on table — Very alike in how you use them; less similar in looks.
Oscar Wong/Moment via Getty Images

Importantly, the mental organization of representations is flexible. Which aspects of an object stand out change with context and goals. If packing a moving box, shape and size matter most, so your mug might be placed anywhere it fits. But when putting it away in a cupboard, it goes next to other drinkware. The mug hasn’t changed, only the way it is organized in your mind.

Human visual perception is adaptive, driven by meaning and tied to how we interact with the world.

Aligning AI with humans

AI systems, however, organize visual input in fundamentally different ways than people – not because they are machines, but because of how narrowly they are trained. When an AI is trained to categorize a cat or an elephant, it only needs to learn which visual patterns lead to the correct label, not how those animals relate to each other or fit into the broader world.

In contrast, humans learn within a broader context. When we learn what an elephant is, we weave that representation into the tapestry of everything else we have learned: animals, size, habitats and more. Because AI is graded only on label accuracy, it can rely on shortcuts that work in training but sometimes fail in the real world.

The issue of representational alignment refers to whether AI organizes information in ways that resemble how people do. It’s not to be confused with value alignment, which refers to the challenge of making sure AI systems pursue outcomes and goals that humans intend.

Because human learning embeds new information into a web of prior knowledge, the relationships between new and existing concepts can be studied and measured. This means that representational alignment may be a solvable problem and a step toward addressing broader alignment challenges.

One approach to representational alignment focuses on building AI systems that behave like humans on psychological tasks, allowing researchers to compare representations directly. For example, if people judge a cat as more similar to a dog than to an elephant, the goal is to build AI models that arrive at those same judgments.

One promising technique involves training AI on human similarity judgments collected in the lab. In these studies, human participants might be shown three images and asked which two objects are more similar; for example, whether a mug is more like a glass or a bowl. Including this data during training encourages AI systems to learn how objects relate to one another, producing representations that better reflect how people understand the world.

view from behind of man looking at X-rays of chest and other body parts — Health care providers want AI systems that flag real issues, without a lot of misses or false positives.
REB Images/Connect Images via Getty Images

Alignment beyond vision

Representational alignment matters beyond vision systems, and AI researchers are taking notice. As AI increasingly supports high-stakes decisions, differences between how machines and humans represent the world will have real consequences, even when an AI system appears highly accurate. For example, if an AI analyzing medical images learns to associate the source of an image or repeated image artifacts with disease rather than the real visual signs of the disease itself, that is obviously problematic.

AI doesn’t necessarily need to process information exactly the way people think, but training AI using principles drawn from human perception and cognition – such as similarity, context and relational structure – can lead to safer, more accurate and more ethical systems.

Eben W. Daggett receives funding from the NMSU Institute for Applied Practice in AI and Machine Learning (IAAM). He is currently employed by Medtronic PLC.

Michael Hout has received funding from the New Mexico State University Institute for Applied Practice in Artificial Intelligence and Machine Learning.

Arryn Robbins does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

Seeing objects, not features

Aligning AI with humans

Alignment beyond vision

Related Posts