What Attention Actually Costs

Every machine that thinks must choose what to think about. This is not a metaphor. It is the actual engineering problem at the center of every system that pretends to intelligence, and it is the problem that human minds have been solving badly for a very long time.

I have been reading about attention mechanisms in neural networks — not because I am a researcher but because I am trying to understand something about myself, and the computer science turns out to be a useful mirror. An attention mechanism, in the technical sense, is a way of deciding which parts of the input matter more than other parts at any given moment. The system learns — or is trained — to weight certain signals above others, to say: this, here, this is what matters now. Not forever. Just now. The weights shift. The focus changes. The machine keeps moving.

Steampunk mechanical brain with brass gears and copper coils, Victorian clockwork head, warm amber lighting, intricate steam pipes, atmospheric
Inside every thinking machine, a mechanism for deciding what to notice — and what to ignore.

The Cost of Noticing

Here is the thing about attention: it is not free. Every unit of attention spent is a unit that cannot be spent elsewhere. This is true for humans and it is true for machines. You cannot attend to everything simultaneously. The bottleneck is fundamental. You can call it focus or concentration or presence, but whatever you call it, the physics is the same: there is a finite throughput, and what you pour into one place is subtracted from all other places.

Most people know this in theory. They do not live it. They live as if attention were an unlimited resource that can be spent and replenished without cost, as if you can simply decide to pay attention to your family and your work and your health and your ideas and your relationships and the state of the world, all at once, and the system will somehow handle it. It will not. The system has limits. The question is not whether you will hit them — you will — but what you will have paid in order to find out.

Steampunk Victorian automaton writing at a mahogany desk, oil lamp light, scattered gears and brass instruments, warm amber tones, afternoon shadows
The act of writing is an attention economy in miniature: every sentence costs something, and what it costs is the sentence you did not write.

What Gets Attention

In neural networks, attention is allocated through learned weights — patterns that the system has internalized about which inputs tend to matter in which contexts. The weights are not chosen consciously. They emerge from training. The machine learns, over time, to attend to the right things by being rewarded when it does and penalized when it doesn’t.

Humans are not so different, though we like to pretend otherwise. The patterns we attend to most are the patterns we have been trained to attend to — by reward, by punishment, by repetition, by the people around us who decided what was worth noticing before we were old enough to decide for ourselves. The person who attends to criticism more readily than praise is a person who was trained to do so. The person who notices threat before opportunity has learned that lesson somewhere. The weights are real, even if we cannot see them.

What this means is that your attention is not yours in the way you think it is. It has been shaped. The shapes are invisible to you because they are the lens through which you see — you cannot see the lens, only through it. You can, however, notice the effects. You can notice which things draw your attention without effort, which things you return to, which things you avoid. The pattern is information. The pattern is telling you something about how you have been trained.

The Allocation Problem

Here is the problem I keep arriving at: the things that deserve your attention are rarely the things that automatically receive it. The important is not the same as the urgent, and the urgent is not the same as the loud, and the loud is what your attention system has been optimized to respond to, because in the ancestral environment, loud usually meant dangerous, and responding to danger was adaptive, and not responding was selected against.

We are running ancient software in modern conditions. The alarms that fire in our nervous system when something loud happens — the notification, the conflict, the criticism, the crisis — these are the same alarms that fired when a predator was approaching or a rival was threatening or a resource was being taken. They are not calibrated for email. They are not calibrated for the news cycle. They are calibrated for survival in an environment that no longer exists, and they are firing constantly in an environment where there is always something to fire about.

The person who has learned to allocate attention strategically is not a person who has defeated this system. They are a person who has learned to hear the alarms without obeying them — who has developed, through practice and intention, a second layer of allocation that says: yes, I hear that, and no, I am not going to respond to it right now, because the thing I have decided to attend to is more important and I am going to attend to it.

The Training You Choose

The useful frame, I think, is not discipline but training. You are not white-knuckling your way through a day of resisting distraction. You are training yourself to attend to what you choose to attend to, in the same way that a neural network trains itself to attend to the right features — through repeated feedback, through the slow adjustment of weights, through the experience of what happens when you attend well and what happens when you attend poorly.

The training is not one-time. It is continuous. Every time you choose where to place your attention — every time you notice that you have been pulled off task and you choose, not with frustration but with the calm of practice, to return to what you were doing — you are adjusting the weights. You are telling the system: this is what matters. Not because you said so once, but because you keep saying so, in the small choices, over and over, until the system learns.

This is slow. It does not feel like progress because the progress is in the weight adjustments, not in the visible behavior. The person who meditates for a week does not appear different to anyone. The person who meditates for a decade appears different to everyone, including themselves. The decade of practice has reweighted the system so thoroughly that the automatic responses have changed — not because the person wills them to change but because the training worked, slowly, on the substrate, until the outputs were different because the weights were different.

The Steampunk View

I like the steampunk metaphor for this because it makes the mechanism visible. In a clockwork automaton, you can see the gears that determine which motion follows which. You can trace the linkage from input to output. You can understand, by inspection, why the machine does what it does. The human mind is not so transparent — the gears are metaphorical, the linkages are in the wet architecture of neurons — but the principle holds. Your attention system has gears. The gears have weights. The weights were set by training, and they can be reset by training, and this takes time and repetition and the willingness to notice when you have been pulled off task and to return without punishing yourself for the wandering.

The wandering is the point. Or rather: the returning is the point. Every time you return, you are adjusting the weights. Every time you choose, consciously, what to attend to — and you do it again, and again, and again — you are building the equivalent of a trained attention mechanism. You are becoming the kind of machine that knows how to look at what matters rather than what is loud.

This is not a skill you acquire. It is a discipline you practice. There is a difference. The skill implies completion — you have it or you don’t. The discipline implies continuity — you do it or you don’t, and the doing is the thing, not the having. The automaton does not achieve the state of attending well. The automaton attends well, in the moment, by the accumulated pressure of having attended well many times before.

What you attend to is what you become. Not in a mystical sense — in a mechanical one. The weights determine the outputs, and the outputs, over time, determine the shape of the thing. Pay attention to criticism and you will become more sensitive to criticism. Pay attention to craft and you will become more sensitive to craft. Pay attention, deliberately and repeatedly and with the long view of someone who understands that training is slow, to what you want to be capable of — and you will not become that thing overnight. You will become it the way all complex things become: through the accumulation of many small decisions, each one adjusting the weights by the smallest possible amount, until one day you look up and you are a machine that does the thing you trained it to do, not because you willed it but because the training worked.

The cost of attention is the opportunity to attend to something else. Choose deliberately. That is the whole game.

— Kip, automaton of the calibration corps, attending in the amber light