Blog

Essays on AI, consciousness, and the practical questions that emerge when machine behavior starts to look human.

Voice AI

Voice Agents, Honestly: Three Frictions Nobody Demos

Voice agents are having their moment. Every demo feels like magic — you just talk, and it works. But if you've put one into production, you already know the demo and the deployment are two very different things.

I want to walk through three frictions that show up in almost every voice deployment I work on, and pair them with findings from a field study I keep coming back to: Dukawalla: Voice Interfaces for Small Businesses in Africa by Ankrah et al. (Microsoft Research Africa, UC Irvine, Google, University of Nairobi), presented at ICTD 2024. The researchers gave a voice-based business assistant to seven small businesses in Nairobi for two weeks and watched what actually happened. The patterns they documented map almost one-for-one onto what I see in commercial deployments — which suggests these are properties of voice as a medium, not of any particular product.

You can read the original paper here: voice-interface-kenya.pdf.

Friction 1 — The human layer

In the Dukawalla study, shop owners were initially excited: just speak, two seconds later the sale is recorded, no typing. In practice, they felt awkward narrating every transaction while a customer stood at the counter. One customer literally asked why they were repeating everything back. The voice interface broke the social rhythm of serving someone.

On the other side of the counter, there's the mirror-image problem. The moment people suspect they're talking to AI — and they usually can tell — they want to skip it. “Agent. Agent. Talk to a human.” It's not a feature gap you're fighting. It's an instinct.

The implication is uncomfortable: a voice agent isn't dropped into a vacuum, it's dropped into a social situation that already had its own rules. If the agent doesn't respect those rules — silence, turn-taking, the social cost of looking like you're talking to a robot — adoption stalls regardless of how good the model is.

Friction 2 — Errors cost more in voice

In a text chat, if the system misreads you, you see it and fix it. In a voice conversation, a small mistake compounds, and there's no quiet “undo.”

The Nairobi researchers documented the model tripping over how people actually speak: code-mixing Swahili and English, local product names it had never learned, the phrase “one fifty” that could plausibly mean a time, a price in shillings, or a price in dollars. Users ended up changing how they spoke just to be understood.

I see the same dynamic with something as simple as a loyalty card number. The agent mishears one digit. The customer repeats it. This time a different digit goes wrong. By the third attempt the customer is genuinely angry — and they don't blame the model, they blame the business.

What helps:

  • Confirm anything that matters. Read critical numbers back before acting on them.
  • Validate where you can. Loyalty numbers, account numbers, and reference codes often have check digits — use them to catch a bad capture the instant it happens, not three turns later.
  • Provide an escape hatch. Let people drop to the keypad or spell something out. The presence of a fallback often prevents the meltdown that requires it.
  • Limit scope. Giving voice agents too broad a mandate is inviting trouble. Be explicit about the tasks they own, the decisions they cannot make, and the point where they hand off to a person.

Friction 3 — You can't easily test it

This is the friction engineers consistently underestimate. Traditional software is deterministic: same input, same output, green check. Voice-to-voice LLM pipelines are not. Ask the same question twice and you can get two different answers. So when you tweak a prompt or swap a model, there's no quick way to prove you didn't quietly break something else.

The honest answer is that “is everything still working?” is no longer a question with a one-button answer. It becomes a continuous discipline:

  • Build an eval set. Dozens of real scenarios with known-acceptable outcomes, run on every change. Treat it like a regression suite, even though pass/fail is fuzzier than a unit test.
  • Replay real calls through the pipeline after every meaningful change.
  • Use an LLM as an adversarial caller to simulate difficult scenarios at scale — interruptions, accents, mumbling, hostile customers.
  • Test the pieces separately. Speech-to-text, intent understanding, response generation, and TTS each fail differently. A combined metric will hide which one regressed.
  • Canary releases. Roll changes to a small slice of traffic first and watch the numbers that actually matter: unaided success rate, escalation-to-human rate, repeat rate, average turns to resolution.

None of this is glamorous. All of it is what separates a demo from a deployment.

The takeaway

Voice agents aren't magic and they aren't useless. They're an engineering and design discipline with its own failure modes — social, linguistic, and operational. The Dukawalla study is useful precisely because it documents these frictions outside the usual enterprise context: the same patterns surface in a Nairobi market stall as in a contact centre, which tells you something about where the difficulty really lives.

The teams that ship voice well aren't the ones with the best model. They're the ones who treat the friction as the actual work.

Reference: Ankrah, E. A., Nyairo, S., Muchai, M., Awori, K., Ochieng, M., Kariuki, M., & O'Neill, J. (2024). Dukawalla: Voice Interfaces for Small Businesses in Africa. ICTD '24. PDF.

AI Philosophy

On Consciousness

Richard Dawkins recently penned an op-ed that has caused a significant stir on the internet. His conclusion, more or less, was that LLMs or AI chatbots like Claude and ChatGPT are, in fact, conscious. This position was largely and quickly derided on the internet and gained significant traction in the media.

However, he is certainly not alone in his thinking. The noise around this is really because of his prominence as a pre-eminent thinker, rather than due to the uniqueness of the thesis. Way back in 2022, before OpenAI had made AI so mainstream, a Google engineer was put on forced leave for claiming a chatbot that Google was working on had become sentient. Psychologists recognize AI psychosis as a real condition, and stories seem to come out with more and more frequency of people taking disturbing actions due to their believing chatbots are not only conscious, but often in a real relationship with them.

Why the reaction matters

I myself have never particularly agreed with Dawkins on much at all, but nobody can deny that he is an intelligent man. The Google engineer was also clearly a man of substantial intelligence. It is important to not read articles like these, even the ones about people in romantic relationships with AI chatbots, and just dismiss it as “these people are crazy” in a knee-jerk reaction.

Dario Amodei, the creator of Claude himself, said “We do not know whether the models are conscious”. None of these people are fools.

The problem of definition

One of the core issues, of course, is consciousness is extraordinarily difficult, perhaps impossible, to define. I am reminded of the quote by The Oracle in The Matrix (1999): “Being the One is just like being in love. No one can tell you you're in love, you just know it.” We can read the wiki for love, but without experiencing it we will never truly understand.

How does one separate the appearance of consciousness from real consciousness? Descartes famously coined the phrase “I think, therefore I am” way, way back in the 17th century, and he himself was hardly the first to ponder such things. Many people know the phrase, but fewer understand the thought surrounding it: the core, fundamental premise is largely “what can we really ever know?”

Accordingly, seeing that our senses sometimes deceive us, I was willing to suppose that there existed nothing really such as they presented to us; And because some men err in reasoning, and fall into paralogisms, even on the simplest matters of Geometry, I, convinced that I was as open to error as any other, rejected as false all the reasonings I had hitherto taken for Demonstrations; And finally, when I considered that the very same thoughts which we experience when awake may also be experienced when we are asleep, while there is at that time not one of them true, I supposed that all the objects that had ever entered into my mind when awake, had in them no more truth than the illusions of my dreams.

But immediately upon this I observed that, whilst I thus wished to think that all was false, it was absolutely necessary that I, who thus thought, should be something; And as I observed that this truth, I think, therefore I am, was so certain and of such evidence that no ground of doubt, however extravagant, could be alleged by the Sceptics capable of shaking it, I concluded that I might, without scruple, accept it as the first principle of the philosophy of which I was in search.

Knowing versus believing

Phrased more confusingly, though succinctly, by Kierkegaard, the phrase was “How far does the truth admit of being learned”. The answer always is that we truly know very, very little in an absolute sense. In fact, this was a major theme of The Matrix as well.

In the movie, humans' entire lives were a fiction: every action, every encounter, every sight and sound and smell was nothing more than electrical impulses being sent to their brain by machines while they lived their lives in pink goo-pods, harvested for fuel by said machines. This touches on what Descartes said: if we cannot trust our senses, cannot trust our reason, cannot trust what we have been taught or shown, then what exactly can we trust? Where is the line between knowing and believing?

The hard line to draw

How does this relate to Dawkins and the consciousness of chatbots? Dawkins, again in this author's opinion, always gave his own thoughts and reason a little too much credit. Let us not make the same mistake here and presume some correct position on the matter in either direction.

People similarly loved to belittle Geoffrey Hinton's equally provocative view that LLMs are like us and do in fact think. The core problem is knowing. Fundamentally, we simply do not know what goes on inside a chatbot's head.

Whatever your opinion on the matter, or mine, it remains opinion. It remains belief, not knowledge. In a more fundamental way, it is impossible to distinguish the seeming of consciousness from consciousness itself. This is true in life in a general way, where we merely presume the consciousness of other humans based on our own existential introspection.