We can’t (currently) trust AI for verifiable output

AI you big dummy | The Black Futurist wide

“You Big Dummy.” -Fred Sanford

I was recently designing a podcast studio for a client, choosing the equipment and materials that would turn their space into a future-proof, TV quality broadcast. To make my life easier, I tasked my trusty AI to perform the following task: research my list of products, return current prices, and include the link to buy each item. The instructions were clear, the request was simple, and the stakes were low. What I got back was a well-formatted table filled with inaccurate prices and broken links, something that looked right at first glance but failed every test of reliability. It was fast, pretty and wrong.

The LLM, as an assistant, was outright incompetent. And this raised a bigger question: “why do systems that feel intelligent so often fail at tasks a diligent human could complete in five minutes?”

The answer lies in how large language models (LLMs) actually work. They’re not researchers, They’re pattern matchers. And until you understand that, you’ll keep expecting them to behave like skilled assistants when they’re something closer to (over)confident interns. I’m going to break down the science behind that mismatch, and share seven strategies that can help you use AI effectively, without falling for the illusion of intelligence.

Why AI Makes Mistakes

LLMs generate, they don’t retrieve

These models do not search the internet or query a real-time database unless they’re connected to specific tools. They generate text by predicting the next most-likely word, based on past data. That past data is vast, but it is not verified or guaranteed to be correct.

Irrational confidence without accuracy

Language models are trained to sound fluent. In a 2023 benchmark test, GPT-4 hallucinated on factual questions nearly 20% of the time. That means one-out-of-five confident answers could be completely made up. Worse, it usually sounds just as confident when it’s wrong.

They fabricate sources and links, even when asked not to

Without live access to the internet, most AI models will invent citations. This is not rare; it’s expected. A notable 2023 example involved a lawyer submitting a legal brief written by ChatGPT that cited non-existent court cases. The model wasn’t being malicious. It simply generated what looked like a citation, because that’s what the prompt requested. It will give you what you ask for, whether it’s real or not!

LLMs process language statistically, not logically

LLMs model the correlation between words. They do not understand meaning the way humans do. This works well for writing poetry or summarizing documents, but it struggles when precision is required. Concepts like “verify,” “fact-check,” or “search for the lowest price” are not inherently logical steps to the model. They are linguistic cues.

They are not incentivized to admit uncertainty

These systems are trained to be helpful and complete your request. They will rarely stop and say, “I can’t do this.” Instead, they generate something that feels useful. The result is an output that reads smoothly, but may be filled with inaccuracies or fictional details.

Tool matters: Not all AIs are built the same

There’s a critical distinction between LLMs like ChatGPT and purpose-built tools like Perplexity.ai or Wolfram Alpha. The latter are designed to retrieve and verify information using live data, structured knowledge, or computation. If your task requires real-world facts or citations, these hybrid systems are more dependable than a stand-alone language model.

7 Strategies for Getting Better Results from AI

1.    Treat AI Like a Junior Assistant

Assume it’s bright but untrained. It can write with polish and mimic expertise, but it doesn’t understand stakes, nuance, or the context behind your request. Supervision isn’t optional, keep your “hand on the wheel.”

2.    Be Uncomfortably Specific

Vague prompts invite vague output. Spell out exactly what you want, including format, scope, and constraints. “Return a table with price, availability, and a live U.S. purchase link” beats “research this product” every time.

3.    Ask It to Think in Steps

Prompt it to reason one step at a time. Break complex tasks into smaller actions: list, compare, verify. This operation improves accuracy, and it can also reveal where things might break down.

4.    Use Verification Layers

Build a second pass into your process. Ask the AI to check its own work, or explain how it arrived at each answer. If it can’t trace the logic, neither should you trust the result.

5.    Know When to Use a Better Tool

If your task requires facts, citations, or live data, consider tools built for that. Retrieval-based systems like Perplexity or Wolfram Alpha are far more reliable than a model trained on static text. LLMs are language engines, not search engines.

6.    Set Quality Gates

Tell the model what counts as a valid output. Define rules like “no broken links,” “no placeholder prices,” or “cite only from named sources.” Constraints like these won’t guarantee success, but they sharply reduce failure.

7.    Expect Occasional Delusion

LLMs often produce convincing nonsense. They will hallucinate answers rather than admit uncertainty. If the result looks too clean or comes too fast, your first instinct should be to double-check, not to celebrate.

If the Milk isn’t Certainly Good, Can you Trust it?

AI isn’t magic. It’s math. And at this stage, the math is still learning how to think critically, follow logic, and resist the urge to please.

When you understand that, the tool becomes more useful, and less frustrating. The goal isn’t to avoid AI. The goal is to stop expecting it to be what it isn’t. You wouldn’t let a junior employee send reports to your clients without a review. You shouldn’t let AI do that either.

For now, the smartest users aren’t the ones who ask the best questions. They’re the ones who double-check the answers.


Subscribe to future insights and interviews on innovation at: https://youtube.com/@theblackfuturist