The AI Didn't Hallucinate. The Data Did.

A delivery truck pulls up to a downtown Seattle coffee shop at 9am on a rainy 45F Tuesday carrying 8,000 gallons of oat milk. The regional manager calls Emma, the data engineer. He is not happy. The AI agent that manages supply chain decisions made the call autonomously at 08:14 AM. Nobody asked for it. “Turn it off,” he says. “It’s hallucinating.”

Emma does not turn it off. She opens her notebook.

This is the story of what she finds, and the architecture that made the investigation possible. I built this project for the DAIS 2026 Community Virtual Challenge.

What Is a Lakehouse?

Before getting into Emma’s investigation, the architecture needs to be clear.

The history of data storage has two chapters. First came data warehouses: Snowflake, Redshift, BigQuery. They gave you ACID transactions, schema enforcement, query performance guarantees. The trade-off is that your data lives in a proprietary format inside the vendor’s infrastructure. You pay for everything, and the data is not portable.

Then came data lakes: S3 or Azure Blob with Parquet files. Cheap storage, any engine can read it, but completely unmanaged. No transactions. No schema enforcement. Nothing stopping you from having forty versions of the same table in forty different folders with slightly different column names.

The lakehouse is the attempt to get warehouse governance on top of lake-style open storage.

Traditional Data Warehouse          Data Lake
+-----------------------+          +-----------------------+
| Proprietary format    |          | Parquet on S3         |
| ACID transactions  ✓  |          | No transactions  ✗    |
| Schema enforcement ✓  |          | No schema        ✗    |
| Vendor lock-in     ✗  |          | Open format      ✓    |
| Expensive          ✗  |          | Cheap            ✓    |
+-----------------------+          +-----------------------+

                    Lakehouse
          +-----------------------+
          | Parquet on S3 (open)  |
          | ACID transactions  ✓  |
          | Schema enforcement ✓  |
          | Time travel        ✓  |
          | Any engine reads   ✓  |
          +-----------------------+

Delta Lake is the open-source layer that makes this possible. It is a transaction log that sits alongside your Parquet files and turns an unmanaged folder of data into something that behaves like a database table.

How Delta Lake Works

When you write to a Delta table, two things happen simultaneously:

The data is written as Parquet files to your storage location (S3, Azure Blob, GCS)
A transaction log entry is written to _delta_log/ recording exactly what files were added, what files were removed, and what the schema looks like

coffee_shop_world_state/
  _delta_log/
    00000000000000000000.json    <-- commit 0: baseline_normal
    00000000000000000001.json    <-- commit 1: corrupted_sensor
    00000000000000000002.json    <-- commit 2: self_corrected
    00000000000000000003.json    <-- commit 3: OPTIMIZE
  part-00000-a1b2c3.parquet
  part-00001-d4e5f6.parquet

The log is append-only. Every commit adds a new sequential JSON file. Nothing is ever overwritten. This gives you three things that matter enormously when something goes wrong.

ACID transactions. A write either succeeds completely (log entry committed) or fails with no visible effect. Concurrent readers always see a consistent snapshot. No partial writes, no torn reads.

Schema enforcement. Delta rejects writes that do not match the table schema unless you explicitly allow evolution. A pipeline that accidentally drops a column gets an error, not silent data corruption.

Time travel. Because the log records every version permanently, you can read the table as it was at any point in history.

# Read the table as it was at commit 1 -- the moment of the corruption
spark.read.format("delta") \
    .option("versionAsOf", 1) \
    .load("coffee_shop_world_state/")

This last one is what saves Emma.

The Black Box Problem

Modern AI agents make decisions continuously. They read data, reason over it, and take actions. The problem is that most infrastructure is built for the present. A standard database shows you what is true right now. It does not show you what was true at 08:14:23 AM on a specific Tuesday when a specific decision was made.

When something goes wrong, you are left with an application log:

08:14 AM - SupplyAgent triggered
08:15 AM - Decision: Order 8000 gallons oat milk to Downtown
08:15 AM - Reasoning: Anticipating unprecedented iced-latte demand
            due to extreme heat and inventory depletion

And a live database showing 118 gallons of inventory and 45F rainy weather. The two realities do not match. Without a way to reconstruct the past, you call it a hallucination and move on. The actual problem goes unfixed.

What Emma Actually Found

Using Delta Lake time travel, Emma reconstructs the exact state of the world at the moment the agent made its decision.

spark.sql("""
    SELECT store, inventory_gallons, weather_temp_f, weather_condition
    FROM world_state VERSION AS OF 1
    WHERE store = 'Downtown'
    ORDER BY event_timestamp DESC
    LIMIT 1
""")

The query returns:

store	inventory_gallons	weather_temp_f	weather_condition
Downtown	-999	102	Extreme Heatwave

At 08:12 AM, a faulty IoT fridge sensor reported inventory as -999, triggering an emergency restock flag. Simultaneously, the third-party weather API pushed a 102F heatwave warning for Seattle. At 08:16 AM, both systems self-corrected and overwrote the bad data in the live table. The evidence was erased four minutes after the damage was done.

The agent did not hallucinate. It made a completely logical decision based on the data it was given. The data pipeline was poisoned.

The Bi-Temporal Layer

The investigation has two separate timestamps that matter.

Event time: when the sensor reading was supposed to represent reality. The sensor said 08:12 AM. That is when the physical fridge was being measured.

Transaction time: when the data was committed to the Delta table. The commit happened at 08:14:23 AM. That is when the agent read it.

A standard database conflates these. When the sensor corrects itself at 08:16 AM and overwrites the -999 reading with 118, the event time history is destroyed. You can no longer ask “what did the sensor say at 08:12?”

A bi-temporal schema preserves both:

schema = StructType([
    StructField("store",               StringType()),
    StructField("event_timestamp",     TimestampType()),   # when it happened
    StructField("transaction_time",    TimestampType()),   # when it was recorded
    StructField("inventory_gallons",   IntegerType()),
    StructField("weather_temp_f",      DoubleType()),
    StructField("weather_condition",   StringType()),
    StructField("commit_version",      LongType()),
])

With this schema plus Delta’s version history, Emma can reconstruct not just what the agent saw, but what was true in the real world at that moment. The gap between event_timestamp = 08:12 and transaction_time = 08:14:23 is exactly where the data corruption happened and propagated.

The Architecture

The full project has four notebooks orchestrated as a Databricks Workflow.

Notebook 1 (World State): Writes three phases to the Delta table. Phase 1 is normal conditions: 118 gallons, 45F, light rain. Phase 2 is the corruption: sensor glitches to -999, weather API pushes 102F heatwave. Phase 3 is the self-correction: both systems correct back to reality.

Notebook 2 (Agent Loop): Runs the supply chain decision function at three timestamps. At 07:30 (normal conditions) it correctly does nothing. At 08:14 (during the corruption window) it reads -999 inventory and 102F heat and orders 8,000 gallons. At 08:20 (after self-correction) it correctly does nothing.

Notebook 3 (Investigation): Emma’s notebook. Uses VERSION AS OF to reconstruct commit 1. Runs the bi-temporal AS-OF join. Produces the audit row showing inventory_agent_saw = -999 next to inventory_live_now = 118. Runs DESCRIBE HISTORY to show the full commit log.

Notebook 4 (Dashboard): Lakeview dashboard showing the world state timeline, the decision audit table, and the Delta commit history. Built directly on top of the Delta tables using Unity Catalog.

What This Is Actually About

The oat milk story is a memorable way to explain a real infrastructure problem. LLM agents in production are making autonomous decisions constantly. Most of the time they work. When they do not, the debugging process is usually “read the logs, see a weird decision, blame the model, move on.”

The Delta Lakehouse architecture changes what is possible in that debugging process. Time travel means you can always go back. Bi-temporal modeling means you can separate “what the agent saw” from “what was actually true.” Unity Catalog lineage means you can trace which table versions flowed into which decision.

None of this requires exotic technology. It is all built on Delta tables, PySpark, and Databricks Workflows. The architecture is the insight, not the stack.

The project is on GitHub at github.com/Pavan-249/supplychain-audit-databricks. All five notebooks are there. Submitted to DAIS 2026 Community Virtual Challenge.