Building Luca: An AI Agent for Finance and Accounting Workflows That Auditors Actually Trust

Building Luca: An AI Agent for Finance and Accounting Workflows That Auditors Actually Trust

How we architected an AI system for finance and accounting that eliminates hallucinations, maintains complete audit trails, and proves AI isn't a black box.

There's an interesting story making the rounds in AI circles: an autonomous agent, tasked with routine accounting work, degraded over time until it resorted to fabricating numbers to force a reconciliation. For anyone who's spent time in finance or compliance, this isn't just a cautionary tale. It's a visceral reminder of why "AI-powered accounting" still triggers skepticism from CFOs, auditors, and boards.

The skepticism is valid. Nearly all early examples of AI for finance use-cases have leveraged raw LLMs. A raw LLM is fundamentally probabilistic. That means it generates outputs based on patterns and probabilities, not fixed rules. Ask it the same question twice, and you might get two slightly different answers. 

This flexibility makes LLMs brilliant at tasks like writing, brainstorming, and interpreting ambiguous requests. But finance and accounting demand the opposite: deterministic precision, mathematical certainty, and complete auditability. The same input must always produce the same output, every single time. It's like using a creative writing tool to balance your checkbook. The tool is powerful, but it's built for the wrong job.

Some would have you lean into this skepticism. Fear, even. But the way forward for finance and accounting isn't to avoid AI. It's to stop building AI accounting systems the wrong way.

At Leapfin, we've spent the last year building Luca, a specialized AI agent that can analyze financial data, build complex accounting automation, and do it in a way that satisfies even the most skeptical auditor. This isn't theoretical. Luca is in production today, automating complete order-to-cash accounting workflows for companies processing billions of transactions. It handles the most complex revenue recognition policies, generates journal entries for every transaction along the contract lifecycle, performs compliance reviews in 60 seconds that would take human auditors weeks or months, and builds custom automation for complex accounting scenarios that used to require weeks of manual development.

This post explains how we built it. More importantly, it explains the architectural principles that transform AI from a compliance liability into your most powerful compliance weapon.

For more on the "why" behind Leapfin's investment in AI, and the transformation we see happening throughout the Office of the CFO, read our CEO Ray's post: Speed. Confidence. Control. AI Workflows Are Finance's Next Big Leap

AI-powered workflows: teaser reel

 

Want an up close look at our AI capabilities? Join me on October 30th for a live webinar: Stop Coding. Start Closing. AI-Powered Accounting Workflows for Modern Finance. Register here.

The problem: Why raw LLMs are compliance disasters

Before diving into solutions, let's be brutally honest about the problem. I ran a simple experiment that every finance leader should see.

I took this perfectly reasonable revenue recognition prompt and ran it through a standard AI chatbot three times in a row, with no changes:

[Prompt] I have a new B2B SaaS contract. Total value is $120,000 ($100k software, $20k implementation fee). Signed March 15, go-live April 1, 2025. Based on ASC 606, create a monthly revenue recognition schedule for 2025.

Run 1: It amortized the $100k subscription correctly, but treated the $20k implementation fee as a one-time service recognized entirely in April.

Run 2: It decided the implementation is not a distinct performance obligation and correctly amortized the full $120k over the subscription term.

Run 3: It did the same math as Run 2, but formatted the table completely differently and added a long paragraph of legal disclaimers.

Three runs. Three different answers. For creative writing, this variability is a feature. For a SOX audit, it's an absolute nightmare.

This is the core risk: probabilistic systems make sophisticated guesses. They don't guarantee repeatability. You cannot build a system of record on top of a guess engine, no matter how sophisticated the guesses are.

Leapfin’s architecture: Building trust through constraints

The insight that unlocked everything for us was realizing that AI doesn't need trust – it needs constraints. The right architecture doesn't hope for reliability; it forces it.

Luca's architecture is built on a principle I call the Architect-Builder separation: a solution-oriented probabilistic AI architect that designs the plan, and a deterministic, mathematical builder that executes it with perfect repeatability at massive scale.

Let me walk through the five layers that make this work.

erik-imagery_ai-mgmt-flowcharts-2_09OCT25

Layer 1: The foundation – clean, immutable data

Every AI system is only as good as its data foundation. This is where most companies fail before they even start.

Luca doesn't run on a swamp of random spreadsheets and ambiguous CSV exports. It runs on Leapfin's financial data platform, a purpose-built system with four architectural principles that transform messy operational data into a deterministic source of truth.

1. Universal data standardization


Every data source (Stripe, Salesforce, NetSuite, custom billing systems) gets transformed into a universal accounting schema at ingestion. This isn't just cleaning data. It's eliminating ambiguity at the source.

"Revenue" means the exact same thing whether it came from your billing system, your CRM, or your payment processor. An "invoice" has the same structure and fields regardless of which system generated it. This standardization means the AI never has to interpret 50 different ways to represent the same concept. There's one language, and everything speaks it.

2. The Connected Accounting Map

Our data isn't just stored in tables. It's structured as a graph where every operational event is explicitly connected to its financial outcome.

This graph encodes causality. A refund connects to an invoice, which connects to a revenue recognition schedule, which connects to specific journal entries, which connect to GL accounts. The AI can trace the complete chain:

"This $5,000 refund triggered a revenue reversal because it canceled an invoice that was 60% recognized, which generated three correcting journal entries that hit these exact accounts."

This isn't just storing data. It's storing the relationships and dependencies that make accounting work. When the AI needs to understand why a number exists, it can walk the graph and see the complete story.

3. Event sourcing and state timeline

We don't store "current balance." We store the complete history of every state change as a sequence of immutable events.

Every modification appends a new event with a timestamp. Nothing is ever deleted or overwritten. This means we can reconstruct the exact state of any entity at any point in time. Want to know what Invoice #12345 looked like on March 15th at 2:47 PM? We can show you, precisely.

This is critical for auditing. When an auditor asks "why did this number change?", we don't have to guess or try to reconstruct history from logs. We have the actual historical state, preserved permanently.

4. Bi-temporal data model

We track two timelines for every piece of data: when something happened in the real world (business time) and when we learned about it (system time).

An invoice might be dated March 1st (business time), but we received it on March 15th (system time), and then corrected an error on March 20th (another system time). All three timestamps are preserved.

This handles late-arriving data and corrections without destroying the audit trail. The AI can reason about both "what should have happened" and "what we knew at the time," which is essential for handling the messy reality of accounting where information arrives late, invoices get corrected, and mistakes need to be fixed without rewriting history.

Why this matters

These four principles create something crucial: a deterministic source of truth before the AI ever sees a single transaction.

When operational systems send us messy data, we force it into this structure immediately. The AI doesn't have to guess which field represents revenue or figure out how a payment relates to an invoice. It operates in a world where ambiguity has already been eliminated.

The data lives in Snowflake, structured specifically for accounting use cases. But here's the critical architectural choice: this isn't just a data warehouse. It's an immutable ledger where every state change is preserved, every relationship is explicit, and every question has a provable answer.

This foundation is what makes everything else possible. Without it, you're just building a very expensive guess engine.

Layer 2: The architect - AI that designs, never executes

Luca uses Claude as its core LLM, but with radical constraints on what it can actually do. The AI's job is to be the brilliant architect, not the builder.

When you ask Luca to build a revenue recognition workflow, here's what happens:

Step 1: Context loading

We give Claude complete context about Leapfin's architecture: our Universal Accounting Records model, our Connected Accounting Map (the graph that relates every operational event to its financial outcome), and the company-specific details like chart of accounts, integration sources, and business workflows.

Step 2: Plain English understanding

The AI interprets your intent from conversational language. "Handle revenue recognition for SaaS contracts with implementation fees" becomes a structured plan: identify performance obligations, determine transaction price allocation, create recognition schedules, handle contract modifications.

Step 3: Code generation in a custom DSL

Here's where it gets interesting. Luca doesn't write arbitrary code. It writes code in a custom JavaScript-based Domain Specific Language we built specifically for accounting automation.

We chose JavaScript as a base because LLMs have been pre-trained on massive amounts of it. It's practically their native language. But our version is heavily restricted. We've removed any function that could interact with external systems, access file systems, or perform destructive actions.

The AI can only "speak" in terms of:

  • Reading from our standardized data models
  • Designing accounting calculations
  • Defining business workflows
  • Specifying journal entry structures

This DSL is Luca's jail cell. It's powerful enough to express any accounting logic, but constrained enough that the AI can't accidentally (or intentionally) do anything dangerous.

Step 4: Static analysis and validation

Before any code runs, it passes through our custom-built static analyzer. This tool does two critical things:

First, it validates the code syntactically and semantically. It ensures every data reference exists, every calculation is type-safe, and every workflow follows accounting conventions.

Second, it generates an execution plan for our graph engine. It maps out exactly which nodes in our accounting graph will be affected, in what order, and with what dependencies. This plan becomes the blueprint for deterministic execution.

If validation fails, the error feedback goes straight back to Claude with structured context about what went wrong. The AI iterates, self-corrects, and tries again. No human intervention needed for syntax errors or simple logic mistakes.

Layer 3: The sandbox – a flight simulator for accounting

Luca's entire development workspace is a completely isolated sandbox. Think of it as a flight simulator for accounting automation.

In this environment, the AI can write code, test it against our validation engine, and see the results, but nothing leaves the sandbox without explicit human approval. It's a training environment with full production context and zero production risk.

The sandbox includes:

  • A copy of the production data model (sanitized)

  • The full workflow engine configuration

  • The complete validation framework

  • Detailed error reporting with suggested fixes

This is where the AI learns through trial and error. It can attempt complex workflows, fail safely, receive structured feedback, and iterate toward a working solution, all without ever touching production data or live accounting processes.

Layer 4: The builder – deterministic workflow execution

Once a workflow is validated and approved, it moves to the builder: our deterministic workflow engine.

This engine is the opposite of the AI. It's not creative. It's not probabilistic. It's not flexible. And that's exactly what makes it trustworthy.

Our workflow engine is graph-based, built on a critical architectural principle: any change propagates through the entire graph, and all affected workflows are re-evaluated automatically.

Here's how it works:

The accounting graph

Every entity in our system (invoices, payments, customers, contracts, GL accounts) is a node in a directed graph. Edges represent relationships and dependencies. When any node changes, we trace its edges to identify every downstream node that might be affected.

Workflow evaluation

Workflows are pure functions: given the same input state, they always produce the same output. No randomness. No external API calls. No probabilistic behavior.

When the graph changes, we identify all workflows that depend on affected nodes and re-evaluate them in dependency order. If Invoice #12345 changes, we re-evaluate its revenue recognition schedule, recalculate any dependent allocations, regenerate associated journal entries, and update any reporting summaries that include it.

Deterministic guarantees

This architecture gives us mathematical guarantees:

  • The same input state always produces the same output
  • Every calculation is traceable to specific input data
  • No hidden state or side effects
  • Complete reproducibility for any historical point in time

The AI designed the instruction set for the workflows. But the workflow engine executes them with mechanical precision. This separation is what transforms AI from a risk into a reliable tool.

Layer 5: The audit trail - immutable evidence

Every action in Luca generates evidence. Not logs that can be deleted. Not mutable database records. Immutable, cryptographically verifiable evidence of what happened, when, and why.

We maintain two types of audit trails:

Execution audit trail

Every time the workflow engine runs, we capture:

  • Input state (snapshot of all graph nodes that fed into the calculation)
  • Workflows evaluated (exact code and version)
  • Output generated (journal entries, state changes, reports)
  • Timestamp and triggering event
  • User or system that initiated the action

This data is written to a custom-built immutable table structure in Snowflake. Records are append-only. Once written, they cannot be modified or deleted, not by users, not by admins, not by the system itself.

For SOX compliance, this gives auditors exactly what they need: a complete, tamper-proof record of how every number in the financial statements was calculated.

Configuration audit trail

Every change to Luca's workflows is stored in Git. Not just the current version—the complete history of every modification, who made it, when, and why.

When an auditor asks, "Why did the revenue recognition for this contract change in March?" we can show them:

  1. The exact commit that modified the workflow
  2. The code diff showing what changed
  3. The approval workflow (from sandbox to production)
  4. The business justification documented in the commit message
  5. Every transaction affected by the change

This is the opposite of a black box. It's radical transparency.

What Leapfin’s “glass box” architecture enables

The result of this architectural discipline is something that seems impossible: an AI system that's more auditable, more consistent, and more reliable than the manual processes it replaces.

No hallucinations

When people ask about hallucinations, I tell them: Luca doesn't generate numbers. It generates workflows. The numbers come from deterministic execution of those workflows against verified data.

Can the AI make a mistake when designing a workflow? Absolutely. But that mistake is caught by:

  1. Static analysis during development
  2. Human review before production
  3. Validation against expected results in the sandbox
  4. Reconciliation frameworks that flag unexpected outputs

And once a workflow is in production, it executes with mathematical certainty. No variability. No probabilistic weirdness.

Complete auditability

Every number in the financial statements can be traced back to:

  • The specific operational transactions that generated it
  • The exact workflow that processed it
  • The code version that was active
  • The timestamp of execution
  • The input state at that moment

This isn't "trust us, the AI did it." This is "here's the complete mathematical proof of how we got this number."

Speed without sacrificing an ounce of control

In the old world, building a complex revenue recognition workflow took weeks of custom development. With Luca, it takes hours—sometimes minutes.

A finance leader can describe their requirements in plain English. Luca designs the automation, tests it in the sandbox, generates documentation, and presents it for review. Once approved, the deterministic engine executes it at scale.

We've seen accounting teams go from month-end closes taking two weeks to closing in under 48 hours. Not because they're working faster, because the architecture allows for speed without sacrificing correctness.

Continuous compliance

Luca can act as your compliance reviewer. We've run experiments where we ask it to audit our own revenue recognition workflows against ASC 606.

The results are brutally honest. No softened language. No "it depends." Just direct, factual assessment of gaps and structured recommendations for fixing them.

This isn't a replacement for human auditors. It's preparation for them. When the auditor shows up, you've already identified and fixed the issues. You arrive with confidence, not anxiety.

The hard lessons for building AI for finance

Building Luca taught us lessons that every team building AI for high-stakes domains should internalize.

Lesson 1: The foundation always matters more than the model

We spent six months arguing about which LLM to use. GPT-4? Claude? Llama? We ran benchmarks, compared accuracy, and analyzed cost per token.

Then we realized: the model barely mattered. The bottleneck was always the data foundation.

An AI running on messy, inconsistent data will produce garbage no matter how sophisticated it is. An AI running on clean, structured, immutable data can be reliable even with a less capable model.

We stopped obsessing about model selection and started obsessing about data architecture. That's when everything unlocked.

Lesson 2: Constraints are more powerful than capabilities

Our first prototype gave the AI broad capabilities. It could query databases, call APIs, modify configurations, generate reports.

We thought flexibility was strength. It turned out to be chaos.

The breakthrough came when we radically constrained what the AI could do. Custom DSL. Isolated sandbox. Deterministic execution. No external access.

These constraints didn't limit the AI's usefulness. They made it trustworthy. Sometimes the most powerful capability is the things you prevent the system from doing.

Lesson 3: Separation of concerns is everything

The Architect-Builder separation isn't just a nice design pattern. It's the core insight that makes AI viable for accounting.

Probabilistic systems are brilliant at understanding intent and designing solutions. Deterministic systems are brilliant at executing with mathematical precision. Trying to make one system do both jobs is where the horror stories come from.

Keep them separate. Let each do what it does best. The result is greater than either could achieve alone.

Lesson 4: Observability is the real moat

In the age of AI, the companies with the best AI won't win. The companies with the best observability will.

When something goes wrong (and it will), you need to know exactly what happened. Not vague logs. Not aggregate metrics. Complete, granular, immutable records of every decision, every state change, every workflow execution.

This is why our audit trail architecture isn't an afterthought. It's a first-class component of the system. We built observability into the foundation, not bolted on at the end.

What's next for AI in Leapfin?

Luca is in production today adding real value for customers. But we're just getting started. The roadmap ahead focuses on three areas:

1. Enhanced validation frameworks

We're building sophisticated reconciliation engines that continuously validate AI-generated workflows against expected accounting outcomes. Think of it as an immune system that detects anomalies before they become problems.

2. Multi-agent orchestration

Complex accounting processes often require multiple specialized agents working together. We're exploring architectures where one agent handles contract interpretation, another handles allocation logic, and a coordinator manages the workflow between them.

3. Learned optimizations

Our workflow engine is deterministic, but that doesn't mean it can't get smarter. We're investigating ways to use ML to optimize execution plans, identify redundant calculations, and suggest workflow refactoring, all while preserving deterministic guarantees.

Closing thoughts

Building AI for accounting isn't about finding a smarter model or writing better prompts. It's about building an architecture that forces reliability.

Raw LLMs are probabilistic and brilliant. Accounting is deterministic and unforgiving. The answer isn't to avoid one or the other. It's to design systems that let each do what it does best.

At Leapfin, we believe the future of finance belongs to teams that control their own systems, understand their own logic, and demand transparency from their tools. We're building the platform that makes that future possible.

Luca isn't perfect. But it's auditable, it's reliable, and it's getting better every day. Most importantly, it's proof that AI in finance and accounting doesn't have to be a black box that hallucinates.

It just has to be built right.

RSVP to get an up close look at Leapfin's AI capabilities

Register for our webinar on October 30 and see how Leapfin is making it easier than ever for Finance and Accounting teams to transform their fragmented operational data and accurately account for and report on revenue at scale. 

You’ll learn how to use AI to:

  • Build complete new automation workflows in minutes, even for complex revenue accounting scenarios – simply by asking
  • Adapt instantly to a business change like new pricing without breaking close or compliance
  • Instantly prepare thorough audit documentation and perform ASC 606 compliance reviews

Register here to save your seat and be first to get the recording.

We're hiring exceptional AI engineers who want to work on hard, important problems at the intersection of machine learning and financial systems. If building the trust layer for AI in finance sounds interesting, reach out.

Follow our journey building AI for accounting on LinkedIn. 

 

See how Leapfin works

Get a feel for the ease and power of Leapfin with our interactive demo.