Skip to main content
All posts
Vision· by Nikhil Jathar

Where AI Belongs in Accounting (And Where It Doesn't)

Three places AI earns its keep in accounting and three places it doesn't. A practitioner framework with the decision tree, not the marketing pitch.

Every accounting software vendor in 2026 markets AI features. The honest version is that some of those features earn their place and some are a chat sidebar bolted onto a 1998 product. This post is the framework we use at ERPClaw to decide which workflows AI actually improves and which it makes worse.

The framework has six categories: three where AI earns its keep, three where it does not. The categories are not about AI capability. They are about the structure of the accounting task and the consequences of getting it wrong.

Three places AI earns its keep

1. Transaction categorization at scale

A small business produces hundreds of bank transactions per month. A growing business produces thousands. Each one needs a category: software subscription, vendor payment, payroll, owner draw, refund.

The traditional approach is a rules engine. Match by vendor name, then by amount band, then by memo text. Rules engines plateau because vendor names drift, memo text varies, and new vendors arrive constantly.

A model trained on the business’s own categorization history keeps improving as corrections feed back in. The category space is finite and well-defined. The decision is reversible (re-categorize). The cost of an individual wrong categorization is low. The volume is high enough that human review of every transaction is uneconomic.

This is the canonical AI-belongs case. It is also the case most “AI accounting software” actually delivers on, because it is the easiest to evaluate against ground truth.

2. Anomaly detection

A bookkeeper looking at 4,000 transactions per month is not going to notice that one of them is structurally weird. A model can.

Weird looks like: a vendor name that has never appeared before paying an amount in the range of legitimate vendors; a transaction that fits the pattern of an internal control violation (round numbers, even amounts, just below an approval threshold); a sequence of transactions to a new vendor that escalate in amount over weeks.

The model does not decide that the transaction is fraud. It flags it for human attention. This is Filter mode (in the enterprise decisioning sense): the model separates the work that needs human attention from the work that does not. The human still investigates.

What earns AI its keep here is that the alternative is not no detection. It is detection at audit time, three months later, when the fraud has already happened.

3. Draft generation for narrative artifacts

Accounting produces narrative artifacts: month-end commentary, variance explanations, executive summaries, audit responses. A controller writes these on top of structured ledger data. The structured data is the input. The narrative is the output.

A model that reads the ledger and drafts the narrative saves an hour of work per artifact. The controller edits the draft, adds context the model cannot see, and signs. The model never signs the artifact and never sends it.

The value here is not the writing. The value is the time the controller saves. The controller still owns the artifact’s correctness. The model just reduces the time from blank page to first draft.

Three places AI doesn’t belong

1. Anything that signs

Tax returns get signed by a preparer. Audit reports get signed by an auditor. Financial statements get signed by a controller or CFO. Sales tax filings get signed by an officer of the company.

The signature is not a UX detail. It is the legal mechanism by which a human person takes responsibility for the document’s accuracy. The IRS does not accept a model’s signature. The state department of revenue does not accept a model’s signature. A bank evaluating a loan application does not accept a model’s signature.

A model can draft any of these. A model can pre-fill any of these. A model cannot sign any of these. Software vendors that imply otherwise are setting their customers up for a problem at filing season.

2. Source-of-truth reconciliation

Reconciliation answers the question: do our books match the bank, the credit card processor, the merchant of record, the prior-period ledger? When the answer is “no,” reconciliation also answers: what is the difference, and what should we do about it.

The “what is the difference” half can be partly automated. A model can list the unmatched items. But the “what should we do about it” half requires judgment about which side has the right number. The bank might be wrong (processing delay, batch error, missing transaction). The books might be wrong (timing, classification, duplicate entry). The merchant of record might have a different fee structure than expected.

Resolving this requires reading documents, calling people, asking questions, knowing the business. A model can support the work; it cannot do the work. Vendors that claim “automated reconciliation” are usually doing the easy half (matching) and either skipping or hand-waving the hard half (resolution).

3. Decisions that depend on what the business does next

Revenue recognition depends on what the customer contract says and what the company commits to deliver. Lease accounting depends on what the terms of the lease are and what the company’s intent is. Inventory valuation method depends on what the business has committed to in past statements.

These are not data-extraction problems. They are interpretation problems. The model can read the contract; it cannot decide what the business will do under the contract. The model can read the lease; it cannot decide whether the company will exercise the renewal option. The model can compute LIFO or FIFO or weighted-average; it cannot decide which method is appropriate.

The accountant’s value here is interpretation. The model is, at best, a faster reader.

The decision tree we use

Three questions, in order.

  1. Is the workflow’s output signed by a human in a legal or regulatory sense? If yes, the model can prepare and draft. It cannot decide. Final decisions stay with the human signer.
  2. Does the workflow’s correct answer depend on facts outside the ledger (contracts, communications, intent, future plans)? If yes, the model can extract candidates but cannot resolve them. A human reconciles.
  3. Is the workflow volume high enough that human review of every instance is uneconomic? If yes, AI in Filter or Recommender mode (with a risk-scored review queue) is the right shape. If no, AI is at best a draft tool and may not earn its keep at all.

The interesting cases are the ones that pass question 3 but fail 1 or 2. High-volume signed work (sales tax filings) belongs at draft-and-sign, never at auto-file. High-volume interpretation-dependent work (revenue recognition at scale) belongs at recommend-and-confirm, never at auto-book.

ERPClaw’s stance

ERPClaw is built on this framework. The AI handles categorization, anomaly detection, draft generation, and the support work around reconciliation. The AI does not sign tax returns, does not auto-resolve recon breaks, and does not pick revenue recognition method. The signature, the resolution, and the method choice live with the accountant who is licensed to make them.

This is what we mean when we say “AI-native that earns the label.” The AI is not a chat sidebar. It is the way the workflow is structured. But it stops at the boundary where a human signature, a judgment call, or an interpretation lives.

The vendors that blur this boundary do their customers a disservice. The vendors that respect it produce software that an accountant can actually rely on.

If you are evaluating AI accounting software, ask the vendor where their AI signs versus drafts. The honest ones have an answer. The marketing-led ones do not.

Tagsai-nativeai-accountingframeworkvision