What's actually in a bank PDF statement
~8 min read · Useful even if you never touch CentProof
Most people never look closely at the document their bank emails them every month. It shows up, it has the right balance on it, you file it. But if you sit with one for ten minutes, you find that it's structured, machine-readable, and surprisingly consistent across banks. That structure is the entire reason a Mac app like CentProof can take a stack of PDFs and turn them into a reconciled, searchable, locally stored ledger — without ever asking for your bank password.
The four parts that always appear
Pick any monthly statement — Chase, Wells Fargo, Bank of America, Apple Card, Capital One, Citi, Discover, US Bank. You'll find the same four sections, in the same order:
- Header. Bank name and address, your name and address, the account identifier (usually the last four digits), and the statement period (e.g. June 1, 2026 — June 30, 2026).
- Balance summary. For checking and savings: opening balance, total credits, total debits, closing balance. For credit cards: previous balance, payments and credits, purchases and adjustments, fees, interest, new balance.
- Transactions table. The bulk of the document. One row per transaction, with columns that vary slightly between banks but always include at least date, description, and amount.
- Footer. Disclaimers, dispute instructions, contact info, regulatory legalese. Useful for legal context, mostly irrelevant to the data inside.
The reconciliation identity
The balance summary isn't decorative. It's a mathematical identity that the transactions table has to satisfy exactly, to the cent. There are two variants depending on account type.
Checking / savings:
opening_balance + sum(credits) - sum(debits) = closing_balance
Credit card:
previous_balance + purchases + cash_advances + balance_transfers + fees + interest - payments - refunds = new_balance
If your sum of transactions doesn't match the printed closing balance to the cent, either the bank made an error (rare) or you parsed something wrong (common). For a finance tool, that's a free correctness check: refuse to commit any statement to the database unless the math reconciles. CentProof calls this the reconciliation gate, and we treat any penny of difference as a hard failure.
It's also why "to the cent" matters as a product claim — you either match the bank's printed total exactly, or you don't. There's no spectrum.
What's in a single transaction row
Every transaction row carries the same few facts, dressed up slightly differently per bank:
- Transaction date — when the transaction occurred. For credit cards, you'll often also see a posting date, which is when the bank actually applied it. The two can differ by a few days.
- Raw description — the merchant string the bank received from the payment network, often messy. Real examples:
AMZN MKTP US*RT5R86Z01·STARBUCKS #1234 SEATTLE WA·Goodleap 14 Agnt Pymnt xxxxx0352. This is the field every finance app has to clean up before showing it to a human. - Amount — usually printed as a positive number in the column it belongs in. The sign comes from which column it's in, not from a minus symbol.
- Direction — debit (money out) or credit (money in). Inferred from the column for checking; inferred from the row's position (purchases vs payments) for credit cards.
- Optional, depending on bank: check number (for written checks), merchant city/state, and reference number.
What varies between banks
Banks all use the same structure, but the surface differs. This is why generic "import any statement" tools usually disappoint:
- Column order — date first, or amount first, or description in the middle. No standard.
- Date format —
MM/DD,MM/DD/YY,MM/DD/YYYY, or "June 14". Same data, four conventions. - Multi-account statements — Bank of America and Wells Fargo combine checking + savings (sometimes credit) in a single statement file. Chase and Capital One usually don't.
- Page boundaries — long statements wrap transactions across pages. Some banks repeat the column headers on every page; others don't.
- Year inference at year-end — a statement covering Dec 28 → Jan 26 has transactions with two different years. The PDF only prints
MM/DDon each row; the parser has to decide which year each row belongs to.
That's why a serious finance tool keeps a parser per bank, updates them when banks change format, and uses reconciliation as the correctness check rather than hoping the parse "looks right."
Why this format makes a local-first app possible
Three properties of the PDF statement combine to make a privacy-led design viable:
- It's durable. The bank already gave it to you. There's no API call to expire, no token to refresh, no agreement to renegotiate. You can re-import the same statement on a new machine ten years from now.
- It carries its own provenance. The file itself is the evidence. Every row in a parsed ledger can be linked back to this file, this page, this position on the page — auditable without any external service.
- It doesn't require credentials. You can analyze every dollar of your financial history without sharing a bank password with anyone, including the software you're using.
Bank-sync workflows (the kind that connect to your account through a service like Plaid) are faster and require less user action. But they're cloud-shaped: someone other than you holds read-only credentials, your transactions live on their servers, and you depend on them staying in business and staying competent. PDF imports are slower per month, but they put the file — and therefore the data — in your possession permanently.
How CentProof works against this structure
CentProof is just a careful application of the patterns above. When you drag in a statement:
- The app detects the bank by fingerprinting the document.
- The matching parser reads the header, summary, and transactions.
- Reconciliation runs the appropriate identity above. If there's a one-cent difference, the import is held for review rather than committed silently.
- If reconciliation passes, the transactions go into a local SQLite database. The original PDF is encrypted and stored alongside it.
- From there, search, tagging, recurring detection, and the local-AI "Ask CentProof" feature all run on this local data. Nothing about the statement contents ever leaves your Mac.
One concrete takeaway
Even if you never use CentProof: when you open a bank PDF next month, look at the balance summary at the top and try to add the transactions yourself. It will work out to the penny. That property is what makes the document a viable data source. It's why "private finance, proved to the cent" isn't a slogan — it's a description of what the document already does.
Want to try this on your own statements?
CentProof runs locally on your Mac. Free Test Mode imports two statements without an account, login, or credit card. The reconciliation gate either passes or it doesn't — you decide what to do with the result.