LLMWhisperer vs. Mistral OCR: The Best Mistral AI OCR Alternative

Table of Contents

Introduction

The rapid evolution of large language models (LLMs) has sparked a wave of innovation in how we process and understand documents. Traditional OCR tools, once limited to plain text extraction, are now being augmented, or replaced entirely, by LLM-powered systems that can interpret complex layouts, extract structured data, and even understand the semantic context of scanned pages.

One of the recent players in this space is Mistral OCR, which has gained attention for its fast, markdown-centric extraction capabilities and its ability to turn PDFs into clean, readable text.

On the other side, we have LLMWhisperer, a component of the Unstract platform, purpose-built to go beyond basic text capture, preserving layout, identifying tables, detecting form fields like checkboxes and radio buttons, and handling challenging elements like handwriting, all while minimizing hallucinations.

Why does this comparison matter now?

As businesses increasingly rely on automation to process invoices, forms, and other document-heavy workflows, the tools they choose must deliver reliable, structured, and trustworthy outputs.

And while Mistral OCR is a good general-purpose solution, we wanted to see how it performs against LLMWhisperer in real-world scenarios where accuracy and structure are critical.

In this article, we’ll put both tools to the test using a series of challenging sample documents. We’ll explore their outputs and wrap up with practical recommendations for which tool to use and when.

Overview of Mistral OCR

Mistral OCR is a Character Recognition API developed by Mistral AI, designed to deliver high-fidelity understanding of complex documents. Unlike traditional OCR systems that focus solely on extracting text, Mistral OCR treats documents as multimodal entities, it is capable of parsing images, tables, mathematical expressions, and structured layouts.

Here is the Github repository where you will find all the codes written for this article.

Capabilities and Common Use Cases

Mistral OCR is suited for large-scale document processing where speed and multilingual capabilities are critical.

It supports a wide array of use cases including:

  • Scientific literature parsing – turning PDFs with equations, figures, and charts into structured outputs.
  • Customer support knowledge bases – transforming manuals and technical documentation into indexed, answer-ready content.
  • Cultural heritage preservation – digitizing historical documents and making them AI-readable.
  • Legal and regulatory compliance – extracting structured data from filings, contracts, and scanned forms.

Output Format: Markdown-Centric

One of Mistral OCR’s key design philosophies is delivering output in markdown format, a readable, structured representation of the original document that interleaves text and imagery.

This makes the outputs usable for downstream processing or human-in-the-loop review.

The structured markdown format includes:

  • Headings and subheadings
  • Table representations
  • Embedded figures/images
  • Lists and annotations
  • Equations as LaTeX blocks or images

Limitations

Despite its speed and general versatility, Mistral OCR has some limitations, particularly in enterprise or document automation contexts where fidelity to original formatting and structured data extraction is crucial:

  • Layout fidelity: Markdown output often loses precise spatial alignment of elements, making it unsuitable for pixel-accurate extraction (e.g., scanned forms).
  • No form intelligence: Lacks built-in understanding of checkboxes, radio buttons, and input fields commonly found in forms.
  • Handwriting recognition is limited: The model performs best with printed text; cursive or messy handwriting often fails to produce meaningful output.
  • Potential hallucinations: Like many LLM-based tools, Mistral OCR can sometimes invent structure or mislabel document elements when confident data isn’t present.

Overview of LLMWhisperer

LLMWhisperer is Unstract’s flagship document understanding engine, an advanced, production-grade pipeline purpose-built for structured data extraction from unstructured or messy document inputs.

Positioned at the heart of the Unstract ecosystem, LLMWhisperer powers the most demanding use cases in document automation, compliance, and AI-native data ingestion.

Where traditional OCR tools aim to transcribe what they see, LLMWhisperer goes further: it interprets, structures, and contextualizes information, enabling downstream systems to take meaningful action with minimal human clean-up.

Seamless Integration with Unstract

LLMWhisperer is tightly integrated into the broader Unstract platform, benefiting from upstream capabilities like intelligent document routing, page classification, and post-processing agents.

Get started with Unstract

Unstract is an open-source, no-code platform for automating document-heavy workflows. It converts unstructured data—like invoices, ID cards, or contracts—into clean JSON for easy integration and automation.

This allows organizations to process large volumes of heterogeneous documents, such as scanned PDFs, printed forms, handwritten notes, and mixed-layout contracts, while receiving clean, structured outputs.

This integration expands LLMWhisperer capabilities to offer multiple structured output formats based on the downstream need:

  • JSON: For programmatic consumption in automation or AI pipelines.
  • HTML: For human review with high-fidelity visual formatting.
  • Markdown: Lightweight and developer-friendly text representation.
  • CSV: Ideal for tabular data export and integration with spreadsheets or databases.

Designed for Structured Extraction from Messy Inputs

LLMWhisperer excels in scenarios where traditional OCR models degrade: low-quality scans, inconsistent layouts, partially handwritten content, or form elements with no textual labels.

It uses layout-aware parsing, spatial grouping, and LLM-driven heuristics to not just “read” a document, but to understand its structure and intent.

Key Strengths

  • Layout Preservation: Outputs retain positional relationships between elements (columns, headers, nested blocks), enabling pixel-accurate extraction.
  • Form Field Extraction: Checkboxes, radio buttons, input fields, and handwritten annotations are detected, labeled, and converted into structured fields.
  • Table Fidelity: Complex tables, including nested rows, spanning cells, and irregular shapes, are reconstructed as accurate outputs.
  • Handwriting Recognition: Incorporates specialized handwriting detection and transcription modules, making it reliable for forms filled out by hand.
  • Hallucination Resistance: Built-in guardrails prevent the model from inventing content not present in the original document, a common pitfall in LLM-powered tools.

In essence, LLMWhisperer brings precision, structure, and reliability to the document understanding space, especially when dealing with real-world documents that are messy, inconsistent, or rich in layout and form semantics.

Test Methodology

To compare Mistral OCR and LLMWhisperer, we conducted hands-on evaluations using real-world, challenging documents designed to stress the capabilities of each system.

Our goal was not only to assess raw text extraction but also to evaluate how well each tool preserves document structure, handles non-textual elements, and avoids common LLM pitfalls such as hallucination.

Document Types Used

We selected a diverse set of input documents that reflect common enterprise use cases, including:

  • Scanned Forms: Multi-page PDF forms with mixed content, printed labels, filled-in fields, checkboxes, and signatures.
  • Handwritten Form: Low-resolution scans of handwritten forms, and fill-in-the-blank fields.
  • Complex Tables: Tabular documents with merged headers, irregular row spans, and embedded text blocks (e.g. financial statements).
  • Layout-rich Documents: Structured PDFs with columns, figures, and mixed visual hierarchies.
  • Excel documents: Excel is still use in many organizations as a source of data.

Evaluation Criteria

We used five key evaluation metrics to score and compare each tool’s performance:

CriterionEvaluation Questions
Layout Fidelity– Is the spatial arrangement (e.g., columns, alignment, section groupings) accurately preserved? 
– Are elements like headers, footnotes, and sidebars handled intelligently?
Table Structure– Are complex tables reconstructed with correct row/column relationships?
– How are merged cells, nested tables, or split content handled?
Radio/Checkbox Detection– Are form controls (checkboxes, radio buttons, toggles) detected and their states captured?
– Does the output clearly indicate whether a checkbox was selected?
Handwriting Accuracy– Is handwritten content transcribed correctly?
– How well does the tool handle messy or stylized handwriting?
Hallucination-Free Output– Does the model introduce any content not present in the original?
– Are factual additions or assumptions (hallucinations) avoided?

Mistral OCR: Installation and Code Setup

Mistral OCR offers a fast and developer-friendly setup via API access through La Plateforme, Mistral’s unified AI interface.

Installation

To get started with Mistral OCR:

  • Sign up at La Plateforme.
  • Generate an API key from the ‘API Keys’ section.

Code Setup

First, install the necessary library:

pip install mistralai

You should also define an .env file for the API key:

MISTRAL_API_KEY=<YOUR_MISTRAL_API_KEY>

Below is an example of a Python script to send a PDF document to Mistral OCR and receive the structured Markdown output:

import base64
import os
import sys
from mistralai import Mistral
from dotenv import load_dotenv

# Load the environment variables
load_dotenv(".env")

# Function to encode the PDF to base64
def encode_pdf(pdf_path):
    """Encode the pdf to base64."""
    try:
        with open(pdf_path, "rb") as pdf_file:
            return base64.b64encode(pdf_file.read()).decode('utf-8')
    except FileNotFoundError:
        print(f"Error: The file {pdf_path} was not found.")
        return None
    except Exception as e:  # Added general exception handling
        print(f"Error: {e}")
        return None


# Function to OCR the PDF
def ocr_pdf(base64_pdf):
    """OCR the pdf."""
    api_key = os.getenv("MISTRAL_API_KEY")
    client = Mistral(api_key=api_key)

    ocr_response = client.ocr.process(
        model="mistral-ocr-latest",
        document={
            "type": "document_url",
            "document_url": f"data:application/pdf;base64,{base64_pdf}" 
        },
        include_image_base64=True
    )

    # Return the OCR response
    return ocr_response


# Main function
if __name__ == "__main__":
    # Get the path to the PDF file from the command line
    pdf_path = sys.argv[1]

    # Getting the base64 string
    base64_pdf = encode_pdf(pdf_path)

    # OCR the PDF
    ocr_response = ocr_pdf(base64_pdf)

    # Print the OCR response
    print(ocr_response)

The script begins by loading the API key securely from a .env file using the dotenv package.

It then reads the PDF file provided via the command line, encodes it into a base64 string, and prepares it for API submission, this allows sending the file directly without uploading it to external storage.

The ocr_pdf function initializes a Mistral client with the API key and sends the encoded PDF to the mistral-ocr-latest model. It specifies the document type as document_url and includes a flag to return base64-encoded images for layout context.

Finally, the script prints the full OCR response to the terminal. This response includes the extracted text, structured layout data, and optional base64 images.

Mistral OCR returns the document in a Markdown-centric format, preserving elements like tables, headers, and embedded images inline with the text.

LLMWhisperer: Installation and Code Setup

Installation

To get started with LLMWhisperer:

  • Sign up for free to LLMWhisperer.
  • Get an API key from the ‘API Keys’ section.

Code Setup

First, install the necessary package:

pip install llmwhisperer-client

You should also define an .env file for the API key:

LLMWHISPERER_API_KEY=<YOUR_LLMWHISPERER_API_KEY>

Create the script to process a file with LLMWhisperer, in this case using the recent V2 of the API:

import os
from dotenv import load_dotenv
from unstract.llmwhisperer import LLMWhispererClientV2  
from unstract.llmwhisperer.client_v2 import LLMWhispererClientException  
import sys  
  
# Load the API key from the .env file
load_dotenv()

# Function to process a document  
def process_document(file_path):  
    # Initialize the client with your API key  
    client = LLMWhispererClientV2(base_url="https://llmwhisperer-api.us-central.unstract.com/api/v2",  
                                  api_key=os.getenv("LLMWHISPERER_API_KEY"))  
    # Call the sync method with the file path  
    try:  
        result = client.whisper(  
            file_path=file_path,  
            wait_for_completion=True,  
            wait_timeout=200,  
        )  
        return result['extraction']['result_text']  
    except LLMWhispererClientException as e:  
        print(e)  
        return None

# Main  function  
if __name__ == "__main__":  
    # Get the path to the PDF file from the command line
    pdf_path = sys.argv[1]

    # Call the function to process the document
    result = process_document(pdf_path)

    # Print the result
    print(result)

The script begins by loading environment variables using python-dotenv, specifically fetching the API key (LLMWHISPERER_API_KEY) required to authenticate with the LLMWhisperer service.

The process_document function initializes a LLMWhispererClientV2 object, configured to point to the official Unstract LLMWhisperer endpoint. It then submits a document (PDF or other type) using the whisper method.

This method runs synchronously, waiting for the model to finish processing the file before returning. If successful, it extracts and returns the structured result_text from the document.

In the script’s main block, the path to a document is passed as a command-line argument. The script then calls process_document() with this path and prints the extracted result.

Comparative Findings

Let’s now run both of the scripts through a set of 6 different document types (5 PDF documents and 1 Excel file) to test how well LLMWhisperer and MistralOCR handle the various formats.

Document 1 – Financial Document

For this document type, we will use this PDF: 

First, we process it with MistralOCR:

python mistral_ocr_example.py files\apple-finance.pdf

Which gives the following output:

|   |  |  |  |  | Apple Inc.  |\n| --- | --- | --- | --- | --- | --- |\n|  CONDENSED CONSOLIDATED STATEMENTS OF OPERATIONS (Unaudited) (In millions, except number of shares which are reflected inthousands and per share amounts) | |   |   |   |   |   |\n|   | Three Months Ended |   | Twelve Months Ended  |   |   |\n|   | September 24, 2022 | September 25, 2021 | September 24, 2022 | September 25, 2021 |   |\n|  Net sales: |  |  |  |  |   |\n|  Products | $70,958 | $65,083 | $316,199 | $297,392 |   |\n|  Services | 19,188 | 18,277 | 78,129 | 68,425 |   |\n|  Total net sales ^{ (1) } | 90,146 | 83,360 | 394,328 | 365,817 |   |\n|  Cost of sales: |  |  |  |  |   |\n|  Products | 46,387 | 42,790 | 201,471 | 192,266 |   |\n|  Services | 5,664 | 5,396 | 22,075 | 20,715 |   |\n|  Total cost of sales | 52,051 | 48,186 | 223,546 | 212,981 |   |\n|  Gross margin | 38,095 | 35,174 | 170,782 | 152,836 |   |\n|  Operating expenses: |  |  |  |  |   |\n|  Research and development | 6,761 | 5,772 | 26,251 | 21,914 |   |\n|  Selling, general and administrative | 6,440 | 5,616 | 25,094 | 21,973 |   |\n|  Total operating expenses | 13,201 | 11,388 | 51,345 | 43,887 |   |\n|  Operating income | 24,894 | 23,786 | 119,437 | 108,949 |   |\n|  Other income/(expense), net | (237) | (538) | (334) | 258 |   |\n|  Income before provision for income taxes | 24,657 | 23,248 | 119,103 | 109,207 |   |\n|  Provision for income taxes | 3,936 | 2,697 | 19,300 | 14,527 |   |\n|  Net income | $20,721 | $20,551 | $99,803 | $94,680 |   |\n|  Earnings per share: |  |  |  |  |   |\n|  Basic | $1.29 | $1.25 | $6.15 | $5.67 |   |\n|  Diluted | $1.29 | $1.24 | $6.11 | $5.61 |   |\n|  Shares used in computing earnings per share: |  |  |  |  |   |\n|  Basic | 16,030,382 | 16,487,121 | 16,215,963 | 16,701,272 |   |\n|  Diluted | 16,118,465 | 16,635,097 | 16,325,819 | 16,864,919 |   |\n|  ^{ (1) } Net sales by reportable segment: |  |  |  |  |   |\n|  Americas | $39,808 | $36,820 | $169,658 | $153,306 |   |\n|  Europe | 22,795 | 20,794 | 95,118 | 89,307 |   |\n|  Greater China | 15,470 | 14,563 | 74,200 | 68,366 |   |\n|  Japan | 5,700 | 5,991 | 25,977 | 28,482 |   |\n|  Rest of Asia Pacific | 6,373 | 5,192 | 29,375 | 26,356 |   |\n|  Total net sales | $90,146 | $83,360 | $394,328 | $365,817 |   |\n|  ^{ (1) } Net sales by category: |  |  |  |  |   |\n|  iPhone | $42,626 | $38,868 | $205,489 | $191,973 |   |\n|  Mac | 11,508 | 9,178 | 40,177 | 35,190 |   |\n|  iPad | 7,174 | 8,252 | 29,292 | 31,862 |   |\n|  Wearables, Home and Accessories | 9,650 | 8,785 | 41,241 | 38,367 |   |\n|  Services | 19,188 | 18,277 | 78,129 | 68,425 |   |\n|  Total net sales | $90,146 | $83,360 | $394,328 | $365,817 |   |

Then we process it with LLMWhisperer:

python llmwhisperer_example.py files\apple-finance.pdf

And we get the following output:

Apple Inc.

                   CONDENSED CONSOLIDATED STATEMENTS OF OPERATIONS (Unaudited)
              (In millions, except number of shares which are reflected in thousands and per share amounts)

                                                        Three Months Ended        Twelve Months Ended
                                                    September 24, September 25, September 24, September 25,
                                                       2022          2021         2022          2021
Net sales:
 Products                                           $    70,958 $      65,083 $    316,199 $     297,392
 Services                                                19,188        18,277       78,129        68,425
    Total net sales (1)                                  90,146        83,360      394,328       365,817
Cost of sales:
 Products                                                46,387        42,790      201,471       192,266
 Services                                                 5,664         5,396       22,075        20,715
    Total cost of sales                                  52,051        48,186      223,546       212,981
       Gross margin                                      38,095        35,174      170,782       152,836

Operating expenses:
    Research and development                              6,761         5,772       26,251        21,914
    Selling, general and administrative                   6,440         5,616       25,094        21,973
       Total operating expenses                           13,201       11,388       51,345        43,887

Operating income                                         24,894        23,786      119,437       108,949
Other income/(expense), net                                (237)         (538)        (334)          258
Income before provision for income taxes                 24,657        23,248      119,103       109,207
Provision for income taxes                                3,936         2,697       19,300        14,527
Net income                                          $     20,721 $     20,551 $     99,803 $      94,680

Earnings per share:
    Basic                                           $       1.29 $       1.25 $        6.15 $       5.67
    Diluted                                         $       1.29 $       1.24 $       6.11 $        5.61
Shares used in computing earnings per share:
    Basic                                             16,030,382   16,487,121    16,215,963   16,701,272
    Diluted                                           16,118,465   16,635,097    16,325,819   16,864,919

(1) Net sales by reportable segment:
    Americas                                        $    39,808 $      36,820 $    169,658 $     153,306
    Europe                                               22,795        20,794       95,118        89,307
    Greater China                                         15,470       14,563       74,200        68,366
    Japan                                                  5,700        5,991       25,977        28,482
    Rest of Asia Pacific                                   6,373        5,192       29,375        26,356
       Total net sales                              $    90,146 $      83,360 $    394,328 $     365,817

(1) Net sales by category:
    iPhone                                          $     42,626 $     38,868 $    205,489 $     191,973
    Mac                                                   11,508        9,178        40,177       35,190
    iPad                                                   7,174        8,252       29,292        31,862
    Wearables, Home and Accessories                        9,650        8,785        41,241       38,367
    Services                                              19,188       18,277       78,129        68,425
       Total net sales                              $     90,146 $     83,360 $    394,328 $     365,817
<<<

Comparative Analysis

Here is a short comparison analysis of the two outputs:

  • LLMWhisperer output is cleaner and more readable, preserving the original document’s layout, hierarchy, and logical groupings (e.g. sales, expenses, income).
  • MistralOCR output is messy and markdown-based, with extra columns and formatting noise, making it harder to read or use without clean-up.
  • LLMWhisperer is ideal for human review and AI tasks, while MistralOCR may be better suited for spreadsheet use after significant formatting adjustments.

Document 2 – Contract Application

For this document type, we will use this PDF: 

First, we process it with MistralOCR:

python mistral_ocr_example.py files\contract-application.pdf

Which gives the following output:

# CREDIT APPLICATION AND MASTER SALES AGREEMENT\n\nPlease send your payments to: 19800 MacArthur Blvd., Suite 510 | Irvine | CA | 92612-2480 (949) 999-9337 | AR@pcpipe.com\n\n|  Full Business Legal Name: | Roger Deakins  |\n| --- | --- |\n|  Physical Address: | Roger Deakins  |\n|  Billing Address: |   |\n|  Phone 090-9093-8930 | Buyer\'s Email  |\n|  How long at current address? | Type of Business Entity:  |\n|   | ☑ Sole Proprietorship ☐ Partnership ☐ LLC ☐ Corporation  |\n|  How long under current management? | ☐ Association (nonprofit) ☐ Government or Public Agency ☐ Other  |\n|  Principal Business Activity: | Roger Deakins  |\n|  Owner/Officer | Title  |\n|  Officer | Manager  |\n|  DUNS No.: | Contractor License No. & State:  |\n|  9040332 | 098424  |\n|  If there is a parent company, please provide full legal name: | Vigilant Systems  |\n|  Purchase Order Required? | May we email your invoices?  |\n|  ☑ Yes ☐ No | ☐ Yes ☑ No  |\n|  Accounts Payable (A/P) Contact Name | A/P Contact Phone Number  |\n|  Please feel free to attach Trade References on a separate sheet. You must provide a valid Resale or Tax Exemption Certificate to exclude sales tax from your invoices. Accounts with no activity for three or more years will require a new signed Credit Application. |   |\n\nThe information provided by the applicant (hereinafter, "Buyer") in this Credit Application and Master Sales Agreement (hereinafter, "Agreement") is for the purpose of establishing a commercial credit account with Pacific Corrugated Pipe Company, LLC and its affiliates, subsidiaries, successors, and assignees (hereinafter collectively, "Seller"). Buyer desires to purchase goods and/or services from Seller, and Buyer agrees, in consideration thereof, to be bound by Seller\'s terms and conditions of sale set forth in this Agreement. The undersigned warrants and represents that he/she is authorized to enter into this Agreement on behalf of the Buyer, and that all representations above are accurate, complete and truthful. By signing below, Buyer acknowledges that it has read, understands, and agrees to Seller\'s terms and conditions of sale in this Agreement.\n\n|  Signature | Date  |\n| --- | --- |\n|  Print name | Title  |\n\nPACIFIC CORRUGATED PIPE COMPANY CREDIT APPLICATION AND MASTER SALES AGREEMENT Page 1 of 4

Then we process it with LLMWhisperer:

python llmwhisperer_example.py files\contract-application.pdf

And we get the following output:

PACIFIC CORRUGATED
                                                   PIPE COMPANY, LLC.
                                                 A Subsidiary of Lane Enterprises Holdings, LLC

                   CREDIT APPLICATION AND MASTER SALES AGREEMENT

                                             Please send your payments to:
                             19800 MacArthur Blvd., Suite 510 | Irvine | CA | 92612-2480
                                           (949) 999-9337 | AR@pcpipe.com

   Full Business Legal Name:
                                  Roger Deakins
   Physical Address:
                         Roger Deakins
   Billing Address:

   Phone
             090-9093-8930                                   Buyer's Email     rdeakins@ville12.com
   How long at current address?     Type of Business Entity:
                                      [X] Sole Proprietorship [ ] Partnership [ ] LLC [ ] Corporation
   How long under current             [ ] Association (nonprofit) [ ] Government or Public Agency [ ] Other
   management?

   Principal Business Activity:
                               Roger Deakins

        Owner/Officer              Title                           Address                              Phone
           Officer                                                                                345-4443-4332
                                 Manager          89,   farmville road, FL 694903
   DUNS No .:                  Contractor License No. &       Is this a certified Disadvantaged Business Entity (DBE)?
                               State:
          9040332                 098424                                          [X] Yes [ ] No

   If there is a parent company, please provide full legal name:
                               Vigilant Systems
   Purchase Order Required?    May we email your invoices?    If yes, please provide email address:
         [X] Yes [ ] No                 [ ] Yes [X] No

   Accounts Payable (A/P) Contact Name    A/P Contact Phone Number           A/P Contact Email Address

                              Please feel free to attach Trade References on a separate sheet.
            You must provide a valid Resale or Tax Exemption Certificate to exclude sales tax from your invoices.
                Accounts with no activity for three or more years will require a new signed Credit Application.
   The information provided by the applicant (hereinafter, "Buyer") in this Credit Application and Master Sales Agreement
   (hereinafter, "Agreement") is for the purpose of establishing a commercial credit account with Pacific Corrugated Pipe
   Company, LLC and its affiliates, subsidiaries, successors, and assignees (hereinafter collectively, "Seller"). Buyer desires
   to purchase goods and/or services from Seller, and Buyer agrees, in consideration thereof, to be bound by Seller's terms
   and conditions of sale set forth in this Agreement. The undersigned warrants and represents that he/she is authorized to
   enter into this Agreement on behalf of the Buyer, and that all representations above are accurate, complete and truthful.
   By signing below, Buyer acknowledges that it has read, understands, and agrees to Seller's terms and conditions of sale
   in this Agreement.

   Signature                                                               Date

   Print name                                                              Title

PACIFIC CORRUGATED PIPE COMPANY         CREDIT APPLICATION AND MASTER SALES AGREEMENT                          Page 1 of 4
<<<

Comparative Analysis

Here is a short comparison of the two outputs:

  • LLMWhisperer output closely mirrors the original PDF layout, preserving formatting, structure, and form fields, making it ideal for document automation, review, or AI processing.
  • MistralOCR output is dense and markdown-formatted, with inconsistent table formatting and missing field associations, leading to reduced readability and higher clean-up effort.
  • LLMWhisperer captures visual cues (checkboxes, labels, section breaks) more reliably, giving a more accurate representation of form intent and user input context.
  • MistralOCR may still be usable for quick raw text extraction, but it lacks the polish and structure needed for downstream tasks without significant post-processing.

Document 3 – Handwritten Tax Form

For this document type, we will use this PDF: 

First, we process it with MistralOCR:

python mistral_ocr_example.py files\handwritten-filled-tax-form-photograph.pdf

Which gives the following output:

Form Number: CA530082\n\nForm 5500-EZ\nAnnual Return of A One-Participant (Owners/Partners and\nTheir Spouses) Retirement Plan or A Foreign Plan\nThis form is required to be filed under section 6058(a) of the Internal Revenue Code.\nCertain foreign retirement plans are also required to file this form (see instructions)\nComplete all entries in accordance with the instructions to the Form 5500-EZ.\nGo to www.ire.gov/Form5500EZ for instructions and the latest information.\nOMB No. 1545-1610\n2023\nThis Form is Open\nto Public Inspection.\nForm 5500-EZ\nPrinted Revenue Service\nFor the calendar plan year 2023 or fiscal plan year beginning (MM/DD/YYYY) 01/02/2022and ending 01/02/2023\nA This return is: (1) the first return filed for the plan (3) the final return filed for the plan\n(2) an amended return (4) a short plan year return (less than 12 months)\nB Check box if filing under [ ] Form 5558 [ ] automatic extension\n[ ] special extension (enter description)\nC If this return is for a foreign plan, check this box (see instructions)\nD If this return is for the IRS Late Filer Penalty Relief Program, check this box\n(Must be filed on a paper Form with the IRS. See instructions).\nE If this is a retroactively adopted plan permitted by SECURE Act section 201, check here\nPart II Basic Plan Information - enter all requested information.\n1a Name of plan\nAnnual Return plan\n1b Three-digit\nplan number (PN) 586\n1c Date plan first became effective\n(MM/DD/YYYY)\n02/05/2022\n2a Employer\'s name Acme Corp Software\nTrade name of business (if different from name of employer)\n2b Employer Identification Number (EIN)\n(Do not enter your Social Security Number)\n735268329\nIn care of name\nMailing address (room, apt., suite no. and street, or P.O. box)\n235, Park Street Avenue, FL\nCity or town, state or province, country, and ZIP or foreign postal code (if foreign, see instructions)\nFL 63052\n3a Plan administrator\'s name (if same as employer, enter "Same")\nIn care of name\nMailing address (room, apt., suite no. and street, or P.O. box)\nCity or town, state or province, country, and ZIP or foreign postal code (if foreign, see instructions)\n4 If the employer\'s name, the employer\'s EIN, and/or the plan name has changed since the\nlast return filed for this plan, enter the employer\'s name and EIN, the plan name, and the\nplan number for the last return in the appropriate space provided\na Employer\'s name\n4b EIN 5732900\n4c Plan name\n4d PN\n5a(1) Total number of participants at the beginning of the plan year\n5a(1) 10\na(2) Total number of active participants at the beginning of the plan year\n5a(2) 9\nb(1) Total number of participants at the end of the plan year\n5b(1) 5\nb(2) Total number of active participants at the end of the plan year\n5b(2)\nc Number of participants who terminated employment during the plan year with accrued accrued\nbenefits that were less than 100% vested\n5c 2\nPart III Financial Information\n(1) Beginning of year\n(2) End of year\n6a Total plan assets\n6a $50000 $60000\nb Total plan liabilities\n6b $4000 $5000\nc Net plan assets (subtract line 6b from 6a)\n6c\nFor Privacy Act and Paperwork Reduction Act Notice, see the Instructions for Form 5500-EZ.\nCatalog Number 63263R Form 5500-EZ (2023)

Then we process it with LLMWhisperer:

python llmwhisperer_example.py files\handwritten-filled-tax-form-photograph.pdf

And we get the following output:

Form         Number:              CA530082

   Form 5500-EZ              Annual Return of A One-Participant (Owners/Partners and                       OMB No. 1545-1610
                                  Their Spouses) Retirement Plan or A Foreign Plan
                                This form is required to be filed under section 6058(a) of the Internal Revenue Code. 2023
                                 Certain foreign retirement plans are also required to file this form (see instructions).
   Department of the Treasury    Complete all entries in accordance with the instructions to the Form 5500-EZ. This Form is Open
   Internal Revenue Service
                                  Go to www.irs.gov/Form5500EZ for instructions and the latest information. to Public Inspection.
    Part      Annual Return Identification Information
   For the calendar plan year 2023 or fiscal plan year beginning (MM/DD/YYYY) 01/02/202 Zand ending 01/02/2023
     A   This return is: (1) the first return filed for the
                            [X]                       plan         (3) [ ] the final return filed for the plan
                         (2) [ ] an amended return                 (4) [ ] a short plan year return (less than 12 months)
     B   Check box if filing under [ ] Form 5558 [ ] automatic extension
                                  [ ] special extension (enter description)
     C    If this return is for a foreign plan, check this box (see instructions)                                         [ ]
     D    If this return is for the IRS Late Filer Penalty Relief Program, check this box
         (Must be filed on a paper Form with the IRS. See instructions).                                                  [ ]
     E    If this is a retroactively adopted plan permitted by SECURE Act section 201, check here                         [ ]
   Part II    Basic Plan Information - enter all requested information.
     1a Name of plan                                                                   1b Three-digit
                                                                                          plan number (PN)      586
                                                                                      1c Date plan first became effective
           Annual             Return Plan
                                                                                          (MM/DD/YYYY)
                                                                                           02/05/2022
     2a Employer's name                                                               2b Employer Identification Number (EIN)
                            Acme          Corp Software                                   (Do not enter your Social Security Number)
         Trade name of business (if different from name of employer)                       735268329
                                                                                      2c Employer's telephone number
         In care of name                                                                   011536259
                                                                                      2d Business code (see instructions)
         Mailing address (room, apt., suite no. and street, or P.O. box)
          235,       Park       Street Avenue, FL
         City or town, state or province, country, and ZIP or foreign postal code (if foreign, see instructions)
                  FL        63052
    3a   Plan administrator's name (if same as employer, enter "Same")                3b Administrator's EIN
                                                                                                             532678
         In care of name                                                              3c Administrator's telephone number

        Mailing address (room, apt., suite no. and street, or P.O. box)

        City or town, state or province, country, and ZIP or foreign postal code (if foreign, see instructions)

    4    If the employer's name, the employer's EIN, and/or the plan name has changed since the
        last return filed for this plan, enter the employer's name and EIN, the plan name, and the
        plan number for the last return in the appropriate space provided
     a Employer's name                                                                         4b EIN    5732900

   4c   Plan name                                                                              4d PN

                                                                                               5a(1)     10
   5a(1) Total number of participants at the beginning of the plan year
                                                                                               5a(2)      8
    a(2) Total number of active participants at the beginning of the plan year
                                                                                               5b(1)      5
    b(1) Total number of participants at the end of the plan year
    b(2) Total number of active participants at the end of the plan year                       5b(2)
                                                                                with accrued
    c   Number of participants who terminated employment during the plan year
       benefits that were less than 100% vested                                                 5c      2
 Part III   Financial Information
                                                                                     (1) Beginning of year   (2) End of year
                                                                               6a    $   50000             $   60000
   6a Total plan assets
                                                                               6b    $    4000              $ 5000
    b Total plan liabilities
                                            6a)                                6c
    c Net plan assets (subtract line 6b from
                                                 see the Instructions for Form 5500-EZ. Catalog Number 63263R Form 5500-EZ (2023)
For Privacy Act and Paperwork Reduction Act Notice,
<<<

Comparative Analysis

Here is a short comparison of the two outputs:

  • LLMWhisperer output preserves the original PDF’s structure and layout, including section headers, field labels, checkboxes, and multi-column alignment, giving a near-replica of the official form. This is ideal for automation, auditing, or downstream LLM tasks.
  • MistralOCR output is dense and flattened, losing visual hierarchy and often grouping or omitting field labels. While it extracts most of the raw text content, it lacks structured formatting, making parsing difficult.
  • LLMWhisperer handles scanned handwritten forms well, maintaining field relationships and numeric entries (e.g., plan assets, participant counts) even in noisy or low-quality inputs.
  • MistralOCR remains suitable for basic raw text retrieval, but its output will need heavy post-processing for structured applications or data extraction pipelines.

Document 4 – Loan Estimate Form

For this document type, we will use this PDF: 

First, we process it with MistralOCR:

python mistral_ocr_example.py files\loan-estimate-filled.pdf

Which gives the following output:

# FICUS BANK\n\n4321 Random Boulevard • Somecity, ST 12340\n\nSave this Loan Estimate to compare with your Closing Disclosure.\n\n## Loan Estimate\n\n**DATE ISSUED** 2/15/2013\n\n**APPLICANTS** Michael Jones and Mary Stone 123 Anywhere Street Anytown, ST 12345\n\n**PROPERTY** 456 Somewhere Avenue Anytown, ST 12345\n\n**SALE PRICE** $180,000\n\n**LOAN TERM** 30 years\n\n**PURPOSE** Purchase\n\n**PRODUCT** Fixed Rate\n\n**LOAN TYPE** ☑ Conventional ☐ FHA ☐ VA ☐ ***_***_____\n\n**LOAN ID #** 123456789\n\n**RATE LOCK** ☐ NO ☑ YES, until 4/16/2013 at 5:00 p.m. EDT\n\nBefore closing, your interest rate, points, and lender credits can change unless you lock the interest rate. All other estimated closing costs expire on 3/4/2013 at 5:00 p.m. EDT\n\n## Loan Terms\n\n|   |  | Can this amount increase after closing?  |\n| --- | --- | --- |\n|  **Loan Amount** | **$162,000** | **NO**  |\n|  **Interest Rate** | **3.875%** | **NO**  |\n|  **Monthly Principal & Interest** | **$761.78** | **NO**  |\n\nSee Projected Payments below for your Estimated Total Monthly Payment\n\n**Does the loan have these features?**\n\n**YES** • As high as $3,240 if you pay off the loan during the first 2 years\n\n**Balloon Payment** **NO**\n\n## Projected Payments\n\n|  Payment Calculation | Years 1-7 | Years 8-30  |\n| --- | --- | --- |\n|  Principal & Interest | $761.78 | $761.78  |\n|  Mortgage Insurance | + 82 | + --  |\n|  Estimated Escrow Amount can increase over time | + 206 | + 206  |\n|  Estimated Total Monthly Payment | $1,050 | $968  |\n\n**This estimate includes** ☑ Property Taxes ☐ Homeowner's Insurance\n\n**In escrow?**\n\n**YES** ☐ Other\n\nSee Section G on page 2 for escrowed property costs. You must pay for other property costs separately.\n\n## Estimated Taxes, Insurance & Assessments\n\nAmount can increase over time\n\n**$206** a month\n\n**Estimated Cash to Close**\n\n**$16,054** Includes $5,672 in Loan Costs + $2,382 in Other Costs - $0 in Lender Credits. See page 2 for details.\n\n**In Lender Credits.** See page 2 for details.\n\n**In escrow?**\n\n**YES** ☐ Other\n\nSee Section G on page 2 for escrowed property costs. You must pay for other property costs separately.\n\nVisit www.consumerfinance.gov/mortgage-estimate for general information and tools.\n\n**LOAN ESTIMATE**\n\n**PAGE 1 OF 3** • **LOAN ID # 123456789**", images=[], dimensions=OCRPageDimensions(dpi=200, height=2339, width=1653)), OCRPageObject(index=1, markdown="# Closing Cost Details\n\n|  Loan Costs |   |\n| --- | --- |\n|  A. Origination Charges | $\\$ 1,802$  |\n|  .25 \\% of Loan Amount (Points) | $\\$ 405$  |\n|  Application Fee | $\\$ 300$  |\n|  Underwriting Fee | $\\$ 1,097$  |\n\n|  B. Services You Cannot Shop For | $\\$ 672$  |\n| --- | --- |\n|  Appraisal Fee | $\\$ 405$  |\n|  Credit Report Fee | $\\$ 30$  |\n|  Flood Determination Fee | $\\$ 20$  |\n|  Flood Monitoring Fee | $\\$ 32$  |\n|  Tax Monitoring Fee | $\\$ 75$  |\n|  Tax Status Research Fee | $\\$ 110$  |\n\n|  C. Services You Can Shop For | $\\$ 3,198$  |\n| --- | --- |\n|  Pest Inspection Fee | $\\$ 135$  |\n|  Survey Fee | $\\$ 65$  |\n|  Title - Insurance Binder | $\\$ 700$  |\n|  Title - Lender's Title Policy | $\\$ 535$  |\n|  Title - Settlement Agent Fee | $\\$ 502$  |\n|  Title - Title Search | $\\$ 1,261$  |\n\n|  D. TOTAL LOAN COSTS $(A+B+C)$ | $\\$ 5,672$  |\n| --- | --- |\n|  |   |\n\n|  Other Costs |   |\n| --- | --- |\n|  E. Taxes and Other Government Fees | $\\$ 85$  |\n|  Recording Fees and Other Taxes | $\\$ 85$  |\n|  Transfer Taxes | $\\$ 867$  |\n|  F. Prepaids | $\\$ 665$  |\n|  Homeowner's Insurance Premium ( 6 months) | $\\$ 605$  |\n|  Mortgage Insurance Premium ( months) | $\\$ 262$  |\n|  Prepaid Interest ( $\\$ 17.44$ per day for 15 days @ 3.875\\%) |   |\n|  Property Taxes ( months) | $\\$ 413$  |\n|  G. Initial Escrow Payment at Closing | $\\$ 202$  |\n|  Homeowner's Insurance $\\$ 100.83$ per month for 2 mo. | $\\$ 211$  |\n|  Mortgage Insurance per month for mo. |   |\n|  Property Taxes $\\$ 105.30$ per month for 2 mo. | $\\$ 1,017$  |\n\n|  H. Other | $\\$ 1,017$  |\n| --- | --- |\n|  Title - Owner's Title Policy (optional) | $\\$ 1,017$  |\n\n|  I. TOTAL OTHER COSTS (E + F + G + H) | $\\$ 2,382$  |\n| --- | --- |\n|  J. TOTAL CLOSING COSTS | $\\$ 8,054$  |\n|  D + I | $\\$ 8,054$  |\n|  Lender Credits |   |\n|  Calculating Cash to Close |   |\n|  Total Closing Costs (J) | $\\$ 8,054$  |\n|  Closing Costs Financed (Paid from your Loan Amount) | $\\$ 0$  |\n|  Down Payment/Funds from Borrower | $\\$ 18,000$  |\n|  Deposit | $-\\$ 10,000$  |\n|  Funds for Borrower | $\\$ 0$  |\n|  Seller Credits | $\\$ 0$  |\n|  Adjustments and Other Credits | $\\$ 0$  |\n|  Estimated Cash to Close | $\\$ 16,054$  |

Then we process it with LLMWhisperer:

python llmwhisperer_example.py files\loan-estimate-filled.pdf

And we get the following output:

FICUS BANK 
4321 Random Boulevard . Somecity, ST 12340                  Save this Loan Estimate to compare with your Closing Disclosure. 

Loan Estimate                                               LOAN TERM    30 years 
                                                            PURPOSE      Purchase 
DATE ISSUED   2/15/2013                                     PRODUCT      Fixed Rate 
APPLICANTS    Michael Jones and Mary Stone                  LOAN TYPE    [X] Conventional [ ] FHA [ ] VA [ ] 
              123 Anywhere Street                           LOAN ID #    123456789 
              Anytown, ST 12345                             RATE LOCK    [ ] NO [X] YES, until 4/16/2013 at 5:00 p.m. EDT 
PROPERTY      456 Somewhere Avenue                                       Before closing, your interest rate, points, and lender credits can 
              Anytown, ST 12345                                          change unless you lock the interest rate. All other estimated 
SALE PRICE    $180,000                                                   closing costs expire on 3/4/2013 at 5:00 p.m. EDT 

 Loan Terms                                                 Can this amount increase after closing?

 Loan Amount                        $162,000                NO

 Interest Rate                      3.875%                  NO

 Monthly Principal & Interest       $761.78                 NO
 See Projected Payments below for your
 Estimated Total Monthly Payment

                                                            Does the loan have these features?

 Prepayment Penalty                                         YES     . As high as $3,240 if you pay off the loan during the
                                                                      first 2 years

 Balloon Payment                                            NO

 Projected Payments

 Payment Calculation                               Years 1-7                                   Years 8-30

  Principal & Interest                              $761.78                                      $761.78

  Mortgage Insurance                          +        82                                  +        -
  Estimated Escrow                            +       206                                  +      206
  Amount can increase over time

  Estimated Total
  Monthly Payment                                  $1,050                                        $968

                                                         This estimate includes                      In escrow?
                                                         [X] Property Taxes                          YES
 Estimated Taxes, Insurance         $206                                                             YES
                                                         [X]
 & Assessments                                              Homeowner's Insurance
 Amount can increase over time      a month              [ ] Other:
                                                         See Section G on page 2 for escrowed property costs. You must pay for other
                                                         property costs separately.

 Costs at Closing

 Estimated Closing Costs            $8,054         Includes $5,672 in Loan Costs + $2,382 in Other Costs - $0
                                                   in Lender Credits. See page 2 for details.

 Estimated Cash to Close            $16,054        Includes Closing Costs. See Calculating Cash to Close on page 2 for details.

                 Visit www.consumerfinance.gov/mortgage-estimate for general information and tools.
LOAN ESTIMATE                                                                                PAGE 1 OF 3 . LOAN ID # 123456789
<<<


Closing Cost Details

Loan Costs                                                Other Costs
A. Origination Charges                         $1,802     E. Taxes and Other Government Fees                $85
 .25 % of Loan Amount (Points)                   $405     Recording Fees and Other Taxes                    $85
Application Fee                                  $300     Transfer Taxes
Underwriting Fee                                $1,097
                                                          F. Prepaids                                      $867
                                                          Homeowner's Insurance Premium ( 6 months)        $605
                                                          Mortgage Insurance Premium ( months)
                                                          Prepaid Interest ( $17.44 per day for 15 days @ 3.875%) $262
                                                          Property Taxes ( months)

                                                          G. Initial Escrow Payment at Closing             $413
                                                          Homeowner's Insurance $100.83 per month for 2 mo. $202
                                                          Mortgage Insurance           per month for mo.
B. Services You Cannot Shop For                 $672      Property Taxes        $105.30 per month for 2 mo. $211
Appraisal Fee                                    $405
Credit Report Fee                                 $30
Flood Determination Fee                           $20
Flood Monitoring Fee                              $32
Tax Monitoring Fee                                $75
Tax Status Research Fee                          $110     H. Other                                       $1,017
                                                          Title - Owner's Title Policy (optional)         $1,017

                                                          I. TOTAL OTHER COSTS (E + F + G + H)           $2,382

C. Services You Can Shop For                   $3,198
                                                          J. TOTAL CLOSING COSTS                         $8,054
Pest Inspection Fee                              $135
                                                           D +                                            $8,054
Survey Fee                                        $65
Title - Insurance Binder                         $700      Lender Credits
Title - Lender's Title Policy                    $535
Title - Settlement Agent Fee                     $502     Calculating Cash to Close
Title - Title Search                            $1,261
                                                           Total Closing Costs (J)                        $8,054
                                                           Closing Costs Financed (Paid from your Loan Amount) $0
                                                           Down Payment/Funds from Borrower              $18,000
                                                           Deposit                                     - $10,000
                                                           Funds for Borrower                                $0
                                                           Seller Credits                                    $0
                                                           Adjustments and Other Credits                     $0

D. TOTAL LOAN COSTS (A + B + C)                $5,672      Estimated Cash to Close                       $16,054

LOAN ESTIMATE                                                                        PAGE 2 OF 3 . LOAN ID # 123456789
<<<

Comparative Analysis

Here is a short comparison of the two outputs:

  • LLMWhisperer output preserves the form’s original layout and structured formatting, including headers, field groupings, checkbox states, and multi-column alignments. This results in a highly legible, form-like digital replica that’s ideal for automation, audits, or downstream LLM tasks.
  • MistralOCR output is flattened into markdown, which captures raw text and some table formatting but loses precise alignment and structure. Checkbox indicators (e.g. ☑, ☐) are retained, but contextual relationships between fields are less clear.
  • LLMWhisperer clearly separates sections like “Loan Terms”, “Projected Payments”, and “Closing Cost Details”, maintaining their hierarchy and labels, which makes it easier to extract structured data directly from the output.
  • MistralOCR does capture most numerical and textual content, but interleaves information and occasionally omits or misplaces details like field headers or line-level alignment, making post-processing more challenging.
  • LLMWhisperer reproduces form tables more faithfully, e.g., preserving calculations and line items in the “Loan Costs” and “Other Costs” tables, including complex breakdowns and summations.
  • MistralOCR remains usable for raw content retrieval, but its output lacks precision required for structured data pipelines, and would require significant transformation or rule-based cleaning to be automation-ready.

Document 5 – Logistics Packing List (Skewed Scan)

For this document type, we will use this PDF: 

First, we process it with MistralOCR:

python mistral_ocr_example.py files\logistics_packing_list_scanned.pdf

Which gives the following output:

# Packing List\n\n**Shipper/Exporter:** Faculty of Arts\n5 Washington Square S, New York, NY 10012, USA\n\n**Commercial Invoice No.:** 894933\n\n**Total number of Packages:** 32\n\n**Total Gross Weight (Lbs):**\n- Total Gross Weight (Kgs): 35\n- Total Net Weight (Lbs): 34343454\n- Total Net Weight (Kgs): 12/12/2023\n- Total Cubic Feet: 1.020\n- Total Cubic Meters: 1.020\n\n**Total Lbs (Kgs):**\n- Total Lbs: 12\n- Total Cubic Meters: 1.020\n\n**Total Number of Packages:** 32\n\n**Total Gross Weight (Lbs):**\n- Total Gross Weight (Kgs): 35\n- Total Net Weight (Lbs): 34343454\n- Total Lbs (Kgs): 12/12/2023\n- Total Cubic Feet: 1.020\n\n**Total Meters:** 1.020\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Number of Packages:** 32\n\n**Total Gross Weight (Lbs):**\n- Total Gross Weight (Kgs): 35\n- Total Net Weight (Lbs): 34343454\n- Total Lbs (Kgs): 12/12/2023\n- Total Cubic Meters: 1.020\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Number of Packages:** 32\n\n**Total Gross Weight (Lbs):**\n- Total Gross Weight (Kgs): 35\n- Total Net Weight (Lbs): 34343454\n- Total Lbs (Kgs): 12/12/2023\n- Total Cubic Feet: 1.020\n\n**Total Meters:** 1.020\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Meters:** 1.020\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Meters:** 1.020\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Meters:** 1.020\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Meters:** 1.020\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Meters:** 1.020\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Meters:** 1.020\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Meters:** 1.020\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs (Kgs):**\n- Meters: 12\n- Gross Weight: 15\n\n**Total Lbs

Then we process it with LLMWhisperer:

python llmwhisperer_example.py files\logistics_packing_list_scanned.pdf

And we get the following output:

 Packing List 

Shipper/ Exporter: Faculty of Arts Ultimate Consignee: Herald Corp Bill To: Faculty of Arts      Intermediate Consignee 
           5 Washington                      28 OLD                  5 Washington Square S, 
           Square S, New York,               BROMPTON                New York, NY 10012, USA
           NY 10012, USA                     ROAD, SOUTH
                                             KENSINGTON
Commercial Invoice No .: 894933 Total number of Packages: 32     Transportation: AIR CARGO UPS
Order No .:                      Total Gross Weight (Lbs):
                                 Total Gross Weight (Kgs): 35
AWB/BL Number:                   Total Net Weight (Lbs):
Date Of Shipment:
                                 Total net Weight (Kgs):         Conditions of Sale and Terms of Payment: If unpaid after 15 days, ALL
Currency:          34343454      Total Cubic Feet:
Freight:           12/12/2023    Total Cubic Meters: 1020                                        may dispose of the goods
                   Dollars                                                                       per Clause 5.
                                                                                                                 Per package
             Item                                                                                                gross weight
 Shipment            Item Description, Sales Order No., Customer Shipped Packaging
            Number                                                                           Dimensions
 Line No.            PO No.                                 Quantity    Type           Inches      centimeters
                                                                                    L    W   H    L   W    H    LBS. KGS.
12        1          Print packaging, 23445                10          Box          12 12    12                       10
13        2          Print packaging, 345232               10          Box          16 16    16                       20
14        3          Black Ink cartridges, 342900          20          Glass        10 8     8                        15

Country of Origin:
Marks:       USA

Note: These commodities, Technology or software were exported from the United States in accordance with the Export Administration Regulations.
Diversion contrary to U.S. law is prohibited.

Signature:                                      Date:
<<<

Comparative Analysis

Here is a short comparison of the two outputs:

  • LLMWhisperer output accurately reconstructs the original document’s layout, preserving field labels, values, and visual grouping despite skewed input. Key metadata like “Shipper/Exporter”, invoice number, weights, and dimensional metrics are clearly separated and aligned for easy extraction.
  • MistralOCR output becomes highly repetitive and redundant, with field-value pairs duplicated dozens of times, likely due to skewed OCR artifacts. The layout collapses into a dense, flattened markdown blob with minimal structure and extreme noise.
  • LLMWhisperer captures numerical precision across grouped fields, including net/gross weights, volumetric metrics (cubic feet/meters), and packaging counts. These are preserved in tabular-style formatting, making downstream parsing feasible.
  • MistralOCR fails to de-duplicate or disambiguate fields, resulting in dozens of repeated blocks like “Total Lbs (Kgs): – Meters: 12 – Gross Weight: 15”, losing clarity and introducing conflicting values.
  • LLMWhisperer maintains semantic hierarchy and document boundaries, including logical divisions between header metadata and totals. Even with a skewed scan, its output mimics the intended visual structure.
  • MistralOCR output is unusable in its raw form for automation, due to severe duplication, formatting noise, and loss of document structure. Manual cleaning or heavy post-processing would be required to extract reliable data.

Document 6 – Excel File

For this document type, we will use this Excel file: 

First, we process it with MistralOCR:

python mistral_ocr_example.py files\uber.xlsx

Which gives the following output:

File "D:\GitHub\LLMWhisperer-MistralOCR-Comparison\mistral_ocr_example.py", line 52, in <module>
    ocr_response = ocr_pdf(base64_pdf)
                   ^^^^^^^^^^^^^^^^^^^
  File "D:\GitHub\LLMWhisperer-MistralOCR-Comparison\mistral_ocr_example.py", line 30, in ocr_pdf
    ocr_response = client.ocr.process(
                   ^^^^^^^^^^^^^^^^^^^
  File "d:\GitHub\LLMWhisperer-MistralOCR-Comparison\.venv\Lib\site-packages\mistralai\ocr.py", line 129, in process
    raise models.SDKError(
mistralai.models.sdkerror.SDKError: API error occurred: Status 400
{"object":"error","message":"Invalid document type. application/vnd.openxmlformats-officedocument.spreadsheetml.sheet is not supported.","type":"invalid_file","param":null,"code":"1901"}

As you can see from the error message, MistralOCR doesn’t support Excel files natively.

Then we process it with LLMWhisperer:

python llmwhisperer_example.py files\uber.xlsx

And we get the following output:

Sheet name:Income statements





 Uber Technologies, Inc. (NYSE:UBER) > Financials > Income Statement                                                                                            


                                                                 ───────────────────────────────────────────                 ────────────────────────────────── 

 In Millions of the reported currency, except per share items.  Template:        Standard                                   Restatement:     Latest Filings     

                                                                                                                             ────────────────────────────────── 

                                                                Period Type:     Annual                                     Order:           Latest on Right    

                                                                                                                             ────────────────────────────────── 

                                                                Currency:        Reported Currency                          Conversion:      Today's Spot Rate  

                                                                                                                             ────────────────────────────────── 

                                                                Units:           S&P Capital IQ (Default)                   Decimals:        Capital IQ (Default)
                                                                                                                             ────────────────────────────────── 

                                                                Source:          Capital IQ & Proprietary Da                                                    

                                                                 ───────────────────────────────────────────                                                    



                                                                                  ──────────────────────────                                                    

 Income Statement                                                                                                                                               

                                                                 ─────────────────                                                                              

                                                                    Reclassified              Reclassified     Reclassified     Reclassified     Reclassified   

                                                                       12 months                 12 months        12 months        12 months        12 months              12 months              12 months
 For the Fiscal Period Ending                                        Dec-31-2016               Dec-31-2017      Dec-31-2018      Dec-31-2019      Dec-31-2020            Dec-31-2021            Dec-31-2022
 Currency                                                                    USD                       USD              USD              USD              USD                    USD                    USD

 Revenue                                                                 3,845.0                   7,932.0         10,433.0         13,000.0         11,139.0               17,455.0               31,877.0
 Other Revenue                                                                 -                         -                -                -                -                      -                      -
                                                                 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   Total Revenue                                                         3,845.0                   7,932.0         10,433.0         13,000.0         11,139.0               17,455.0               31,877.0

 Cost Of Goods Sold                                                      3,109.0                   5,514.0          6,302.0          8,363.0          6,801.0               11,228.0               22,072.0
                                                                 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   Gross Profit                                                            736.0                   2,418.0          4,131.0          4,637.0          4,338.0                6,227.0                9,805.0

 Selling General & Admin Exp.                                            2,575.0                   4,564.0          5,036.0          7,925.0          6,144.0                7,105.0                7,892.0
 R & D Exp.                                                                864.0                   1,201.0          1,505.0          4,836.0          2,120.0                2,054.0                2,798.0
 Depreciation & Amort.                                                     320.0                     510.0            426.0            472.0            575.0                  902.0                  947.0
 Other Operating Expense/(Income)                                              -                         -                -                -                -                      -                      -

                                                                 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   Other Operating Exp., Total                                           3,759.0                   6,275.0          6,967.0         13,233.0          8,839.0               10,061.0               11,637.0

   Operating Income                                                    (3,023.0)                 (3,857.0)        (2,836.0)        (8,596.0)        (4,501.0)              (3,834.0)              (1,832.0)

 Interest Expense                                                        (334.0)                   (479.0)          (648.0)          (559.0)          (458.0)                (483.0)                (565.0)
 Interest and Invest. Income                                                22.0                      71.0            104.0            234.0             55.0                   37.0                  139.0
                                                                 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   Net Interest Exp.                                                     (312.0)                   (408.0)          (544.0)          (325.0)          (403.0)                (446.0)                (426.0)

 Income/(Loss) from Affiliates                                                 -                         -           (42.0)           (34.0)           (34.0)                 (37.0)                  107.0
 Currency Exchange Gains (Loss)                                           (91.0)                      42.0           (45.0)           (40.0)          (128.0)                 (67.0)                (147.0)
 Other Non-Operating Inc. (Exp.)                                           208.0                   (129.0)          (428.0)             82.0             59.0                (230.0)                (213.0)
                                                                 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   EBT Excl. Unusual Items                                             (3,218.0)                 (4,352.0)        (3,895.0)        (8,913.0)        (5,007.0)              (4,614.0)              (2,511.0)

 Restructuring Charges                                                         -                         -                -                -          (362.0)                      -                      -
 Impairment of Goodwill                                                        -                         -                -                -                -                      -                      -
 Gain (Loss) On Sale Of Invest.                                                -                         -          1,996.0              2.0        (1,815.0)                1,626.0              (6,822.0)
 Gain (Loss) On Sale Of Assets                                                 -                         -          3,214.0                -            204.0                1,684.0                   14.0
 Asset Writedown                                                               -                   (223.0)          (197.0)                -                -                      -                      -
 Other Unusual Items                                                           -                         -            152.0            444.0                -                  242.0                      -
                                                                 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   EBT Incl. Unusual Items

Comparative Analysis

Here is a short comparison of the two outputs:

  • LLMWhisperer successfully processes Excel files, extracting structured tabular data from the spreadsheet with clear column headers, numeric precision, and financial hierarchy intact. It interprets fiscal periods, currency units, and even metadata like source and reporting standards.
  • MistralOCR does not support Excel files at all, returning a hard API error indicating that application/vnd.openxmlformats-officedocument.spreadsheetml.sheet is not a supported MIME type. The pipeline halts entirely and provides no output for this file format.
  • LLMWhisperer converts the spreadsheet into a human-readable financial report, preserving layout features like multi-year comparisons, currency formatting, and clearly separated revenue, cost, and profit lines; mimicking a professional financial statement.
  • MistralOCR fails to offer a fallback or workaround, meaning Excel support would require format conversion (e.g., saving as PDF) before even attempting extraction, increasing pre-processing burden.
  • LLMWhisperer provides value for structured financial data ingestion, producing reliable output directly usable for analytics, dashboards, or automated pipelines.
  • MistralOCR’s lack of Excel compatibility limits its applicability for enterprise use cases involving tabular financial, inventory, or logistics data typically stored in spreadsheets.

Comparative Findings Summary

Feature / Document TypeMistral OCRLLMWhisperer (Unstract)
Layout Preservation⚠️ Collapses layout on scans or OCR noise✅ Reconstructs layout even in skewed scans
Table Extraction⚠️ Struggles with structured tables✅ Schema-aware tabular output
Checkboxes / Radios⚠️ Detected but lacks structure✅ Parsed with semantic structure
Handwriting Support⚠️ Very limited, poor accuracy✅ Parses mixed print + handwriting
Document Boundaries❌ Flattened, lacks clear sections✅ Maintains headers, sections, totals
Field Deduplication❌ Redundant values repeated excessively✅ Clean de-duplication and semantic grouping
Numerical Accuracy⚠️ Often ambiguous due to layout collapse✅ Preserves precision across columns
Image/Scan Robustness⚠️ Struggles with low quality or skew✅ Layout and data still reconstructed
Excel File Support❌ Not supported (API error)✅ Reads .xlsx directly, extracts data
Hallucination Risk❌ High — spurious or repeated data✅ Controlled, factual extraction

LLMWhisperer (Unstract) consistently outperforms Mistral OCR across all document types.

It accurately preserves layout, handles skewed scans, extracts tables with schema-awareness, and cleanly parses fields; including checkboxes, handwritten notes, and Excel files. Its output is structured with strong deduplication and numerical precision.

In contrast, Mistral OCR often collapses layout, produces redundant and noisy markdown, fails on Excel input, and struggles with unstructured or low-quality scans. While it does detect checkboxes, it lacks contextual grouping and has a high hallucination risk.

Use Cases and Recommendations

Choosing the right tool depends on the complexity of your documents and the end use of the extracted data.

When to Use Mistral OCR:

  • ✅ For clean, digital documents (e.g., basic PDFs with standard fonts and layout).
  • ⚡ When you need fast Markdown output without additional processing.
  • 📄 Suitable for simple extraction tasks where layout fidelity and field grouping aren’t critical.

When to Choose LLMWhisperer (Unstract):

  • 🧾 For scanned, skewed, or handwritten documents requiring layout-aware parsing.
  • 📊 When tables, checkboxes, or multi-format inputs (PDF, DOCX, XLSX) are involved.
  • 🧠 Ideal for automation pipelines where structured or schema-mapped output is required.
  • 🏗️ Useful in document-heavy domains like logistics, legal, healthcare, and finance where accuracy and data structure are essential.

In summary, use Mistral OCR for fast, simple jobs, and LLMWhisperer when precision, structure, and robustness are key.

Cost and Deployment Considerations

Mistral OCR

  • Deployment: SaaS-only, no On-Premise support.
  • Pricing: Flat-rate model at $1/1,000 pages for OCR, $3/1,000 pages for annotations.
  • Customization: Limited to built-in SaaS capabilities, with minimal tuning options.

Mistral OCR is simple and cost-effective for lightweight needs but lacks deployment versatility.

LLMWhisperer (Unstract)

  • Deployment: Highly flexible, since it supports On-Premise deployments, SaaS plans, and enterprise/private cloud options.
  • Pricing: Tiered per processing mode, ranging from $1 to $15 per 1,000 pages, depending on quality and form understanding.
  • Customization: Full control with On-Premise setups, plus rich API options and structured output formats.

LLMWhisperer supports diverse deployment strategies and pricing models, making it well-suited for both developers and enterprise teams.

Conclusion

Mistral OCR is a fast and easy solution for basic text extraction needs which makes it ideal for straightforward, low-complexity use cases.

However, for organizations that require accurate layout retention, form parsing, structured outputs, and minimal hallucination, LLMWhisperer stands out as the superior choice.

With enterprise-ready features, On-Premise deployment options, and support for complex document types, LLMWhisperer is purpose-built for production environments and high-stakes automation workflows.

UNSTRACT
End Manual Document Processing

Leveraging AI to Convert Unstructured Documents into Usable Data

Leveraging AI to Convert Unstructured Documents into Usable Data
Get complex documents ready for LLM consumption

RELATED READS

About Author
Picture of Nuno Bispo

Nuno Bispo

Nuno Bispo is a Senior Software Engineer with more than 15 years of experience in software development. He has worked in various industries such as insurance, banking, and airlines, where he focused on building software using low-code platforms. For the past years, Nuno has been improving his skills in Python and Django and has worked as a freelance consultant on many international projects and written several innovative articles on his blog. Currently, Nuno works as an Integration Architect for a major multinational corporation. He has a degree in Computer Engineering.