Docling vs. LLMWhisperer: The Best Docling Alternative

Table of Contents

Introduction

Optical Character Recognition (OCR) and document conversion technologies have come a long way since their inception.

Originally developed to transform printed text into machine-readable formats, OCR has evolved from simple pattern-matching techniques to sophisticated systems capable of handling diverse document types—from neatly printed pages to complex, handwritten notes.

One of the most crucial challenges in this evolution is the preservation of layout: retaining the original structure, formatting, and context of documents is key for many applications, whether it’s for archiving, automated data extraction, or enabling seamless integration with modern language models.

In this context, two notable tools exist: IBM’s Docling and LLMWhisperer.

Docling is designed to convert documents into markdown while preserving the layout. Its ability to maintain formatting makes it particularly appealing for projects where the visual structure of documents—such as purchase orders or reports—is important. However, Docling tends to struggle with tasks that go beyond digital text, such as parsing scanned documents, handwritten content, or images captured by a camera.

LLMWhisperer, on the other hand, leverages advanced OCR techniques enhanced by deep learning to excel at complex tasks. It not only handles traditional printed text but also demonstrates superior performance in recognizing handwriting and extracting structured data like tables, forms, checkboxes, and radio buttons. Its context-aware approach reduces the need for extensive pre- or post-processing, making it highly versatile across different document types.

The purpose of this article is to explore how these two tools perform across various document scenarios. Key questions include: How does each tool manage layout preservation and OCR accuracy?

In this article, we will:

  • Walk through the setup processes for both IBM’s Docling and LLMWhisperer.
  • Explore and evaluate key features using sample documents such as a purchase order, a handwritten document, and a form with checkboxes and radio buttons.
    .
  • Compare the performance and pricing of each solution to help you determine the best fit for parsing diverse formats and data extraction needs.

Let’s dive in and explore how each tool performs, and why LLMWhisperer might just be the superior choice for your next project.

Here’s the GitHub repository where you will find all the codes written for this article.


Overview of IBM’s Docling

IBM’s Docling is a powerful tool designed to convert a wide range of documents into markdown while preserving their original layout and structure.

By maintaining visual fidelity, Docling ensures that the converted markdown accurately reflects the formatting nuances—such as headings, bullet points, tables, and columns—essential for documents like purchase orders, reports, or forms.

Originally conceived as a response to the growing need for robust document processing, Docling emerged from early open-source initiatives aimed at bridging traditional OCR techniques with modern workflow requirements.

Its development focused on tackling one of OCR’s constant challenges: preserving layout integrity. Over time, Docling has evolved through community feedback and iterative improvements, leading to a tool that not only converts documents but also retains the original visual context.

Early versions were primarily optimized for digital text, but as users began to work with more diverse input types, enhancements were made—including the exploration of alternative OCR engines—to improve its handling of scanned documents, handwritten notes, and photographed images.

Today, Docling stands out for its ease of converting complex layouts into structured markdown files, making it an indispensable asset for teams looking to integrate document data seamlessly into modern content management and automation systems.

Key Features

  • Markdown Conversion and Layout Preservation: Docling excels at translating complex document layouts into clean, structured markdown files. This capability allows users to maintain headings, lists, tables, and other formatting elements, ensuring that the essential structure of the document remains intact.
  • Ease of Converting Complex Layouts: Whether dealing with multi-column layouts or documents with various formatting nuances, Docling simplifies the conversion process. The resulting markdown is easy to review and further process, making it a practical solution for teams looking to integrate document data into modern workflows.

Limitations

  • Challenges with Non-Digital Inputs: Although Docling performs well with digital text, it faces performance issues when processing scanned documents, handwritten notes, or photographed images. These input types often lead to errors in text extraction or layout misinterpretations.

Default OCR Engine Constraints: By default, Docling utilizes EasyOCR for optical character recognition. While effective for many printed text scenarios, EasyOCR may not deliver the desired accuracy for more challenging inputs like handwriting or low-quality scans. Users have the option to experiment with alternative engines, such as Tesseract, which might offer improvements in those areas.

Overview of LLMWhisperer

LLMWhisperer is an advanced OCR solution that leverages deep learning to deliver highly accurate document conversions across a wide range of document types.

Initially conceived as a response to the limitations of traditional OCR systems, it emerged from research into neural networks and natural language processing techniques that could better handle the complex visual and linguistic patterns found in modern documents.

Over time, LLMWhisperer has evolved through iterative improvements and extensive training on diverse datasets. This evolution has equipped it to manage everything from neatly printed pages to challenging handwritten notes and multilingual content. This means that even intricate layouts—such as tables, forms, checkboxes, and radio buttons—are interpreted with high fidelity, preserving the original structure of the document.

The principles behind LLMWhisperer are rooted in context-aware extraction. Unlike traditional OCR methods that often rely on rigid pattern matching, LLMWhisperer uses advanced models to understand the relationship between different parts of a document. This approach not only boosts recognition accuracy but also minimizes the need for extensive pre- or post-processing. As a result, LLMWhisperer can effectively translate complex documents into clean, structured text that integrates seamlessly into modern data workflows.

Today, LLMWhisperer represents a significant leap forward in OCR technology. It stands at the forefront of a new generation of document processing tools that combine the robustness of deep learning with the nuanced understanding required for diverse, real-world applications.

Core Capabilities

  • Advanced OCR for Varied Document Types: LLMWhisperer is tailored to process diverse inputs including scanned documents, handwritten notes, and multilingual texts. Its adaptive approach allows it to manage different fonts, styles, and languages, making it robust in scenarios where traditional OCR methods might falter.
  • Superior Extraction of Structured Data: Beyond simple text recognition, LLMWhisperer excels at extracting structured elements such as tables, forms, checkboxes, and radio buttons. This capability facilitates the conversion of complex documents into data formats that can be directly integrated into modern workflows, reducing the need for manual data reformatting.

Technical Advantages

  • Deep Learning for Context-Aware Extraction: LLMWhisperer harnesses the power of deep learning to not only recognize text but also to understand the context and layout of documents. This means that even in cases of overlapping text or noisy backgrounds, the tool can accurately extract and structure the content with minimal errors.
  • Minimal Need for Extensive Pre- or Post-Processing: Traditional OCR tools often require significant image pre-processing (like deskewing or noise reduction) and post-processing to reconstruct document layouts. In contrast, LLMWhisperer’s advanced models inherently grasp the document structure, thereby streamlining the extraction process and reducing the need for additional corrective steps.

Test Methodology

To objectively compare IBM’s Docling and LLMWhisperer, we designed a series of tests using three distinct test documents.

Each document was selected to stress different aspects of the tools—ranging from layout preservation and markdown conversion to advanced OCR capabilities for handwriting and form extraction.

Test Documents Overview

Test Document 1: Simple Purchase Order – This document represents a standard, well-formatted purchase order. The focus here is on layout fidelity and maintaining the document’s inherent structure. 

Test Document 2: Handwritten Document – A scanned handwritten document is employed to evaluate OCR performance. This allows us to assess accuracy, clarity, and consistency in extracting challenging handwritten content. 

Test Document 3: Form with Checkboxes and Radio buttons – This test document includes various form elements such as checkboxes and radio buttons. The goal is to evaluate how well each tool can extract and preserve the structured data inherent in forms, as well as retain the overall layout integrity. 

Evaluation Criteria

Tools and Configuration Details:

  • For Docling, we ran tests using both the default OCR engine (EasyOCR) and a modified setup using Tesseract to determine if switching engines improve performance on non-standard documents.
  • LLMWhisperer was configured with its default settings optimized for handling complex document structures and multilingual content.

Evaluation Metrics:

  • OCR Accuracy: Assessing how precisely each tool recognizes text, including challenging cases like cursive handwriting and low-quality scans.
  • Layout Fidelity: Measuring the extent to which the original document’s structure—such as headings, columns, and tables—is preserved in the output markdown or text.

Comparative Analysis

Let’s first take a look on the Python code that we will use to test the documents.

IBM Docling installation and code setup

Starting with Docling, first, you will need to install the correspondent library:

pip install docling

And the code using the defaut EasyOCR:

from docling.document_converter import DocumentConverter  
import sys  
 

# Function to process a document  
def process_document(file_path):  
    converter = DocumentConverter()  
    result = converter.convert(file_path)  
    print(result.document.export_to_markdown())  
  

 
# Main  function  
if __name__ == "__main__":  
    # Retrieve document name from command line arguments  
    if len(sys.argv) != 2:  
        print("Usage: python docling_simple.py <document>")  
        sys.exit(1)  
    # Call the function to process the document  
    process_document(sys.argv[1])

Here’s a breakdown of what each part of the code does:

  • Function process_document(file_path):
    • This function takes a single argument, file_path, which is the path to the document that needs to be converted.
    • It creates an instance of DocumentConverter.
    • It calls the convert method on the converter object, passing the file_path as an argument. This method presumably reads the document and converts it into an internal format.
    • It prints the converted document in Markdown format using the export_to_markdown method.

Here is the code for using Tesseract OCR:

from docling.datamodel.base_models import InputFormat  
from docling.datamodel.pipeline_options import (  
    PdfPipelineOptions,  
    TesseractCliOcrOptions,  
    TesseractOcrOptions,  
)  
from docling.document_converter import DocumentConverter, PdfFormatOption  
import sys  

# Function to process a document  
def process_document(file_path):  
    ocr_options = TesseractCliOcrOptions(lang=["auto"])  
    pipeline_options = PdfPipelineOptions(  
        do_ocr=True, ocr_options=ocr_options  
    )  
  

    converter = DocumentConverter(  
        format_options={  
            InputFormat.PDF: PdfFormatOption(  
                pipeline_options=pipeline_options,  
            )  
        }  
    )  
    doc = converter.convert(file_path).document  
    md = doc.export_to_markdown()  
    print(md)  

  
# Main  function  
if __name__ == "__main__":  
    # Retrieve document name from command line arguments  
    if len(sys.argv) != 2:  
        print("Usage: python docling_simple.py <document>")  
        sys.exit(1)  
    # Call the function to process the document  
    process_document(sys.argv[1])

Here’s a detailed explanation of the code:

  • Function process_document(file_path):
  • This function processes a PDF document located at file_path.
  • OCR Options: It sets up OCR options using TesseractCliOcrOptions with the language set to “auto“, which means the OCR engine will automatically detect the language in the document.
  • Pipeline Options: It creates PdfPipelineOptions with OCR enabled, using the previously defined OCR options.
  • Document Conversion: It initializes a DocumentConverter with the specified format options for PDFs.
  • It converts the document and exports the result to Markdown format, then prints the Markdown content.

Before using Docling with Tesseract OCR, you need to make sure you have it installed in your system.

Below are instructions for installing it on Windows, macOS, and Linux.

Windows Installation:

  • Download the Tesseract installer for Windows from GitHub or a precompiled binary.
  • Run the installer and follow the on-screen instructions.
  • Add Tesseract to the system path:
    • Open “System Properties” → “Environment Variables” → “Path.”
    • Add the directory where Tesseract is installed (usually C:\Program Files\Tesseract-OCR).
  • Verify the installation by opening the command prompt and running: tesseract –version

MacOS Installation:

  • Open the terminal.
  • Install Tesseract using Homebrew: brew install tesseract
    • If you don’t have Homebrew installed, you can run the following command to install it:

/bin/bash -c “$(curl -fsSL <https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh>)”

  • Verify the installation by running tesseract –version

Linux Installation:

  • Open the terminal.
  • Install Tesseract with the following command: sudo apt install tesseract-ocr
  • Verify the installation by running: tesseract –version


LLMWhisperer installation and code setup

For LLMWhisperer, also make sure to install the necessary package:

pip install llmwhisperer-client

And this is the code for processing with LLMWhisperer, using the V2 of the API:

from unstract.llmwhisperer import LLMWhispererClientV2  
from unstract.llmwhisperer.client_v2 import LLMWhispererClientException  
import sys  
  

  



# Function to process a document  
def process_document(file_path):  
    # Initialize the client with your API key  
    client = LLMWhispererClientV2(base_url="https://llmwhisperer-api.us-central.unstract.com/api/v2",  
                                  api_key=<your_api_key>)  
    # Call the sync method with the file path  
    try:  
        result = client.whisper(  
            file_path=file_path,  
            wait_for_completion=True,  
            wait_timeout=200,  
        )  
        print(result['extraction']['result_text'])  
    except LLMWhispererClientException as e:  
        print(e)  

# Main  function  
if __name__ == "__main__":  
    # Retrieve document name from command line arguments  
    if len(sys.argv) != 2:  
        print("Usage: python docling_simple.py <document>")  
        sys.exit(1)  
    # Call the function to process the document  
    process_document(sys.argv[1])

Here’s a breakdown of the code:

  • Function process_document(file_path):
    • This function processes a document located at file_path.
    • Client Initialization: It initializes the LLMWhispererClientV2 with a specified base URL and API key.
    • API Call: It calls the whisper method on the client, passing the file_path and other parameters to wait for the completion of the processing.
    • Result Handling: It prints the extracted text from the result dictionary.
    • Exception Handling: If an exception occurs during the API call, it catches the LLMWhispererClientException and prints the error message.

Don’t forget to replace <your_api_key> with your API Key.

Comparative Findings

Document 1: Purchase Order

Processing the first document with Docling:

python docling_simple.py Purhcase-order.pdf

Returns the following output:

## River Park Inc 222, River view st.

## PURCHASE ORDER

Sacremento, CA 90203 (903) 903-8895 info@reverparinc.com www.river-park-ca.com

Date:

09/04/24

P.O. NUMBER:

784993

<!-- image -->

## CUSTOMER

BILL TO

## DELIVER TO

John Armstrong Hive view Inc 9090, West river avenue, Los Angeles, CA  92802

John Armstrong Hive view Inc 9090, West river avenue, Los Angeles, CA  92802

Simon Jones Hive view Inc 9090, West river avenue, Los Angeles, CA  92802

| START DATE   | CANCEL DATE                                | ORDERED BY                                 | SHIPPED VIA                                | FOB                     | TERMS                   |

|--------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|-------------------------|-------------------------|

| 03/23/2024   | 03/09/24                                   | John Armstrong                             | UPS Express                                | View Park Stores Net 18 | View Park Stores Net 18 |

| Unit         | Description                                | Description                                | Description                                | Unit Price ($)          | Amount ($)              |

| 150          | Stainless Steel 304 Hex Head Screw (M8X35) | Stainless Steel 304 Hex Head Screw (M8X35) | Stainless Steel 304 Hex Head Screw (M8X35) | 3.50                    |                         |

|              |                                            |                                            |                                            | 3.75                    | 525.00 750.00           |

| 200          | Stainless Steel 304 Hex Head Screw (M5X30) | Stainless Steel 304 Hex Head Screw (M5X30) | Stainless Steel 304 Hex Head Screw (M5X30) |          

               |                         |

| 100          | Mccoy 50 x 8 wooden screw - Black Finish   | Mccoy 50 x 8 wooden screw - Black Finish   | Mccoy 50 x 8 wooden screw - Black Finish   | 4.00                    | 400.00                  |

| 150          | MS Steel 3/4 Inch Screw                    | MS Steel 3/4 Inch Screw                    | MS Steel 3/4 Inch Screw                    | 3.50                    | 525.00                  |

| 200          | Pan Slotted Self Tapping Screw             | Pan Slotted Self Tapping Screw             | Pan Slotted Self Tapping Screw             | 3.75                    | 750.00                  |

| 100          | SS Round Head Nails                        | SS Round Head Nails                        | SS Round Head Nails                        | 4.00                    | 400.00                  |

Subtotal ($)

Sales Tax (%)

12

Total Amount ($)

APPROVED BY

DATE

AUTHORIZED SIGNATORY

3,350.00

402.00

3,752.00

<!-- image -->


Now let’s process the first document with LLMWhisperer:

python llmwhisperer.py Purhcase-order.pdf 


Returns the following output:

River Park Inc

                                                          PURCHASE ORDER

 222, River view st.

Sacremento, CA 90203                                                  P.O. NUMBER:    784993

 (903) 903-8895

info@reverparinc.com                                                          Date:    09/04/24

 www.river-park-ca.com

 CUSTOMER                           BILL TO                         DELIVER TO

John Armstrong                      John Armstrong                  Simon Jones

Hive view Inc                       Hive view Inc                   Hive view Inc

9090, West river avenue,            9090, West river avenue,        9090, West river avenue,

Los Angeles, CA 92802               Los Angeles, CA 92802           Los Angeles, CA 92802

 START DATE   CANCEL DATE   ORDERED BY       SHIPPED VIA            FOB             TERMS

 03/23/2024   03/09/24      John Armstrong   UPS Express            View Park Stores Net 18

     Unit                           Description                       Unit Price ($) Amount ($)

      150     Stainless Steel 304 Hex Head Screw (M8X35)                       3.50         525.00

      200     Stainless Steel 304 Hex Head Screw (M5X30)                       3.75         750.00

      100     Mccoy 50 x 8 wooden screw - Black Finish                         4.00         400.00

      150     MS Steel 3/4 Inch Screw                                          3.50         525.00

      200     Pan Slotted Self Tapping Screw                                   3.75         750.00

      100     SS Round Head Nails                                              4.00         400.00

 APPROVED BY

                                                                 Subtotal ($)              3,350.00

                                                                Sales Tax (%) 12            402.00

 AUTHORIZED SIGNATORY               DATE

                                                             Total Amount ($)              3,752.00

<<<


CREATED BY

TemplateLAB

@ TemplateLab.com

<<<

Analysis:

Docling:

  • Markdown Conversion: Produces clean markdown output suitable for basic integration into markdown-centric workflows.
  • Limitation: Relies on markdown syntax, which simplifies formatting but sacrifices nuanced layout details (e.g., alignment, spacing).

LLMWhisperer:

  • Layout Preservation: Retains original document structure using ASCII lines and whitespace, ensuring tables, headers, and spacing mirror the source.
  • Contextual Accuracy: Maintains positional relationships (e.g., address blocks) critical for automated processing.

Live coding session on data extraction from a scanned PDF form with LLMWhisperer

You can also watch this live coding webinar where we explore all the challenges involved in scanned PDF parsing. We’ll also compare the capabilities of different PDF parsing tools to help you understand their strengths and limitations.

Document 2: Handwritten Notes

Processing the second document with Docling (with EasyOCR):

python docling_simple.py notes.pdf 

Returns the following output:

## Only a mother of style 7

For eclucational purposes we analuse the opening pages of an Il-page arkicle that fon peared in The American Mathematical Monthl 5 Volume 102 .Number 2 / February 1995 -We have added line numbers in the right margin.

line 4: Since in this article, squares don't cet alternatin colours, it could be argued that the term "chessboard" is misplaced.

line 4. The introduction of the name "B' seems Unnecessary: it is used --in the combination "the board B"~ in The text fer "Figure 1 and in line 7; in both cases 

\ust "the board" would have done Gne. Th line 77 occurs the \as+ use of Bi, viz. in "X eB", which js dubious since B was a board and not a set; in line 77, L wou

ld have preferred "Given a set X of cells a line 7/8: The first Move , like any other, does not deserve a separate discription. The term "step" is redundant. bein OQ move line 8: Why not "Q move consists of" 2 line 40/11; At this slage the italics are przzling, Since GQ move 3s possi ble if,

/G49

{Sr some c,h , cell C64) contains a pebble and cells Cist, 7D and Cé,jt1) are empty . line 10. Vusice the term "positions" fe wheat everywhere else 35 called "ce

lls". Jine 12: Why no} "* After k moves the board has qo pebbles on it." 7 line IZ /\a : In the one sentence, k counts moves , in the other k counts ebles, Since

 the prose does not indicate the Scope of dummies, this double use of the same kis co litte bi unfargivable. line 14: "ancl we set TWR:= Uy ROR) "We remark of defining

- e the use the verb "to set" when Chhe set!) R can be considered cunfSrtuncake e since "Ris not used on the next two pages, the name seems to be introduced too earl

- e the introduction of the name (r) Seems unnecessary; in the rest of? the paper 1 saw it used once in "an CeR" 4% , where an reachable conficuration ~ woulda hov

e adume. CNote. Tn the context in question -p 116~ the reachable context can remain anonymous : the quoted occurrence C is the omly occurrence af the identi -fer Cin that cantexk. My Conclusisn is that the reachable conf@gurakien has been

/G50

Now let’s process the second document with LLMWhisperer:

python llmwhisperer.py notes.pdf 

Returns the following output:

EWD1200-0

 Only a matter of style?

   For educational purposes we analyse the

 opening pages of an 11-page article that

appeared in The American Mathematical

Monthly, Volume 102 Number 2 / February 1995.

 We have added line numbers in the right

 margin.

line 4 : Since in this article , squares don't get

 alternating colours , it could be argued that

 the term " chessboard " is misplaced .

line 4 : The introduction of the name " B "

 seems unnecessary : it is used   - in the

 combination " the board   B " - in the text

 for Figure   and in line 71 ; in both cases

just " the board " would have done fine .

 In line 77   occurs the last use of   B ,

 via . in " X CB " , which is dubious since

 B was a    board and not a set ; in line

77 . I would have preferred " Given a set X [X]

 of cells .

 line 7/8 : The first move , being a move

 like any other , does not deserve a separate

 discription . The term " step" is redundant .

 line 8: Why not "a move consists of"?

 line 10/11: At this stage the italics are

 puzzling , since a move is possible if ,

                                                  1

<<<


                                   EWD 1200-1

for some i, j, cell       contains a pebble

and cells ( 1+ 1 , j ) and ( i , j + 1 ) are empty .

line 10 : Twice the term " positions " for

what everywhere else is called " cells " .

line 12: Why not "After k moves the

board has kti    pebbles on it . " ?

line 12/14: In the one sentence, k counts

moves , in the other      k   counts pebbles .

Since the prose does not indicate

scope of dummies, this double use of

the same    k   is a little bit unforgivable .

line 14 : " and we set R :=      R(K) ". We

remark

. the use of the verb " to set " when defining

( the set ! ) R can be considered unfortunate

. since      is not used on the next two

pages , the name seems to be introduced

too early

. the introduction of the name       R seems

unnecessary ; in the rest of the paper I

saw it used once in " any CER " , where

" any reachable configuration " would have

done . ( Note . In the context in question

- p 116 - the reachable context can remain

anonymous : the quoted      occurrence of

 C is the only occurrence of the identi-

fier C in that context . My conclusion is

that the reachable configuration has been

                                                2

<<<

Analysis:

Docling:

  • OCR Limitations: Struggles with cursive handwriting and unstructured text even with Tesseract. Output requires manual cleanup.
  • Workflow Impact: Markdown’s rigid syntax complicates representing free-form handwritten annotations.

LLMWhisperer:

  • Handwriting Recognition: Accurately parses cursive and mixed handwriting styles using deep learning.
  • Adaptive Output: Preserves line breaks, annotations, and marginalia while interpreting context and delivering reliable text extraction.

Document 3: Form Elements

Processing the third document with Docling:

python docling_simple.py loan-application.pdf

Returns the following output:

Tobe completed by the Lender: Lender Loan No /Universal Loan Identifier

Agency Case No.

## Uniform Residential Loan Application

Verify and complete the information on this application. If you are applying for this loan with others; each additional Borrower must provide information as directed by your Lender .

Section 7:Borrower Information. This section asks about your personal information and your income from employment and other sources; such as retirement; that you want considered to qualify for this loan:

## 1a. Personal Information

Name (First; Middle; Last; Suffix)

IMA

(or Individual Taxpayer Identification Number)

Alternate Names List any names by which you are known or any names under which credit was previously received (First; Middle; Last; Suffix)

Date of Birth

Citizenship

(mmIddlyyyy)

@u.s. Citizen

08 31 1931

Permanent Resident Alien

Non-Permanent Resident Alien

## Type of Credit

List Name(s) of Other Borrower(s) Applying for this Loan (First, Middle, Last; Suffix) Use a separator between names

@Iam applying for individual credit.

Iam applying for joint credit. Total Number of Borrowers:

Each Borrower intends to apply for joint credit. Your initials:

## Marital Status

Dependents (not listed by another Borrower)

Contact Information

Married

Number

Home Phone

Separated

Ages

Unmarried

Cell Phone

(40812 4563

(Single; Divorced; Widowed; Civil Union, Domestic Partnership, Registered Reciprocal Beneficiary Relationship)

Work Phone

Ext.

Email

## Current Address

Street

024 An

Unit #

Los

State

CA

Country

How Long at Current Address? 3 Years

5

Months   Housing

No primary housing expense

Own

Imonth)

Ifat Current Address for LESS than 2 years, list Former Address

Does not apply

Street

Unit #

State

ZIP

Country

Long at Former Address? How

Years

Months

Housing

No primary housing expense

Own

Rent ($

Imonth)

Mailing Address\_

if different from Current Address

Does not apply

Street

Unit #

City

State

ZIP

Country

## 1b. Current EmploymentlSelf-Employment and Income

Does not apply

Employer or Business Name

CAFFIENATED

Phone

(408) 101

8365

## Gross Monthly Income

Street

Unit #

Base

$

Imonth

Les

State

ZIP

Country

Overtime

Imonth

Bonus

Imonth

Position or Title

CEO

Check if this statement applies:

Commission

5

Imonth

Start Date 02 / 04

(mmIddlyyyy)

Iam employed by a family member, property seller, real estate agent; or other party to the transaction.

Military

How in this line of work?  15 Years

5 Months

Entitlements $

Imonth

Check if you are the Business

have an ownership share of less than 25%. Monthly Income (or Loss)

Other

Imonth

Owner or Self-Employed

have an ownership share of 25% or more:

$ 802

Imonth

City

City

City

USA

Uniform Residential Loan Application Freddie Mac Form 65 Fannie Mae Form 1003 Effective 1/2021

<!-- image -->

Now let’s process the third document with LLMWhisperer:

python llmwhisperer.py loan-application.pdf  

Returns the following output:

To be completed by the Lender:

 Lender Loan No./Universal Loan Identifier                                                        Agency Case No.

Uniform       Residential Loan Application

Verify and complete the information on this application. If you are applying for this loan with others, each additional Borrower must provide

information as directed by your Lender.

Section 1: Borrower Information. This section asks about your personal information and your income from

employment and other sources, such as retirement, that you want considered to qualify for this loan.

 1a. Personal Information

Name (First, Middle, Last, Suffix)                                            Social Security Number 175-678-910

  IMA         CARDHOLDER                                                      (or Individual Taxpayer Identification Number)

Alternate Names - List any names by which you are known or any names          Date of Birth            Citizenship

under which credit was previously received (First, Middle, Last, Suffix)      (mm/dd/yyyy)             [X] U.S. Citizen

                                                                               08 /31 / 1977           [ ] Permanent Resident Alien

                                                                                                       [ ] Non-Permanent Resident Alien

Type of Credit                                                                List Name(s) of Other Borrower(s) Applying for this Loan

[X] I am applying for individual credit.                                      (First, Middle, Last, Suffix) - Use a separator between names

[ ] I am applying for joint credit. Total Number of Borrowers:

   Each Borrower intends to apply for joint credit. Your initials:

Marital Status             Dependents (not listed by another Borrower)        Contact Information

[X] Married                Number                                             Home Phone (         )

[ ] Separated               Ages                                              Cell Phone     (408) 123-4567

[ ] Unmarried                                                                 Work Phone    (      1                     Ext.

   (Single, Divorced, Widowed, Civil Union, Domestic Partnership, Registered

   Reciprocal Beneficiary Relationship)                                       Email ima1977@gmail.com

Current Address

Street 1024, SULLIVAN                  STREET                                                                        Unit #

City    LOS ANGELES                                                                State CA      ZIP 90210         Country   USA

How Long at Current Address? 3 Years 5 Months Housing [ ] No primary housing expense [ ] Own [X] Rent ($ 1,300                    /month)

If at Current Address for LESS than 2 years, list Former Address     [X] Does not apply

Street                                                                                                                Unit #

City                                                                               State         ZIP               Country

How Long at Former Address?       Years      Months Housing [ ] No primary housing expense [ ] Own [ ] Rent ($                    /month)

Mailing Address - if different from Current Address [X] Does not apply

Street                                                                                                                Unit #

City                                                                               State         ZIP               Country

 1b. Current Employment/Self-Employment and Income              [ ] Does not apply

                                                                                                           Gross Monthly Income

Employer or Business Name       CAFFIENATED                             Phone (408) 109-8765

                                                                                                           Base       $ 8000       /month

Street 2048, MAIN              STREET                                               Unit #

                                                                                                           Overtime   $            /month

City   LOS     ANGELES                           State CA      ZIP 90210          Country USA

                                                                                                           Bonus      $            /month

Position or Title CEO                                         Check if this statement applies:             Commission $        0.00 /month

Start Date                                                     [ ] I am employed by a family member,

            02/04/2009

                                                                 property seller, real estate agent, or other Military

How long in this line of work? 15 Years 5    Months              party to the transaction.                 Entitlements $          /month

                                                                                                           Other      $            /month

[X] Check if you are the Business [ ] I have an ownership share of less than 25%. Monthly Income (or Loss)

                                                                                                           TOTAL $ 8000            /month

   Owner or Self-Employed         [X] I have an ownership share of 25% or more. $ 8000

Uniform Residential Loan Application

Freddie Mac Form 65 · Fannie Mae Form 1003

Effective 1/2021

<<<


                 DRIVER LICENSE

California

                         CLASS C

            DL /1234568

            EXP 08/31/2014 END NONE

            LNCARDHOLDER

            FNIMA

            2570 24TH STREET

            ANYTOWN. CA 95818

              08/31/1977

            RSTR NONE         08311977

               VETERAN

                SEX F HAIR BRN EYES BRN

    Cardhoca    HGT 5'-05 WGT 125 1b

                               08/31/2009

<<<

Analysis:

Docling:

  • Structured Data Gaps: Fails to distinguish active/inactive checkboxes. Converts form elements to plain text labels, losing interactivity.
  • Markdown Constraints: Tables lack positional context (e.g., grouped fields), reducing downstream usability

LLMWhisperer:

  • Form Element Extraction: Detects active checkboxes/radio buttons and exports them as structured annotations.
  • Hierarchical Layout: Maintains visual grouping (e.g., address sections) critical for form processing.

Summary of Findings

ToolStrengthsLimitations
Docling– Simple markdown output for digital documents.– Markdown syntax limits complex layout preservation.
– Lightweight integration for basic workflows.– Struggles with non-digital inputs and structured data extraction.
LLMWhisperer– Context-aware layout preservation using whitespace/ASCII lines.– Not optimized for markdown-centric pipelines.
– Robust OCR for handwriting, forms, and multilingual content.

Key Features and Technical Comparison

Below is a comparative table summarizing the key features and technical differences between IBM’s Docling and LLMWhisperer.

This table highlights how each tool handles layout preservation, OCR capabilities, flexibility/customization, and integration/deployment.

FeatureDoclingLLMWhisperer
Layout PreservationConverts to markdown syntax (headings, lists). Loses alignment/spacing.Retains positional context via whitespace/ASCII lines. Mimics original structure.
Structured DataTreats checkboxes/tables as plain text.Detects interactive elements (checkboxes) and exports as structured annotations.
OCR FlexibilityLimited to basic text extraction; requires engine swaps for improvements.Deep learning adapts to handwriting, low-quality scans, and multilingual text.
Output CustomizationMarkdown-only.Supports ASCII/JSON/markdown-like formatting via API flags.

Use Cases and Recommendations

When to Use Docling:

  • Simple digital documents requiring markdown output (e.g., blogs, wikis).
  • Teams with markdown-dependent workflows and no need for layout fidelity.

When to Choose LLMWhisperer:

  • Complex documents requiring layout-aware processing (forms, invoices, legal docs).
  • Scenarios needing raw OCR accuracy and structural integrity (automation pipelines).
  • Projects involving handwriting, multilingual content, or checkbox/form extraction.

Cost and Deployment Considerations

  • Pricing Overview: Docling typically appeals to users looking for cost-effective solutions, especially when the primary need is straightforward markdown conversion and layout retention. In contrast, LLMWhisperer, with its advanced OCR capabilities, may come at a higher price point, reflecting its superior performance in complex scenarios.
  • Ease-of-Use and Integration: Both tools offer flexible deployment options, but their integration complexities differ. Docling is often simpler to set up for basic document conversions and can be easily integrated into existing workflows using its command-line interface. LLMWhisperer, while slightly more complex due to its deep learning foundation, provides robust APIs and is designed to seamlessly fit into advanced data processing pipelines.
  • Deployment Options: Consider whether a cloud-based solution or an on-premise deployment better meets your organizational needs. Docling’s lightweight approach is ideal for quick integrations, while LLMWhisperer’s scalable architecture is designed for enterprises needing high-volume, high-accuracy OCR processing.

Conclusion

Both IBM’s Docling and LLMWhisperer bring unique strengths to the table.

Docling excels in converting digital documents into structured markdown while faithfully preserving the original layout—making it ideal for straightforward documents like purchase orders and reports.

In contrast, LLMWhisperer stands out for its advanced OCR capabilities, effectively handling complex challenges such as handwriting recognition, multilingual content, and structured forms with minimal manual intervention.

Final Recommendations

Docling suits teams prioritizing markdown simplicity over layout fidelity. Its limitations in OCR and structured data extraction make it less viable for enterprise-grade document processing.

LLMWhisperer excels in preserving document context and extracting structured data—capabilities critical for automation, compliance, and integration with LLMs. Its adaptive OCR and whitespace-driven layout preservation address the limitations of markdown-centric tools.

Use LLMWhisperer for documents where layout meaning (not just formatting) matters. Reserve Docling for basic markdown conversions of born-digital text.

Future Directions

There is potential to enhance IBM’s Docling OCR capabilities by integrating more robust engines like Tesseract, which could improve performance on scanned documents, handwritten texts, and images.

The OCR landscape is rapidly evolving with advancements in deep learning and AI-driven context recognition. Future solutions are likely to offer even more accurate, real-time processing and seamless integration with broader automation pipelines.

As document processing continues to merge with AI and machine learning, both tools are expected to evolve—bringing enhanced accuracy, more intuitive layout preservation, and increased adaptability to varied document types.

If you want to quickly take LLMWhisperer for a test drive, you can check out our free playground.

Guide to LLMWhisperer Document OCR API

Document OCR API (document scanner API) is a tool that converts text from scanned documents, PDFs, or images into machine-readable formats like JSON or text. It bridges the gap between static documents and dynamic workflows by enabling seamless text extraction across diverse document types.

By offering OCR functionality as an API, organizations eliminate the need for standalone tools and automate text extraction, making document data instantly usable.

Learn more →