Docling vs. LLMWhisperer: The Best Docling Alternative
Table of Contents
Introduction
Optical Character Recognition (OCR) and document conversion technologies have come a long way since their inception.
Originally developed to transform printed text into machine-readable formats, OCR has evolved from simple pattern-matching techniques to sophisticated systems capable of handling diverse document types—from neatly printed pages to complex, handwritten notes.
One of the most crucial challenges in this evolution is the preservation of layout: retaining the original structure, formatting, and context of documents is key for many applications, whether it’s for archiving, automated data extraction, or enabling seamless integration with modern language models.
In this context, two notable tools exist: IBM’s Docling and LLMWhisperer.
Docling is designed to convert documents into markdown while preserving the layout. Its ability to maintain formatting makes it particularly appealing for projects where the visual structure of documents—such as purchase orders or reports—is important. However, Docling tends to struggle with tasks that go beyond digital text, such as parsing scanned documents, handwritten content, or images captured by a camera.
LLMWhisperer, on the other hand, leverages advanced OCR techniques enhanced by deep learning to excel at complex tasks. It not only handles traditional printed text but also demonstrates superior performance in recognizing handwriting and extracting structured data like tables, forms, checkboxes, and radio buttons. Its context-aware approach reduces the need for extensive pre- or post-processing, making it highly versatile across different document types.
The purpose of this article is to explore how these two tools perform across various document scenarios. Key questions include: How does each tool manage layout preservation and OCR accuracy?
In this article, we will:
Walk through the setup processes for both IBM’s Docling and LLMWhisperer.
Explore and evaluate key features using sample documents such as a purchase order, a handwritten document, and a form with checkboxes and radio buttons. .
Compare the performance and pricing of each solution to help you determine the best fit for parsing diverse formats and data extraction needs.
Let’s dive in and explore how each tool performs, and why LLMWhisperer might just be the superior choice for your next project.
Here’s the GitHub repository where you will find all the codes written for this article.
Overview of IBM’s Docling
IBM’s Docling is a powerful tool designed to convert a wide range of documents into markdown while preserving their original layout and structure.
By maintaining visual fidelity, Docling ensures that the converted markdown accurately reflects the formatting nuances—such as headings, bullet points, tables, and columns—essential for documents like purchase orders, reports, or forms.
Originally conceived as a response to the growing need for robust document processing, Docling emerged from early open-source initiatives aimed at bridging traditional OCR techniques with modern workflow requirements.
Its development focused on tackling one of OCR’s constant challenges: preserving layout integrity. Over time, Docling has evolved through community feedback and iterative improvements, leading to a tool that not only converts documents but also retains the original visual context.
Early versions were primarily optimized for digital text, but as users began to work with more diverse input types, enhancements were made—including the exploration of alternative OCR engines—to improve its handling of scanned documents, handwritten notes, and photographed images.
Today, Docling stands out for its ease of converting complex layouts into structured markdown files, making it an indispensable asset for teams looking to integrate document data seamlessly into modern content management and automation systems.
Key Features
Markdown Conversion and Layout Preservation: Docling excels at translating complex document layouts into clean, structured markdown files. This capability allows users to maintain headings, lists, tables, and other formatting elements, ensuring that the essential structure of the document remains intact.
Ease of Converting Complex Layouts: Whether dealing with multi-column layouts or documents with various formatting nuances, Docling simplifies the conversion process. The resulting markdown is easy to review and further process, making it a practical solution for teams looking to integrate document data into modern workflows.
Limitations
Challenges with Non-Digital Inputs: Although Docling performs well with digital text, it faces performance issues when processing scanned documents, handwritten notes, or photographed images. These input types often lead to errors in text extraction or layout misinterpretations.
Default OCR Engine Constraints: By default, Docling utilizes EasyOCR for optical character recognition. While effective for many printed text scenarios, EasyOCR may not deliver the desired accuracy for more challenging inputs like handwriting or low-quality scans. Users have the option to experiment with alternative engines, such as Tesseract, which might offer improvements in those areas.
Overview of LLMWhisperer
LLMWhisperer is an advanced OCR solution that leverages deep learning to deliver highly accurate document conversions across a wide range of document types.
Initially conceived as a response to the limitations of traditional OCR systems, it emerged from research into neural networks and natural language processing techniques that could better handle the complex visual and linguistic patterns found in modern documents.
Over time, LLMWhisperer has evolved through iterative improvements and extensive training on diverse datasets. This evolution has equipped it to manage everything from neatly printed pages to challenging handwritten notes and multilingual content. This means that even intricate layouts—such as tables, forms, checkboxes, and radio buttons—are interpreted with high fidelity, preserving the original structure of the document.
The principles behind LLMWhisperer are rooted in context-aware extraction. Unlike traditional OCR methods that often rely on rigid pattern matching, LLMWhisperer uses advanced models to understand the relationship between different parts of a document. This approach not only boosts recognition accuracy but also minimizes the need for extensive pre- or post-processing. As a result, LLMWhisperer can effectively translate complex documents into clean, structured text that integrates seamlessly into modern data workflows.
Today, LLMWhisperer represents a significant leap forward in OCR technology. It stands at the forefront of a new generation of document processing tools that combine the robustness of deep learning with the nuanced understanding required for diverse, real-world applications.
Core Capabilities
Advanced OCR for Varied Document Types: LLMWhisperer is tailored to process diverse inputs including scanned documents, handwritten notes, and multilingual texts. Its adaptive approach allows it to manage different fonts, styles, and languages, making it robust in scenarios where traditional OCR methods might falter.
Superior Extraction of Structured Data: Beyond simple text recognition, LLMWhisperer excels at extracting structured elements such as tables, forms, checkboxes, and radio buttons. This capability facilitates the conversion of complex documents into data formats that can be directly integrated into modern workflows, reducing the need for manual data reformatting.
Technical Advantages
Deep Learning for Context-Aware Extraction: LLMWhisperer harnesses the power of deep learning to not only recognize text but also to understand the context and layout of documents. This means that even in cases of overlapping text or noisy backgrounds, the tool can accurately extract and structure the content with minimal errors.
Minimal Need for Extensive Pre- or Post-Processing: Traditional OCR tools often require significant image pre-processing (like deskewing or noise reduction) and post-processing to reconstruct document layouts. In contrast, LLMWhisperer’s advanced models inherently grasp the document structure, thereby streamlining the extraction process and reducing the need for additional corrective steps.
Test Methodology
To objectively compare IBM’s Docling and LLMWhisperer, we designed a series of tests using three distinct test documents.
Each document was selected to stress different aspects of the tools—ranging from layout preservation and markdown conversion to advanced OCR capabilities for handwriting and form extraction.
Test Documents Overview
Test Document 1: Simple Purchase Order – This document represents a standard, well-formatted purchase order. The focus here is on layout fidelity and maintaining the document’s inherent structure.
Test Document 2: Handwritten Document – A scanned handwritten document is employed to evaluate OCR performance. This allows us to assess accuracy, clarity, and consistency in extracting challenging handwritten content.
Test Document 3: Form with Checkboxes and Radio buttons – This test document includes various form elements such as checkboxes and radio buttons. The goal is to evaluate how well each tool can extract and preserve the structured data inherent in forms, as well as retain the overall layout integrity.
For Docling, we ran tests using both the default OCR engine (EasyOCR) and a modified setup using Tesseract to determine if switching engines improve performance on non-standard documents.
LLMWhisperer was configured with its default settings optimized for handling complex document structures and multilingual content.
Evaluation Metrics:
OCR Accuracy: Assessing how precisely each tool recognizes text, including challenging cases like cursive handwriting and low-quality scans.
Layout Fidelity: Measuring the extent to which the original document’s structure—such as headings, columns, and tables—is preserved in the output markdown or text.
Comparative Analysis
Let’s first take a look on the Python code that we will use to test the documents.
IBM Docling installation and code setup
Starting with Docling, first, you will need to install the correspondent library:
pip install docling
And the code using the defaut EasyOCR:
from docling.document_converter import DocumentConverter
import sys
# Function to process a document
def process_document(file_path):
converter = DocumentConverter()
result = converter.convert(file_path)
print(result.document.export_to_markdown())
# Main function
if __name__ == "__main__":
# Retrieve document name from command line arguments
if len(sys.argv) != 2:
print("Usage: python docling_simple.py <document>")
sys.exit(1)
# Call the function to process the document
process_document(sys.argv[1])
Here’s a breakdown of what each part of the code does:
Function process_document(file_path):
This function takes a single argument, file_path, which is the path to the document that needs to be converted.
It creates an instance of DocumentConverter.
It calls the convert method on the converter object, passing the file_path as an argument. This method presumably reads the document and converts it into an internal format.
It prints the converted document in Markdown format using the export_to_markdown method.
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import (
PdfPipelineOptions,
TesseractCliOcrOptions,
TesseractOcrOptions,
)
from docling.document_converter import DocumentConverter, PdfFormatOption
import sys
# Function to process a document
def process_document(file_path):
ocr_options = TesseractCliOcrOptions(lang=["auto"])
pipeline_options = PdfPipelineOptions(
do_ocr=True, ocr_options=ocr_options
)
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_options=pipeline_options,
)
}
)
doc = converter.convert(file_path).document
md = doc.export_to_markdown()
print(md)
# Main function
if __name__ == "__main__":
# Retrieve document name from command line arguments
if len(sys.argv) != 2:
print("Usage: python docling_simple.py <document>")
sys.exit(1)
# Call the function to process the document
process_document(sys.argv[1])
Here’s a detailed explanation of the code:
Function process_document(file_path):
This function processes a PDF document located at file_path.
OCR Options: It sets up OCR options using TesseractCliOcrOptions with the language set to “auto“, which means the OCR engine will automatically detect the language in the document.
Pipeline Options: It creates PdfPipelineOptions with OCR enabled, using the previously defined OCR options.
Document Conversion: It initializes a DocumentConverter with the specified format options for PDFs.
It converts the document and exports the result to Markdown format, then prints the Markdown content.
Before using Docling with Tesseract OCR, you need to make sure you have it installed in your system.
Below are instructions for installing it on Windows, macOS, and Linux.
Windows Installation:
Download the Tesseract installer for Windows from GitHub or a precompiled binary.
Run the installer and follow the on-screen instructions.
Add Tesseract to the system path:
Open “System Properties” → “Environment Variables” → “Path.”
Add the directory where Tesseract is installed (usually C:\Program Files\Tesseract-OCR).
Verify the installation by opening the command prompt and running: tesseract –version
MacOS Installation:
Open the terminal.
Install Tesseract using Homebrew: brew install tesseract
If you don’t have Homebrew installed, you can run the following command to install it:
Verify the installation by running tesseract –version
Linux Installation:
Open the terminal.
Install Tesseract with the following command: sudo apt install tesseract-ocr
Verify the installation by running: tesseract –version
LLMWhisperer installation and code setup
For LLMWhisperer, also make sure to install the necessary package:
pip install llmwhisperer-client
And this is the code for processing with LLMWhisperer, using the V2 of the API:
from unstract.llmwhisperer import LLMWhispererClientV2
from unstract.llmwhisperer.client_v2 import LLMWhispererClientException
import sys
# Function to process a document
def process_document(file_path):
# Initialize the client with your API key
client = LLMWhispererClientV2(base_url="https://llmwhisperer-api.us-central.unstract.com/api/v2",
api_key=<your_api_key>)
# Call the sync method with the file path
try:
result = client.whisper(
file_path=file_path,
wait_for_completion=True,
wait_timeout=200,
)
print(result['extraction']['result_text'])
except LLMWhispererClientException as e:
print(e)
# Main function
if __name__ == "__main__":
# Retrieve document name from command line arguments
if len(sys.argv) != 2:
print("Usage: python docling_simple.py <document>")
sys.exit(1)
# Call the function to process the document
process_document(sys.argv[1])
Here’s a breakdown of the code:
Function process_document(file_path):
This function processes a document located at file_path.
Client Initialization: It initializes the LLMWhispererClientV2 with a specified base URL and API key.
API Call: It calls the whisper method on the client, passing the file_path and other parameters to wait for the completion of the processing.
Result Handling: It prints the extracted text from the result dictionary.
Exception Handling: If an exception occurs during the API call, it catches the LLMWhispererClientException and prints the error message.
Don’t forget to replace <your_api_key> with your API Key.
Comparative Findings
Document 1: Purchase Order
Processing the first document with Docling:
python docling_simple.py Purhcase-order.pdf
Returns the following output:
## River Park Inc 222, River view st.
## PURCHASE ORDER
Sacremento, CA 90203 (903) 903-8895 info@reverparinc.com www.river-park-ca.com
Date:
09/04/24
P.O. NUMBER:
784993
<!-- image -->
## CUSTOMER
BILL TO
## DELIVER TO
John Armstrong Hive view Inc 9090, West river avenue, Los Angeles, CA 92802
John Armstrong Hive view Inc 9090, West river avenue, Los Angeles, CA 92802
Simon Jones Hive view Inc 9090, West river avenue, Los Angeles, CA 92802
| START DATE | CANCEL DATE | ORDERED BY | SHIPPED VIA | FOB | TERMS |
|--------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|-------------------------|-------------------------|
| 03/23/2024 | 03/09/24 | John Armstrong | UPS Express | View Park Stores Net 18 | View Park Stores Net 18 |
| Unit | Description | Description | Description | Unit Price ($) | Amount ($) |
| 150 | Stainless Steel 304 Hex Head Screw (M8X35) | Stainless Steel 304 Hex Head Screw (M8X35) | Stainless Steel 304 Hex Head Screw (M8X35) | 3.50 | |
| | | | | 3.75 | 525.00 750.00 |
| 200 | Stainless Steel 304 Hex Head Screw (M5X30) | Stainless Steel 304 Hex Head Screw (M5X30) | Stainless Steel 304 Hex Head Screw (M5X30) |
| |
| 100 | Mccoy 50 x 8 wooden screw - Black Finish | Mccoy 50 x 8 wooden screw - Black Finish | Mccoy 50 x 8 wooden screw - Black Finish | 4.00 | 400.00 |
| 150 | MS Steel 3/4 Inch Screw | MS Steel 3/4 Inch Screw | MS Steel 3/4 Inch Screw | 3.50 | 525.00 |
| 200 | Pan Slotted Self Tapping Screw | Pan Slotted Self Tapping Screw | Pan Slotted Self Tapping Screw | 3.75 | 750.00 |
| 100 | SS Round Head Nails | SS Round Head Nails | SS Round Head Nails | 4.00 | 400.00 |
Subtotal ($)
Sales Tax (%)
12
Total Amount ($)
APPROVED BY
DATE
AUTHORIZED SIGNATORY
3,350.00
402.00
3,752.00
<!-- image -->
Now let’s process the first document with LLMWhisperer:
python llmwhisperer.py Purhcase-order.pdf
Returns the following output:
River Park Inc
PURCHASE ORDER
222, River view st.
Sacremento, CA 90203 P.O. NUMBER: 784993
(903) 903-8895
info@reverparinc.com Date: 09/04/24
www.river-park-ca.com
CUSTOMER BILL TO DELIVER TO
John Armstrong John Armstrong Simon Jones
Hive view Inc Hive view Inc Hive view Inc
9090, West river avenue, 9090, West river avenue, 9090, West river avenue,
Los Angeles, CA 92802 Los Angeles, CA 92802 Los Angeles, CA 92802
START DATE CANCEL DATE ORDERED BY SHIPPED VIA FOB TERMS
03/23/2024 03/09/24 John Armstrong UPS Express View Park Stores Net 18
Unit Description Unit Price ($) Amount ($)
150 Stainless Steel 304 Hex Head Screw (M8X35) 3.50 525.00
200 Stainless Steel 304 Hex Head Screw (M5X30) 3.75 750.00
100 Mccoy 50 x 8 wooden screw - Black Finish 4.00 400.00
150 MS Steel 3/4 Inch Screw 3.50 525.00
200 Pan Slotted Self Tapping Screw 3.75 750.00
100 SS Round Head Nails 4.00 400.00
APPROVED BY
Subtotal ($) 3,350.00
Sales Tax (%) 12 402.00
AUTHORIZED SIGNATORY DATE
Total Amount ($) 3,752.00
<<<
CREATED BY
TemplateLAB
@ TemplateLab.com
<<<
Analysis:
Docling:
Markdown Conversion: Produces clean markdown output suitable for basic integration into markdown-centric workflows.
Limitation: Relies on markdown syntax, which simplifies formatting but sacrifices nuanced layout details (e.g., alignment, spacing).
LLMWhisperer:
Layout Preservation: Retains original document structure using ASCII lines and whitespace, ensuring tables, headers, and spacing mirror the source.
Live coding session on data extraction from a scanned PDF form with LLMWhisperer
You can also watch this live coding webinar where we explore all the challenges involved in scanned PDF parsing. We’ll also compare the capabilities of different PDF parsing tools to help you understand their strengths and limitations.
Document 2: Handwritten Notes
Processing the second document with Docling (with EasyOCR):
python docling_simple.py notes.pdf
Returns the following output:
## Only a mother of style 7
For eclucational purposes we analuse the opening pages of an Il-page arkicle that fon peared in The American Mathematical Monthl 5 Volume 102 .Number 2 / February 1995 -We have added line numbers in the right margin.
line 4: Since in this article, squares don't cet alternatin colours, it could be argued that the term "chessboard" is misplaced.
line 4. The introduction of the name "B' seems Unnecessary: it is used --in the combination "the board B"~ in The text fer "Figure 1 and in line 7; in both cases
\ust "the board" would have done Gne. Th line 77 occurs the \as+ use of Bi, viz. in "X eB", which js dubious since B was a board and not a set; in line 77, L wou
ld have preferred "Given a set X of cells a line 7/8: The first Move , like any other, does not deserve a separate discription. The term "step" is redundant. bein OQ move line 8: Why not "Q move consists of" 2 line 40/11; At this slage the italics are przzling, Since GQ move 3s possi ble if,
/G49
{Sr some c,h , cell C64) contains a pebble and cells Cist, 7D and Cé,jt1) are empty . line 10. Vusice the term "positions" fe wheat everywhere else 35 called "ce
lls". Jine 12: Why no} "* After k moves the board has qo pebbles on it." 7 line IZ /\a : In the one sentence, k counts moves , in the other k counts ebles, Since
the prose does not indicate the Scope of dummies, this double use of the same kis co litte bi unfargivable. line 14: "ancl we set TWR:= Uy ROR) "We remark of defining
- e the use the verb "to set" when Chhe set!) R can be considered cunfSrtuncake e since "Ris not used on the next two pages, the name seems to be introduced too earl
- e the introduction of the name (r) Seems unnecessary; in the rest of? the paper 1 saw it used once in "an CeR" 4% , where an reachable conficuration ~ woulda hov
e adume. CNote. Tn the context in question -p 116~ the reachable context can remain anonymous : the quoted occurrence C is the omly occurrence af the identi -fer Cin that cantexk. My Conclusisn is that the reachable conf@gurakien has been
/G50
Now let’s process the second document with LLMWhisperer:
python llmwhisperer.py notes.pdf
Returns the following output:
EWD1200-0
Only a matter of style?
For educational purposes we analyse the
opening pages of an 11-page article that
appeared in The American Mathematical
Monthly, Volume 102 Number 2 / February 1995.
We have added line numbers in the right
margin.
line 4 : Since in this article , squares don't get
alternating colours , it could be argued that
the term " chessboard " is misplaced .
line 4 : The introduction of the name " B "
seems unnecessary : it is used - in the
combination " the board B " - in the text
for Figure and in line 71 ; in both cases
just " the board " would have done fine .
In line 77 occurs the last use of B ,
via . in " X CB " , which is dubious since
B was a board and not a set ; in line
77 . I would have preferred " Given a set X [X]
of cells .
line 7/8 : The first move , being a move
like any other , does not deserve a separate
discription . The term " step" is redundant .
line 8: Why not "a move consists of"?
line 10/11: At this stage the italics are
puzzling , since a move is possible if ,
1
<<<
EWD 1200-1
for some i, j, cell contains a pebble
and cells ( 1+ 1 , j ) and ( i , j + 1 ) are empty .
line 10 : Twice the term " positions " for
what everywhere else is called " cells " .
line 12: Why not "After k moves the
board has kti pebbles on it . " ?
line 12/14: In the one sentence, k counts
moves , in the other k counts pebbles .
Since the prose does not indicate
scope of dummies, this double use of
the same k is a little bit unforgivable .
line 14 : " and we set R := R(K) ". We
remark
. the use of the verb " to set " when defining
( the set ! ) R can be considered unfortunate
. since is not used on the next two
pages , the name seems to be introduced
too early
. the introduction of the name R seems
unnecessary ; in the rest of the paper I
saw it used once in " any CER " , where
" any reachable configuration " would have
done . ( Note . In the context in question
- p 116 - the reachable context can remain
anonymous : the quoted occurrence of
C is the only occurrence of the identi-
fier C in that context . My conclusion is
that the reachable configuration has been
2
<<<
Analysis:
Docling:
OCR Limitations: Struggles with cursive handwriting and unstructured text even with Tesseract. Output requires manual cleanup.
Handwriting Recognition: Accurately parses cursive and mixed handwriting styles using deep learning.
Adaptive Output: Preserves line breaks, annotations, and marginalia while interpreting context and delivering reliable text extraction.
Document 3: Form Elements
Processing the third document with Docling:
python docling_simple.py loan-application.pdf
Returns the following output:
Tobe completed by the Lender: Lender Loan No /Universal Loan Identifier
Agency Case No.
## Uniform Residential Loan Application
Verify and complete the information on this application. If you are applying for this loan with others; each additional Borrower must provide information as directed by your Lender .
Section 7:Borrower Information. This section asks about your personal information and your income from employment and other sources; such as retirement; that you want considered to qualify for this loan:
## 1a. Personal Information
Name (First; Middle; Last; Suffix)
IMA
(or Individual Taxpayer Identification Number)
Alternate Names List any names by which you are known or any names under which credit was previously received (First; Middle; Last; Suffix)
Date of Birth
Citizenship
(mmIddlyyyy)
@u.s. Citizen
08 31 1931
Permanent Resident Alien
Non-Permanent Resident Alien
## Type of Credit
List Name(s) of Other Borrower(s) Applying for this Loan (First, Middle, Last; Suffix) Use a separator between names
@Iam applying for individual credit.
Iam applying for joint credit. Total Number of Borrowers:
Each Borrower intends to apply for joint credit. Your initials:
## Marital Status
Dependents (not listed by another Borrower)
Contact Information
Married
Number
Home Phone
Separated
Ages
Unmarried
Cell Phone
(40812 4563
(Single; Divorced; Widowed; Civil Union, Domestic Partnership, Registered Reciprocal Beneficiary Relationship)
Work Phone
Ext.
Email
## Current Address
Street
024 An
Unit #
Los
State
CA
Country
How Long at Current Address? 3 Years
5
Months Housing
No primary housing expense
Own
Imonth)
Ifat Current Address for LESS than 2 years, list Former Address
Does not apply
Street
Unit #
State
ZIP
Country
Long at Former Address? How
Years
Months
Housing
No primary housing expense
Own
Rent ($
Imonth)
Mailing Address\_
if different from Current Address
Does not apply
Street
Unit #
City
State
ZIP
Country
## 1b. Current EmploymentlSelf-Employment and Income
Does not apply
Employer or Business Name
CAFFIENATED
Phone
(408) 101
8365
## Gross Monthly Income
Street
Unit #
Base
$
Imonth
Les
State
ZIP
Country
Overtime
Imonth
Bonus
Imonth
Position or Title
CEO
Check if this statement applies:
Commission
5
Imonth
Start Date 02 / 04
(mmIddlyyyy)
Iam employed by a family member, property seller, real estate agent; or other party to the transaction.
Military
How in this line of work? 15 Years
5 Months
Entitlements $
Imonth
Check if you are the Business
have an ownership share of less than 25%. Monthly Income (or Loss)
Other
Imonth
Owner or Self-Employed
have an ownership share of 25% or more:
$ 802
Imonth
City
City
City
USA
Uniform Residential Loan Application Freddie Mac Form 65 Fannie Mae Form 1003 Effective 1/2021
<!-- image -->
Now let’s process the third document with LLMWhisperer:
python llmwhisperer.py loan-application.pdf
Returns the following output:
To be completed by the Lender:
Lender Loan No./Universal Loan Identifier Agency Case No.
Uniform Residential Loan Application
Verify and complete the information on this application. If you are applying for this loan with others, each additional Borrower must provide
information as directed by your Lender.
Section 1: Borrower Information. This section asks about your personal information and your income from
employment and other sources, such as retirement, that you want considered to qualify for this loan.
1a. Personal Information
Name (First, Middle, Last, Suffix) Social Security Number 175-678-910
IMA CARDHOLDER (or Individual Taxpayer Identification Number)
Alternate Names - List any names by which you are known or any names Date of Birth Citizenship
under which credit was previously received (First, Middle, Last, Suffix) (mm/dd/yyyy) [X] U.S. Citizen
08 /31 / 1977 [ ] Permanent Resident Alien
[ ] Non-Permanent Resident Alien
Type of Credit List Name(s) of Other Borrower(s) Applying for this Loan
[X] I am applying for individual credit. (First, Middle, Last, Suffix) - Use a separator between names
[ ] I am applying for joint credit. Total Number of Borrowers:
Each Borrower intends to apply for joint credit. Your initials:
Marital Status Dependents (not listed by another Borrower) Contact Information
[X] Married Number Home Phone ( )
[ ] Separated Ages Cell Phone (408) 123-4567
[ ] Unmarried Work Phone ( 1 Ext.
(Single, Divorced, Widowed, Civil Union, Domestic Partnership, Registered
Reciprocal Beneficiary Relationship) Email ima1977@gmail.com
Current Address
Street 1024, SULLIVAN STREET Unit #
City LOS ANGELES State CA ZIP 90210 Country USA
How Long at Current Address? 3 Years 5 Months Housing [ ] No primary housing expense [ ] Own [X] Rent ($ 1,300 /month)
If at Current Address for LESS than 2 years, list Former Address [X] Does not apply
Street Unit #
City State ZIP Country
How Long at Former Address? Years Months Housing [ ] No primary housing expense [ ] Own [ ] Rent ($ /month)
Mailing Address - if different from Current Address [X] Does not apply
Street Unit #
City State ZIP Country
1b. Current Employment/Self-Employment and Income [ ] Does not apply
Gross Monthly Income
Employer or Business Name CAFFIENATED Phone (408) 109-8765
Base $ 8000 /month
Street 2048, MAIN STREET Unit #
Overtime $ /month
City LOS ANGELES State CA ZIP 90210 Country USA
Bonus $ /month
Position or Title CEO Check if this statement applies: Commission $ 0.00 /month
Start Date [ ] I am employed by a family member,
02/04/2009
property seller, real estate agent, or other Military
How long in this line of work? 15 Years 5 Months party to the transaction. Entitlements $ /month
Other $ /month
[X] Check if you are the Business [ ] I have an ownership share of less than 25%. Monthly Income (or Loss)
TOTAL $ 8000 /month
Owner or Self-Employed [X] I have an ownership share of 25% or more. $ 8000
Uniform Residential Loan Application
Freddie Mac Form 65 · Fannie Mae Form 1003
Effective 1/2021
<<<
DRIVER LICENSE
California
CLASS C
DL /1234568
EXP 08/31/2014 END NONE
LNCARDHOLDER
FNIMA
2570 24TH STREET
ANYTOWN. CA 95818
08/31/1977
RSTR NONE 08311977
VETERAN
SEX F HAIR BRN EYES BRN
Cardhoca HGT 5'-05 WGT 125 1b
08/31/2009
<<<
Analysis:
Docling:
Structured Data Gaps: Fails to distinguish active/inactive checkboxes. Converts form elements to plain text labels, losing interactivity.
Scenarios needing raw OCR accuracy and structural integrity (automation pipelines).
Projects involving handwriting, multilingual content, or checkbox/form extraction.
Cost and Deployment Considerations
Pricing Overview: Docling typically appeals to users looking for cost-effective solutions, especially when the primary need is straightforward markdown conversion and layout retention. In contrast, LLMWhisperer, with its advanced OCR capabilities, may come at a higher price point, reflecting its superior performance in complex scenarios.
Ease-of-Use and Integration: Both tools offer flexible deployment options, but their integration complexities differ. Docling is often simpler to set up for basic document conversions and can be easily integrated into existing workflows using its command-line interface. LLMWhisperer, while slightly more complex due to its deep learning foundation, provides robust APIs and is designed to seamlessly fit into advanced data processing pipelines.
Deployment Options: Consider whether a cloud-based solution or an on-premise deployment better meets your organizational needs. Docling’s lightweight approach is ideal for quick integrations, while LLMWhisperer’s scalable architecture is designed for enterprises needing high-volume, high-accuracy OCR processing.
Conclusion
Both IBM’s Docling and LLMWhisperer bring unique strengths to the table.
Docling excels in converting digital documents into structured markdown while faithfully preserving the original layout—making it ideal for straightforward documents like purchase orders and reports.
In contrast, LLMWhisperer stands out for its advanced OCR capabilities, effectively handling complex challenges such as handwriting recognition, multilingual content, and structured forms with minimal manual intervention.
Final Recommendations
Docling suits teams prioritizing markdown simplicity over layout fidelity. Its limitations in OCR and structured data extraction make it less viable for enterprise-grade document processing.
LLMWhisperer excels in preserving document context and extracting structured data—capabilities critical for automation, compliance, and integration with LLMs. Its adaptive OCR and whitespace-driven layout preservation address the limitations of markdown-centric tools.
Use LLMWhisperer for documents where layout meaning (not just formatting) matters. Reserve Docling for basic markdown conversions of born-digital text.
Future Directions
There is potential to enhance IBM’s Docling OCR capabilities by integrating more robust engines like Tesseract, which could improve performance on scanned documents, handwritten texts, and images.
The OCR landscape is rapidly evolving with advancements in deep learning and AI-driven context recognition. Future solutions are likely to offer even more accurate, real-time processing and seamless integration with broader automation pipelines.
As document processing continues to merge with AI and machine learning, both tools are expected to evolve—bringing enhanced accuracy, more intuitive layout preservation, and increased adaptability to varied document types.
If you want to quickly take LLMWhisperer for a test drive, you can check out our free playground.
A Document OCR API (document scanner API) is a tool that converts text from scanned documents, PDFs, or images into machine-readable formats like JSON or text. It bridges the gap between static documents and dynamic workflows by enabling seamless text extraction across diverse document types.
By offering OCR functionality as an API, organizations eliminate the need for standalone tools and automate text extraction, making document data instantly usable.
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
We do not use cookies of this type.
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.
We do not use cookies of this type.
Analytics cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
We do not use cookies of this type.
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
We do not use cookies of this type.
Unclassified cookies are cookies that we are in the process of classifying, together with the providers of individual cookies.