Document extraction at the cutting edge with LLMs vs. LLMWhisperer

Conducted on 7th May 2025

00:10:40.000 –> 00:10:46.000
Alright, so hello there. Thank you so much folks for being able to join this session today.

00:10:46.000 –> 00:10:52.000
I’m Mahashree, product marketing specialist at Unstract, and I’ll be your host for this session.

00:10:52.000 –> 00:10:58.000
Now, before we get started, here are a few housekeeping points I’d like to quickly run over.

00:10:58.000 –> 00:11:03.000
So firstly, all attendees in this webinar will automatically be on mute throughout the course of this session.

00:11:03.000 –> 00:11:14.000
In case you have any questions. Please do drop them in the Q&A tab that you’ll find in the bottom panel of your screen. And one of us from the team will be able to get back to you with the answers via text.

00:11:14.000 –> 00:11:23.000
You can also interact with fellow attendees using the chat tab. This is also where you’ll let us know in case you run into any technical difficulties during the session.

00:11:23.000 –> 00:11:30.000
Now, as a final point, I request you to leave your feedback on your way out of this webinar when you’ll be redirected to a feedback form.

00:11:30.000 –> 00:11:35.000
This will really help us improve our webinar experience going forward.

00:11:35.000 –> 00:11:45.000
So with that said, let’s get started with the session. Now in this webinar, we’re looking to talk everything about document extraction at the cutting edge.

00:11:45.000 –> 00:11:49.000
Now, for those of you that have been observing the space for a while.

00:11:49.000 –> 00:11:58.000
You may know that document extraction has come a long way from traditional OCR to the advanced large language models that are currently leading the front.

00:11:58.000 –> 00:12:08.000
Now, large language models do bring a transformative impact to document extraction with their solutions for various age-old extraction challenges that we’ve been dealing with.

00:12:08.000 –> 00:12:17.000
Now, this is specifically what we’re interested in in this webinar. We’ll take a look at how LLNs perform when it comes to document extraction challenges.

00:12:17.000 –> 00:12:21.000
And we’ll also take a look at the limitations that they are currently bringing with them.

00:12:21.000 –> 00:12:31.000
And to address these limitations. We’ll be introducing LLMWhisperer, which is unstruct’s text extraction service and preprocessor tool.

00:12:31.000 –> 00:12:34.000
So for those of you that are new to LLMWhisperer.

00:12:34.000 –> 00:12:44.000
As I mentioned, it is unstruct’s text extraction service and preprocessor, which is known to prepare documents for peak LLM performance and also improve the LLM outputs that you get.

00:12:44.000 –> 00:13:00.000
So we’ll get into the demo segment of this webinar where you’ll be able to see how LLM Whisperer works practically, and you’ll also get a hands-on view of how you get extracted outcomes and how it can be improved by this platform.

00:13:00.000 –> 00:13:08.000
So we mentioned a couple of things in the introduction and just to give it some form as to what we’ll be covering in this webinar, we have our agenda over here.

00:13:08.000 –> 00:13:14.000
Now, we’ve carefully designed this agenda, keeping in mind the mixed audience that we have in this webinar.

00:13:14.000 –> 00:13:26.000
We’ll start off by addressing the commonly faced document extraction challenges today. So this will set context as to what is the fundamental problem that we’re looking to handle.

00:13:26.000 –> 00:13:35.000
Moving on from there, we’ll take a look at how LLMs pose themselves as a promising solution As well as what are the limitations that they bring with them.

00:13:35.000 –> 00:13:43.000
From there, we’ll take a look at LLMWhisperer as an enabler to overcome these limitations and everything else that the platform brings to the table.

00:13:43.000 –> 00:14:01.000
And as a fourth segment, which is probably one of the most awaited segments of this webinar is We’ll have a demo where we’ll compare the outcome that you get when you pass documents directly to the top LLMs in the market today versus what happens when you pre-process it using LLM Whisperer.

00:14:01.000 –> 00:14:09.000
So for your understanding, this is a diagram where we’ve shown you what you can expect in the demo segment.

00:14:09.000 –> 00:14:31.000
You either directly pass your document through one of the leading elements and we’ll take a look at the data extracted. And the second approach we’ll be taking is you take the same document and you pre-process it first with LLM Whisperer and then through an LLM and we’ll be able to compare the outcome out of these two extractions and see what is the impact that LLM Whisperer is actually making.

00:14:31.000 –> 00:14:38.000
So that said, we’ll move on to the opening discussion of this webinar that is document extraction challenges.

00:14:38.000 –> 00:14:45.000
You can see that we’ve outlined five commonly faced challenges over here. The first one being content complexity.

00:14:45.000 –> 00:14:52.000
Now documents come in various content formats and layouts depending on the information they represent.

00:14:52.000 –> 00:14:59.000
So this layout is usually designed keeping in mind the best possible way for it to be consumed by a human.

00:14:59.000 –> 00:15:15.000
But this is not necessarily always the best way for us to feed this data or information through a system or a machine. So usually we deal with nested tables or rotated text or even forms with various radio buttons and check boxes

00:15:15.000 –> 00:15:23.000
So these are all easy for human consumption, but when it comes to systems again, it could be quite difficult to extract data from these forms.

00:15:23.000 –> 00:15:43.000
What we’re looking at with LLMs is one powerful capability they bring to the table is that they are able to understand documents very much like a human would. So while previous gen tools had to be deployed specifically for each of these different formats, you would have a specific tool dealing with nested tables or with rotated text.

00:15:43.000 –> 00:15:59.000
And you end up using a combination of these tools to extract your data overall. But that is a challenge we are overcoming with LLMs as they understand the context of your documents Just like a human would and they are able to extract data pretty accurately from all these different

00:15:59.000 –> 00:16:09.000
Content complexities without any other external tool or help. So we’ll take a look at how this functions in our demo section.

00:16:09.000 –> 00:16:13.000
Moving on to the next extraction challenge, we have scanned documents.

00:16:13.000 –> 00:16:18.000
Now, certainly there are efforts being taken to digitize documents in business operations.

00:16:18.000 –> 00:16:26.000
But the ground reality remains that physical documents or hard copies still remain an integral part of various business functions.

00:16:26.000 –> 00:16:37.000
And what happens with these documents at a later stage is that they’re usually scanned and uploaded into systems For easy maintenance or even for data analysis.

00:16:37.000 –> 00:16:47.000
Now, when these scans are done, we cannot always be guaranteed a good scan. You might have skewed scans or scans with bad lighting or bad resolution, especially when they are taken using a mobile phone.

00:16:47.000 –> 00:17:02.000
So your solution at the cutting edge should really be looking at dealing with these different types of scans and still being able to extract data because ultimately the data these scanned physical documents have is invaluable for the business.

00:17:02.000 –> 00:17:09.000
Moving on to the next challenge, we have handwritten text, which can be considered as a byproduct of the previous point that I spoke about.

00:17:09.000 –> 00:17:15.000
So we spoke about physical documents playing an integral role in business operations.

00:17:15.000 –> 00:17:20.000
And a natural consequence of using physical documents is that we’ll have to deal with handwritten text.

00:17:20.000 –> 00:17:30.000
So this could be forms that are filled in with details by hand, or you could have a physical document with additional notes added by hand. But whatever it is.

00:17:30.000 –> 00:17:42.000
Only if we are able to retrieve the information that these handwritten notes represent, will we actually end up having a 360 degree view into what the document completely has to offer.

00:17:42.000 –> 00:17:48.000
So we’ll again see how LLMs are able to extract data from handwritten notes.

00:17:48.000 –> 00:17:50.000
So moving on to the fourth point, we have data privacy.

00:17:50.000 –> 00:17:57.000
Now, this is again of paramount importance, especially when we are dealing with sensitive data.

00:17:57.000 –> 00:18:01.000
There’s a lot of skepticism as to how well LLMs are able to handle your data.

00:18:01.000 –> 00:18:14.000
Now, while I cannot speak on the behalf of the models, what I can say is that a platform like LLMWhisperer is carefully designed, keeping in mind the protection of data and the confidentiality that we’ll have to maintain.

00:18:14.000 –> 00:18:24.000
So we’ll again take a look at the various compliances that the platform adheres to and how it is able to ensure data privacy.

00:18:24.000 –> 00:18:36.000
The final extraction challenge, we have result validation. So all right, I have my document extraction. I’m able to extract the data from my documents. But how do I validate whether the extracted data is in fact accurate and in fact correct?

00:18:36.000 –> 00:18:43.000
So especially with LLMs, they can hallucinate or they might also end up bringing incorrect answers.

00:18:43.000 –> 00:18:57.000
So currently, the LLM models do not offer a way of validating your output. So we’ll again see how LLM Whisperer offers a workaround and you can actually arrive at a validation using this platform.

00:18:57.000 –> 00:19:02.000
So with that, folks, we’ve quickly covered the document extraction challenges that we have over here.

00:19:02.000 –> 00:19:14.000
And these are quite age-old challenges. They are not new. But what we are rather interested in in this session is the solution that we have currently at the cutting edge for these challenges.

00:19:14.000 –> 00:19:23.000
So this solution that we have is really making a mark and is being a differentiator from all the other previous tools that we had.

00:19:23.000 –> 00:19:31.000
Now let’s take a look at what makes LLMs a differentiator. So the foremost reason I would say is contextual understanding.

00:19:31.000 –> 00:19:36.000
Now, unlike previous gen models or tools to extract data from documents.

00:19:36.000 –> 00:19:49.000
LLMs do not just extract your text and consider them as characters. They actually, when they extract text from documents, they are able to understand the context of the document and what the document is actually trying to convey.

00:19:49.000 –> 00:19:52.000
So this is very much like how a human would consume information.

00:19:52.000 –> 00:19:58.000
And this capability is what is leading them to arrive at more accurate results and outcomes.

00:19:58.000 –> 00:20:14.000
A McKinsey report even states that with LLMs for document extraction, we are able to bring down the extraction error outcome rate by 40%. So that is the number that we are looking at when we are deploying LLMs for document extraction.

00:20:14.000 –> 00:20:23.000
Moving on from there, another powerful capability of LLMs is that they are able to handle unstructured document data formats.

00:20:23.000 –> 00:20:44.000
So this means that you do not have to pre-train your model or define any other rules for it to extract data from documents, even if an LLM is looking at a document for the first time, it is able to extract data unlike the previous gen tools that we had where we either had to define rules Or perform some sort of training.

00:20:44.000 –> 00:20:53.000
Now, a powerful byproduct benefit that we get from that is that we are able to reduce the human effort involved in document extraction to a very large extent.

00:20:53.000 –> 00:21:05.000
Where right now people can actually enter prompts in natural language. So this also moves on to democratize document extraction among the workforce. We are able to, even with minimal coding knowledge.

00:21:05.000 –> 00:21:13.000
We are able to extract the data that we want. And this again pulls down the dependency that we initially had on the IT teams.

00:21:13.000 –> 00:21:21.000
And it’s not just the human effort. We bring down the time and the costs involved in document extraction.

00:21:21.000 –> 00:21:27.000
So finally, moving to the final point we have over here, we’re going to be talking about semantic data extraction.

00:21:27.000 –> 00:21:34.000
So what this means is that LLM models are able to understand the relationship between the different text fields that you have in your document.

00:21:34.000 –> 00:21:44.000
So this means that rather than just extracting data points Today, we are able to go a mile further and also extract actionable insights.

00:21:44.000 –> 00:21:47.000
So for you to understand this better, let me take an example.

00:21:47.000 –> 00:21:54.000
Let’s say we have a credit card statement. So it would ideally have details like the customer name or the address and so on.

00:21:54.000 –> 00:22:10.000
So with previous gen tools, you would be able to extract, let’s say the customer name. But what we are seeing with LLMs is that we’re able to go a step further and we can enter a prompt that says, give me a summary of this document or retrieve the critical points from this document.

00:22:10.000 –> 00:22:22.000
So this is the extra step that we’re able to take and LLMs being at the cutting edge are only rapidly evolving on a day-to-day basis and bringing more and more use cases under their hood.

00:22:22.000 –> 00:22:30.000
So that said, we’re able to take a look at the various differentiators that are putting LLMs in the limelight when it comes to document extraction.

00:22:30.000 –> 00:22:35.000
However, if we are looking to deploy these models in our systems for this purpose.

00:22:35.000 –> 00:22:42.000
Then it is equally important to take a look at the limitations that they come along with as well.

00:22:42.000 –> 00:22:51.000
So over here you can see we have outlined two major limitations that we observe when we deploy LLMs for document extraction use cases.

00:22:51.000 –> 00:22:55.000
The first one being that LLM outputs are only as good as the input they receive.

00:22:55.000 –> 00:23:02.000
This means that in case your LLM is not able to understand the input document.

00:23:02.000 –> 00:23:11.000
In the format that you upload it in, it is most definitely going to give you inaccurate outputs. So this is as simple as the concept of garbage in, garbage out.

00:23:11.000 –> 00:23:21.000
Now, this is where a platform like LLM Whisperer comes into the picture because with LLM Whisperer, what you can do is prepare your documents in a format that is LLM ready.

00:23:21.000 –> 00:23:25.000
So we’ll take a look at how this performs, this function in a little while.

00:23:25.000 –> 00:23:31.000
Now, the second limitation we have over here is that LLM results cannot be independently validated.

00:23:31.000 –> 00:23:36.000
So we briefly touched upon this when we were covering the extraction challenges.

00:23:36.000 –> 00:23:50.000
Currently, LLMs do not support a way in which you can validate the results. And there are also non-deterministic in nature, being able to give you an output no matter what prompt you give. And there is no way of verifying this result. So how do you trust this?

00:23:50.000 –> 00:23:59.000
So again, LLM Whisperer offers certain accuracy features that will help you get an understanding of how accurate a result or an outcome is.

00:23:59.000 –> 00:24:05.000
We’ll take a look at it and when we cover the LLM Whisperer overview.

00:24:05.000 –> 00:24:12.000
All right, so here’s a quick understanding of LLM Whisperer and what the platform brings.

00:24:12.000 –> 00:24:24.000
So as I mentioned earlier, LLM Whisperer is Unstruct’s text extraction and preprocessor tool that prepares LLMs documents in an LLM ready format.

00:24:24.000 –> 00:24:41.000
So how does this do it? One of the defining features of the platform is its capability to preserve the layout of the original document. So as you can see under layout preserving, what we’re talking about over here is when you upload your document and pre-process it using LLMWhisperer.

00:24:41.000 –> 00:24:49.000
It is able to retrieve the text from the document while preserving the layout of the text that you see in the original document.

00:24:49.000 –> 00:25:07.000
So this is the capability that really helps LLNs understand the context to the best of their abilities, because as I mentioned earlier, they do consume information in a way that is very similar to humans. So by preserving the layout, what you’re doing is preserving the context, the full context of the original document.

00:25:07.000 –> 00:25:15.000
So this leads to maximum LLM accuracy no matter what the document format is. You could have radio buttons, nested tables, and a lot more.

00:25:15.000 –> 00:25:20.000
We will take a look at how this works following this slide.

00:25:20.000 –> 00:25:30.000
Moving on from there, we also spoke about LLMWhisperer being able to offer certain accuracy enabling capabilities which help you verify the extracted outcomes.

00:25:30.000 –> 00:25:49.000
So two major features that the platform has to offer over here is that it is able to give you first a confidence score. So depending on For all the text that is extracted from your document, the platform will give you a confidence score on how confident it is that this is in fact the original text present in the document.

00:25:49.000 –> 00:26:02.000
So you can use this as a benchmark or as a standard for you to understand how well accurate the text extraction has been done. And you can see how far you’re willing to go and trust this extraction.

00:26:02.000 –> 00:26:19.000
And another capability the platform has to offer is that it is able to retrieve the coordinates of the individual text. So depending on the original document, it is able to retrieve a bounding box or the coordinate where that particular text was present in the original document.

00:26:19.000 –> 00:26:30.000
So with these coordinates, you can even have a human in the loop when you deploy this data in downstream operations To be able to verify whether the extracted text is in fact accurate.

00:26:30.000 –> 00:26:41.000
Now, we will take a look at how this looks in the downstream operations and you will be able to get an understanding of both The layout preserving mode as well as the accuracy enabling capabilities.

00:26:41.000 –> 00:26:47.000
And finally, LLMWhisperer, depending on your business use case, supports flexible deployment options.

00:26:47.000 –> 00:26:54.000
You can deploy it as a cloud offering, as an API, or as a JavaScript client or a Python client.

00:26:54.000 –> 00:27:04.000
And finally, to wrap it up, LLMWhisperer is designed to ensure the privacy of your data by being ISO, GDPR, SOC 2, and HIPAA compliant as well.

00:27:04.000 –> 00:27:14.000
So with that said, let’s jump into an LLMWhisperer deep dive where we’ll take a look at how the platform works and what are the capabilities and how it is deployed.

00:27:14.000 –> 00:27:24.000
So for this, I’ll just take you to the element whisperer interface that I have over here. So as you can see, this is my interface.

00:27:24.000 –> 00:27:35.000
And one interesting thing that the platform offers is that when you sign up for LLMWhisperer, you get access to the end-to-end capabilities with no features being limited.

00:27:35.000 –> 00:27:49.000
And all of this for free. So what you can do is you have a daily limit of 100 pages that you can extract data from, extract text from using this platform on a daily basis and you can test the various documents that you have in your business

00:27:49.000 –> 00:27:55.000
So this is where I can upload a document of my own and get started with a text extraction.

00:27:55.000 –> 00:28:06.000
Or you also have various sample documents over here. So you can see there are a range of documents that are forms that are handwritten, forms that are filled by hand. There are condensed tables.

00:28:06.000 –> 00:28:22.000
As well as canned images over here. So just to show you an idea of how this looks, I’m going to open this table. So as you can see, this document contains a table that is very condensed and it has a lot of information which

00:28:22.000 –> 00:28:33.000
Would ideally be difficult if I were to pass this directly through an LLM model. But what we are seeing over here is LLMWhisperer’s capability of extracting this text and maintaining the layout of this text.

00:28:33.000 –> 00:28:46.000
And you can also choose to whether or not you want to enable the rows and the column lines to be extracted in the raw view or the LLM ready format, which you’ll be feeding to the LLM next.

00:28:46.000 –> 00:28:56.000
So as you can see, the platform is able to maintain the spacing as well as the text that is present in the original document.

00:28:56.000 –> 00:29:01.000
Now just to show you another example for you to get a better picture.

00:29:01.000 –> 00:29:14.000
We have over here a scanned, a rather badly scanned invoice where you have bad lighting and the text is not too clear. You even have an oil stain over here. This is a receipt from a burger joint apparently.

00:29:14.000 –> 00:29:26.000
And you can see how text has been extracted by preserving the layout of the document. So we spoke about how LLM outcomes are only as good as the input that they receive.

00:29:26.000 –> 00:29:41.000
And the best way to send them the input is to preserve the layout of the original document when you extract the text from the document. So this is the input which will make the LLM perform best on this particular document.

00:29:41.000 –> 00:29:57.000
So this is what we see are the capabilities of layout preserving with LLM Whisperer So LLM Whisperer again can be deployed in various ways as the API Python client or as a JavaScript client.

00:29:57.000 –> 00:30:08.000
So as you can see, we have an API key, a single API key over here, which can be used for multiple document data extractions and you can also download this as a postman collection for your deployment.

00:30:08.000 –> 00:30:13.000
So let me just show you an example of how this is done.

00:30:13.000 –> 00:30:23.000
So over here, I have over here i have actually set up this call. I’ve uploaded a sample document just to give you an idea of what this document looks like.

00:30:23.000 –> 00:30:34.000
So this is the document we’re dealing with. A loan application which is number one scanned And it also has details written in by hand. So you can see that the customer name.

00:30:34.000 –> 00:30:52.000
The social security number their marital status has a checkbox And various other details are entered by hand. And on the second page, we even have a disoriented scan. So this is not a very great scan. It has a lot of information, which is quite fuzzy and

00:30:52.000 –> 00:31:00.000
Yeah, so let’s see how well the platform is able to extract the text from this document.

00:31:00.000 –> 00:31:15.000
So let me go back to my Postman API call. So yeah, all right. So over here in this call, I’ve uploaded the document And I can also set certain parameters over here to control how I want this to be processed.

00:31:15.000 –> 00:31:28.000
So I can, as you can see, I can control the median filter or the gaussian blur I can even mark whether or not I want vertical and horizontal lines from tables to be extracted, like in the example you saw earlier.

00:31:28.000 –> 00:31:36.000
So in this case, I’ve already made this call and you can check the status. So we have some data as to how long it took for this extraction to be done.

00:31:36.000 –> 00:31:42.000
And what is the status? So this is a successful extraction.

00:31:42.000 –> 00:31:47.000
And finally, over here, you can see how the text has been extracted from this particular document.

00:31:47.000 –> 00:31:55.000
So you can see the layout is again being preserved and this was information that was entered by hand. So we have the customer, the applicant name.

00:31:55.000 –> 00:32:01.000
The social security number And we also have the marital status, which was checked in a checkbox.

00:32:01.000 –> 00:32:15.000
So all of this is accurately retrieved. And moving on to the second page, we looked at the disoriented ID card scan and you can see how the details have been retrieved while still preserving the layout.

00:32:15.000 –> 00:32:26.000
So this is ultimately the text that will or the document format that will be passed through your LLMs for you to get the data that you’re looking to retrieve.

00:32:26.000 –> 00:32:36.000
Now we also spoke about certain accuracy enabling capabilities. We spoke about how the system can also give you confidence scores on each of the text extracted.

00:32:36.000 –> 00:32:44.000
And how you can also get access to the coordinates of the text, which can be used in downstream operations for highlighting.

00:32:44.000 –> 00:32:57.000
So let me just make this call again after enabling the highlighting capability and you’ll be able to see how how you get your metadata

00:32:57.000 –> 00:33:10.000
So I’m uploading the document again and let’s check the status Let’s just give it a few seconds for it to process.

00:33:10.000 –> 00:33:24.000
All right. So now that this is done. You can see how my confidence metadata is retrieved So you have the text extracted. What are the coordinates of this text, as well as the confidence code that you get.

00:33:24.000 –> 00:33:33.000
How confident the service is on the particular extracted text. And this will give me an idea of whether or not I want to go ahead with this extraction.

00:33:33.000 –> 00:33:43.000
So you have the various texts over here along with the confidence score and the coordinates which form the bounding box on the source document.

00:33:43.000 –> 00:33:53.000
So moving on, we again have the coordinates of each of the lines present in your document. So you can, again, use this information for highlighting purposes.

00:33:53.000 –> 00:34:01.000
So this is how your metadata looks And just to give you an understanding of how this can be deployed in downstream operations.

00:34:01.000 –> 00:34:10.000
Let me quickly move to my unstract platform where I have deployed this highlighting.

00:34:10.000 –> 00:34:33.000
All right. So this is unstract, for those of you that are unfamiliar with the platform, this is an end-to-end document extraction or ETL platform that we’re looking at where not only can you upload documents and extract data or pre-process them into an LLM ready format, but in this platform you can also specify prompts

00:34:33.000 –> 00:34:40.000
That give an idea of what is the data you’re looking to extract. And you can also deploy them using various deployment options.

00:34:40.000 –> 00:34:56.000
Now, if I had to get into the details of OneStrack, that would be a webinar on its own. So in this case, let’s just see how LLMWhisperer has been able to create this document into an LLM ready format and also support highlighting.

00:34:56.000 –> 00:35:01.000
So you can see that this is a credit card statement over here.

00:35:01.000 –> 00:35:05.000
And we have… a few pages to this document.

00:35:05.000 –> 00:35:14.000
So the first step LLMWhisperer would do is to extract the text and maintain the layout. So you have the extracted text over here.

00:35:14.000 –> 00:35:27.000
And even something as small as the logo, as you can see over here, is extracted, maintaining the spacing between the text. And you have the various details of the lines given over here, which will again be used when it comes to highlighting.

00:35:27.000 –> 00:35:34.000
So in this specific project, I have enabled highlighting and just to show you how this works.

00:35:34.000 –> 00:35:42.000
So we have over here a few prompts that contain information on what is the data I’m looking to extract from this document.

00:35:42.000 –> 00:35:48.000
So we’re going to extract the customer name. We’re extracting the address and line items from this credit card statement and so on.

00:35:48.000 –> 00:35:55.000
Now, if I want to know where this address was specifically was extracted from the source document.

00:35:55.000 –> 00:35:59.000
Since I have enabled highlighting using the metadata that you saw earlier.

00:35:59.000 –> 00:36:14.000
I just have to click on this output And you can see how the platform is able to highlight that corresponding spot using the bounding box and the coordinates in the source document So this is how it allows you, for instance, if I click on city.

00:36:14.000 –> 00:36:18.000
You get where the city was fetched from the original document.

00:36:18.000 –> 00:36:30.000
So this is how you can actually have a human in the loop in case you need to manually review some of your extractions using the metadata that you get from LLMWhisperer.

00:36:30.000 –> 00:36:37.000
So some industries actually require this as a compliance. And this is how LLMWhisperer really makes it possible.

00:36:37.000 –> 00:36:44.000
So with that, I think we’ve taken an overview of most of the capabilities that LLMWhisperer has to offer.

00:36:44.000 –> 00:36:56.000
And while this is a bird’s eye view of everything, you can get a deeper understanding by going through our documentation over here. You even have a getting started guide that you can go through.

00:36:56.000 –> 00:37:05.000
And sign up for the platform and the playground that we looked at in case you’re looking to check out how this works for your particular business documents.

00:37:05.000 –> 00:37:14.000
Now, LLMWhisperer is free of charge for the first 100 pages that you can upload. And in case your limit exceeds that.

00:37:14.000 –> 00:37:20.000
You would be charged based on the usage and the kind of documents you’re looking to extract text from.

00:37:20.000 –> 00:37:27.000
So these are the various file formats that are supported as input.

00:37:27.000 –> 00:37:35.000
We also support webhooks so that you can have one, you can also be invested in data extraction, be immediately notified upon completion and other operations.

00:37:35.000 –> 00:37:45.000
And there is extensive documentation on the API deployment as well as the Python and JavaScript client deployments as well that you can go through.

00:37:45.000 –> 00:37:58.000
So with that, let me go back to my presentation. And we’ve taken a look at what LLMWhisperer has to offer and how it can really help you achieve data extraction the way you’d like.

00:37:58.000 –> 00:38:05.000
So while we have seen the various capabilities and how it works in data extraction.

00:38:05.000 –> 00:38:12.000
I thought it would be interesting to actually compare the outcomes in real world using real world document extraction tasks.

00:38:12.000 –> 00:38:21.000
So in the upcoming segment, what we’ll be seeing is that we’ll be parsing the same document in two different ways.

00:38:21.000 –> 00:38:30.000
So in the first way that you see over here, we’ll be performing document extraction where we’ll be retrieving some data from a particular document without pre-processing.

00:38:30.000 –> 00:38:37.000
So we will directly be passing it through some of the leading LLMs in the market today. That is Gemini 2.5 Flash.

00:38:37.000 –> 00:38:47.000
Chatgpt Plus and Claude 3.7 Sonnet. And in the other approach, we’ll take the same document and we will process it using our preprocessor that is LLMWhisperer.

00:38:47.000 –> 00:38:55.000
And in this case, after pre-processing, we will be sending it to ChatGPT for data extraction.

00:38:55.000 –> 00:39:02.000
So with that said, let me move on and show you the outcomes of this exercise.

00:39:02.000 –> 00:39:14.000
So we’ve actually performed this exercise with a bunch of documents and I’ve just shortlisted a few over here for you to get a good understanding of how these models work with different document formats and types.

00:39:14.000 –> 00:39:31.000
So firstly, we have over here a credit card statement and it is a pretty clean document And it is neatly formatted. So you would have any other detail that any, I mean, you’d have details any other credit card statement will contain. So you have the account

00:39:31.000 –> 00:39:44.000
Summary, you have the details of the customer, like their name, their address, the payment information over here. And on the second page, you have a list of the spend line items.

00:39:44.000 –> 00:39:49.000
So let’s take a look at what is the data that we’re looking to extract from this particular document.

00:39:49.000 –> 00:39:56.000
So over here we have the various prompts that we will be giving to these LLM models, as well as when we parse it through LLMWhisperer.

00:39:56.000 –> 00:40:02.000
So the first prompt will be extracting the customer name from this credit card statement.

00:40:02.000 –> 00:40:13.000
In the second prompt, we’re looking to extract the full address And we’ve also given it a structured JSON. So I want it in this particular schema with the full address, the city, as well as the zip code.

00:40:13.000 –> 00:40:25.000
Moving on, we’ll be retrieving some payment information And in the fourth prompt, we’ll be retrieving a simple JSON object with the spend date, the spend description, as well as the amount.

00:40:25.000 –> 00:40:33.000
Which will be again formatted to two decimal places. So you can see how I can also define the schema in which I want the data to be extracted.

00:40:33.000 –> 00:40:42.000
We’ll again take, we’ll also be retrieving the opening and close date of this statement and what is the sum of previous balance payment credits and purchases.

00:40:42.000 –> 00:41:06.000
So firstly, we uploaded this document to Gemini and we also gave the prompts And you can see how Gemini has been able to retrieve the output. So this output has been verified and it was able to do a good job. It was able to retrieve all of the data that we were looking at and you can see how it’s been able to retrieve this

00:41:06.000 –> 00:41:17.000
Keeping in mind the schema that we’d asked for. So we have the customer name, the address, payment information, and list of spend line items as well.

00:41:17.000 –> 00:41:27.000
And finally, we’ll have the opening and closing date as well as the sum. So similarly, Claude was able to do a good job with this particular document.

00:41:27.000 –> 00:41:32.000
And all of this output has been verified. And we see the same with ChatGPT.

00:41:32.000 –> 00:41:41.000
So you might wonder that these models are actually retrieving the data in spite of not performing pre-processing.

00:41:41.000 –> 00:41:47.000
But one thing to consider is that this is a rather simple document with clean formatting.

00:41:47.000 –> 00:41:53.000
And while we’re not saying that LLMs do not perform on all documents.

00:41:53.000 –> 00:42:00.000
When it is neatly formatted, you can see in this case, the models are able to retrieve the data that you’re looking for.

00:42:00.000 –> 00:42:10.000
Now in our upcoming examples, you will be able to take a look at certain complex, more complex documents and we’ll see how the models are able to work in those cases.

00:42:10.000 –> 00:42:17.000
So just finally to show you how LLMWhisperer is deployed in this case, we have the original document over here and under raw view.

00:42:17.000 –> 00:42:31.000
You can see how LLMWhisperer has created an LLM ready format by preserving the layout. And we also have the various prompts on the right on the left over here with the output extracted.

00:42:31.000 –> 00:42:49.000
So let me move on to the next example where we have a slightly more complex document So this is the same document you’d seen earlier. This is the loan application. We saw that this is a scanned document with Some details entered by hand and we also had a disoriented scan of the ID card on the second page.

00:42:49.000 –> 00:42:54.000
So let’s take a look at what is information we’re looking to retrieve in this case.

00:42:54.000 –> 00:43:05.000
So firstly, we’ll be extracting some personal information. So we’d be getting this particular applicant’s social security number, citizenship type, hair color, eye color, and so on.

00:43:05.000 –> 00:43:15.000
If you closely take a look at it, the details like the citizenship type or marital status are all details from the first page, whereas the remaining are from the second page.

00:43:15.000 –> 00:43:20.000
So we will see how it is able to bring these two data together and present it in the same outcome.

00:43:20.000 –> 00:43:26.000
Moving on, we’ll also be looking to extract some customer contact information, applicant address information.

00:43:26.000 –> 00:43:38.000
Whether the applicant is self-employed or a business owner. And finally, we’ll be retrieving the gross income and the rent given by the applicant in this application.

00:43:38.000 –> 00:43:44.000
So firstly, we’ve uploaded this document to gemini And we’ve also given it the prompts that you looked at right now.

00:43:44.000 –> 00:44:02.000
So we were able to see that Gemini was in fact able to retrieve the output. And by looking at it, it does look pretty decent. But upon closer inspection, what we did notice is that there were certain incorrect extractions and incomplete extractions as well.

00:44:02.000 –> 00:44:09.000
For instance, looking at the personal information over here. We have the sex field given as null.

00:44:09.000 –> 00:44:17.000
Now, this is the extraction done by Gemini. However, if I were to compare this with that done when this document was first pre-processed by LLMWhisperer.

00:44:17.000 –> 00:44:21.000
We have over here that the sex is retrieved as female.

00:44:21.000 –> 00:44:33.000
So just to cross-check this with the original document. Let me open this and upon closer inspection of the scan, you can see that in fact there is a field which gives the details of the sex.

00:44:33.000 –> 00:44:41.000
However, it was not retrieved properly with the LLM model when the document was directly passed through it.

00:44:41.000 –> 00:44:55.000
So this was one inconsistency that we saw with Gemini. And another inconsistency was that if you take a look at the gross monthly income and rent, we have 8,000 and 4,300 respectively.

00:44:55.000 –> 00:45:10.000
However, looking at the document, what we see over here is that the monthly, while the gross monthly income is retrieved correctly, the monthly rent over here is given as thousand 1,300 by the applicant.

00:45:10.000 –> 00:45:16.000
So what the model probably did over there is confuse this one because of the handwriting for a four.

00:45:16.000 –> 00:45:21.000
So these are the issues that we’re really looking to overcome with pre-processing.

00:45:21.000 –> 00:45:37.000
Because while this is while all the data has been extracted, these are outcomes which are incorrect. And when we are handling larger documents or more volumes of documents, it’s going to be impossible to nitpick and look at each of these outcomes.

00:45:37.000 –> 00:45:43.000
And there is also currently no way to validate whether or not this particular outcome is accurate.

00:45:43.000 –> 00:45:56.000
So just to see what LLMWhisperer was able to retrieve, you can see that Over here, we do clearly have that the monthly, the gross monthly income is 8,000.

00:45:56.000 –> 00:46:04.000
And the rent is also 1300. And that is the data that you see reflecting over here in the prompt as well.

00:46:04.000 –> 00:46:25.000
So moving on to ChatGPT and Claude. So what we noticed with these two models is that they were not able to in fact process the scan at all. For instance, we have the prompt over here given to ChatGPT. Let me just upload this document from my local file.

00:46:25.000 –> 00:46:33.000
Now, when I do this you will see that ChatGPT is throwing an error message. It was not able to find it.

00:46:33.000 –> 00:46:39.000
It could not extract the text from this particular file. So it basically dropped out of the race in this particular case.

00:46:39.000 –> 00:46:46.000
Moving on to Claude, we found a very similar output where Claude was also not able to process the scanned document.

00:46:46.000 –> 00:47:10.000
So this actually was a common observation that we’d seen across the various other documents that we had performed this particular exercise with, where Claude and ChatGPT did not process a lot of scanned documents when they were directly given to the models. But what we see with LLMWhisperer in this case is I mentioned earlier that we’ll be using ChatGPT for all our extractions.

00:47:10.000 –> 00:47:16.000
So as you can see over here, we are deploying a GPT model for this particular data extraction.

00:47:16.000 –> 00:47:34.000
And when the same document was uploaded to the GPT model directly, it did not accept the document at all. But because we are able to pre-process it and perform this stage, you can see that the very same model is actually able to retrieve this information and pretty accurately.

00:47:34.000 –> 00:47:45.000
So this is the difference or the outcome that we see when we use preprocessing and why we stress again that it is a necessity in this day and age.

00:47:45.000 –> 00:47:50.000
So moving on to the next document we have a nested table over here.

00:47:50.000 –> 00:47:56.000
Now, we’ve just picked this document to show you the various document formats and how these models perform.

00:47:56.000 –> 00:48:03.000
So in this case, we have some details as to the sports facility and their availability in different schools.

00:48:03.000 –> 00:48:16.000
So you can see two different schools given over here. What are their classes or grades that they have? What is the class strength or the number of students in each grade and whether or not sports facility is available for these classes.

00:48:16.000 –> 00:48:26.000
So we’ll take a look at the prompts. So we’ll be extracting the name of the school and whether or not they have sports facilities. So this will be a simple yes or no extraction that I’m looking at.

00:48:26.000 –> 00:48:32.000
The second prompt will extract the name of the school along with what are the classes available, which will be returned as a string array.

00:48:32.000 –> 00:48:42.000
The third prompt is looking to extract the specific strength of a grade 10 from XYZ public school. So this is 45.

00:48:42.000 –> 00:48:50.000
This is the number that we’re looking to extract. And finally, we’ll be extracting the strengths of the classes 11 and 12 from both the schools given.

00:48:50.000 –> 00:49:10.000
So this, again, when it was first uploaded to Gemini and we’ve also passed the prompts We were able to get the outcome, but it was incomplete because Gemini was able to only pull out the outcome for the Our first prompt given, whereas the remaining prompts were pretty much incomplete. So this was the observation with Gemini.

00:49:10.000 –> 00:49:17.000
Moving on to ChatGPT and Claude, however, we were able to retrieve all of the outcome and we did verify it to be pretty accurate.

00:49:17.000 –> 00:49:26.000
So you see over here, we’ve retrieved even the class strength and all the other outcome that you saw for the different prompts.

00:49:26.000 –> 00:49:38.000
And the same goes with Claude. So another observation I’d like to bring to your notice over here is we didn’t notice that whilst Gemini was able to retrieve data from scanned documents.

00:49:38.000 –> 00:49:50.000
In most cases when there were condensed tables or nested tables during our exercise, we did notice that Gemini I mean, Claude and ChatGPT did outperform Gemini in many instances.

00:49:50.000 –> 00:50:00.000
And just to show you how this parses through LLMWhisperer first. So you have the um LLM ready document format and the remaining output that’s retrieved over here.

00:50:00.000 –> 00:50:17.000
Now, as a final document, I just wanted to show you this document because to show you the level of complexity that preprocessing can handle. So this is a pretty complex document because it’s barely legible even for the human eye so you have

00:50:17.000 –> 00:50:34.000
Pretty intricate checked background and also handwritten text, which is not easily legible. But I can see that this is a letter made out to somebody and it’s There are a few sender details over here as well. And there is a serial number that you’ll find on the bottom right.

00:50:34.000 –> 00:50:48.000
So using these details, here are the prompts that I had given to the models. So we’re looking to extract the text from this letter the sender details as well as what was the serial number that we saw in this particular document.

00:50:48.000 –> 00:51:04.000
So upon giving it to Gemini, we did notice that it was able to retrieve the text perfectly well, as well as the sender details However, the serial number was again incorrectly retrieved. So you can see that over here, it starts with a 340.

00:51:04.000 –> 00:51:19.000
Whereas looking at the document, you can see it clearly starts with a 240. And this is again in incorrect output. So moving on to chat GPT and Claude again, they could not process this particular document as it was heavily handwritten.

00:51:19.000 –> 00:51:28.000
And Claude gave us a similar response where it found it extremely difficult to decipher since the image quality is poor and the handwriting is also not clearly legible.

00:51:28.000 –> 00:51:43.000
So again, taking a look at this with LLMWhisperer after our pre-processing, you can see that all of your output is being extracted. And we even have the serial number that is extracted accurately.

00:51:43.000 –> 00:51:55.000
So this was again, this extraction was done using a GPT model, but the only difference that was made over here again is this pre-processing stage that we’ve been looking at for a while.

00:51:55.000 –> 00:52:09.000
So that sums up the various documents I wanted to show you folks. I just wanted to show you an example of how these models work across documents of varying complexities and document formats.

00:52:09.000 –> 00:52:15.000
Now, moving back to the presentation Here is an outcome analysis of this exercise.

00:52:15.000 –> 00:52:28.000
So you can see the four documents that we’ve taken a look at right now. So with a fairly simple document with NEAT, formatting which was digitally native, like the credit card statement, you could see that all LLMs were able to perform pretty well.

00:52:28.000 –> 00:52:36.000
However, when it came to a scanned document, whether it was a loan application or the letter which was a bad scan.

00:52:36.000 –> 00:52:45.000
We were able to see that ChatGPT and Claude kind of dropped out of the race, whereas Gemini was able to extract data, but it was only partially correct.

00:52:45.000 –> 00:52:59.000
And with nested tables, however, Gemini did struggle with a few cases even with our other documents that we’d used for this exercise. Whereas ChatGPT and Claude were fairly able to perform better.

00:52:59.000 –> 00:53:08.000
So when it comes to an enterprise use case. You are going to be dealing with documents that come in various layouts and formats.

00:53:08.000 –> 00:53:17.000
And looking at this analysis, it is going to be impossible for us to deploy a specific LLM for each document format that you incur.

00:53:17.000 –> 00:53:33.000
Which is why again With pre-processing, you can actually make a real difference with the state-of-the-art technology that we have right now when it comes to extracting data from your documents And that is what we saw with this exercise.

00:53:33.000 –> 00:53:37.000
So that, folks, was the final segment of our webinar today.

00:53:37.000 –> 00:53:58.000
We took a look at, just to sum it up, we took a look at the various document extraction challenges that we face, how LLNs have arise to the occasion and how they are really differentiating themselves in the market And we went on to take a look at the necessity for a preprocessor tool like LLMWhisperer. We looked at its deployment.

00:53:58.000 –> 00:54:10.000
How it can be deployed and the various features it supports. And finally, we even saw a face-off between extracting data from your documents using a preprocessor versus not using a preprocessor.

00:54:10.000 –> 00:54:23.000
Now, if you would like to explore LLMWhisperer in more detail, you can always sign up for the platform and use the playground that we have to offer for you to test your own business documents.

00:54:23.000 –> 00:54:26.000
And you can also go through the extensive documentation that I had shown you.

00:54:26.000 –> 00:54:42.000
Another way to explore the platform would be to sign up for a personalized demo. This would be a free demo that we offer where one of our experts will be able to sit with you and understand your specific business use cases and see how LLMWhisperer can be deployed in your business.

00:54:42.000 –> 00:54:47.000
So in case you are interested in this demo, please do drop your email IDs in the chat tab.

00:54:47.000 –> 00:54:59.000
And we’ll be able to reach out to you proactively. So with that said, this webinar has come to an end and we will be sending the recording of this session to all our registrants.

00:54:59.000 –> 00:55:18.000
So in case we have any questions, we can take up the Q&A as well.

00:55:18.000 –> 00:55:24.000
All right, I can see that few questions have already been answered.

00:55:24.000 –> 00:55:29.000
Naren, is this it? I think we can close this session, right?

00:55:29.000 –> 00:55:42.000
Yeah, there’s one another question. So just so I understand more what’s happening under the hood

00:55:42.000 –> 00:55:54.000
So Jason, there is a question on like simulate the workflow by first extracting the text from a PDF, saving it to a file and then uploading it to a model yeah so the LLMwhisperer just does the text text action.

00:55:54.000 –> 00:56:12.000
But with Unstract platform, you don’t have to like do this. I mean, it does end to end. So the first step, the element is the text extraction happens And then that is the text output is sent to the large language model and then the structured extraction happens.

00:56:12.000 –> 00:56:20.000
So you don’t have to like do this bit by bit. So the unstract platform can actually do this end to end.

00:56:20.000 –> 00:56:29.000
So you can check it out and then, yeah, the Unstract platform, you can sign up for a free trial and then You can try it out. Yeah.

00:56:29.000 –> 00:56:35.000
All right. Thank you, Naren and Thank you, everybody. I think that sums up all the questions that we had.

00:56:35.000 –> 00:56:46.000
So yeah, again, a reminder, please leave your feedback on your way out. And we’re really looking forward to seeing you at our upcoming sessions and events. Thank you so much. Have a great day.

00:56:46.000 –> 00:56:55.000
Thank you.

Document extraction at the cutting edge with LLMs vs. LLMWhisperer

Developers

Industries

Tools

Resources

Stay in touch