[00:00:00]
Hi everybody. Hope you’re doing well. I’m Mahashree, Product Marketing Specialist at Unstract, and I’m joining you from Chennai today. So here at Unstract, we speak to customers across different industries on a daily basis, each having their own unique document processing needs. Now Unstract is a document agnostic platform and can be deployed across a wide range of use cases.
But one of the major industries that we often deal with is finance,
[00:00:30]
and that is the reason we have this webinar today. So in this session, we’ve tried to cover most of the answers. To the common questions we hear about finance document, ETL. And if you’re a part of a finance team, you’ll know that document extraction isn’t just about pulling data from documents and automating workflows.
I mean, there are more layers of complexity. Uh, if we just look at the sheer volume of documents. Finance teams deal with the absolute need. To get extraction, right, because even a small error can have significant
[00:01:00]
downstream consequences. And of course, the pressure of compliance requirements that the industry is driven by.
So over the next few minutes we’ll explore some of the major challenges in finance document processing, and we’ll also look at certain practical ways to address them, uh, using a platform like Unstract. So let’s get started.
So here are a few session essentials I’d like to go over before beginning. So number one, all attendees in this webinar will
[00:01:30]
automatically be on mute throughout the course of the session. In case you have any questions, you can drop them in the q and a tab at any time during this session. Our team, uh, is monitoring the q and a and will be able to get back to you with the answers via text.
You can also interact with fellow attendees in the chat tab. This is also where you’ll let us know in case you run into any technical difficulties during this webinar. And as a final point, when you exit the session, you’ll be redirected to a feedback form that I request you to leave a review on so that we can, uh, continue to improve our
[00:02:00]
sessions going forward.
So that said, here is the agenda for today. So in this session, as I mentioned earlier, we, um, we’ll be covering the most common challenges in finance document, ETL, and we’ll be building into each of these challenges in more depth. And, uh, we’ll be addressing them with certain best practices that you can follow to overcome them as, uh, along with a live demo, uh, of Unstract in action and the various capabilities that the platform brings forward to overcome these challenges in finance.
[00:02:30]
Document, ETL. And uh, finally after the session, in case we have any questions remaining, we’ll also be venturing into a live q and a where one of our experts will be on air to answer your questions. So that said, let’s start off, uh, dive right deep into the finance document, uh, ETL challenges. So as you can see, we have three challenges over here.
So we’ve bucketed the overall challenges into three categories. Firstly, we have varying unstructured document volumes.
[00:03:00]
So, uh, over here we are talking about the sheer volume of documents that finance teams Inca on a daily basis. So finance teams, again, get documents coming in from various sources. So this, uh, could be bank statements, cash flow statements, uh, uh, P&L reports and so on.
So how do you handle the variety of documents that you have? Secondly, we have accuracy and compliance requirements as a challenge. And thirdly, we have integrating with core systems. So let me get started with the first
[00:03:30]
challenge and then we’ll move on and cover each of these one by one. So up first we have varying, uh, unstructured document volumes.
So as I mentioned earlier, we are dealing with finance reports, balance sheets, cashflow statements, aging reports, and just so many reports that, um, and documents that we, uh, we need to look at on a daily basis and extract data from. So. How do you deal with the variety? Because each of these documents are going to come in different layouts.
And not only that, they do contain
[00:04:00]
a big amount of unstructured doc, uh, data as well. And these documents can, again, change their layout with time. So if we, for instance, take an invoice, each and every one of your vendors is going to follow a different layout for the invoice. So you have to keep up with the layout.
And, uh, again, it, the format might change with time. So ideally we should, uh, in order to function at the highest, we, we should be looking at a one-stop solution for finance document processing. But let’s quickly
[00:04:30]
take a look at what are the various technologies that we have out there for document processing, and let’s compare them with one another to see which one would fit in best for the kind of chaos that finance teams end up dealing with when it comes to document extraction.
So over here we have a comparison of the major technologies that we have out there. So, um, you might be familiar with these technologies, but for the case of, uh, some of you who might be completely new to this,
[00:05:00]
let me just quickly go over this, um, one by one. So first up we have OCR, which is one of the oldest document processing technologies.
It’s, uh, it’s lightweight to deploy and it basically only extracts the text from your documents and it, uh, provides you the text that was present in the documents, but it is, it does not necessarily understand any of the text. So, for instance, if I was, uh, if I were to process an invoice, uh, through OCR and if I’m looking to extract the invoice number, I would probably get a text dump
[00:05:30]
from the system.
But the system would not identify, uh, which one of these texts is the invoice number. So it is, again, uh, you’ll have to define post-processing, and this makes it really heavy, uh, when you start working on it. And it’s, again, very difficult to scale. And, uh, OCR has again, uh, sensitive to document quality as well.
So while it is. A best fit for simple digitization tasks like turning documents into text. You cannot look at adopting this in the
[00:06:00]
long run, and especially not for, uh, the kind of variety of documents finance teams are dealing with. So from traditional OCR, OCR with rules evolved. So over here, uh, what happened was the OCR technology was again able to identify the exact data that you’re looking to extract out of the text dump that it was able to retrieve.
So how this works is that. Um, as a user, you define certain rules on the, uh, technology for it to identify where particular data could be
[00:06:30]
present on a given document. So, for instance, if I take an invoice, again, I could define a rule that the invoice number is usually on the top right corner, and the OCR would be able to fetch that for me.
So this would work really well if the documents are, again, highly consistent and even with minute variations and layout. Uh, this system is bound to break, which is why it is really, uh, it comes with really high maintenance and it’s not really scalable for, uh, document variety. So moving on from OCR, uh,
[00:07:00]
uh, with rules, we had the emergence of ML models where, um, machine learning models were fed with large number of document, uh, uh, documents.
And it would learn out of these documents, identify patterns, and also, uh, check out where a particular data is bound to present. Uh, be present on a, a given document. So depending on the kind of training set that you use, uh, the ML models were able to handle a larger. Umbrella of documents than maybe the traditional OCR
[00:07:30]
as well as OCR with rules.
But again, it really depends on the kind of documents you include in the training set. So if I were to give a document that is completely new or completely unstructured to the ML model, it. Would again, uh, struggle, uh, with the document, it would run into edge cases and you would probably have to retrain the model for new layouts that you’ll be feeding it with.
So while this was functional to a certain extent, it is again, um, difficult for the kind of, uh, documents that we keep incurring on a daily
[00:08:00]
basis, and you’ll have to keep coming back and retraining the model. So currently when we look at the, um. Businesses worldwide, they might fall under these two categories, which is OCR, with rules and ML models.
They might be using these technologies for document processing, but this is also because these technologies have been around for a while, for a couple of years, and, uh, what we are seeing today, how war is that large with the large language models entering these space. It is really disrupting how document processing is being done because
[00:08:30]
LLMs today.
Not only extract the text from documents, but they go beyond that to understand what your text is conveying or what is the context of the document really. So this, uh, these, uh, models can again handle unseen layouts and they require minimal setup or they require no training or predefined templates at all.
They can be looking at a document for the first time and still be able to understand the context and retrieve the text. So this. Is why, um, LLMs are again, disrupting the
[00:09:00]
space and we are seeing more because they emerged. More recently, we are seeing more and more businesses exploring, uh, LLMs for document processing, and they’re also slowly transitioning into, uh, this technology.
So again, they do not come, uh, I mean they do come with their own set of limitations. For instance, when you’re dealing with a large. Set of, uh, documents or when you have documents with, uh, hundreds of pages, they may, uh, hallucinate or misinterpret data. It can be expensive at scale and can also require a human in the loop for
[00:09:30]
compliance, critical use cases.
So while they do come with their own limitations, when you, um, uh, couple elements along with a platform like Unstract, you also have access to capabilities with which you can seamlessly overcome these. Limitations as well. So I would say that out of the various technologies that we have out there for the kind of, uh, document chaos that finance teams end up dealing with, elements would be the best match, um, as they are really useful in, um, you know, handling unstructured data and a high
[00:10:00]
variety of layouts as well.
So, I mean, if you are a business looking to transition to elements or to explore this technology for your particular needs, and if you are currently already, uh, using one of these technologies over here and, uh, you’re looking at how you can transition, uh, you can set up a call with one of our experts and we can help you figure it out, um, based on your particular use case.
So, uh, you can figure that out. I’ll ask my team to drop a link to the demo, uh, request form and you can maybe fill it out and we can get
[00:10:30]
started in case you’re looking to transition. So that said, let me move on, uh, to covering Untract. So we looked at the various, uh, technologies that are out there. So, uh, let me move on to, uh, covering how Unstract really drives LLM driven, uh, document processing.
So for those of you that are completely new to the platform, let me uh, quickly give you an introduction to what Unstract is all about before we move into a demo segment. So. Untraced is an
[00:11:00]
LLM powered unstructured data ETL platform. Now, if I were to bucket the capabilities of the platform, we’d have three major categories.
That is text extraction, the development phase, and thirdly the deployment phase. So what the platform first does when you upload a document is to extract the raw text from your document. And it does this using a text extractor that you can connect. Uh, with, and, uh, one of the, um, popular choices that we see among our customers is LLMWhisperer,
[00:11:30]
which is also Unstract’s in-house text extractor tool.
So this works, uh, well hand in hand with the platform. It’s able to extract raw text. From the document and also pre-process it into a format that is consumable by lms. So it is also known to generate LLM Ready Output. And again, LLMWhisperer can be deployed as a standalone application as well, depending on your needs.
So you can deploy it, uh, you can use it. In the cloud setup or as an API as well. So once you
[00:12:00]
extract the text from your documents that you upload into the platform, you would be moving on to defining prompts in natural language that would, uh, basically help you extract relevant data from your document.
So this is done using a prompt engineering environment called Prompt Studio, where you would be entering prompts that would, one, define what is the data that you’re looking to extract, and secondly, what is the schema that you’re looking to extract this data in. And you can also test these prompts across multiple document variants, um, to see how it’s
[00:12:30]
working.
And, uh, you also have access to various other. Accuracy, enabling capabilities or cost saving capabilities within the prompt studio itself. We’ll, uh, quickly go over it when we actually cover the demo segment. So once you define all the prompts in Prompt Studio and you’re happy with how your data is being extracted, you can then deploy this.
Particular project in any of the four deployment options that we currently offer. Uh, to see which, which, and you can choose whichever one best suits your use case.
[00:13:00]
So you can go for an API deployment, an ETL pipeline, a task pipeline, or also a human in the loop deployment. So, um, again, to sum it. Some numbers that would throw light on where the platform stands today is we have over 5.5 k plus stars on GitHub, a nine 50 plus member Slack community, and currently we’re processing over 8 million pages per month by paid users alone.
So that said, let me move into the demo segment. I’ll quickly take you through Unstr, how it looks and uh, how you can really
[00:13:30]
get started.
So what you see over here is the Unstract interface, and if I were logging in for the first time, I would have to set up certain prerequisite connectors to get started. So as I mentioned earlier, this platform is LLM driven. So you can choose from a range of popular element elements for you to set up, um, within the platform that you can, uh, get started and start using.
So you can see I have four elements, uh, set up over here.
[00:14:00]
And you’ll have to, again, similarly set up vector dbs, embedding models, and also text extractors. So this is where you’ll also find LLMWhisperer, or you also have access to other text extractors that you can set up. So once this is done, you are good to go and you can get started with, um, you know, uploading your documents and extracting your data.
Now, as I mentioned earlier, the first step in the process is to first extract the text from your documents. So you can either do it. Within Unstract
[00:14:30]
itself, or using LLMWhisperer as a standalone application. We will explore both these, uh, tools in this webinar and we’ll see how this can be done. So firstly, let me get into an existing pro studio project that I have over here, which is a finance report, uh, for Unilever.
So, all right, uh, just a.
So we have the document over here, which has the financial report, and we also have a bunch of prompts
[00:15:00]
that are extracting some relevant data that I want from out of this report. So you can go through the prompts over here. And the first step, as I mentioned, is. To extract the raw text and convert it into a format that is LLM-ready.
So you can see how the text extractor was deployed in this particular project under Raw View. And um, you can see that in this particular document, it’s, uh, around two 40 pages long. So for across all these pages. LLMWhisperer in this case was
[00:15:30]
deployed and we’ve been able to extract the raw text while preserving the layout of the original document.
And this layout preservation is what, uh, again, it, it brings attention to LLMWhisperer because lms. Basically process information or data and documents very much similar to how humans would, so the best way in which you can, uh, preserve the complete context of the original document while sending it to the LLMs for data extraction is to preserve the original layout, and that is what
[00:16:00]
LLMWhisperer has been able to do over here.
With the, uh, original document, and this is really why it has been a popular choice among, um, customers because it is again, leading to accurate results as it preserves the complete context. So this is how you could, um, I mean within the prompt studio, check out the output of. LLMWhisperer. However, I, as I mentioned earlier, finance teams deal with a range of documents, and these documents are not always digitally
[00:16:30]
clean.
So you might be dealing with scans, you might be dealing with documents, with handwritten texts, you might have. Skewed scans, scans with bad lighting, disoriented documents. There’s just, there’s just so much variety that comes into play. So let me move into, um, the LLMWhisperer playground where you can basically upload documents of your own as well once you sign up.
And, uh, we also have a free playground that you can access where you can upload around a hundred, uh, pages on a daily basis and get the
[00:17:00]
text extraction done for free. So I’ll, I’ll just show you a couple of documents in this demo and you can see how, uh, the text has been, uh, I mean, it’s being extracted from those documents.
So in this, uh, case, uh, I’m gonna upload a document of my own.
So firstly, we have a balance sheet that I’m going to upload. So if I have to show you how this, uh, balance sheet looks.
[00:17:30]
So this is the document that we are dealing with over here and going back to the platform, I’ve uploaded it, so let’s just see how the results look.
So as you can see, uh, LLM Visser has been able to retrieve the raw text from this particular document while preserving the original layout. And this is what then moves on into downstream
[00:18:00]
operations when you deploy an LLM over this, uh, context or this extracted, uh, text that you have over here for, uh, data extraction, uh, using prompts.
So this is a balance sheet. Let me explore other types of documents as well. For instance, I, uh, have a document over here which has some handwritten text. It is a tax form that is filled in by hand. So this is the document that we are dealing with over here. So it is a scanned document. It also has some text fields entered by, um, where we have values entered by
[00:18:30]
hand.
And you can also see a checkbox over here, uh, which has, uh, I mean, we’ve checked a particular, um. Box over here in this form. So if this were an electronical form, then this would be a radio button. So let’s just see how, um, LLMWhisperer is able to work on this, because again, when it comes to scan documents and dealing with handwritten texts, this is where, uh, where many systems, uh, can tend to, uh, fumble or, uh, it could be difficult, uh, for many systems to extract this
[00:19:00]
accurately.
So we have over here again, the layout preserved and the document has been extracted, and you can see how the check boxes are extracted. So over here we have, again, four, uh, check boxes and you can, you again have the one that has been checked over here with an x uh, symbol in it. So this is the context that is ultimately sent to the LLM for it to extract data from.
And, uh, just to bring your attention to the various. Uh, values that were entered by hand. So we have the
[00:19:30]
handwritten text extracted over here as well. So we have the name of the plan, the employer’s name, and all of this extracted perfectly well. So this basically preserves the complete context of the document, and it wouldn’t be lost when you later on deploy an LLM for extraction.
So just to go over a few other documents, we also have a bunch of, uh, pre uploaded documents that you can access. For instance, in this case, we have a receipt with some bad lighting and we also have an oil
[00:20:00]
stain over here. So let’s just see how the, um, platform works. This is just to give you an idea of how it works with different document types.
And, uh, also the platform can, um, process documents with different, um, uh, formats. So you could be, uh, you know, uploading a document, which is an Excel sheet. You could be uploading images or PDFs or Word documents and still be able to get the output so you can see the output over here. And just a final document that I’d like to go before continuing.
We have a
[00:20:30]
disoriented ID card scan. So, um, in this case, uh, it is disoriented and I mean, the text is also not really legible, so let’s just wait to see how the text is being extracted in this case.
So this is the output that I have over here. So this is how LLMWhisperer works, uh, on the cloud deployment. And as I mentioned, this can also be deployed as an API. So let me just quickly show you how this works as an API before we move forward.
[00:21:00]
Uh, let me open Postman and, uh, we’ll be uploading an Excel sheet to show you how this works.
Just to go for a different, uh, document type. So this is basically an expense report that I have over here. Uh, just to show you what this report is, let me open it for you. So we have an expense report over here with some employee details, the expense summary, and we also have a detailed log along with the approval status.
[00:21:30]
So this is a table that we will be uploading onto, uh, postman and. Let’s see, uh, how LLM this printer works. So I’m going to, um, send this document in
so you can see that it has been extracted successfully. And right here we have the output with the layout preserved again. So this is how my
[00:22:00]
tables, uh, are again being extracted. And this is the same case when we are dealing with multi-layer columns and uh, complex tables as well. So once you get into this stage, once you pre-process it into an LLM ready format, then rest of the operations actually become pretty easy.
So with that, we’ve seen how we can handle different document types. Um, I mean, it’s no matter what kind of document you’re dealing with or what is the layout that you’re dealing with, or whether you have unstructured data, you can still extract the relevant text. And, uh, we’ll
[00:22:30]
now move into the Prompt Studio project to see, uh, what is the stage after this.
So how do you define prompts and how do you, uh, you know, extract the relevant data that you’re looking for? So let me move back into the, um, platform and we have the prompt studio project here that we’d, uh, explored earlier. So on the right hand side, I have the report. Uh, it’s a pretty lengthy report that I have over here and we have defined a couple of prompts on the left.
So we are extracting and each of these prompts, if you
[00:23:00]
pay close attention to it, we have defined what is the data that we are looking to extract along with what is the schema that we’re, uh, that we want the extraction to follow. So in this case, I’ve, uh, I’m extracting some general information in this prompt, like the company name, the year that this report belongs to, the top three performing industries or categories, the overall turnover as well as the growth percentage.
And we have some financial highlights extracted over here, the performance by sector. So what was the performance across each of the sectors?
[00:23:30]
The growth rate comparison? Um. And here’s some, uh, data from the, uh, balance sheet as well as the cashflow statement that is present in this financial report. So once I have extracted this, uh, data, I can, uh, and once I’m happy with this, I can then move on to deploying this project as a tool.
However, I just wanted to bring your attention to one capability over here, which I have enabled within this project. So if I were to select, let’s
[00:24:00]
say, the number of suppliers, so you can see that this output that, uh, the system has brought to me over here is selectable. And when I actually click on this.
The system immediately highlights the space where this particular output was fetched from on the original document. So this is again, uh, what, uh, is, uh, I mean, uh, supported by source document highlighting. So how you do this. Basically is that when you, this is an other capability of LLMWhisperer that I missed to cover.
[00:24:30]
So, LLMWhisperer not only extracts the text from your documents, but also, uh, it could, uh, when enabled, it could extract the confidence scoring. So for each of the texts that you see over here on screen, you could get a score from the system on how confident it is that this is in fact the text that was presented in the original document.
And you also get the confidence. Recording for each of the text along with the bounding box coordinates. So, uh, you can see that we have line coordinates over here. So the bounding box coordinate is again, for each of this particular
[00:25:00]
text present on the document. And, uh, using these, uh, coordinates in downstream, uh, operations is how you can perform source document highlighting.
So using those coordinate. When I click on a particular output, the system is able to immediately highlight where that particular data was fetched from. So, we’ll again, uh, we, we do have a demo segment where we’ll dive a little deeper into conference scoring and bounding box, and we’ll see how this works.
So. Just to quickly cover the prompt studio. Otherwise,
[00:25:30]
we have a bunch of settings over here that you can explore. So, uh, we have, uh, summarized extraction, which is a cost saving, uh, feature challenge. So, uh, we have a bunch of settings, and again, highlighting is what you just saw, the ex, uh, when we, uh, clicked on the output and it was able to highlight the original, uh, I mean the, where that particular data was fetched from.
So you can explore these features, uh, going through our extensive documentation, or we’ve even done separate webinars on, uh, each of these features to cover
[00:26:00]
them in more depth. So for the sake of time in this session, uh, I’ll be wrapping up my, uh, time spent on Pro Studio, and, uh, in this case, I’m happy with how my data is being fetched from the, uh, document.
And, um, I’m, I’m okay with deploying this in. Um. The, the business, uh, use cases. So once you’re happy with your prompt studio project, you can export this project as a tool and then use it to deploy it either as an API deployment ETL pipeline, task pipeline,
[00:26:30]
or also define human in the loop, uh, for this particular extraction.
So in this case, I’ve already exported this document as a tool. Let me move. To the workflows where I can show you how, um, once the tool has been exported from prom studio, how it can be used to actually create the deployments that we just spoke about. So in the workflows over here, we have a bunch of workflows that I’ve already created.
Uh, but I’m interested in the finance report, ETL pipeline that I have created, uh, for
[00:27:00]
this particular session. So, ETL pipelines are again, when, uh, your use case demands that you get the document from a file system. So this could be. Uh, Google Drive, Dropbox, or any of the options that we have over here. So in this case, you just have to set the folder where I’m, uh, you know, going to get this particular document from.
And also whether or not I want to process sub folders. What are the maximum number of files that I want to process. So once I set, set this up and also authenticate the connection,
[00:27:30]
I have set up the input configuration to the file system and. Later, I’ll have to again, select the tool that I want to deploy on that particular, um, on those particular files.
So whatever documents I’m getting from that particular folder, which was configured, I can choose from various tools. So in this case, we have enabled the finan, uh, finance Reports tool, which was the Prompt Studio project that you just saw. And, uh, you can also have further tool settings over here, whether or not you want to enable advanced features like
[00:28:00]
LLM Challenge, highlighting and so on.
So once you do this. Um, you would once. And so once the data comes in, the tool works on, I mean, once the document comes in, the tool works on the document, the data is extracted, and then you could configure it and send it to an. Destination db. So over here again, I have configured the destination db the table that I want this particular data to be stored in after the data is extracted.
And, uh, we can also define a human in the loop layer within
[00:28:30]
the ETL pipeline, which I’ll be coming to in a little while. So this is basically how, uh, the entire, uh, chain or the entire, uh, document pipeline, uh, works. So you can either go for ETL pipelines, which is what we saw over here, or, uh, you again have API deployments as well.
So in this case, we have a credit card statement parser, which I have used as an e, uh, API deploy in, in an API deployment. So in this case, the document influx is coming in from an
[00:29:00]
application. So we have, uh, configured the in, uh, input configuration as API. It would go through this, uh, data extraction, powered by the credit card parser tool, and then the output data would be sent to another application.
So depending on your use case, you can choose from any of the deployment options that we have. So this again, sums up how you can handle different document variants and, uh, be able to specify, um. And also test this across multiple documents. So
[00:29:30]
in the studio project that we saw, we had just seen one document.
But for instance, if I were to explore the credit card parcel, uh, project over here. You can see that we have multiple documents that we have uploaded into this particular do, uh, document. And these are credit card statements from two, three different banks. So they are bound to follow different layouts and formatting.
However, I have the same, uh, prompts that I have mentioned over here on the left hand side. So let’s just check out how this works. So over here we are, uh, looking at the
[00:30:00]
statement from Bank of America. So if I were to, uh, look at the statement from American Express. You can see that the layout has completely changed.
However, the prompts remain the same and they are, uh, successfully able to actually retrieve the output from the, uh, document. So this is basically how you can again, uh, upload multiple documents, even if they have varying layouts. And, uh, your prompts still remain generic. So that I think, sums up how you can handle, uh, the document chaos in finance teams,
[00:30:30]
uh, with varying layouts, unstructured data, and also different document varieties.
So going back to the presentation, let me move on to covering the next, uh, challenge that we have, which is accuracy and compliance requirements.
So when we talk about finance document processing, accuracy isn’t just about getting numbers right on a page, it’s about what actually happens when, uh, these numbers or data extracted could
[00:31:00]
go wrong. So let’s say there is a small, just a minute. Yeah. So let’s say there’s a small data extraction error in an invoice or a misread figure in a financial report.
So this can again, cause. Uh, downstream consequences. It can lead to payment delays or reporting inaccuracies. And ultimately the financial and re uh, the financial reputation is on the risk. So why? That’s why accuracy is always the first battle that we need to win when it comes to finance document, ETL, and on top of that.
[00:31:30]
Finance teams, again, work under strict compliance lens and regulatory frameworks like IFR is or industry specific audit standards leave almost zero margin for error. So how do we really close these accuracy and compliance gaps that we deal with? So at Unstract, we think of this as layering safeguards across the pipeline.
So we have multiple safeguards or capabilities that you can lay out across your document extraction pipeline, which are present over there to
[00:32:00]
ensure better accuracy and compliance. So, uh, a couple of popular, I mean, the couple of, uh, uh, uh, I mean good capabilities that we have is right from layout aware pre-processing.
So why we consider this as an accuracy enabling capability is that when LLMWhisperer, uh, preserve the original layout of the document, you are sending the complete context of the document. To the, uh, LLM for data extraction. So this is how you ensure that the LLM is not missing out on any
[00:32:30]
details that it requires for the extraction to be done.
And again, there are, uh, certain popular, uh, capabilities in the market, like human in the loop or HITL that you see over here, as well as confidence scoring. So we already touched upon these, uh, capabilities. We spoke about it earlier. So we’ll now, uh, in the demo segment, we’ll see how this can be deployed in Unstract.
Finally in this session, we’ll also be exploring LLMChallenge, which is, uh, one of Unstract’s native capabilities. Uh, so
[00:33:00]
this. Capability basically implements LLM-as-a-judge. So, uh, you would have two elements working on the same, uh, document and on the same prompts. And only if they arrive at a consensus on the output is the data, uh, given to the user for further processing.
So, uh, I’m, I’ve just basically given you an idea of what these different capabilities are and what they do. So let’s actually move into the demo and we’ll see how this works in action.
[00:33:30]
So let me move back to the platform.
So again, I mean, these capabilities can be spread across both LLMWhisperer as well as Unstract. So we have the layout preservation mode, which is one of the powerful capabilities of LLMWhisperer that we already went over. So again, we spoke about how LLMWhisperer can, uh, also extract confidence scoring as well as bounding box coordinates.
So both of these, uh, uh, the data that you get from,
[00:34:00]
uh, the tool can be. Put into use when you’re looking to, uh, facilitate human review. So we’ll see how this is being done. And in Unstract we’ll be exploring human in the loop deployment as well as LLMChallenge in this demo. So up, uh, first up, let me first go and, uh, I’ll show you how you can extract the confidence scoring and bounding box coordinates from LLMWhisperer.
So for this, let me open up Postman again. And we’ll be again, uploading a document over here for us to, uh,
[00:34:30]
do this, perform this on. So I’m again, uploading the tax form that you had seen earlier with the handwritten text and the checkbox. So let me send this document in for processing.
We just wait a couple of seconds until this document is processed.
[00:35:00]
So what you can basically do with the confidence score that you extract from the document is based on the value of the confidence score. You can set up thresholds on whether or not you want that particular data to go under a human review. So, uh, yeah, we’ll just see how that, uh, works with
[00:35:30]
an example. So you can see that the page has been successfully extracted.
So let’s now deploy. Um, I mean, let’s now extract the confidence scoring and, uh, the bounding box coordinates. So I’ve disabled the text from being extracted. Instead, we will be receiving the confidence score as well as the bounding box coordinates. So you can see that we have the confidence, uh, metadata over here.
So for each of the text present on the original document, you have a confidence score. So this is a score between zero to one and
[00:36:00]
it, uh, gives you an idea of how confident the system is on this particular extraction. So in this case, we have 0.6 as the, uh, confidence score for this particular text. So in the downstream operations, what I can do with this data is I can set a threshold of, let’s say, 0.7, and I can tell the system that if there is.
Uh, data that has been extracted, which has a confidence code that is lesser than 0.7, then I want that particular data to be routed for human review. So that is again, where this, um,
[00:36:30]
data comes into the picture in ensuring better accuracy, uh, for the data extracted from your documents. And again, these bounding box coordinates that you have over here will, uh, again, support source document highlighting that you’d seen earlier.
So you can see how, uh, this. Has been extracted for all the text and you have, uh, line metadata as well. So these are the coordinates for the different lines, um, present in this document. So with this you can see how LLMWhisperer is able to provide
[00:37:00]
you with the raw data, and now we’ll move on into the Unstract platform.
To see how this data can be, um, put to use in a human in the loop review. So for that, what you, uh, what I should do is go back in, I’m gonna go back into my workflows. So basically, human-in-the-loop can be added as a layer either, uh, in between your API deployment or your ETL pipeline. So in this case, let me open the finance report, ETL pipeline that we had just
[00:37:30]
seen.
So earlier I explained to you that you get the document coming in from a file system. You process it using the finance reports tool, and then you send the output data to a relevant uh, uh, destination db. But when we are looking to include human in the loop, what happens is before it goes to the destination db, it passes through a layer of human review where you can, uh, verify the extraction, make changes wherever required before you send this document to the db.
So how do you configure
[00:38:00]
this? I am going to click on the output configuration, and you can see that I have the Human in the Loop feature over here. So in this case, you can see I’ve already set certain rules. So what your rules are are basically, uh, how you want to route your documents for human review.
So if we are looking at, let’s say in an enterprise setting, we are, uh, processing hundreds of documents, it becomes practically impossible to, uh, perform human review on all those documents. So how do you decide what. Percentage of your documents
[00:38:30]
are actually routed for human review. So for that, you can define the percentage.
Over here, it can be 30% or 40%, whatever you’d like, and whatever suits best for your business use case. And once you define this percentage, you can also define the logic. So what is the logic that the document has to satisfy for it to fall under the 40% bucket for human review? So. Over here we have a couple of fields that you can define.
So you can filter either by confidence scoring, so in this case, if
[00:39:00]
my general information, which I’ve extracted from the document has a confidence scoring less than 0.7, I want this document to be routed for human review, or I can also filter it by value. So in this case, you can see that if my general information contains a specific value, then I want to process this particular document for human review.
And when I have multiple conditions, uh, that I want to define under rules, I, I can also define it using not and or, or logic. So in this case, since we’ve selected, or if any, either of these conditions
[00:39:30]
prove to be true, then this particular document will be sent for human review. So in this case, I’m going to set the percentage to a hundred because, um, for this demo, I have only configured one.
I mean, I’ve only uploaded one, uh, document in the finance report file where this, um, I mean where, uh, this eTail pipeline will basically fetch the document from. So this is just for the sake of the demo. So once you, uh, you know, set the rules and, uh, you, you know how you’re going to be routing the documents for human review, you can then move on into the
[00:40:00]
settings.
So over here you’re defining where the doc, uh, the data should go after the human review. So once your documents go in for a review, you uh, make changes if necessary. So where do you push the data afterwards? So you can either push it to the destination DB that you have connected with. Or you can push it back to the queue, which means that, uh, the extracted data, or the altered data after the human review would be present in a queue where you can download it as a CSV or a JSON file.
So we have, I
[00:40:30]
mean, we again, have another webinar that we have done on Human in the Loop specifically, and you can, uh, go over that webinar to see how each of this can be done, uh, one by one. So in this case, I’m going to leave it at the Destination DB and I’m gonna save my settings. So once this is done.
You, uh, can, uh, what you’ll have to do is basically run this workflow again and, um, you can then move into the review interface and, uh, perform the review because that is the added layer that we have now configured. So. Again,
[00:41:00]
to perform review, you have, uh, access permissions that you can set. So depending on whom you want in your organization to have the ability to perform the review, you can define access permissions.
So the details of that is, again, available in our documentation as well as the webinar that we’ve covered. So in this case, let me just jump to the review. I. And I’ll show you how this works. So, uh, I have access to the review interface and I can choose from a bunch of document classes that are available for review over here.
So in this case,
[00:41:30]
I’m going to, uh, look at the finance report, ETL, and let me just fetch the document.
All right, so you can see that I have the document over here. I have the complete document on the left hand side, and I have all the extracted data on the right. And um, I’d like to pull your attention to specific data, which is highlighted differently. So as you can see, whenever the conference score falls below a certain threshold, the system automatically highlights that portion for me.
So this
[00:42:00]
makes it easier for the human reviewer to, uh, look at the data, which. Could, you know, uh, be wrong or incorrect? So you can first drive your attention to these areas before you look and verify the other data. So in this case, if I just select this particular, uh, data output, once I select it, you can see that the system automatically highlights the portion from which this particular data was fetched.
So that is the same, that is true with any other data. For instance, we have. The number of logistics, uh, warehouses occupied by, uh,
[00:42:30]
Unilever over here. So once I select this, you can, uh, the system automatically highlights that specific portion from the original document. And this is not, uh, just, uh, it, so like, uh, beyond verifying where the document was fetched from, the reviewer can again edit the data, uh, before they push it into the DB if necessary.
So in this case, we have the number of. Uh, warehouses is 500. So let’s say that the, um, number given in the report is incorrect, and in fact the number is 523. So
[00:43:00]
I can change the value right here and click on save, and you can see that it has been changed over here. So finally, when this data goes and sits in the db, this is the value 523 that will go into the DB.
So. Once you set up, uh, once you update the data, I will have to finish the review over here so you can check the queue details. So currently we have one review in progress, which is the document that we are looking at right now. And once I finish this review, you will see that it moves along the queue. So right now we
[00:43:30]
have one document which has completed the review.
So in Unstract. We, uh, follow a two layer review, um, flow. So you, uh, you would have two people working in human, uh, review. So one would review the document, make changes, and then just to double check the data before it goes into the db. You can also, uh, enable an approval workflow where an approval would again, oversee the data extracted, and, uh, they again have the same controls over the document.
Uh. Extraction so they
[00:44:00]
can, uh, again, alter data if necessary. And, uh, once they approve it, that is when it finally goes into the destination db. But if your use case does not demand a second layer of checking, then you can choose to disable the approval workflow as well. So in this case, I am, um. I mean, I’ll show you how the approval workflow, uh, uh, works as well.
So once my review is done, uh, if you have access to approve documents, you can click on approve and again, you would, uh, have to choose the document class and click
[00:44:30]
on fetch Next so that you would receive the document.
Alright, so you can see in the approval, uh, interface that I have, uh, I mean the value has been edited. So instead of 500, we have 523 warehouses, which is reflected over here. So again, I have the same controls. I can perform source, document, highlighting, or even change the values as I want before I send this into the DB.
[00:45:00]
So in this case, I am just going to approve this particular, um, document. And once I approve this, this will be sent into the db. So let me just show you how the DB looks. Right now. We have a table over here and we have three rows which are, uh, already added. So when I approve this workflow.
You can see that it has successfully synced into the destination database. So let’s go back to the database and see if I’ll just refresh the
[00:45:30]
page for you to see. Um, you would see that a new row has been appended, so you can see that a fourth row has been appended in the table. And if I, uh, just inspect the data that I have over here.
You can see that, uh, the change that we made, we had, uh, changed, uh, the warehouse count from 500 to 523, and you can see the change being reflected over here. So this is how human review works, and this is how you can again, um, add a layer of accuracy for
[00:46:00]
finance document extraction, because even small, uh, errors in, uh, the extraction could lead to significant consequences, wrong decision making delays, and, uh, just a lot of confusion.
So you can factor this in either in API deployment or in um, ETL pipelines currently. So that said, let me move back to the platform and I will go over the final accuracy enabling capability that we had covered, which was LLM challenge.
[00:46:30]
So, uh, let me just show you how LLM Challenge works using the same prompt studio.
This is where you test the capability and how it works before you deploy it. So, um, one second. So LLMChallenge again, as we had seen earlier, is available under prom studio settings. So what you’ll have to do is basically set the challenger LLM, so again, under LLM profile, you, we, we can see that we have set up a GPT model, which is the extractor, LLM, which is, which
[00:47:00]
will primarily be running the pros and extracting the data from your document and what you set.
In the challenger, LLM is basically another model which will run on the same document on, I mean, run on the same prompts for this particular document. And only if these two models arrive at a consensus is the output given to the user. So once I enable LLMChallenge over here and uh, run the prompts, you will see that.
When I inspect the L, we have an LLM challenge log that is available for each of the prompts. So when you
[00:47:30]
inspect this particular LLM challenge log, you can see the extract, you can inspect the conversation between the extractor run and the challenger run. So the extractor, uh, LLM has basically retrieved this, uh, data for me from the document.
So what the Challenger run does is it basically rates the extraction. So it, uh, it basically gives you a number for you to see how far it agrees with the particular extraction being done. So we have five given for a couple of extractions. We also have three and four.
[00:48:00]
So depending on how far it agrees with.
The extraction outcome, it, you can then use it, uh, in downstream operations. So then the only if it agrees is the u uh, user actually, uh, presented with the data that was retrieved by the extract re And this is one of, uh, the major ways in which you can actually overcome LLM hallucinations and misinterpret misinterpretation of data from documents.
So with that, folks, we’ve covered. All the
[00:48:30]
capabilities that we wanted to for accuracy in, uh, document extraction workflows. Let me move back to the slide.
So we’ve arrived at the third, um, challenge that is integrating with core systems. So. That one second. So one of the biggest challenges that we see in finance operations today is that, uh, you could run into siloed processes. So data often
[00:49:00]
lives in disconnected systems. You would have various, uh, tools that are, uh, in play when you’re looking to process.
Uh, finance documents or when you’re using, uh, I mean, you could have tools for other purposes, uh, within finance operations as well. So this entire, all these tools, whatever need to be connected in the document processing pipeline. If they’re not, uh, seamlessly connected, and if the flow is not, uh, smooth, you could have data stuck in different sources.
You could have duplication of information and
[00:49:30]
also. Could be chasing information that should already be available at your fingertips. So these silos, again, become bottlenecks that drag down efficiency. So that is why core systems like ERP or accounting platforms are important. And, uh, they also act as a single source of truth for your organizations financial data.
So if the extracted document doesn’t, uh, if, if the extracted document data doesn’t make its way reliably into these systems, you’re left with partial records or. Inconsistent
[00:50:00]
reporting. So this is where, again, integration becomes a game changer because once you integrate, once, uh, you have the ability to seamlessly integrate with all these different tools that are in play and you can automate the entire workflow.
You do not, uh, you not only speed up the processes, but you also gain tighter control over the processes. And instead of scattered or manual entries, everything flows seamlessly into the right place. So this again. Reduces the risk of incorrect data and frees up finance teams to
[00:50:30]
process, um, I mean to focus on higher value tasks and data processing alone is, uh, left to the heavy lifting of the system.
Currently, like as we looked at earlier, the platform natively supports, um, auto integrating, I mean integrating with various connectors like, uh, in the ETL pipeline, we saw the various connectors for file systems that you could integrate with. We saw the various databases that you could integrate with.
Again, you can use APIs and um. However, if you need to use tools that are
[00:51:00]
not available currently in the platform and uh, you have more advanced business logic than what is, uh, natively available within Unstrap, we also are stepping into agent workflows where we are integrating with platforms like n8n or we are also supporting MCP servers for and LLMs, which help you connect with.
Wider variety of tools that you might be using in finance operations. So, uh, we’ll again, take a look at what they are. So, um, you saw how, um,
[00:51:30]
we had the deployment for ETL Pipeline and API pipeline, and you can actually automate this. On, uh, the unstuck platform as well. So if I look at the ETL pipelines over here, you can see that each of these pipelines have a frequency, um, at which they are being triggered, um, on a periodical basis.
So you can look, uh, edit this pipeline and you can define the chron schedule where, uh, you can define how frequent you want this particular flow to be triggered. So that is how you automate the native
[00:52:00]
integrations that you have present with, I mean, native deployment options that you have present within Unstract.
But again, if you are dealing with a broader business use case where you have more complex or advanced logic coming into play, that is when you would, uh, you know, probably want to go for a platform like a or MCP. So, uh, we again, have extensive webinars on each of these topics that you can explore, uh, using the webinar.
Uh, I, I, I’m sure they would’ve dropped the link in chat so you can look at the webinar, um, links
[00:52:30]
and, uh, see, uh, this in more detail. But let me just show you how this works for now.
So we have the n8n, uh, deployment over here. So in this, this, basically what n8n does is you can seamlessly connect multiple tools and also set up a visual flow of how you want your operations to, um, flow. So you have, in this case, we have processed an invoice. So what we have set up over here. We have basically,
[00:53:00]
uh, we are basically getting the invoice, uh, via a Gmail trigger, which means that if we get a new invoice, uh, on, in the inbox, the system would automatically pick it up.
And we have an unstacked node over here that I can set up within n8n. So you would find the Unstract node in the community nodes, uh, n8n. And so what if, I mean, if when I open this node after running, you would see the input data And what is the output data that was present after, uh, it passes through this node.
So once you get the output over here, what
[00:53:30]
this particular node does is it evaluates whether the, uh, incoming invoice is in fact an invoice or some other junk document. So if this is an invoice, we are sending it further down the workflow. Otherwise it would be sent to a relevant Slack channel over here.
So this is how you can integrate. Multiple tools like we are integrating Slack and Gmail as well. And, uh, once it passes through the workflow, we are again using an LLM Whisperer node to extract the text from the document. And, uh, once the text is extracted, we are
[00:54:00]
deploying an Unstract tool as an API over here.
To extract relevant invoice data and we are again checking with QuickBooks. If the vendor, uh, details extracted from the invoice is already present within uh, uh, the QuickBooks account, and if it is present, then it would continue down the workflow. Otherwise, an invoice error channel would be. Notified.
And, uh, again, once it goes into the workflow, depending on the extracted values, you would be classifying it as urgent, high
[00:54:30]
value, critical or normal. And the relevant Slack channels are again notified. So this is the overall flow, uh, that we have set up. This is just a particular use case that we have taken up, but this is the kind of advanced.
Workflows, uh, or integrations which might not be possible natively within the platform as of now. So you can go for innate workflows and we do integrate with these platforms for you to, uh, facilitate agent tech workflows. So while this isaid, let me also go into the MCP.
[00:55:00]
Servers that you can connect with.
So again, untraced and as well as LLMWhisperer is available as are available as LLM, uh, I mean as MCP servers. So once you configure it in, uh, your, um, MCP host, um, as you know, cps, again, gaining steam in the market and, uh, you do not really have to open any tool for you to perform this. Um, functionality. So all you have to do is enter prompts and, uh, the system itself would be able to connect with the relevant servers, with the relevant tools and
[00:55:30]
perform the operation for you.
So this can be used for, um, more complex workflows as well. So in this case, we have a similar workflow where we are downloading attachments. So I’ve just given a prompt to download attachments from an email, which, uh, contain invoices. And you can see that we have, since we had integrated with the Gmail server, uh, the system was able to read the emails, download the, um, attachments, which were invoices, and then you can see that we have the Unstract MCP tool deployed over here that
[00:56:00]
I can again, inspect.
So for each of the invoices, this tool was deployed and, uh, using a relevant prompt studio project, you can define the different data that you’re looking to extract. So we have a couple of data that is defined over here, so you can, again, inspect these values over here as well for each of the invoices that were extracted.
So once this goes through this. Uh, stage. It’s, it’s a very similar, uh, workflow to what we’ve seen in a, so we get the data output and we are uploading it to sheets, uh, Google Sheets, using
[00:56:30]
a Google Sheets connector. And once this is done, you can again inspect all of this as well. Once this is done, we are finally notifying relevant teams on Slack.
So we again have, uh, what you’ll have to do is connect with a slack MCT server. So I’m just giving you an overview, a very, uh, brief overview of how this works. But, uh, if you are looking to explore this, uh, capability in more detail, um, the webinar or the documentation is the answer. And we also have a blog on these
[00:57:00]
topics if you’re looking to explore this.
So this is how folks you can connect with, um. The core systems and integrate with the core systems to facilitate seamless workflows, and that actually brings me to the end of the session. So in this, uh, over the last few minutes, I’ve tried to cover as much as I can when it, um, on, uh, document processing when it comes to finance document, ETL, and, uh, it, I mean, it would vary again, specifically when we are looking at, um,
[00:57:30]
each and every use case.
So if you are interested in exploring this. Uh, in more detail for your, uh, particular business. You can set up a call with us so it is the same, uh, link for, uh, you know, uh, requesting a demo with the Unstract team. And we’ll be able to, uh, proactively reach out to you and, uh, set up a time that works for you to see how, uh, we can take the journey forward.
So that said, let’s move on into a q and a and in case we have any questions.
[00:58:00]
So it looks like we’re good to go. So, um, we will be sending you the recording of this session shortly, once it’s available. And, uh, thank you everybody for joining this webinar today. It was really nice to have you over. Hope you had a good session.
[00:58:30]
And, uh, we are really looking forward to seeing you at our upcoming events as well.
Thank you so much.
Privacy policy | Terms of service | DPA | 2025 Zipstack.Inc. All rights reserved.
We use cookies to enhance your browsing experience. By clicking "Accept", you consent to our use of cookies. Read More.