How LLMs Solve the Hardest Document Processing Problems in Insurance

Conducted on 8th October, 2025

[00:00:00]

All right. Hello everybody. Thanks for joining this session today. I’m Mahashree and I will be the speaker for this webinar. Now, before we get started, I thought I’ll quickly run you through certain session, uh, session essentials that we have today or certain ground rules that I’d like to lay out. So this is a, uh, this webinar will be in listen only mode, and all attendees will automatically be on mute.

In case you have any questions, please do drop them in the q and a tab at any time during this webinar, and

[00:00:30]

our team will be able to get back to you with the answers via text. You can also use the chat tab to interact with fellow attendees, and it would be wonderful if you could introduce yourselves and let us know where you’re joining from.

And as a final point, when you exit this session, you’ll be redirected to a feedback form where I request you to leave a review on so that we can continue to improve our webinar experience going forward. So that said. Welcome everybody, and let’s get started. So as the title suggest today.

[00:01:00]

We’ll be exploring how LLM solves some of the hardest document processing problems in insurance.

Now, insurance is a document heavy industry with vast volumes of documents to process on almost an everyday basis. And not only do we deal with a huge volume of documents, but we are also looking at a large variety of documents. Think about multi-page policies or handwritten endorsements. Faxed investigation reports and invoices, and they’re just a lot more.

And each of these

[00:01:30]

documents, again, play a critical role in different insurance document workflows. So on one hand we have the large volume and variety of documents that we are dealing with, and at the same time, customers also expect fast service. Regulators are demanding tighter accuracy, and insurers are constantly under the pressure to keep up.

So handling this sort of scale and complexity requires more than what the legacy. Document extraction tools are offering today. It calls for a system that is easy to

[00:02:00]

implement, guarantees accuracy can easily scale, and also supports compliance requirements. And that is exactly where LLM driven solutions come in.

So in this webinar, we’ll take a look at how untraced can actually support you in various, uh, in overcoming various problems that we face on a day-to-day basis in insurance document processing. So let’s take a look at the agenda for today. So we have a simple three step approach. Firstly, we’ll set the stage by looking at some of the

[00:02:30]

most common insurance document use cases and what are the challenges that they bring along with them.

So these are use cases that most of you might be dealing with, and we’ll particularly, uh, be diving into a specific use case on submission intake, and we’ll see how these challenges are positioned at. Every stage of this process and how they really make an impact on the, uh, workflow as a whole. And, uh, following this, once we set the stage, once we understand what are the challenges that we are dealing with, we’ll be moving on to the

[00:03:00]

interesting segment of this webinar where we’ll get into a live demonstration of untraced.

And we’ll take a look at a range of capabilities that can be deployed to overcome these challenges and how it really helps in insurance document ETL. And finally, once the demo is over, we’ll be heading into a live q and a in case we have any questions remaining. And, um, we, uh, a live expert, I mean an expert will be on air to answer your questions live.

So that said. Here are the common use cases in document, uh, extraction

[00:03:30]

that we see in insurance. So we have submission intake, uh, I guess I haven’t had added it over here, but yeah, submission, intake quote and buying process claims, intake processing, billing and payments. Renewals agent or broker onboarding, and just a lot more.

And for each of these document processing workflows, you would have different types of documents coming in. You could have, um. An ID card scan, which is, uh, not property, not formatted legally. And it could be, uh, there when you’re, uh,

[00:04:00]

dealing with broker onboarding or licensing, or you could be dealing with loss runs and court forms, application forms.

When we take a look at submission intake, and even then when we take a look at maybe application forms or court forms, even if it’s coming into the same document processing workflow, these documents might be different in their nature. For instance, one customer could send you a document, which is, um.

Which has handwritten text, it could be filled in by hand and it is a scan document, whereas for the same document, another customer could be sending you in an

[00:04:30]

electronic form. So there are different varieties of documents, again, when we are looking at the same workflow as well. And this is the kind of complexity that we are dealing with, especially when it comes to, uh, document heavy industry like insurance.

So let’s move on to understanding. We’ll take one use case and we’ll dive deeper into it. And we’ll see how these, what are the different challenges that we see in this specific use case? And um, I’m just taking one use case over here for your understanding, but these are challenges that we can see repeatedly in various other

[00:05:00]

workflows as well when it comes to insurance document processing.

So over here we have submission intake. And now this is the starting point of insurance. This is where either brokers or customers would submit their applications along with a few documents like financials and other, um, risk details to the, uh, insurance insurers and underwriters would probably review the risk and price it and decide whether or not to offer coverage.

So here’s how the workflow

[00:05:30]

looks overall. So you receive the documents and then you would, um, basically receive a bunch of documents. It could either be in a single PDF file or you could receive it separately via email. So you take these documents and you would want to route it to different document workflows, to extract specific data.

And again, when it comes to an industry like insurance, there is a compliance requirement sometimes to have a human in the loop. So. Post the data extraction from documents, you might want to, uh, deploy a human screening

[00:06:00]

stage where a person would be reviewing these, uh, this data that is extracted, making any changes if necessary.

And once these changes are completed, you would then be sending this data for downstream operations. So this could be sending it to another application or to a database or a data warehouse. So. Overall, we’ll take a look at each of these stages individually and we’ll also, uh, take a look at what are the challenges that we face in each of these stages, along with how legacy systems or traditional

[00:06:30]

systems are, uh, dealing with these challenges versus what can be, what is the difference when we, um, take a look at an LLM based solution.

So firstly, we have receiving documents and one of the common. Challenges that we see over here is that there is no industry standard for this process. So documents could come in messy packages. So what the broker would do is, uh, collect all the documents and usually they are put under into a single PDF and sent via email or an

[00:07:00]

application form.

So what you’ll have to do later on is split this document in and, uh, identify the different doc, uh, documents within this single PDF file and then route it for extraction. But anyways, how do you split this single PDF file? So currently we are seeing that many businesses are doing this manually, or you could, they could also be going for rule based systems.

And this is, again, time consuming and error prone, and especially when we are looking at. An, an industry like insurance, we are running, uh,

[00:07:30]

on time. So it is as, it’s very important to cut short the process wherever you can and, uh, make sure that it’s sent, it’s done as quickly as possible. So, to overcome these limitations, what an LLM enabled solution would do is that one.

Powerful advantage we have with large language models is that they understand the context of the documents. So just like how you or me would understand what the document is conveying, the LLM is equally capable of it. So it’s able to easily split the

[00:08:00]

single PDF file into different documents that are attached within it.

And, uh, all you have to do from your end, ideally would be to give a prompt for this, uh, for the LLM to, uh, perform this task efficiently and going further, uh, into this. Challenge what you, you can actually be, how you can be better enabled with a platform like Unstract is that we offer a ready to use API for this very purpose.

So it’s called the PDF Splitter. API. All you have to do is send your document

[00:08:30]

into this API and get the single file split, uh, and you would get all the documents inside the single file neatly split and provided back to you. So this is how you can also overcome the development stage that is involved, where you don’t even have to enter a prompt for.

Task to be done rather. We do support you with a range of ready to use APIs and PDF Splitter. API is one of the APIs that the platform supports. So we will be getting into, um, you know, seeing how all of these capabilities work

[00:09:00]

when we move on into the demo. So once you split the documents that you receive.

The next stage would be to route these documents into different document processing workflows and also perform data extraction. So again, what are the challenges that we see over here? So one of the key challenges or major challenges would be the, uh, challenges that come with the documents themselves, because these documents would be mixed.

It could have inconsistent formats, bad quality scans, for instance, when, especially when you’re looking at. Uh, probably the scans of

[00:09:30]

historical policy claims, or again, when it comes to insurance, you could have table heavy documents like policy declaration or coverage summaries. So these are all different document challenges itself that we’ll have to overcome.

And there are again, other challenges like industry specific keywords. Or terminologies. So these are terms which would have a specific meaning for the indu depending on the industry that you’re working with. And the document extraction system should be able to know these terms and what they mean for it to

[00:10:00]

extract the data efficiently and also interpret it in the right way.

And again, we are, we would be dealing with unclear or missing fields which these systems should be able to identify. And finally. One even af after overcoming all these challenges, you would have to format this extracted data into a format that is usable for your particular downstream operations. So these are all the challenges that we are dealing with when it comes to document extraction itself.

And if we are looking at legacy systems or traditional systems, how this has

[00:10:30]

been working. So we usually, uh, what we see with businesses is that they deploy rule-based OCR or ML based systems to handle these documents. So again, these systems come with their own set of limitations. For instance, when you, uh, deploy a rule-based OCR for document extraction, it comes with rules that are very specific to the document layout and format.

And if your document varies even slightly, the system wouldn’t be able to extract data efficient. So moving on to ML based system,

[00:11:00]

this is fairly better because it can, uh, accommodate a larger pool of documents within its, um, extraction. However, what you’ll have to do is come back and keep retraining the model, as in when you get new documents.

So again, this does involve extra effort. You will have to keep coming back and maintaining these systems. So, to cut short on all of this, what we are seeing with LLMs today and why we are seeing a major, uh, shift. From these systems, what businesses are making from these systems to LLM

[00:11:30]

enabled solutions is because you do not have to perform any training whatsoever on your documents because the model itself understands the documents context, and because of this, it can see a document for the first time and still be able to extract the data efficiently from it as long as the prompt provided is valid.

Good enough. So that is why it is actually easier to scale, and that’s exactly why we are seeing this shift happening today. So we’ll again, see how prompt based extraction works. In our demo, we will be exploring a prompt

[00:12:00]

studio or a prompt engineering environment. So moving on, once you extract the uh, data from the document, as I mentioned earlier, you could have a human in the loop screening.

So what the human would do is validate this extraction using the original document just to ensure that it is accurate before it goes for downstream operations. So the challenge is, again, what we are seeing today is that since this has been. Mostly done manually. So they’d usually have a checklist and they could go through the checklist and perform this validation

[00:12:30]

manually, which is, again, error prone as well as time consuming.

So with LLM enabled solutions, again, we have automated highlighting capabilities, which would immediately highlight the original, uh, space from a document where a particular data was fetched from, and it again. Comes with, uh, metadata like confidence scoring and bounding box coordinates. So we’ll again, take a look at this in the demo.

And finally, once the human reviews the extracted data, it moves for downstream operations

[00:13:00]

and for that you could, uh, want to connect with various. Um, destinations. It could be an application, it could be a database, it could be a file system. So we do support a couple of native deployments within the platform itself that you can make use of.

So once you, uh, create your, uh, project for document extraction, you can then deploy this project as an API. As an a ETL pipeline, if you’re looking to push your data into a database or a data warehouse as a task pipeline, if you’re looking to push your data into

[00:13:30]

a file system, and we’ll again, uh, support human in the loop deployment as well.

And if your needs exceed whatever is given, uh, in the native deployment. Untraced is also available as a node on n8n uh, so you can deploy it in agent pick workflow platforms as well, and as well as. In the MCP environment, since we do support an NCP server as well. So we will be looking into this, um, once we get into the demo.

And, uh, before that. Here are a couple of file

[00:14:00]

systems and databases that we support natively within the platform. And, uh, with that lemme move on into the platform and actually show you how things work around. And, uh, before that, uh, just for the sake of those of you who are completely new to Unstract, I just wanted to give you a quick preview of what the platform is all about and, uh, what it can do.

So again, over here, just to wrap up all the challenges that we just seen, uh, we saw the various challenges pertaining to the documents, like the volume, the

[00:14:30]

formatting, the variety that you get in insurance, and the messy packages that you might be dealing with. So I’ve just condensed all the challenges that you just saw into three main, uh, main challenges over here.

And these could be challenges, not just in submission intake, but in the various other document processing workflows as well. So you have the variety and the volume, which is challenge number one. And then we spoke about human review. So this is a largely manually done process. So how can you automate this?

How can you make this better and more seamless? And

[00:15:00]

again, fragmented workflows. So it’s not enough if your document extraction system is just able to accurately extract the data from documents, it should be able to connect with all the other applications within your ecosystem so as to support an end-to-end seamless document processing workflow.

So that’s exactly what we are trying to achieve. And um. Let’s now introduce, I’ll quickly introduce, uh, Unstract to you and, um, we’ll take a look at what are the key capabilities over here before we move on, uh, into the demo

[00:15:30]

segment. So Unstrapped is an LLM, uh, powered unstructured data, ETL platform. If I had to, uh, briefly put out all the capabilities that the platform supports, I could, I have three main categories.

That is the text extraction phase, then the development phase, and finally the deployment phase. So the text extraction phase is probably what the, uh, the first set of processes that are done within the platform. So once you upload your document into the platform. We deploy a text extraction

[00:16:00]

tool like LLMWhisperer, which is Unstract’s in-house text extractor to extract the raw text from your document and prepare it in a format that is best consumable by LLMs.

And this is, uh, one of the key steps that we have, uh, which, uh, LLMWhisperer is known to do. Pretty well because it preserves the original layout of the document. Now, Ellens understand the context of the document, just like how humans would, so the best way in which you can feed the, uh, text or the raw data would

[00:16:30]

be to preserve the layout that was present in the original document.

And that is exactly what LLMWhisperer is known to do. And, uh, LLMWhisper is again available as a standalone application as well, depending on our user’s needs. So once you extract the raw text. What happens is you can then move into the development phase where you can, uh, create prompts within in, in a prompt engineering environment called Prompt Studio.

So over here, your prompts would contain mainly two details. So what is the data you’re looking to

[00:17:00]

extract from your document, as well as what is the sche of extraction that you’re looking for? So all you have to do. Specify this in simple natural language and you’re good to go. So this again, brings down the IT dependency that we used to have earlier, and you can test multiple document variants.

So if you are testing maybe a project for let’s say, um, invoices, then you could have multiple invoices. Maybe some in scanned, uh, which are scanned. Some could have handwritten text, some could be digitally native. So you can test multiple documents and see

[00:17:30]

how these prompts are performing across these documents.

And within the prompt studio, again, you have access to various accuracy, enabling capabilities like LLMChallenge and grammar. Which we’ll be covering in the demo segment. And finally, once you are happy with the extraction project that you’ve created, you’re happy with the prompts, the way they’re running and the output that you’re receiving, you can now deploy this project in any of the deployment options that we support.

So we had just looked at these options earlier, and that is basically how the platform functions end to

[00:18:00]

end. And um, that would give you, I think this sums up, uh, the. Of the platform and what it does. So to maybe throw out some numbers out there on where Unstract stands today, we have a five 5K plus uh star GitHub stars on GitHub.

And, uh, nine 50 plus members lack community. And we’re also currently processing 8 million plus pages per month by paid users alone. And, uh, if I have to get into how you can start using the platform, we have three major editions, uh,

[00:18:30]

available. So you could go for an open source. Offering, uh, with certain limited features.

And you can explore this platform on your own or you can sign up for, um, the cloud offering that we have. And you also have a free trial period for this, and Ract is available as an on-prem version as well. And coming to LLMWhisperer, as I mentioned earlier, you, this is also available as a standalone application.

So, uh, you can either use it in the cloud where you would get a. You can upload a hundred pages for free on a daily basis and you would get access to

[00:19:00]

the end-to-end capabilities of the platform. Or it can be deployed as a Python client or a JavaScript client, and both the platforms, again, ISO, GDPR, SOC 2, and HIPAA compliant as well.

So that said, uh, here are a few select customers from insurance that we, uh, currently have who are using the platform. So we have Gallagher Cover Force, and um, we have the other, um, customers over here as well. So let me move into the platform folks, and I’ll take

[00:19:30]

you around, uh, show you how it looks and how you can get started.

So what you see over here is the Unstract interface and, uh, if you were signing up for the first time, what you would probably have to do is set up certain prerequisite connectors over here, like the LLM model that you want to work with. So these are connectors that are essential for you to get started with the work that you want to do in the platform.

So we do support integrating with multiple ma uh, I mean all the major LLMs over here that you can choose from. So you, we, I, you can see

[00:20:00]

that I’ve connected with four models over here. And, uh, similarly you’ll have to choose from a range of vector dds, embedding models, as well as text extractors. So over here is where you would find l desper as well, along with the other text extractors in the market.

So once you have set up your connectors, you’re happy with, uh, the connectors that you’ve set up, you can move on into the prompt studio where you can get started with, um, you know, creating your projects and, uh, specifying the prompts

[00:20:30]

for document data extraction. Now, coming back to the first step that we had covered, when we looked at the submission intake workflow, we were talking about how brokers would send.

A single file that would contain multiple documents. So this is again, a common scenario that we see in various other use cases as well. So I was talking to you about the ready to use APIs that we support, which, uh, would be able to extract the documents for you without, you know, you having to put in any

[00:21:00]

backend work or any development into it.

So to show you how this works, let me start from there. Uh, so what you see over here is the untrapped interface. However, I do have access to the API hub as well. So in the API hub, this is again, we’re, uh, constantly looking on populating more and more APIs over here. So all the APIs that you see over here are ready to use APIs.

So the backend work, the development is already done by our team and we have. Tested it across multiple document variants as well. So this is,

[00:21:30]

uh, all you have to do is pick up the API that you want and then you can download it as a postman collection as well, or you have, you can use it over here in the playground.

So what I’ll, uh, have to do is you, you have some overview on what this particular API does. So this is the PDF splitter, API that I was talking about, which basically gets the input as a single file. And it would be able to neatly split the various documents that are present within that single file. So lemme try that, uh, by, you know, uploading a sample file to

[00:22:00]

the playground.

So I am uploading a submissions, uh, a file which has, uh, multiple documents that would, um, you know, ideally be given, uh, for submission intake. And let me also show you how this document looks.

So you can see that this is a 41, uh, page long

[00:22:30]

document, and it has a bunch of documents that would ideally be present in a sub, uh, in the, um, uh, file that you would be sending for submissions. Intake. So we have the loss run summary over here. And then we’d probably have a couple of Accord forms, the application.

And, uh, you see that we have a bunch of other documents as well. So you would, once I send this to the API, you would be able to see how, uh, the API is basically able to neatly split these different documents. So we have a

[00:23:00]

contractor, uh, con, uh, questionnaire over here as well. Yeah, that’s about it. So let’s wait a couple of, uh, seconds until you know the API finishes working on it.

And meanwhile, let me just show you how the platform, uh, works otherwise. So coming back? Yeah.

So while the API works, let me show you around the prompt engineering environment and, uh, how this works. So, um, I, we spoke about setting up the

[00:23:30]

connectors and once the connectors are set up, you can start developing your prompts. So in this case, uh, I wouldn’t be creating a new prompt studio project for the want of time.

I thought I’ll just, uh, go and explore an existing project that I have, which extracts data from a number of incident reports. So, uh, this is how the prompt Studio environment looks. We have the document over here on the right and uh, we have a bunch of, um, prompts over here on the left, which has, uh,

[00:24:00]

the, uh, output given over here as well since I’ve already run this.

So, as I mentioned earlier, the first step that is done when you upload the document into the platform is to extract the text from this document. And, uh. Created into a format that is LLM ready. So this is what the text extractor does. And in this case we have deployed LLM Visser as the text extractor.

And as you can see, we have the extracted text, the raw view over here. So this is the context that will be

[00:24:30]

sent to the LLM for extraction, and you can see how the layout has been preserved. And, uh, this is basically how you ensure that the LLM is getting the complete context and is also a way of ensuring accuracy.

So you can check this out for the other document, uh, variants that we’ve uploaded for this particular project as well. So over here we are exploring one particular incident report. So I have other documents that I’ve uploaded that I can test, uh, you know, these prompts on. So we have another incident report over here, which

[00:25:00]

is a scanned.

It’s uh, quite a bad scan and it has some. Um, handwritten text and it has some, uh, a text, I mean in, in Hindi as well. And, uh, we have a couple of pages over here. So if I go into the raw view, this is how the, um, this is how LLMWhisperer has been able to extract the text while maintaining the layout. And the reason why I against stress on the importance of this stage is because what we see with elements is that they are able to extract data from documents wonderfully well.

[00:25:30]

However, even with the leading models that we have today, they do have some difficulty when they’re dealing with scanned documents or, uh, documents that contain handwritten text. So we’ve done a separate webinar on this itself where we, uh, explored mul, we tested LLM Whisperer and we tested multiple documents across different, um, large language models, and we were able to see the difference that having this intermediary, uh, step done by the text extractor makes.

So once. There is a

[00:26:00]

difference between sending this document directly to an LLM versus having this intermediary stage where you extract the text and then send it to the LLM, so you can check that webinar route. But I just wanted to quickly take you through how LLM vis works for the different incident reports that I have over here.

And you can see that we have incident reports of various formats. So we had a digitally native one, we had a scan, we had a bad scan. And over here we have quite a decent scan. But yeah, it’s just different formats and, uh, you can see how the prompts are again,

[00:26:30]

working for all these, uh, different documents.

So, coming to the prompts now that, uh, you know, you saw how text, uh, the text is extracted in a layout preserved format. You, you can then start creating your prompts. So as I mentioned earlier in the prompt, you would be specifying what is the data you’re looking to extract, as well as what is the schema for extraction.

For instance, if I take this prompt over here itself, we are extracting the data, uh, which is basically basic information from this in, uh, incident report.

[00:27:00]

So we, we mentioned what is the data that I’m going to, that I want to be extracted along with the formatting details. So over here you can see that I’ve specified a specific schema for the date.

So, and as you can see, it has been extracted in that particular schema. So this is not just for this particular document, if I click on coverage. You can see the output of this particular prompt across all the documents that you have uploaded into this particular project, and you can see that the date format has

[00:27:30]

been the same across the different documents.

So this is how you can. Bring your data into one single format. It’s as easy as, um, it’s as easy as creating simple prompts and it easily gets the work done. And this is a wonderful way of democratizing document extraction, what we are seeing with lms. So in a given prompt, you would have the name, the description, you see the output over here.

And as I, uh, as we explored earlier, we, you can also, uh, check out the output for the other test documents within

[00:28:00]

this project by clicking on coverage. And you also have access to a bunch of output data types that you can choose from for this particular extraction. And you, again, have details on what is the, uh, model that you’ve used, how many tokens were consumed, the cost and time details as well.

So this is the same across all the, uh, prompts that I have over here. And. To express certain other features that you have in the prompt studio, I spoke about the accuracy, enabling capabilities as well as grammar. So, uh, over

[00:28:30]

here, this is my prompt studio settings. What you see over here is the various LLM profiles that you can choose from.

In this case, I just have a single LLM profile. So the LM profile is nothing, but what is the combination? Because you can connect with multiple LLMs and vector dbs, uh, text extractors within the platform. So what is a combination, uh, you are looking to use in this particular project? So you can define that over here.

You, you can see that I’ve already, uh, defined it, and you can also specify. Chunking the chunk

[00:29:00]

size that you want to use, especially when you are dealing with lengthy documents. So sometimes when you’re, uh, probably, uh, dealing with financial reports, you might have the, uh, the document which has over hundreds of pages, and this often exceeds the token limit of elements.

So it might exceed the context limit the maximum number of pages that the LLM can handle at one go. So in these cases, what you’ll have to do is specify. A chunking strategy, which will be able to split the

[00:29:30]

document and, uh, provide the LLM with different chunks at different points in time. So that is basically, uh, what you’ll be defining in the retrieval strategy.

And again, there are different strategies for doing this depending on the document type. So we do support a couple of retrieval strategies over here. And, um, you can go through this and we, again, have information on which of these strategies are best for what document type. And you can, uh, go through this at any time.

So we have a bunch of strategies over

[00:30:00]

here, and once you specify the chunk size, you can, uh, you can then enable chunking as well if you are exploring, uh, I mean if you are dealing with documents that have. A, a large number of pages. So that said, we’ve looked at the LLM profile and I again, wanted to quickly run through the other capabilities that we have over here in prompt studio settings.

So one of the other key capabilities I wanted to talk. You about was LLM challenge because we earlier

[00:30:30]

spoke about how LLM enabled solutions have an advantage over the legacy systems, how they are able to transform the space, however, do they, it is true that they also have their own set of limitations.

So LMS can be known to hallucinate when it comes to document extraction, uh, and, uh, you know, retrieving any sort of, uh, result from your data and you cannot afford it to go wrong when it, uh, when you’re dealing with. Different documents and when you’re dealing with highly

[00:31:00]

sensitive data. So in order to prevent hallucination, we have a capability within Unstract called LLM Challenge, where you can basically deploy an LLM as a judge implementation.

So apart from the extractor LLM that I have defined over here, you can see that I’ve defined an anthropic model. I can also define another element. So over here I’ve give, uh, I’ve defined this as a GPT model and what would happen is that both these models would work and only on, on all the prompts that

[00:31:30]

you have, uh, defined.

And only if they arrive at a consensus on the output that is, uh, brought from these different models is it given to the user. Otherwise, the user would receive a null result. So you can check out how the LLM challenge works as well, so we do have the option of inspecting the conversation. That was, um.

That the two LLMs had for each of these extractions. So in the LLM challenge log, what you can see over here is what was the extraction run? So

[00:32:00]

what was the data that was provided by, uh, the extractor, LLM and you would have the challenger run as well. So the challenger run basically rates the extraction, how far it agrees with the extraction data, and only then is it given to the user if, if it completely agrees with the extraction run.

So this is how, uh, deep you can get into inspecting how the LLM challenge works. And again, we spoke about how it’s important to, um, ensure that the, uh, that, that the system understands the

[00:32:30]

specific industry specific terminology, uh, as well. So you can specify that under grammar, where you can define. Rose.

And let’s say I want to define a word. Uh, for instance, I’m just randomly taking profits and if I want to give it a synonym, I can give it multiple synonyms. So if it comes, uh, I want the more, uh, I want the system to understand profits and associated with the synonym gains. So this is how you can closely control how the industry specific terminology is being in interpreted as well.

[00:33:00]

So apart from this, we do have a couple of, uh, cost saving features like summarized extraction and single pass extraction, and we have extensive documentation on all these capabilities as well, as well as you can go through the previous webinars. Uh, for now, I think we’ve quickly, I mean, we’ve done a quick.

Uh, wrap up of the prom studio, and I think, uh, I’ve gone through most of the capabilities that I wanted to cover. So this is how the prom studio works, and once you’re happy with the prompts and how the data

[00:33:30]

is being extracted, you can now export this particular project as a tool and then deploy it using any of the deployment options that you had seen earlier.

So you can deploy it as an API an ETL, pipeline, task pipeline and so on. So in this particular, um. Webinar, we’ll take a look at how this can be deployed as an ETL pipeline, as well as the human in the loop deployment. So I’ve already exported this particular project as a tool and I’ve created an ETL pipeline as well.

So all your, uh, I

[00:34:00]

mean the, uh, deployments are created under the workflow tab that you see over here. So I have a bunch of, um, workflows that I have. Let me open the incident report, ETL pipeline. And this is how the interface, the workflow interface looks. So you would be specifying where you’re getting the input document from.

So in this case, I’ve connected with a file system and I can again, um. Specify which of the folders this particular, uh, document will be present in whether

[00:34:30]

or not to process the sub folders. What is the maximum number of files to process? So once I specify all of this, and I’ll, uh, connect and I specify the, um, source where I’m, where I’ll be getting the document from, I can then choose from the various, uh, tools that I have.

Uh, that I want to deploy on that particular document. So in this case, since I will be getting incident reports from the source, I have, uh, selected the incident reports tool, which is basically all the,

[00:35:00]

uh, processes that you’d seen earlier in the prom studio. So this tool will be able to perform all that and extract the data in the format that you want.

And once you specify this, you again. Need to specify the destination connector. Over here we’ve connected with a database and um. We again have the human in the loop feature over here. So over here is where if you, if you want to enable human in the loop, you can set up how many documents you want

[00:35:30]

to be sent for human review over here.

So this, in this case, I’ve given this as a hundred. Because, uh, in this sample test case, I just had one document in my folder. However, I can alter this according to your needs. So basically the percentage is when you’re dealing with large number of document influx, you would want to, you might not want to send all those documents for human review.

So if I just want to send 40% of those documents, I would set the percentage over here as to 40. But how do I decide which?

[00:36:00]

40% or which of these documents would, uh, get into the 40% that would be sent for human review. So that is where we have the conditional logic that you can set up over here. So this allows you to filter out the data and depending on the data, you can send your documents for review.

So we can filter it by confidence score. So as I mentioned earlier. When LLMWhisperer extracts the text from your document, it also extracts it along with confidence score on how confident it’s on a particular

[00:36:30]

extraction. So let’s say in this case, I’ve given the incident type and, uh, if it is equal to environmental hazard, so if, uh, so you can filter it out by value as well.

So in this, uh, in this case, I’ve filtered it out by value. So if it is an environmental hazard. Then I want this particular document to go into human review, or I can filter it out by confidence scoring as well. So in this case, if the incident basic info, which is one of the fields that I had defined in the, uh, project for extraction, if the confidence score is less than 0.5, then I want this

[00:37:00]

particular document to be sent in for human review.

So this is how closely you can control how your, uh, documents are sent for review. And you can further define the settings as in. Where your data would go once the review is completed. So once the human performs the review, would you want to send it to the destination DB that you have connected with, or you can decide to keep it in queue within the platform itself, and then download it as a JSON file for downstream operations.

So we have these two, um. Destinations

[00:37:30]

that, that are currently available post the human review process. So once I’m, if I’m happy with the rules that I have and the settings, I can, uh, click on save rules and, um, we will now see how human in the loop is, um, deployed within Unstract. So this is how the ETL pipeline is set up and, um.

Under ETL pipelines, you would see all the other, um, ETL pipelines that I have. And I also have the option of setting CRON schedule. So in this case I haven’t,

[00:38:00]

but I can, um, set up CRON schedule to trigger this particular workflow, um, on a periodical basis. So this is also how you can automate how often this particular, uh, particular workflow runs and.

So with this, let me explore human in the loop because in this case I have specified human in the loop and I’ve already run this particular workflow once. So what you should be seeing is that we have the document in the um,

[00:38:30]

folder, which is sent for human review. Now let me quickly open the folder as well and show you what this, uh, document is.

Just a minute folks.

[00:39:00]

So this is the, um. This is the document that I have sent for the human review. So this is again, an, uh, incident report. This is again, a document you had seen earlier within the prompt Studio itself, but in the ETL pipeline. I will be getting this document from my file system and I will be deploying the incident report tool over the document and, uh, it would then be

[00:39:30]

sent for human review.

So once, since I’ve already run this workflow, let me go into my review interface.

So this is a review interface and you can choose from the various document classes over here. So these are the various other projects that I have sent for, um, review and let’s find the incident report, ETL Pipeline over here. And once I click on fetch Next, you basically have the document. And what the system does is that it automatically highlights

[00:40:00]

the, um.

Data, which has a low confidence score. So this might be the data I might want to, the reviewer might want to first focus on before they go ahead and review the other data that is extracted. So in this case, you can see that the location has a low confidence score. And upon clicking on this data. The system automatically highlights the space on the original document where this data was fetched from.

So you can see that the site details is given over here. It’s um, so it’s basically the same

[00:40:30]

detail that was given in the document, so I might not want to change it. I can click on any other. Data over here and the system would automatically highlight where it was fetched from. And not only that, the reviewer also has the access of editing this data.

So let’s say that I do not want, uh, 20th of March to be sent into my database, then I can change this data to, uh, whichever value is correct. And once I click on save, this is the data which will then be sent post the review into

[00:41:00]

the destination BB or for downstream operations. So this is how the reviewer works in the review interface that is supported by Unstract, and you again have access to queue details so you can check out the various documents that are there.

So this particular document is a review in progress. You will see that once we finish this review, it would move for, uh, it would move in two reviews finished. And in Unstractt, we support a two layered review process. So one is done by the reviewer and depending on the access

[00:41:30]

permissions you have in the platform in your organization, you might also have approver access, where the approver does a second level of screening and then sends it for downstream operation.

So you can choose to. Choose to have or to not have the approver workflow depending on your needs. In this case, I’ll just send this, finish this review, and this should move into the approver’s queue. So right now if I click on the queue details, you can see that the document has moved from review in progress to reviews

[00:42:00]

finished.

So as an approver, what I’d have to do is move into the approver’s interface, and again, I’ll have to choose this document class. And, uh, as you can see, the date over here has been changed. So this was the, um, yeah, so this was the date that I had changed, and we have it over here. So all I’ll have to do is I, I can click on approve and this should be, um, sent to the destination db.

[00:42:30]

As you can see, this is how the output looks once you send your document to the PDF splitter, API. So the API has neatly split the documents and you can download it over here, which will then be downloaded as a zipped file. So upon opening this zipped file, I can see the various documents that this particular PDF contained.

So I have the Accord forms over here, and if I open. The, uh, contractor questionnaire, you can see that we have this

[00:43:00]

document neatly split. And the same goes for all the other documents that were contained in the single PDF. So this is how you can then route these documents for, uh, data extraction in their specific workflows.

And that brings me to the end of this session. This is basically how you, uh, perform all these different, uh, capabilities in the platform. And, uh, we looked at the, we basically covered the two major challenges, which were the document challenges as well as the

[00:43:30]

human review challenges that we usually see in insurance use cases.

And moving on to fragmented workflows. Again, we saw how the ETL pipeline is supported and we again have, um, API deployments as well and task pipelines. So these are all the native deployments that are present within the application. However, if you are to, uh, if your needs exceed this and you want to deploy this in, uh, an agent workflow platform like a, we again have

[00:44:00]

Unstract as well as LLMWhisperer nodes available in a.

So, as you can see, this is a sample workflow that I have. This is a submission intake workflow that we’ve set up in in aan. So what happens over here is we get an email trigger, and this particular workflow is. Basically extracting the text from the email body as well as the attachment. And what it does is combine the LLMWhisperer stage over here.

This node basically extracts all the raw text from the documents

[00:44:30]

and it combines all of this into a single PD. So you would have the body of the. Email as well as the raw text from the documents put into a single file, which is then sent to the untraced node. So I can open this node and explore how this has worked.

So we have the input details over here and we, again have the various output details. So, uh, you can inspect each one of these nodes for the want of time. I’m just exploring the abstract note so you can see that we have the details,

[00:45:00]

like the coverage type, the um, ad. Looking to extract from the various documents that you got, uh, in submission intake.

And finally, this would be sent. To a database or whichever, uh, downstream operation you’re looking for. So this is how we support innate deployment of Unstract. And again, I spoke to you earlier about the MCP deployment as well. So I, uh, I mean these are again, uh, different topics. They’re quite elaborate on

[00:45:30]

their own, and we have blog posts as well as webinars on these different, uh, deployment options as well.

I just wanted to quickly take you through how this works so you get an idea. So in this case, I’ve, um, extracted some data from an invoice using the MCP environment. So you can see that I’ve simply specified prompts and the system has been able to handle this. And all I have to do is connect, uh, with the relevant tools and the servers and, uh, you, that should be enough for you to get started.

So you can see over

[00:46:00]

here that we, again, have the abstract MCP server. So in this case, we are extracting data from an invoice. And you can again, take a look at, uh, what is the data and the line items that have been extracted so you can, uh, inspect how the tool is working and um, what is exactly happening in the backend as well.

So this is again, how you can deploy your projects in the MCP environment. And that brings us to the end of this session, folks. We took a look at the major challenges

[00:46:30]

in insurance document, ETL. So, um. There were document challenges. We saw how abstract as well as LLMWhisperer were able to handle different document types and, um, and it, you do not require any training as such.

And again, we looked at how human review is made easier with the interface, and finally how fragmented workflows are no matter problem because we are expanding and we are trying to integrate with as many applications as we can so that, uh, you ensure a seamless. Workflow

[00:47:00]

end to end depending on your use case.

So if you are looking to explore this in more detail and to see how you can customize the platform for your particular needs, we do have a a free personalized demo that we offer where you can sit on a one-on-one conversation with one of our experts and we’ll be able to understand your business requirements and see how the platform can be customized for your needs.

So you would, uh, find the link to this demo in the chat. And, uh, please do register for it and we’ll be in touch with you.

[00:47:30]

So that brings us to the end of this session and let’s move on into the q and a in case we have any questions remaining.

I see that we don’t really have any questions for today. So thank you everybody for joining this session today. I hope, uh, this was insightful and I really hope to see you in our upcoming as well. Thank you.