Build Document Workflows That Adapt

Table of Contents

Why “Adaptive” Document Workflows Matter

Document automation has improved significantly, but most systems still struggle in real production environments. Documents are not static, invoice layouts change, new fields appear, and vendors update formats without notice. At the same time, business rules evolve, and requirements vary across clients, regions, and policies. What works for one use case or customer,  more often than not, fails for another.

Traditional document extraction relies heavily on static logic. Prompts and rules are tightly coupled to specific layouts and assumptions, and business rules are often hardcoded into the extraction itself. This approach may work initially, but it breaks down at scale. As variations increase, teams end up duplicating prompts, maintaining fragile workflows, and constantly rewriting logic for small changes.

Unstract is shifting this model from hardcoded workflows to adaptive, data-driven pipelines. Extraction logic remains stable, while business context, validation rules, and downstream actions are applied dynamically. This separation allows document workflows to automatically adapt to changing formats, policies, and use cases without constant rework.

In this article, we explore three new advanced Unstract features that enable this adaptability: Post-Processing Webhooks, Custom Data Variables, and Prompt Chaining. Together, they provide developers with fine-grained control over extraction, validation, and workflow orchestration.

Mental Model: How These Features Work Together

To understand how these features fit together, it helps to think of document processing as a pipeline rather than a single extraction step. Each component has a clear responsibility, and together they form an adaptive workflow that can handle change without constant rewrites.

Unstract’s Prompt Studio is responsible for extracting the raw structure from documents. Its job is to identify and pull out fields, values, and relationships; without embedding business rules or assumptions about how the data will be used.

Custom Data provides the runtime business context. It answers questions like why this document is being processed, which rules apply right now, and what thresholds or expectations should be used. This context is injected dynamically, allowing the same prompt to behave differently across clients, regions, or workflows.

Post-Processing Webhooks take the extracted data and make it production-ready. This is where validation, transformation, enrichment, calculations, and system integration happen. Webhooks allow developers to apply deterministic logic and connect extracted data to real systems such as ERPs, databases, or internal services.

Finally, Prompt Chaining connects these steps into multi-stage workflows. Outputs from earlier prompts guide later extraction and validation, enabling conditional logic, cross-document checks, and sequential decision-making; all without reprocessing documents or duplicating prompts.

Conceptually, the flow looks like this:

Build Document Workflows That Adapt with Unstract

This flow keeps extraction, context, and workflow logic cleanly separated, making document pipelines easier to scale, adapt, and maintain.

Feature 1: Post-Processing Webhooks

A Post-Processing Webhook is a developer-controlled step that runs immediately after Prompt Studio completes extraction. Instead of consuming raw extracted JSON directly, Prompt Studio sends the output to a webhook endpoint where additional logic can be applied before the data is finalized.

The webhook receives structured JSON from Prompt Studio and executes custom logic such as validation, transformation, enrichment, calculations, or integration with external systems. Based on this logic, it can return a cleaned and enriched payload, or respond with errors and flags indicating that the document requires review.

From an implementation perspective, post-processing webhooks are typically built as lightweight Python services, using frameworks like FastAPI or Flask. This approach keeps the logic deterministic, testable, and easy to evolve, while allowing developers to enforce real business rules outside of the prompt itself.

In practice, webhooks are most commonly used to:

  • Validate or clean extracted data
  • Transform values into normalized or formatted outputs
  • Enrich results with data from external systems or reference sources

For more information, check out the Unstract Post-Processing Webhook Documentation.

Use Case 1 – Purchase Order Validation and Enrichment

Document

For this example, we’ll use the following purchase order document as the input.

Download Document →

What Prompt Studio Extracts

Let’s create a new Prompt Studio project to extract customer information. We’ll begin by extracting the customer details using the following prompt:

Extract the customer, bill to and deliver to sections. Return each as a record (address and postal code should be separate fields).

This returns the following result:

Build Document Workflows That Adapt with Unstract

Make sure the return format is set to JSON. You can enable the new webhook feature by clicking the three vertical dots in the top-right corner of the prompt section, then activating it and entering your webhook endpoint:

Build Document Workflows That Adapt with Unstract

For reference, here is the resulting JSON from the previous prompt execution:

{
    "bill_to": {
        "address": "9090, West river avenue",
        "company": "Hive view Inc",
        "name": "John Armstrong",
        "postal_code": "Los Angeles, CA 92802"
    },
    "customer": {
        "address": "9090, West river avenue",
        "company": "Hive view Inc",
        "name": "John Armstrong",
        "postal_code": "Los Angeles, CA 92802"
    },
    "deliver_to": {
        "address": "9090, West river avenue",
        "company": "Hive view Inc",
        "name": "Simon Jones",
        "postal_code": "Los Angeles, CA 92802"
    }
}

Post-Processing Webhook Actions

Let’s start by looking at validation and cleaning, beginning with the customer data:

def compare_addresses(bill_to: Dict[str, Any], deliver_to: Dict[str, Any]) -> bool:
    """
    Compare billing and delivery addresses.
    Returns True if addresses match, False otherwise.
    """
    bill_address = bill_to.get("address", "").strip().lower()
    bill_postal = bill_to.get("postal_code", "").strip().lower()
   
    deliver_address = deliver_to.get("address", "").strip().lower()
    deliver_postal = deliver_to.get("postal_code", "").strip().lower()
   
    addresses_match = (bill_address == deliver_address and bill_postal == deliver_postal)
    logger.info(f"Addresses match: {addresses_match}")

    return addresses_match

This function, compare_addresses, takes two dictionaries representing billing and delivery addresses and checks whether they match. It compares both the address and postal_code fields after stripping whitespace and converting to lowercase.

For the enrichment example, we can use a CSV file containing customer bank information and look up the relevant details to include in the processed data:

def get_customer_billing_info(bill_to: Dict[str, Any]) -> Dict[str, Any]:
    """
    Get customer billing info.
    Returns a dictionary with customer billing info.
    """
   
    # Read customers billing info from csv file
    with open('customers_billing.csv', 'r') as file:
        reader = csv.reader(file)
        for row in reader:
            # Skip header row
            if row[0] == "contact_name":
                continue
            # Check if company matches
            if row[1].lower().strip() == bill_to.get("company", "").lower().strip():
                return {
                    "contact_name": row[0],
                    "company": row[1],
                    "bank_name": row[2],
                    "bank_account_number": row[3]
                }
    return None

This function, get_customer_billing_info, looks up a customer’s billing information from a CSV file. It reads through customers_billing.csv, finds the row matching the company in the bill_to dictionary, and returns a dictionary containing the customer’s contact name, company, bank name, and bank account number. If no match is found, it returns None.

For reference, a CSV file could be defined like this:

contact_name,company,bank_name,bank_account_number
Alice Johnson,Acme Corp,Bank of America,123456789
Bob Smith,Global Supplies,Citi Bank,987654321
Carol Lee,Tech Solutions,Wells Fargo,456789123
John Financial,Hive view Inc,Bank of America,1234567890

Each row represents a customer, with columns for the contact name, company, bank name, and bank account number.

This allows the webhook to add additional context, such as bank account numbers, bank names, or routing codes, directly into the extracted JSON, making it ready for downstream processing or integration.

Let’s now look at the complete source code for the webhook, implemented as a FastAPI application:

import json
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
import uvicorn
import logging
from typing import Dict, Any
import csv

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


app = FastAPI(title="Unstract PostProcessing Webhook", version="1.0.0")


def compare_addresses(bill_to: Dict[str, Any], deliver_to: Dict[str, Any]) -> bool:
    """
    Compare billing and delivery addresses.
    Returns True if addresses match, False otherwise.
    """
    bill_address = bill_to.get("address", "").strip().lower()
    bill_postal = bill_to.get("postal_code", "").strip().lower()
   
    deliver_address = deliver_to.get("address", "").strip().lower()
    deliver_postal = deliver_to.get("postal_code", "").strip().lower()
   
    addresses_match = (bill_address == deliver_address and bill_postal == deliver_postal)
    logger.info(f"Addresses match: {addresses_match}")

    return addresses_match

def get_customer_billing_info(bill_to: Dict[str, Any]) -> Dict[str, Any]:
    """
    Get customer billing info.
    Returns a dictionary with customer billing info.
    """
   
    # Read customers billing info from csv file
    with open('customers_billing.csv', 'r') as file:
        reader = csv.reader(file)
        for row in reader:
            # Skip header row
            if row[0] == "contact_name":
                continue
            # Check if company matches
            if row[1].lower().strip() == bill_to.get("company", "").lower().strip():
                return {
                    "contact_name": row[0],
                    "company": row[1],
                    "bank_name": row[2],
                    "bank_account_number": row[3]
                }
    return None


@app.post("/webhook/customer")
async def receive_webhook_customer(request: Request):
    """
    Receive webhook POST requests
   
    This endpoint accepts webhook payloads, compares billing and delivery addresses,
    and returns the structured output with an additional field indicating if addresses match.
    """
    try:
        # Get the raw body as bytes
        body = await request.body()
       
        # Try to parse as JSON
        try:
            payload = await request.json()
            logger.info(f"Received webhook with JSON payload: {payload}")
        except Exception:
            # If not JSON, log as text
            payload = body.decode('utf-8')
            logger.info(f"Received webhook with text payload: {payload}")
            raise HTTPException(status_code=400, detail="Invalid JSON payload")
       
        # Extract structured_output from payload
        structured_output = payload.get("structured_output", {})
       
        # Get bill_to and deliver_to addresses
        bill_to = structured_output.get("bill_to", {})
        deliver_to = structured_output.get("deliver_to", {})
       
        # Compare addresses and set field for address match
        addresses_match = compare_addresses(bill_to, deliver_to)
        structured_output["billing_delivery_address_match"] = addresses_match
       
        # Get customer billing info and set field for customer billing info
        customer_billing_info = get_customer_billing_info(bill_to)
        structured_output["customer_billing_info"] = customer_billing_info

        # Prepare response with structured_output with the additional fields
        response_data = {
            "structured_output": structured_output,
            "highlight_data": payload.get("highlight_data", [])
        }
       
        # Return response
        return JSONResponse(
            status_code=200,
            content=response_data
        )   
    except Exception as e:
        logger.error(f"Error processing webhook: {str(e)}")
        raise HTTPException(status_code=500, detail=f"Error processing webhook: {str(e)}")

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

It includes the helper functions to:

  • Compare addresses (compare_addresses) between billing and delivery for validation.
  • Enrich customer data (get_customer_billing_info) by looking up bank info from a CSV file.

The endpoint receives extracted customer data, validates addresses, enriches with billing info, and returns the structured output.

Final Output

We’ve reviewed the Prompt Studio prompts, the extraction process, and the webhook processing code with its endpoints.

Now, let’s look at the results produced when the extraction runs and the post-processing webhook is invoked, and how the final, enriched output differs from the raw Prompt Studio response:

Build Document Workflows That Adapt with Unstract

The JSON returned after the post-processing webhook has been applied:

{
    "bill_to": {
        "address": "9090, West river avenue",
        "company": "Hive view Inc",
        "name": "John Armstrong",
        "postal_code": "Los Angeles, CA 92802"
    },
    "billing_delivery_address_match": true,
    "customer": {
        "address": "9090, West river avenue",
        "company": "Hive view Inc",
        "name": "John Armstrong",
        "postal_code": "Los Angeles, CA 92802"
    },
    "customer_billing_info": {
        "bank_account_number": "1234567890",
        "bank_name": "Bank of America",
        "company": "Hive view Inc",
        "contact_name": "John Financial"
    },
    "deliver_to": {
        "address": "9090, West river avenue",
        "company": "Hive view Inc",
        "name": "Simon Jones",
        "postal_code": "Los Angeles, CA 92802"
    }
}

As you can see, the processed output now includes additional fields representing the address match and customer banking information, enhancing the extracted data with validation and enrichment results.

Imagine uploading a messy PDF with multiple documents and instantly getting back clean, perfectly labeled files.

Most PDF splitters work only on predictable layouts, like fixed page ranges or chapters. Real business PDFs are different. A single file might include a loan form, KYC documents, payslips, and tax proofs, all in varying order and length. Traditional tools fail because there’s no consistent structure.

With an intelligent splitter, that same mixed PDF can be automatically separated into individual, clearly labeled documents, no manual rules, no page guessing, no trial and error.

Learn more →

Use Case 2 – Bank Statement Validation and Transform

Document

For this example, we will use a bank statement document:

Download Document →

What Prompt Studio Extracts

Now, let’s define the prompt to extract the bank statement transaction list details:

Extract the list of items with the individual fields (all lower-case field names) in a record ‘items’. Additionally, return another field with the Purchases and Advances as ‘total’.

Which produces the following results:

Build Document Workflows That Adapt with Unstract

And the corresponding JSON:

{
    "items": [
        {
            "amount": 223.26,
            "date": "11/12/23",
            "description": "Wallmart Purchase"
        },
        {
            "amount": 313.39,
            "date": "12/12/23",
            "description": "Sam vehicle service"
        },
        {
            "amount": 582.96,
            "date": "13/12/23",
            "description": "FEE PRE PAID ELECTRICITY"
        },
        {
            "amount": 125,
            "date": "14/12/23",
            "description": "Interac purchase - 1361 - The Bay"
        },
        {
            "amount": 215,
            "date": "15/12/23",
            "description": "Amazon Audible Book"
        },
        {
            "amount": 656.86,
            "date": "16/12/23",
            "description": "Hawai International Stay"
        },
        {
            "amount": 246,
            "date": "17/12/23",
            "description": "Adobe Inc CC cloud"
        },
        {
            "amount": 571.1,
            "date": "18/12/23",
            "description": "ATM withdrawal"
        },
        {
            "amount": 965,
            "date": "19/12/23",
            "description": "St louis Pharmacy"
        }
    ],
    "total": 3898.57
}

You can enable the webhook for this prompt by following the same process as before. Webhook configuration is defined per prompt.

Post-Processing Webhook Actions

For the individual transactions, we can perform validation by checking that the totals are correct:

def calculate_and_verify_totals(structured_output: Dict[str, Any]) -> Dict[str, Any]:
    """
    Calculate totals from items and verify against provided totals.
    Returns a dictionary with calculated totals and verification result.
    """
    items = structured_output.get("items", [])
   
    # Calculate line item totals
    calculated_total = 0.0
   
    for item in items:
        try:
            # Get amount
            amount = item.get("amount", 0)
           
            # Handle string values that might contain currency symbols or commas
            amount = float(str(amount).replace(",", "").replace("$", "").strip())
           
            # Calculate line total
            calculated_total += amount
        except (ValueError, TypeError) as e:
            logger.warning(f"Error processing item {item}: {str(e)}")
            continue
   
    # Round subtotal
    calculated_total = round(calculated_total, 2)
    logger.info(f"Calculated total: {calculated_total}")

    # Get provided subtotal
    provided_total = structured_output.get("total", {})
    logger.info(f"Provided total: {provided_total}")

    # Check if subtotal is correct
    is_total_correct = calculated_total == provided_total
    logger.info(f"Total is correct: {is_total_correct}")
   
    return {
        "totals_correct": is_total_correct, 
        "calculated_total": calculated_total,
        "provided_total": provided_total
    }

This function, calculate_and_verify_totals, computes the subtotal of all transactions and compares it against the provided subtotal. It handles field names, removes currency symbols or commas, and sums each line item’s total.

For transform, here’s an example of formatting the amounts as currency:

def format_currency_items_totals(structured_output: Dict[str, Any]):
    """
    Format currency items.
    Returns a list of formatted items.
    """

    # Define currency
    currency = "USD"

    # Format items
    items = structured_output.get("items", [])
    for item in items:
        item["amount"] = f"{item["amount"]:.2f} {currency}"
   
    # Format totals
    total = structured_output.get("total", {})
    total = f"{total:.2f} {currency}"

    # Set formatted items and totals in structured_output
    structured_output["items"] = items
    structured_output["total"] = total
    return structured_output

This function, format_currency_items_totals, formats all transaction amounts and total in as currency strings. It converts amount for each item, as well as total, into a consistent currency format (e.g., 12.34 USD).

Combining all the previous code examples, the full source code is:

import json
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
import uvicorn
import logging
from typing import Dict, Any
import csv

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


app = FastAPI(title="Unstract PostProcessing Webhook", version="1.0.0")


def calculate_and_verify_totals(structured_output: Dict[str, Any]) -> Dict[str, Any]:
    """
    Calculate totals from items and verify against provided totals.
    Returns a dictionary with calculated totals and verification result.
    """
    items = structured_output.get("items", [])
   
    # Calculate line item totals
    calculated_total = 0.0
   
    for item in items:
        try:
            # Get amount
            amount = item.get("amount", 0)
           
            # Handle string values that might contain currency symbols or commas
            amount = float(str(amount).replace(",", "").replace("$", "").strip())
           
            # Calculate line total
            calculated_total += amount
        except (ValueError, TypeError) as e:
            logger.warning(f"Error processing item {item}: {str(e)}")
            continue
   
    # Round subtotal
    calculated_total = round(calculated_total, 2)
    logger.info(f"Calculated total: {calculated_total}")

    # Get provided subtotal
    provided_total = structured_output.get("total", {})
    logger.info(f"Provided total: {provided_total}")

    # Check if subtotal is correct
    is_total_correct = calculated_total == provided_total
    logger.info(f"Total is correct: {is_total_correct}")
   
    return {
        "totals_correct": is_total_correct, 
        "calculated_total": calculated_total,
        "provided_total": provided_total
    }


def format_currency_items_totals(structured_output: Dict[str, Any]):
    """
    Format currency items.
    Returns a list of formatted items.
    """

    # Define currency
    currency = "USD"

    # Format items
    items = structured_output.get("items", [])
    for item in items:
        item["amount"] = f"{item["amount"]:.2f} {currency}"
   
    # Format totals
    total = structured_output.get("total", {})
    total = f"{total:.2f} {currency}"

    # Set formatted items and totals in structured_output
    structured_output["items"] = items
    structured_output["total"] = total
    return structured_output


@app.post("/webhook/items")
async def receive_webhook_items(request: Request):
    """
    Receive webhook POST requests for items
   
    This endpoint accepts webhook payloads, calculates totals from items,
    verifies them against provided totals, and returns the result.
    """
    try:
        # Get the raw body as bytes
        body = await request.body()
       
        # Try to parse as JSON
        try:
            payload = await request.json()
            logger.info(f"Received webhook with JSON payload: {payload}")
        except Exception:
            # If not JSON, log as text
            payload = body.decode('utf-8')
            logger.info(f"Received webhook with text payload: {payload}")
            raise HTTPException(status_code=400, detail="Invalid JSON payload")
       
        # Extract structured_output from payload
        structured_output = payload.get("structured_output", {})

        # Calculate and verify totals
        totals_verification = calculate_and_verify_totals(structured_output)
        structured_output["totals_verification"] = {
            "totals_correct": totals_verification["totals_correct"],
            "calculated_total": totals_verification["calculated_total"],
            "provided_total": totals_verification["provided_total"]
        }

        # Format currency items and totals
        formatted_structured_output = format_currency_items_totals(structured_output)

        # Prepare response with the formatted structured_output (also contains the totals verification)
        response_data = {
            "structured_output": formatted_structured_output,
            "highlight_data": payload.get("highlight_data", [])
        }
       
        # Return response
        return JSONResponse(
            status_code=200,
            content=response_data
        )   
    except Exception as e:
        logger.error(f"Error processing webhook: {str(e)}")
        raise HTTPException(status_code=500, detail=f"Error processing webhook: {str(e)}")


if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

It receives structured JSON from Prompt Studio, recalculates the total amount by summing item values, and verifies it against the extracted total to detect inconsistencies.

The service then formats monetary values as currency, enriches the output with a totals_verification section, and returns a clean, production-ready JSON response.

This keeps business logic deterministic, testable, and fully decoupled from the prompt itself.

Final Output

Executing the prompt and invoking the post-processing webhook returns the following result:

Build Document Workflows That Adapt with Unstract

With the corresponding JSON:

{
    "items": [
        {
            "amount": "223.26 USD",
            "date": "11/12/23",
            "description": "Wallmart Purchase"
        },
        {
            "amount": "313.39 USD",
            "date": "12/12/23",
            "description": "Sam vehicle service"
        },
        {
            "amount": "582.96 USD",
            "date": "13/12/23",
            "description": "FEE PRE PAID ELECTRICITY"
        },
        {
            "amount": "125.00 USD",
            "date": "14/12/23",
            "description": "Interac purchase - 1361 - The Bay"
        },
        {
            "amount": "215.00 USD",
            "date": "15/12/23",
            "description": "Amazon Audible Book"
        },
        {
            "amount": "656.86 USD",
            "date": "16/12/23",
            "description": "Hawai International Stay"
        },
        {
            "amount": "246.00 USD",
            "date": "17/12/23",
            "description": "Adobe Inc CC cloud"
        },
        {
            "amount": "571.10 USD",
            "date": "18/12/23",
            "description": "ATM withdrawal"
        },
        {
            "amount": "965.00 USD",
            "date": "19/12/23",
            "description": "St louis Pharmacy"
        }
    ],
    "total": "3898.57 USD",
    "totals_verification": {
        "calculated_total": 3898.57,
        "provided_total": 3898.57,
        "totals_correct": true
    }
}

As you can see, the currency values are properly formatted, and an additional total verification record is added with the calculated amounts and validation result.

This approach allows you not only to validate the extracted data, but also to transform it outside of the prompt itself, keeping prompts clean while enforcing reliable, production-grade logic.

Feature 2: Custom Data Variables

Custom Data allows developers to inject runtime business context into Prompt Studio without modifying the prompt itself, cleanly separating extraction logic from business rules and configuration.

In this model, the prompt defines how to extract information, what fields to look for and how to interpret the document structure, while Custom Data defines what conditions apply right now, such as thresholds, expectations, policies, or workflow intent.

Custom Data is passed at runtime as a simple JSON object, allowing the same prompt to behave differently depending on the client, region, document purpose, or processing scenario, all without duplicating or rewriting prompt logic.

By moving variability into data instead of text, Custom Data eliminates prompt sprawl and enables scalable, multi-tenant document workflows that are easier to maintain and evolve.

A helpful way to reframe Custom Data is to shift the question from “who is this document about?” to “why is this document being processed?”. Rather than capturing personal attributes or static metadata, Custom Data represents the business context that determines how a document should be interpreted in a given workflow.

In practice, it encodes factors such as business intent, risk profile, and the applicable policy or rule set, directly influencing validation, thresholds, comparisons, and decisions, all without altering the underlying extraction logic.

For more information, check out the Unstract Custom Data Documentation.

Use Case 1 – Air Shipment Validation

Document

As a document for the air shipment validation, we will use this airway bill example:

Download Document →

Custom Data

Let’s assume we want to use custom data to validate whether the cargo weight falls within predefined limits for a specific customer account and route. We could define data like this:

{
  "account_number": "HY73221",
  "route": {
    "origin": "New York",
    "destination": "London"
  },
  "thresholds": {
    "minimum": 50,
    "maximum": 80,
"unit": "Kg"
  }
}

Prompt Studio

In Prompt Studio, you can define custom data by following these steps:

  1. Open your Prompt Studio project
  2. Click the Settings icon (gear) in the toolbar
  3. Select Custom Data from the menu

From there, you can add and manage the custom data that will be available to your prompts:

Build Document Workflows That Adapt with Unstract

This definition can be used for testing within Prompt Studio. When using the API, you can pass the same information via the custom_data parameter in your request to apply it at runtime.

Let’s consider the following prompt, which leverages Custom Data to process and validate the information in the document:

Extract the total shipment weight for customer {{custom_data.account_number}} in the route from {{custom_data.route.origin}} to {{custom_data.route.destination}}.

The total weight should be between {{custom_data.thresholds.minimum}} and {{custom_data.thresholds.maximum}} of unit {{custom_data.thresholds.unit}}.

Return field ‘is_between_thresholds’ with True or False depending on the outcome of the data.

This generates the following output:

Build Document Workflows That Adapt with Unstract

And corresponding JSON:

{
    "is_between_thresholds": true,
    "total_shipment_weight_kg": 72
}

As you can see, by using Custom Data, the validation logic is kept separate from the prompt itself and can be applied dynamically, allowing processing to adapt to specific customers, routes, or any other contextual data without modifying the extraction logic.

Use Case 2 – Medical Document

Document

Let’s look at a second example of using Custom Data, this time with a medical document:

Download Document →

Custom Data

The advantage of using Custom Data is that we can “override” the default reference ranges defined in the report. For example, based on a patient’s history, doctors may expect different thresholds than the standard ranges.

In this case, we can define a custom metric to reflect those expectations:

{
  "metric_check": {
    "name": "Hemoglobin",
    "expected_min": 13.0,
    "expected_max": 17.0,
    "unit": "g/dL"
    }
}

Prompt Studio

As described before, let’s add this Custom Data to the Settings of the Prompt Studio project:

Build Document Workflows That Adapt with Unstract

Reminder: when using the API, you can pass the same information via the custom_data parameter in your request to apply it dynamically at runtime.

Let’s define the prompt that uses the custom metric to extract and return a properly formatted output:

Extract the value and unit for {{custom_data.metric_check.name}} from the lab report.

Compare the extracted value against the expected provided range: {{custom_data.metric_check.expected_min}} to {{custom_data.metric_check.expected_max}} of unit{{custom_data.metric_check.unit}}.

Record a field name ‘is_expected’ with only one of these possibilities:
– below_expected
– within_expected
– above_expected

This generates the following output:

Build Document Workflows That Adapt with Unstract

And corresponding JSON:

{
    "Hemoglobin": {
        "unit": "gm/dL",
        "value": 12.4
    },
    "is_expected": "below_expected"
}

As you can see, by using Custom Data, medical professionals can adjust the expected metrics for a report based on a patient’s history, enabling more accurate diagnostics.

This clearly demonstrates the advantages of separating extraction logic from workflow-specific rules using Custom Data.

Feature 3: Prompt Chaining

Prompt Chaining is a technique for building multi-stage document workflows by connecting multiple prompts in sequence. Instead of treating each extraction as an isolated step, prompts are executed one after another, where the output of one prompt becomes the input to the next.

This approach enables workflows that can reason progressively over documents. Earlier extraction results can influence what happens downstream, allowing for conditional logic, cross-document validation, and multi-stage analysis without reprocessing entire documents or duplicating prompts.

By breaking complex logic into smaller, focused steps, Prompt Chaining makes document workflows easier to understand, more reliable, and better suited for real-world scenarios where decisions depend on earlier results.

For more information, check out the Unstract Prompt Chaining Documentation.

Data Flow Patterns

Prompt Chaining supports two primary patterns for passing data between prompts, depending on whether the output needs additional processing before being reused.

Direct Value Passing: In this pattern, the raw output of a prompt is passed directly into the next prompt using the syntax {{prompt_key}}. This is ideal when the extracted value is already in the required format and can be reused as-is, for example, passing a name, identifier, or simple field between prompts.

Processed Value Passing: In cases where the output needs validation, normalization, or enrichment before being reused, Prompt Chaining supports processed value passing. Here, the prompt output is first sent to a post-processing webhook and then injected into the next prompt using the syntax {{[prompt_key] <post_processing_url>}}. This allows developers to apply deterministic logic between prompt stages while keeping the overall workflow modular and flexible.

Use Case 1 – Loan Application + KYC Verification

Document

For this example, we will use a loan application document to illustrate the use of Direct Value Passing:

Download Document →

Prompts

First, we begin by extracting the applicant’s name from the document:

Extract the applicant’s full name from the loan application. Reply only with the name.

This prompt was defined with the key applicant_name:

Build Document Workflows That Adapt with Unstract

Which returns the following result:

Roger Deakins

We can now reuse this extracted value in the second prompt:

Check if the name “{{applicant_name}}” appears in the driver license documents on page 2 and 3.

Reply with a list of records composed by ‘page’ and ‘match_type’ fields.

For the ‘match_type’ field return only one of these values:
– exact_match
– partial_match
– no_match

We define the prompt key as kyc_name_match:

Build Document Workflows That Adapt with Unstract

The resulting JSON looks like this:

[
    {
        "match_type": "exact_match",
        "page": 2
    },
    {
        "match_type": "no_match",
        "page": 3
    }
]

As you can see, by reusing the applicant name extracted in the first prompt, we can automatically validate the driver’s license details on pages 2 and 3, ensuring they correctly match the applicant.

This approach requires no manual intervention and is fully reusable across different loan application documents and workflows.

Use Case 2 – Bill Of Lading + Payment Lookup

Document

For this example, we will use a bill of lading document to illustrate the use of Processed Value Passing:

Download Document →

Prompts

Let’s define a prompt to extract the booking number and amount:

Extract the booking no and cod amount.

This prompt was defined with the key booking:

Build Document Workflows That Adapt with Unstract

Which returns the following result:

{
    "booking_no": "758993033",
    "cod_amount": "$ 800"
}

We can now reuse this extracted value in the second prompt, using a webhook to process the values:

print the result {{[booking]https://unstract-webhook.developer-service.blog/webhook/amount_due}}

We define the prompt key as amount_due:

Build Document Workflows That Adapt with Unstract

The resulting JSON looks like this:

{
    "amount_due": 450,
    "amount_paid": 350,
    "booking_number": "758993033",
    "cod_amount": 800,
    "payment_date": "2026-01-16"
}

As shown in this example, prompt chaining with processed value passing allows you to combine the output of earlier prompts with data retrieved from external systems.

The webhook exposes a single endpoint at /webhook/amount_due (source code below), which accepts POST requests containing the shipment billing data. When a webhook payload is received, the processing flow is as follows:

  • The webhook extracts the booking_no and cod_amount.
  • It looks up the booking number in the bill_lading_payments.csv file to check for existing payment records.
  • If a match is found, it retrieves the corresponding amount_paid and payment_date.
  • The outstanding balance is then calculated as amount_due = cod_amount – amount_paid.

This pattern provides a high degree of flexibility for validating and enriching extracted data, while keeping prompts generic, reusable, and easy to maintain.

Webhook Source Code

The webhook referenced above is implemented using the following source code:

import json
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
import uvicorn
import logging
from typing import Dict, Any
import csv

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


app = FastAPI(title="Unstract PostProcessing Webhook", version="1.0.0")


@app.post("/webhook/amount_due")
async def receive_webhook_amount_due(request: Request):
    """
    Receive webhook POST requests for amount due
   
    This endpoint accepts webhook payloads, calculates amount due from totals,
    and returns the result.
    """
    try:
        # Get the raw body as bytes
        body = await request.body()
       
        # Try to parse as JSON
        try:
            payload = await request.json()
            logger.info(f"Received webhook with JSON payload: {payload}")
        except Exception:
            # If not JSON, log as text
            payload = body.decode('utf-8')
            logger.info(f"Received webhook with text payload: {payload}")
            raise HTTPException(status_code=400, detail="Invalid JSON payload")
       
        # Convert payload to dictionary
        payload = json.loads(payload)

        # Get booking number and cod amount
        booking_number = payload.get("booking_no", "")
        cod_amount = payload.get("cod_amount", 0)

        # Read bill lading payments from csv file
        amount_paid = 0
        payment_date = ""
        with open('bill_lading_payments.csv', 'r') as file:
            reader = csv.reader(file)
            for row in reader:
                # Skip header row
                if row[0] == "booking_no":
                    continue
                # Check if booking number matches
                if row[0] == booking_number:
                    amount_paid = float(row[1].replace("$", "").replace(",", "").strip())
                    payment_date = row[2]
                    break

        # Calculate amount due
        cod_amount = float(cod_amount.replace("$", "").replace(",", "").strip())
        amount_due = cod_amount - amount_paid
        response_data = {
            "booking_number": booking_number,
            "cod_amount": cod_amount,
            "amount_due": amount_due,
            "amount_paid": amount_paid,
            "payment_date": payment_date
        }
       
        # Return response
        return JSONResponse(
            status_code=200,
            content=response_data
        )   
    except Exception as e:
        logger.error(f"Error processing webhook: {str(e)}")
        raise HTTPException(status_code=500, detail=f"Error processing webhook: {str(e)}")


if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

The results were derived from data stored in a CSV file, which simulates an integration with external systems:

booking_no,amount_paid,payment_date
1234567890,1000.00,2026-01-19
3253252352,200.00,2026-01-10
4365347654,300.00,2026-01-05
758993033,350.00,2026-01-16

Why This Approach Scales in Real Systems

This approach scales because it reduces duplication and cleanly isolates change. Instead of creating new prompts for every variation, teams maintain a smaller set of stable prompts and control behaviour through data and workflow logic.

Key reasons it scales well:

  • Fewer prompts: One prompt can serve multiple clients, regions, and use cases by injecting runtime context instead of duplicating logic.
  • Smaller, composable logic blocks: Extraction, validation, transformation, and orchestration are separated, making each piece easier to build, test, and maintain.
  • Clear separation of concerns: Prompts extract data, Custom Data defines context, webhooks enforce rules, and prompt chains handle flow control.

This design also makes change easy to absorb:

  • New clients are onboarded by passing different Custom Data, not by rewriting prompts
  • New rules are added through webhook logic or configuration updates
  • New document formats can be supported without breaking downstream workflows

The result is a document automation system that remains flexible as complexity grows, instead of becoming harder to manage over time.

Conclusion: From Extraction to Intelligent Workflows

These features move Unstract beyond traditional OCR and basic data extraction. Instead of treating documents as isolated inputs, Unstract enables intelligent workflows where extraction, context, validation, and decision-making work together as a single system.

By combining Prompt Studio, Custom Data, Post-Processing Webhooks, and Prompt Chaining, teams can build production-grade document workflows that are flexible, reliable, and easy to evolve.

Developers retain full control over business logic and integrations, while workflows automatically adapt to new formats, rules, and use cases without introducing unnecessary complexity.

If you’re exploring how to design scalable document automation, the best way to understand these concepts is to try them yourself.

Start with the provided sample documents, experiment with different configurations, and build your own adaptive workflows to see how Unstract behaves in real-world scenarios.

UNSTRACT
AI Driven Document Processing

The platform purpose-built for LLM-powered unstructured data extraction. Try Playground for free. No sign-up required.

Leveraging AI to Convert Unstructured Documents into Usable Data

RELATED READS

About Author
Picture of Nuno Bispo

Nuno Bispo

Nuno Bispo is a Senior Software Engineer with more than 15 years of experience in software development. He has worked in various industries such as insurance, banking, and airlines, where he focused on building software using low-code platforms. Currently, Nuno works as an Integration Architect for a major multinational corporation. He has a degree in Computer Engineering.