From Inbox to Database: Automating Document Extraction with Unstract + n8n
Table of Contents
Introduction
Accounting firms today are inundated with a continual stream of client documents arriving via email, from vendor invoices to specialized tax-form submissions.
Each morning, staff must sift through dozens or even hundreds of messages to find billable attachments, then painstakingly download and transcribe data into spreadsheets or accounting systems.
This repetitive manual process not only diverts skilled accountants from high-value analysis but also creates bottlenecks that slow turnaround times and inflate labor costs, all while introducing the risk of typos and mis-mapped fields that can cascade into reporting errors and compliance headaches.
n8n addresses the orchestration challenge by providing a self-hosted, open-source workflow automation platform.
Its visual, drag-and-drop interface lets you connect Gmail triggers, branching logic, custom JavaScript (or Python) functions, HTTP request nodes, and database clients into a single, end-to-end pipeline.
With n8n’s powerful branching and retry capabilities, you can filter incoming emails, parse subject lines to identify form types, and route attachments through the appropriate processing paths, all under your data governance.
Complementing n8n’s orchestration is Unstract’s custom extraction API. Within Unstract’s Prompt Studio, you build and train form-specific extractors—mapping fields like invoice numbers, dates, line-item details, or tax-form codes—then deploy them as REST endpoints.
When n8n submits a PDF or scanned image to Unstract, the API returns perfectly structured JSON suitable for insertion into your cloud database.
By combining n8n’s trigger-and-branch logic with Unstract’s intelligent document parsing, you eliminate manual keystrokes, accelerate processing, and ensure reliable, error-free data capture from inbox to database.
Use Case Deep Dive: Accounting Firm Tax Forms
Scenario Description
Imagine an accounting firm that receives a steady stream of tax-related documents via email, each clearly labelled in the subject line with its form type—say, Form 1040 for income tax and Form 990 for tax exemption.
For example, an incoming message might read “Adam Scott – Form 1040 – Income Tax” or “Acme Corp – Form 990 – Tax exemption”.
Although both arrive as PDF attachments, each form type has its own unique set of fields: Form 1040 has income lines, whereas Form 990 has revenue lines.
Distinguishing these immediately allows us to tailor the downstream processing precisely to each document’s schema.
Desired Workflow Outcomes
The goal is a fully automated pipeline that, based solely on the email subject, auto-routes each attachment into the correct extraction workflow and then persists the resulting structured JSON into separate database tables.
Form 1040 documents should flow through the “Form 1040” extractor and land in the form-1040 table, with columns for each income line item.
Form 990 files should invoke the “Form 990” extractor and populate the form-990 table, capturing revenue lines.
By isolating each form’s data in its table, we maintain clear separation of schemas, simplify analytics, and ensure that reporting or reconciliation queries never have to sift through irrelevant fields.
Prerequisites & Setup
n8n Self-Hosted Installation
To get full control over your data and meet security requirements, you can deploy n8n on your own infrastructure using Docker.
Although it is possible to deploy it with a single Docker command and the built-in SQLite database, the best and most reliable way to deploy it is with Postgres and a separate worker.
That can be easily achieved by creating a Docker Compose file:
This Docker Compose file defines three named volumes—db_storage, n8n_storage, and redis_storage—to persist database, n8n, and Redis data across container restarts.
Using a shared YAML anchor (&shared), it standardizes configuration for both n8n and n8n-worker services: always-restart policies, the official docker.n8n.io/n8nio/n8n image, and environment variables that point n8n to a PostgreSQL backend, enable queued execution via Redis, and secure credentials with an encryption key.
It also mounts n8n_storage into each n8n container at /home/node/.n8n so workflows, credentials, and logs aren’t lost when containers restart.
The postgres service runs postgres:16, uses db_storage for its data directory, and automatically initializes schema or seed data via init-data.sh on first startup. A healthcheck (pg_isready) ensures n8n only starts once PostgreSQL is accepting connections.
Meanwhile, the redis service runs redis:6-alpine with its own volume (redis_storage) and a redis-cli ping healthcheck, guaranteeing the queue backend is ready before any jobs are enqueued.
Finally, two n8n services inherit the shared settings:
n8n exposes port 5678 for the web UI and API, linking to Redis and PostgreSQL once they’re healthy.
n8n-worker, launched with command: worker, handles background job processing separately from the main web process.
This separation of web and worker processes, combined with persistent volumes and health-checked dependencies, delivers a robust, production-ready n8n deployment.
On first opening of the URL, you will be asked to set up an owner account:
Skipping the initial pop-ups after login takes you to the Dashboard:
Tax Form Samples
In order to accurately extract the information of the 2 different types of tax forms, we are going to create 2 Unstract APIs that are designed specifically for each type.
Let’s start by taking a look at the examples of the 2 types that we will use in these examples.
1040 Tax form:
990 Tax Form:
To get an initial idea of the extracted information that Unstract and its LLMWhisperer can provide, let’s run one of the files in the LLMWhisperer Playground:
Unstract Prompt Studio Projects
Unstract is an open-source no-code LLM platform to launch APIs and ETL pipelines to structure unstructured documents. Get started with this quick guide.
Unstract’s Prompt Studio is a powerful tool that enables users to design and customize AI-driven prompts for extracting specific information from unstructured data.
In this section, we’ll focus on creating prompts to extract the necessary fields from the tax forms, like income, revenue, and identification.
Visit the Unstract website and create an account. The registration process is straightforward and grants you access to the platform’s features, including the Prompt Studio and LLMWhisperer tools.
Upon signing up, you will receive a 14-day trial that includes $10 in LLM tokens, allowing you to start using the account immediately.
Setting Up Prompts in Prompt Studio
Navigate to the Prompt Studio interface in Unstract and create a new project specific for the first tax form, let’s call it ‘Form-1040’.
Add the document on which you want to test and write the prompts for it with ‘Manage Documents’.
Prompts are designed to instruct the AI to focus on specific information fields within the document.
Prompt: “Extract the first name, last name, home address, apt no (if exists), city, state, zip code. Return JSON with these exact field names.”
Note: Remember to set the output format as JSON.
Running the prompt, we get the following JSON:
Prompt: “From the income section, extract the total amount from Form W-2 as total amount, household wages, medical payments, other income, and total income. Return JSON with these exact field names.”
Note: Remember to set the output format as JSON.
Running the prompt, we get the following JSON:
Output Format
The extracted data is organized into structured JSON, as mentioned. The combined output of the different prompts is, for example:
Once you’ve set up your Prompt Studio project and fine-tuned your prompts for precise data extraction, the next step is to deploy your Unstract solution as an API.
This deployment enables you to integrate the parsing functionality directly into your applications or systems, including n8n, to support real-time processing and scalable operations.
Creating a Tool
Begin by converting your project into a tool that can be incorporated into a workflow. In your Prompt Studio project, click the Export as tool icon located at the top right corner.
This action will transform your project into a ready-to-use tool.
Creating a Workflow
Next, create a new workflow:
Navigate to BUILD → Workflows.
Click on + New Workflow to start a new workflow.
Then, in the Tools section on the right, locate the tool you just created (e.g., “Form-1040”) and drag and drop it into the workflow editor on the left side:
Creating an API
Now that your workflow is ready, you can transform it into an API. Begin by navigating to MANAGE → API Deployments and clicking on the + API Deployment button to create a new API deployment by selecting the created workflow.
Once the API is set up, you can use the Actions links to manage different aspects of the API.
For example, you can manage the API keys or download a Postman collection for testing:
You can now repeat the same steps for the other tax form type.
For example, a possible output for the second tax form is:
A pop-up will be shown with your app password; copy it to a safe location so you can use it later.
Note: To use an app password, you need to have 2-step authentication enabled in your account.
NeonDB Postgres
Sign up for a free NeonDB project to create a dedicated database for your tax-form data.
Create a new project:
And you will be redirected to the Dashboard:
Click on ‘Connect to your database’:
Select ‘Parameters only’ and click on ‘Copy snippet’, you will need these values later to connect n8n to this Postgres database.
Next, you can create the tables necessary to store the tax form data, starting with the table form-1040:
Add the columns as the fields defined in the JSON created in Unstract’s Prompt Studio. Then click on ‘Review and create’ and then confirm by clicking on ‘Create table’.
Repeat the process for the table form-990:
PDF Data Extraction: Architecture Overview
High-Level Flow Diagram
Below is a simplified flowchart illustrating the end-to-end pipeline—from incoming email to structured JSON in the database:
Component Responsibilities
n8n:
Trigger: Watches a Gmail account for new emails with attachments.
Branching Logic: Parses the subject line to identify “Form 1040” vs. “Form 990” and routes each message accordingly.
API Calls: Sends the PDF attachment to the corresponding Unstract REST endpoint, receives parsed JSON, and then writes data into the target database.
Unstract
Document Parsing: Hosts form-specific extractors trained in Prompt Studio to recognize fields unique to each tax form.
JSON Extraction: Exposes each extractor as a secure REST API that converts uploaded PDFs into structured JSON payloads matching your schema.
Database (NeonDB Postgres)
Storage: Houses two tables—form-1040 and form-990—with columns aligned to the JSON fields produced by Unstract.
Schema Enforcement: Ensures data integrity via proper types and constraints, enabling reliable downstream analytics and reporting.
Step-by-Step Implementation
In this section, we will describe step by step the necessary configurations of each node of the workflow and its connection to the other to process the tax forms.
Click on ‘Create Workflow’. You can rename your workflow to something like ‘Tax Forms’:
In the next sections, we will create and configure each of the necessary nodes.
1. Configure n8n IMAP Trigger
Since this is the first node of the workflow, you can create it by clicking on ‘Add first step’ and selecting ‘Email Trigger (IMAP)’ from the list:
Click on ‘Create new credential’ to associate your app password:
Fill in the required information:
It should be filled as per:
User -> The email address
Password -> The app password defined previously
Host -> imap.gmail.com
Click ‘Save’ to save the credentials.
Automatically, a test connection will be attempted. If all is correct, you will see the success message:
Finally, tick the option to ‘Download Attachments’:
Click ‘Back to canvas’ to return to the workflow editor and add the next node of the workflow.
2. Handle Branching Logic
Use n8n’s Switch node to branch based on form type.
Click on the ‘+’ icon to add a new node after the Email Trigger:
Define the rules by selecting the appropriate field and matching expression:
Note: Here we are doing a single string contains either Form 1040 or Form 990 for simplicity. In true Production settings, you should create a strong regex expression.
3. Call Unstract Extraction API
In order to use the Unstract APIs in n8n, you need to install the Unstract node, which is part of the community nodes.
Navigate to ‘Settings’ -> ‘Community nodes’ and click on ‘Install a community node’:
Fill in the npm package name for the Unstract node, n8n-nodes-unstract, and click ‘Install’:
After a couple of seconds, the community node for Unstract should be installed:
Return to the workflow, click on the ‘+’ icon after the branch named ‘Form 990’ and select the ‘Unstract’ node from the list:
Then click on ‘Create new credential’ and fill in the corresponding API key and Organization ID mentioned previously:
Additionally, in the node configurations, you all need to fill in the corresponding ‘API Deployment Name’, which corresponds to the API Name from the Unstract API definition mentioned previously:
Then repeat the same steps for the Switch branch ‘Form 1040’, including setting a new credential since each API has its own credentials and API name:
4. Store Results in Database
Returning to the workflow editor, the next step is to configure the output of the Unstract APIs to a Postgres node.
Click on the ‘+’ icon after the Unstract node in the branch ‘Form 1040’ and select a Postgres node:
As action, select ‘Insert rows in a table’:
Then click on ‘Create new credential’ and fill in the corresponding parameters from the NeonDB project, as described previously
Note: Don’t forget to tick the Allow SSL option by scrolling down on this pop-up.
Now a field mapping needs to be configured between the Unstract API result and the Postgres corresponding table.
Select the schema, in this case ‘Public’, and the table ‘form-1040’. The dropdowns will populate with real-time info from the database:
Remove the id field and map each of the other fields and columns by dragging from the left INPUT section and dropping into the corresponding table field.
Repeat the process for the other branch ‘Form 990’:
5. Complete Workflow
The complete workflow is defined as:
It gets triggered with a new email (read from IMAP), parses the email subject, chooses the appropriate branch, calls the Unstract AP, sends the attached PDF, and inserts the JSON output into the corresponding Postgres table.
Testing & Validation
Since the workflow is not active by default, to test a workflow, you need to click on ‘Test workflow’.
After that, send an email with the appropriate subject, containing either ‘Form 1040’ or ‘Form 990’, and attach the corresponding tax form PDF.
The workflow will execute and process all the nodes.
You can check the workflow executions by selecting ‘Executions’ when in the ‘Editor’.
Here you can see the corresponding executions, including the paths taken by the workflow, here for ‘Form 1040’:
And here for ‘Form 990’:
Once you have confirmed that the workflow works as expected you can set to Active so it runs automatically. You can do this by ticking the ‘Inactive’ toggle in the editor to change it to ‘Active.
Database Verification
As an additional verification, you can check the stored data in the NeonDB Postgres database by querying the inserted data.
Navigate to ‘Tables’ and select one of the tables, in this case, form-1040:
Can also check table form-990:
End-to-End Document Automation with n8n and Unstract
By combining n8n’s flexible, self-hosted orchestration with Unstract’s LLM-driven extraction APIs, accounting firms can replace tedious, error-prone manual data entry with a reliable, fully automated pipeline.
Incoming emails are automatically filtered and routed based on simple subject-line logic, PDFs are sent to form-specific extractors trained in Prompt Studio, and the resulting structured JSON lands directly in your NeonDB Postgres tables—no human intervention required.
This end-to-end workflow accelerates document processing automation, frees up your team for higher-value work, and ensures consistent data quality.
With retry strategies, failure alerts, and execution logs in place, you maintain full visibility and control over every step of the pipeline.
As your document types evolve, you can rapidly iterate on new extractors in Unstract and seamlessly plug them into your existing n8n workflows.
Ready to take the next step?
Whether you’re handling two tax-form variants today or scaling to dozens of document types tomorrow, Unstract + n8n delivers the agility, accuracy, and efficiency modern accounting teams demand.
Unstract is a no-code platform to eliminate manual processes involving unstructured data using the power of LLMs. The entire process discussed above can be set up without writing a single line of code. And that’s only the beginning. The extraction you set up can be deployed in one click as an API or ETL pipeline.
With API deployments you can expose an API to which you send a PDF or an image and get back structured data in JSON format. Or with an ETL deployment, you can just put files into a Google Drive, Amazon S3 bucket or choose from a variety of sources and the platform will run extractions and store the extracted data into a database or a warehouse like Snowflake automatically. Unstract is an Open Source software and is available at https://github.com/Zipstack/unstract.
If you want to quickly try it out, signup for our free trial. More information here.
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
We do not use cookies of this type.
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.
We do not use cookies of this type.
Analytics cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
We do not use cookies of this type.
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
We do not use cookies of this type.
Unclassified cookies are cookies that we are in the process of classifying, together with the providers of individual cookies.