DocuForge: Adding AI to my paperless-ngx Workflow
How I went from existing paperless-ngx AI tools, to an overengineered 4090-powered orchestrator, to building my own lightweight OCR and tagging pipeline.
I use paperless-ngx to manage documents at home. It already does the important part well: ingest documents, OCR them, store them, and make them searchable.
After using it for a while, I wanted a little more automation:
- better titles;
- automatic tags;
- correspondents;
- document types;
- better handling of awkward scans and phone pictures;
- less manual sorting for every new document.
So naturally, I thought: this sounds like a job for AI. Also naturally, I ended up with a small robot arm pressing the power button of my PC.
Starting with existing projects
I first looked at paperless-gpt and paperless-ai. They proved the idea was useful: connect paperless-ngx to an LLM, analyse the OCR text, and update the document metadata automatically.
But they did not quite fit what I wanted. Some parts felt heavier than necessary, and when I saw paperless-ai using around 1 GB of RAM while idle, I started thinking about building something smaller.
At first, the obvious solution was to use my main AI machine: a desktop with an RTX 4090. That made inference easy, but it also meant leaving a power-hungry machine running in an office that is already warm enough.
The first version: paperless-orchestrator
The first version of my own workflow was a small API running in a Docker container. I called it paperless-orchestrator.
The idea was simple:
- A document is added to paperless-ngx, either manually or by email ingestion.
- paperless-ngx sends a webhook to the orchestrator.
- The orchestrator queues the document for AI processing.
- When enough documents are waiting, it wakes up the 4090 machine.
- The PC runs inference.
- The document metadata is updated.
The queue mattered because I did not want to start the PC for every single document. I set a threshold: only wake the machine when at least five documents were waiting for processing.
Reasonable enough.
Then the hardware got involved.
Wake-on-LAN, except no
My PC has a 10 GbE NIC, which is useful for moving data around quickly, but less useful when it does not support Wake-on-LAN.
So the orchestrator could queue documents, but it had no clean way to wake the inference machine.
The elegant solution would have been to swap the NIC or rethink the architecture. Instead, I used a SwitchBot mini robot arm to press the physical power button.
To make that controllable locally from the rest of the lab, I added an ESP32 Bluetooth proxy for Home Assistant. The orchestrator could then call the Home Assistant API, which would trigger the SwitchBot, which would physically press the PC power button.
This is the sort of thing that makes perfect sense at 11 p.m.
On boot, a PowerShell script launched through Task Scheduler exposed a few local endpoints:
- start Ollama;
- stop Ollama;
- schedule a shutdown.
Overengineered? Yes. Working? Also yes.
Discovering paper-llama
While browsing Reddit, I found paper-llama, a much lighter Python alternative. Its goal was close to mine: connect a private paperless-ngx instance to a private Ollama instance, analyse OCR text, and update document metadata.
I started testing on my M2 Pro MacBook Pro, using smaller models such as Qwen 3.5 9B. That is where I ran into an issue: the model’s thinking budget was eating all of the available context, which made it unusable.
That led to my first pull request on an open-source project. Not a huge change, but still a nice milestone.
The missing fit
After testing more documents, I realised paper-llama still did not fully match my needs. The main issue was model specialization.
The workflow used the same kind of medium-sized model for several different tasks:
- optional OCR;
- tagging;
- title generation;
- metadata extraction.
That can work if you have enough resources. I wanted something that could run comfortably on my MacBook without going back to the 4090-powered robot-arm contraption.
Small general models such as Gemma 4 e4b were good enough for tagging and title generation, but not great for OCR. Difficult documents, especially phone pictures of receipts, were still a problem.
Then I tested GLM-OCR.
It is tiny compared with the general models I had tried, but very specialized. For OCR, it did an excellent job, even on awkward inputs like poorly framed receipt photos.
That was the missing piece.
Enter DocuForge
DocuForge is my own take on the workflow, inspired by paper-llama but built around my constraints. The core design choice is simple: OCR and tagging should not use the same model.
They are different jobs:
- OCR needs a specialized vision/text extraction model.
- Tagging and title generation need a language model that can reason over the extracted text.
When paperless-ngx receives a new document, it sends a webhook to DocuForge. From there, the pipeline:
- fetches the document;
- runs OCR using GLM-OCR;
- updates or prepares the extracted text;
- queues the document for later metadata processing;
- runs tagging and title generation during a configurable time window.
The OCR step is lightweight enough that it can run while I am using the Mac. Tagging is different: Qwen 3.5 9B can use close to 10 GB of RAM even with a modest context size, and my MacBook has 16 GB. That is workable, but not something I want fighting for memory during the day.
So DocuForge has a configurable tagging window: a time range during which heavier LLM processing is allowed to run. That gives me a useful balance:
- documents can be ingested immediately;
- OCR can happen quickly;
- heavier tagging waits until a quiet period;
- the MacBook stays usable during the day;
- the 4090 machine can stay off.
If the Mac is unreachable, DocuForge skips processing and tries again the next day. These are personal documents; nothing explodes if a receipt is tagged tomorrow instead of today. Treating AI enrichment as background work simplified the design.
Why not just use the 4090?
The 4090 is great for local LLM experiments. It is less great as an always-on document-processing appliance:
- Electricity cost: running a large desktop just to classify documents is not ideal.
- Heat: the office is already warm enough.
- Operational complexity: the SwitchBot solution was funny, but not exactly elegant infrastructure.
I still like the absurdity of the robot arm. It worked, and that counts for something. But long term, I prefer the boring solution: smaller models, better task separation, and scheduled background processing.
First Python project
DocuForge is also my first proper Python project.
Most of my professional experience is in TypeScript, NestJS, PostgreSQL, and backend architecture, so Python is not my default environment. This project has been a good excuse to work with a different ecosystem on a concrete problem.
It is not open source yet, but that is the plan once it is stable enough. I want the first public version to be useful, documented, and not just a pile of scripts that only works on my machine.
What I like about this approach
The final architecture is much simpler than where I started. The first version involved:
- paperless-ngx webhooks;
- a custom orchestrator;
- Home Assistant;
- an ESP32 Bluetooth proxy;
- a SwitchBot pressing a physical power button;
- a Windows machine exposing PowerShell-managed endpoints;
- Ollama running on a 4090 desktop.
DocuForge replaces most of that with:
- a webhook;
- a queue;
- a lightweight OCR model;
- a separate tagging model;
- a configurable processing window.
That is still a real system, but it is a much better fit for the problem. The first version was useful because it proved the workflow; once the workflow was proven, the architecture needed to shrink.
What is next
The next steps for DocuForge are:
- finish stabilising the pipeline;
- improve error handling and retries;
- refine prompts for different document types;
- make configuration cleaner;
- write proper documentation;
- open source it.
The goal is not to build a huge AI platform. It is to make paperless-ngx a little smarter while keeping the system private, lightweight, and understandable. Ideally, without needing a robot arm to press a power button.
Although I will admit: I kept it.