Retab, a Paris and San Francisco-based startup focused on document automation, comes out of stealth with $3.5M (nearly €3M) in pre-seed funding and the launch of its platform.

The round came from early-stage investors, including VentureFriends, Kima Ventures, and K5 Global. Individual backers include Eric Schmidt (via StemAI), Olivier Pomel, CEO of Datadog, and Florian Douetteau, CEO of Dataiku.


Contentlockr

Founded by engineers aiming to improve how large language models process documents, Retab is addressing common challenges developers face when applying AI to paperwork.

The funding will be used to develop the platform, expand the community, and scale infrastructure to support growing demand from vertical AI startups and internal innovation teams.

Next-gen document automation

Retab is an AI agent that builds document extraction pipelines through a developer-focused platform and Software Development Kit (SDK).

Co-founder and CEO, Louis de Benoist, says, “People keep building demos that look like magic, but break the moment you put them into production. We lived that pain ourselves. Wiring up fragile pipelines just to extract a few fields from a PDF. We built Retab because it’s the developer-first platform we always wished we had.”

Founded by engineers Louis de Benoist, Sacha Ichbiah, and Victor Plaisance, the company helps users convert unstructured documents, such as PDFs and handwritten scans, into structured data.

Users describe their data needs, upload documents, and Retab handles extraction logic, dataset labelling, evaluations, and model selection.

The platform routes tasks to the most suitable model and updates automatically as better models become available. It supports workflows involving contracts, invoices, and compliance documents, aiming to reduce manual processes. 

Retab’s orchestration layer connects with large language models from providers like OpenAI, Google, and Anthropic, serving as an interface between developers and model infrastructure.

The co-founders developed the platform based on their experience creating internal automation tools for logistics workflows. Their early focus on orchestration over output led to the foundation of Retab, which multiple companies now use to extract structured data from complex document inputs.

An operating system for document extraction

Retab’s platform operates through a system of checks and balances designed to ensure consistent output. Its AI agent uses self-optimising schemas that test and refine instructions based on user documents before deployment. 

The platform supports model-agnostic routing, automatically benchmarking models and assigning tasks based on performance goals such as cost, speed, or accuracy.

Retab also applies guided reasoning and a k-LLM consensus mechanism, which requires models to follow step-by-step logic and compares outputs from multiple models to assess uncertainty. This system supports the delivery of structured data with a focus on reliability across varied document types and use cases.

“Retab is the OS for reliably extracting structured data,” says de Benoist. “It wraps the best models in a layer of logic that actually makes them usable with error handling and structured outputs. That’s what devs need if they want to build production apps, not just prototypes.”

Customers in sectors like logistics, finance, and healthcare are using Retab to automate complex document workflows with minimal setup. 

For instance, a trucking company used Retab to meet high accuracy requirements while reducing costs. A financial firm now extracts key insights from lengthy reports in a fraction of the time. Other use cases include claims processing, medical records, and identity verification.

Florian Douetteau, investor in Retab and co-founder & CEO of Dataiku, says, “The AI-fication of the economy depends on the capability to convert operations based on millions of documents into verified, structured data that autonomous systems can utilise. On a large scale, this process hinges on quality control, cost efficiency, and rapid implementation. The team at Retab understands this thoroughly and is uniquely positioned to solve it for the thousands of AI-first companies that are emerging.”

What’s next?

Retab is expanding its platform to handle data extraction from websites and is introducing integrations with automation tools, including n8n, Zapier, and Dify. 

The company aims to become a middleware layer that connects unstructured data with AI agents, enabling use across documents such as loan files, contracts, and customs records.

With a team of 10 and a growing developer community, Retab is positioning its platform as part of the AI infrastructure stack, designed to help users build and scale data-driven workflows.