How to Extract Tables in Images / PDF with Python in 5min?

How to Extract Tables in Images / PDF with Python in 5min?

·

3 min read

In this tutorial, you will learn how to use OCR Table API in 5 minutes using Python and Eden AI OCR Table API. Eden AI provides an easy and developer-friendly API that allows you to extract tables in your files.

What is OCR Table API?

Table OCR lets you extract tabular data from PDFs and images in one shot. It uses a combination of Optical Character Recognition (OCR) and machine learning models that allow you to select and extract whole tables from images for later analysis, making it a valuable tool for businesses that handle large volumes of documents. Other document types like receipts, invoices, resumes, IDs, etc., also follow the same layout and benefit from Table OCR's capabilities.

Getting Started with OCR and Table Extraction API

The first step to getting started with OCR Table is to install Python's requests package, that will allow you to call Eden AI API.

Next, you'll need to install Python's JSON package to be able to read and print the result of the API request.

How to extract table in a PDF or an image with Python

You are now ready to process your file into Eden AI OCR Table API. You can process files in .pdf, .jpg, .png or .jpeg and documents in many languages.

1. Get an OCR Table API Key on Eden AI

To perform OCR Table, you'll need to create an account on Eden AI for free. Then, you will be able to get your API key direclty from the homepage with free credits offered by Eden AI.

Eden AI platform - Get your API key

Get your API key for FREE

2. Let’s Extract Tables with OCR

Now that you have imported packages on Python and got your API key, you will be able to extract text in your file. With Eden AI, you can choose from a wide range of different engines you want for OCR Table. You can see OCR Table providers available on Eden AI on our documentation here.

Here is the Python script you need to write on your notebook:

For example, we called two different OCR Table engines. Eden AI API will then return in its JSON response results of those providers.

Eden AI OCR Table API is an asynchronous API. It means that you will get in response an ID:

Then you will need to perform a GET request to check the status of the API request (success, processing, failed):

You will first get this response:

Once the request is done (status : finished), you will be able to get the result with this print:

Here is an example of a result for OCR Table task:

Benefits of using OCR Table API with Eden AI

Using Table Extraction with Eden AI API is quick and easy.

Multiple AIs in one API - Eden AI

Save time and cost

We offer a unified API for all providers: simple and standard to use, with a quick switch between providers and an access to the specific features of each provider.

Easy to integrate

The JSON output format is the same for all suppliers thanks to Eden AI's standardisation work. The response elements are also standardised thanks to Eden AI's powerful matching algorithms.

Customization

With Eden AI you have the possibility to integrate a third party platform: we can quickly develop connectors. To go further and customize your OCR Table request with specific parameters, check out our documentation.

Create your Account on Eden AI