Skip to main content
POST
/
v1
/
extract
Extract Endpoint
curl --request POST \
  --url https://api.extract.page/v1/extract \
  --header 'Content-Type: application/json' \
  --header 'X-API-KEY: <api-key>' \
  --data '
{
  "url": "<string>",
  "extract_text": true,
  "extract_images": true,
  "ocr": "auto"
}
'
{
  "chunks": [
    {
      "page_no": 123,
      "page_content": "",
      "bbox": [
        123
      ],
      "chunk_type": "text",
      "confidence": 123,
      "image_url": "<string>",
      "image_b64": "<string>",
      "image_mime": "<string>",
      "image_width": 123,
      "image_height": 123,
      "cells": [
        {
          "row": 123,
          "col": 123,
          "text": "",
          "row_span": 1,
          "col_span": 1,
          "bbox": [
            123
          ],
          "confidence": 123
        }
      ],
      "n_rows": 123,
      "n_cols": 123
    }
  ]
}

Documentation Index

Fetch the complete documentation index at: https://docs.extract.page/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

X-API-KEY
string
header
required

Body

application/json

Input to the extraction pipeline.

The hosted API accepts exactly four knobs. Server-side guardrails (page limit, max size, OCR provider, image/OCR thresholds) are intentionally not user-configurable.

url
string | null
extract_text
boolean
default:true
extract_images
boolean
default:true
ocr
enum<string>
default:auto
Available options:
auto,
never,
force

Response

Successful Response

chunks
Chunk · object[]
required