Extract Endpoint

curl --request POST \
  --url https://api.extract.page/v1/extract \
  --header 'Content-Type: application/json' \
  --header 'X-API-KEY: <api-key>' \
  --data '
{
  "url": "<string>",
  "extract_text": true,
  "extract_images": true,
  "ocr": "auto"
}
'

{
  "chunks": [
    {
      "page_no": 123,
      "page_content": "",
      "bbox": [
        123
      ],
      "chunk_type": "text",
      "confidence": 123,
      "image_url": "<string>",
      "image_b64": "<string>",
      "image_mime": "<string>",
      "image_width": 123,
      "image_height": 123,
      "cells": [
        {
          "row": 123,
          "col": 123,
          "text": "",
          "row_span": 1,
          "col_span": 1,
          "bbox": [
            123
          ],
          "confidence": 123
        }
      ],
      "n_rows": 123,
      "n_cols": 123
    }
  ]
}

POST

extract

Extract Endpoint

curl --request POST \
  --url https://api.extract.page/v1/extract \
  --header 'Content-Type: application/json' \
  --header 'X-API-KEY: <api-key>' \
  --data '
{
  "url": "<string>",
  "extract_text": true,
  "extract_images": true,
  "ocr": "auto"
}
'

{
  "chunks": [
    {
      "page_no": 123,
      "page_content": "",
      "bbox": [
        123
      ],
      "chunk_type": "text",
      "confidence": 123,
      "image_url": "<string>",
      "image_b64": "<string>",
      "image_mime": "<string>",
      "image_width": 123,
      "image_height": 123,
      "cells": [
        {
          "row": 123,
          "col": 123,
          "text": "",
          "row_span": 1,
          "col_span": 1,
          "bbox": [
            123
          ],
          "confidence": 123
        }
      ],
      "n_rows": 123,
      "n_cols": 123
    }
  ]
}

Authorizations

X-API-KEY

string

header

required

Body

application/json

Input to the extraction pipeline.

The hosted API accepts exactly four knobs. Server-side guardrails (page limit, max size, OCR provider, image/OCR thresholds) are intentionally not user-configurable.

url

string | null

extract_text

boolean

default:true

extract_images

boolean

default:true

ocr

enum<string>

default:auto

Available options:

auto,

never,

force

Response

Successful Response

chunks

Chunk · object[]

required

Show child attributes

Extract Extract File Endpoint

API Reference

Documentation Index

Authorizations

Body

Response