Introduction
# Upstage Document Parse
Extract structured content from documents using Upstage's Document Parse API.
## Supported Formats
PDF (up to 1000 pages with async), PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP, DOCX, PPTX, XLSX, HWP
## Installation
```bash openclaw install upstage-document-parse ```
## API Key Setup
1. Get your API key from [Upstage Console](https://console.upstage.ai) 2. Configure the API key:
```bash openclaw config set skills.entries.upstage-document-parse.apiKey "your-api-key" ```
Or add to `~/.openclaw/openclaw.json`:
```json5 { "skills": { "entries": { "upstage-document-parse": { "apiKey": "your-api-key" } } } } ```
## Usage Examples
Just ask the agent to parse your document:
``` "Parse this PDF: ~/Documents/report.pdf" "Parse: ~/Documents/report.jpg" ```
---
## Sync API (Small Documents)
For small documents (recommended < 20 pages).
### Parameters
| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `model` | string | required | Use `document-parse` (latest) or `document-parse-nightly` | | `document` | file | required | Document file to parse | | `mode` | string | `standard` | `standard` (text-focused), `enhanced` (complex tables/images), `auto` | | `ocr` | string | `auto` | `auto` (images only) or `force` (always OCR) | | `output_formats` | string | `['html']` | `text`, `html`, `markdown` (array format) | | `coordinates` | boolean | `true` | Include bounding box coordinates | | `base64_encoding` | string | `[]` | Elements to base64: `["table"]`, `["figure"]`, etc. | | `chart_recognition` | boolean | `true` | Convert charts to tables (Beta) | | `merge_multipage_tables` | boolean | `false` | Merge tables across pages (Beta, max 20 pages if true) |
### Basic Parsing
```bash curl -X POST "https://api.upstage.ai/v1/document-digitization" \ -H "Authorization: Bearer $UPSTAGE_API_KEY" \ -F "document=@/path/to/file.pdf" \ -F "model=document-parse" ```
### Extract Markdown
```bash curl -X POST "https://api.upstage.ai/v1/document-digitization" \ -H "Authorization: Bearer $UPSTAGE_API_KEY" \ -F "[email protected]" \ -F "model=document-parse" \ -F "output_formats=['markdown']" ```
### Enhanced Mode for Complex Documents
```bash curl -X POST "https://api.upstage.ai/v1/document-digitization" \ -H "Authorization: Bearer $UPSTAGE_API_KEY" \ -F "[email protected]" \ -F "model=document-parse" \ -F "mode=enhanced" \ -F "output_formats=['html', 'markdown']" ```
### Force OCR for Scanned Documents
```bash curl -X POST "https://api.upstage.ai/v1/document-digitization" \ -H "Authorization: Bearer $UPSTAGE_API_KEY" \ -F "[email protected]" \ -F "model=document-parse" \ -F "ocr=force" ```
### Extract Table Images as Base64
```bash curl -X POST "https://api.upstage.ai/v1/document-digitization" \ -H "Authorization: Bearer $UPSTAGE_API_KEY" \ -F "[email protected]" \ -F "model=document-parse" \ -F "base64_encoding=['table']" ```
---
## Response Structure
```json { "api": "2.0", "model": "document-parse-251217", "content": { "html": "<h1>...</h1>", "markdown": "# ...", "text": "..." }, "elements": [ { "id": 0, "category": "heading1", "content": { "html": "...", "markdown": "...", "text": "..." }, "page": 1, "coordinates": [{"x": 0.06, "y": 0.05}, ...] } ], "usage": { "pages": 1 } } ```
### Element Categories
`paragraph`, `heading1`, `heading2`, `heading3`, `list`, `table`, `figure`, `chart`, `equation`, `caption`, `header`, `footer`, `index`, `footnote`
---
## Async API (Large Documents)
For documents up to 1000 pages. Documents are processed in batches of 10 pages.
### Submit Request
```bash curl -X POST "https://api.upstage.ai/v1/document-digitization/async" \ -H "Authorization: Bearer $UPSTAGE_API_KEY" \ -F "[email protected]" \ -F "model=document-parse" \ -F "output_formats=['markdown']" ```
Response: ```json {"request_id": "uuid-here"} ```
### Check Status & Get Results
```bash curl "https://api.upstage.ai/v1/document-digitization/requests/{request_id}" \ -H "Authorization: Bearer $UPSTAGE_API_KEY" ```
Response includes `download_url` for each batch (available for 30 days).
### List All Requests
```bash curl "https://api.upstage.ai/v1/document-digitization/requests" \ -H "Authorization: Bearer $UPSTAGE_API_KEY" ```
### Status Values
- `submitted`: Request received - `started`: Processing in progress - `completed`: Ready for download - `failed`: Error occurred (check `failure_message`)
### Notes
- Results stored for 30 days - Download URLs expire after 15 minutes (re-fetch status to get new URLs) - Documents split into batches of up to 10 pages
---
## Python Usage
```python import requests
api_key = "up_xxx"
# Sync with open("doc.pdf", "rb") as f: response = requests.post( "https://api.upstage.ai/v1/document-digitization", headers={"Authorization": f"Bearer {api_key}"}, files={"document": f}, data={"model": "document-parse", "output_formats": "['markdown']"} ) print(response.json()["content"]["markdown"])
# Async for large docs with open("large.pdf", "rb") as f: r = requests.post( "https://api.upstage.ai/v1/document-digitization/async", headers={"Authorization": f"Bearer {api_key}"}, files={"document": f}, data={"model": "document-parse"} ) request_id = r.json()["request_id"]
# Poll for results import time while True: status = requests.get( f"https://api.upstage.ai/v1/document-digitization/requests/{request_id}", headers={"Authorization": f"Bearer {api_key}"} ).json() if status["status"] == "completed": break time.sleep(5) ```
## LangChain Integration
```python from langchain_upstage import UpstageDocumentParseLoader
loader = UpstageDocumentParseLoader( file_path="document.pdf", output_format="markdown", ocr="auto" ) docs = loader.load() ```
---
## Environment Variable (Alternative)
You can also set the API key as an environment variable:
```bash export UPSTAGE_API_KEY="your-api-key" ```
---
## Tips
- Use `mode=enhanced` for complex tables, charts, images - Use `mode=auto` to let API decide per page - Use async API for documents > 20 pages - Use `ocr=force` for scanned PDFs or images - `merge_multipage_tables=true` combines split tables (max 20 pages with enhanced mode) - Results from async API available for 30 days - Server-side timeout: 5 minutes per request (sync API) - Standard documents process in ~3 seconds