Introduction
# DocStrange by Nanonets
Document extraction API — convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring.
> **Get your API key:** https://docstrange.nanonets.com/app
## Quick Start
```bash curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \ -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \ -F "[email protected]" \ -F "output_format=markdown" ```
Response: ```json { "success": true, "record_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "result": { "markdown": { "content": "# Invoice\n\n**Invoice Number:** INV-2024-001..." } } } ```
## Setup
### 1. Get Your API Key
```bash # Visit the dashboard https://docstrange.nanonets.com/app ```
Save your API key: ```bash export DOCSTRANGE_API_KEY="your_api_key_here" ```
### 2. OpenClaw Configuration (Optional)
**Recommended: Use environment variables** (most secure): ```json5 { skills: { entries: { "docstrange": { enabled: true, // API key loaded from environment variable DOCSTRANGE_API_KEY }, }, }, } ```
**Alternative: Store in config file** (use with caution): ```json5 { skills: { entries: { "docstrange": { enabled: true, env: { DOCSTRANGE_API_KEY: "your_api_key_here", }, }, }, }, } ```
**Security Note:** If storing API keys in `~/.openclaw/openclaw.json`: - Set file permissions: `chmod 600 ~/.openclaw/openclaw.json` - Never commit this file to version control - Prefer environment variables or your agent's secret store when possible - Rotate keys regularly and limit API key permissions if supported
## Common Tasks
### Extract to Markdown
```bash curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \ -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \ -F "[email protected]" \ -F "output_format=markdown" ```
Access content: `response["result"]["markdown"]["content"]`
### Extract JSON Fields
**Simple field list:** ```bash curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \ -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \ -F "[email protected]" \ -F "output_format=json" \ -F 'json_options=["invoice_number", "date", "total_amount", "vendor"]' \ -F "include_metadata=confidence_score" ```
**With JSON schema:** ```bash curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \ -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \ -F "[email protected]" \ -F "output_format=json" \ -F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}' ```
Response with confidence scores: ```json { "result": { "json": { "content": { "invoice_number": "INV-2024-001", "total_amount": 500.00 }, "metadata": { "confidence_score": { "invoice_number": 98, "total_amount": 99 } } } } } ```
### Extract Tables to CSV
```bash curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \ -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \ -F "[email protected]" \ -F "output_format=csv" \ -F "csv_options=table" ```
### Async Extraction (Large Documents)
For documents >5 pages, use async and poll:
**Queue the document:** ```bash curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/async" \ -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \ -F "[email protected]" \ -F "output_format=markdown"
# Returns: {"record_id": "12345", "status": "processing"} ```
**Poll for results:** ```bash curl -X GET "https://extraction-api.nanonets.com/api/v1/extract/results/12345" \ -H "Authorization: Bearer $DOCSTRANGE_API_KEY"
# Returns: {"status": "completed", "result": {...}} ```
## Advanced Features
### Bounding Boxes Get element coordinates for layout analysis: ```bash -F "include_metadata=bounding_boxes" ```
### Hierarchy Output Extract document structure (sections, tables, key-value pairs): ```bash -F "json_options=hierarchy_output" ```
### Financial Documents Mode Enhanced table and number formatting: ```bash -F "markdown_options=financial-docs" ```
### Custom Instructions Guide extraction with prompts: ```bash -F "custom_instructions=Focus on financial data. Ignore headers." -F "prompt_mode=append" ```
### Multiple Formats Request multiple formats in one call: ```bash -F "output_format=markdown,json" ```
## When to Use
### Use DocStrange For: - Invoice and receipt processing - Contract text extraction - Bank statement parsing - Form digitization - Image OCR (scanned documents)
### Don't Use For: - Documents >5 pages with sync (use async) - Video/audio transcription - Non-document images
## Best Practices
| Document Size | Endpoint | Notes | |---------------|----------|-------| | <=5 pages | `/extract/sync` | Immediate response | | >5 pages | `/extract/async` | Poll for results |
**JSON Extraction:** - Field list: `["field1", "field2"]` — quick extractions - JSON schema: `{"type": "object", ...}` — strict typing, nested data
**Confidence Scores:** - Add `include_metadata=confidence_score` - Scores are 0-100 per field - Review fields <80 manually
## Schema Templates
### Invoice ```json { "type": "object", "properties": { "invoice_number": {"type": "string"}, "date": {"type": "string"}, "vendor": {"type": "string"}, "total": {"type": "number"}, "line_items": { "type": "array", "items": { "type": "object", "properties": { "description": {"type": "string"}, "quantity": {"type": "number"}, "price": {"type": "number"} } } } } } ```
### Receipt ```json { "type": "object", "properties": { "merchant": {"type": "string"}, "date": {"type": "string"}, "total": {"type": "number"}, "items": { "type": "array", "items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}} } } } ```
## Security & Privacy
### Data Handling
**Important:** Documents uploaded to DocStrange are transmitted to `https://extraction-api.nanonets.com` and processed on external servers.
**Before uploading sensitive documents:** - Review Nanonets' privacy policy and data retention policies: https://docstrange.nanonets.com/docs - Verify encryption in transit (HTTPS) and at rest - Confirm data deletion/retention timelines - Test with non-sensitive sample documents first
**Best practices:** - Do not upload highly sensitive PII (SSNs, medical records, financial account numbers) until you've confirmed the service's security and compliance posture - Use API keys with limited permissions/scopes if available - Rotate API keys regularly (every 90 days recommended) - Monitor API usage logs for unauthorized access - Never log or commit API keys to repositories or examples
### File Size Limits
- **Sync endpoint:** Recommended for documents ≤5 pages - **Async endpoint:** Use for documents >5 pages to avoid timeouts - **Large files:** Consider using `file_url` with publicly accessible URLs instead of uploading large files directly
### Operational Safeguards
- Always use environment variables or secure secret stores for API keys - Never include real API keys in code examples or documentation - Use placeholder values like `"your_api_key_here"` in examples - Set appropriate file permissions on configuration files (600 for JSON configs) - Enable API key rotation and monitor usage through the dashboard
## Troubleshooting
**400 Bad Request:** - Provide exactly one input: `file`, `file_url`, or `file_base64` - Verify API key is valid
**Sync Timeout:** - Use async for documents >5 pages - Poll `/extract/results/{record_id}`
**Missing Confidence Scores:** - Requires `json_options` (field list or schema) - Add `include_metadata=confidence_score`
**Authentication Errors:** - Verify `DOCSTRANGE_API_KEY` environment variable is set - Check API key hasn't expired or been revoked - Ensure no extra whitespace in API key value
## Pre-Publish Security Checklist
Before publishing or updating this skill, verify:
- [ ] `package.json` declares `requiredEnv` and `primaryEnv` for `DOCSTRANGE_API_KEY` - [ ] `package.json` lists API endpoints in `endpoints` array - [ ] All code examples use placeholder values (`"your_api_key_here"`) not real keys - [ ] No API keys or secrets are embedded in `SKILL.md` or `package.json` - [ ] Security & Privacy section documents data handling and risks - [ ] Configuration examples include security warnings for plaintext storage - [ ] File permission guidance is included for config files
## References
- **API Docs:** https://docstrange.nanonets.com/docs - **Get API Key:** https://docstrange.nanonets.com/app - **Privacy Policy:** https://docstrange.nanonets.com/docs (check for privacy/data policy links)