Skip to main content
Document extraction outputs structured field, table, stamp, and handwriting information. This document shows how to trigger extraction and parse core results.

Trigger Extraction

Docflow’s default business workflow is parsing->classification->extraction.
Therefore, after uploading files, extraction results can be obtained by default in the final step.
curl -X POST \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  -F "file=@/path/to/invoice.pdf" \
  "https://docflow.textin.ai/api/app-api/sip/platform/v2/file/upload?workspace_id=<your-workspace-id>&batch_number=<your-batch-number>"
You can also combine with category to specify document category, which will skip automatic classification to match corresponding field templates:
curl -X POST \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  -F "file=@/path/to/invoice.pdf" \
  "https://docflow.textin.ai/api/app-api/sip/platform/v2/file/upload?workspace_id=<your-workspace-id>&category=invoice"

Get and Parse Extraction Results

curl \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  "https://docflow.textin.ai/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&batch_number=<your-batch-number>"
Results are returned in JSON structure. result.files[].recognition_status indicates the file recognition status.Extraction-related statuses include:
  • 0: Pending recognition
  • 1: Extraction successful
  • 2: Extraction failed
Field and table data are located in result.files[].data. Key properties include:
  • fields[]: Key-value pairs, each containing key, value, and position[] (can be used for drawing coordinates)
  • items[][]: Table row key-value pair collections
  • stamps[]: Stamp information
  • handwritings[]: Handwriting information

Python Example: Print Fields and Tables

Python
import requests, json

host = "https://docflow.textin.ai"
url = "/api/app-api/sip/platform/v2/file/fetch"

resp = requests.get(
    f"{host}{url}",
    params={"workspace_id": "<your-workspace-id>", "batch_number": "<your-batch-number>"},
    headers={"x-ti-app-id": "<your-app-id>", "x-ti-secret-code": "<your-secret-code>"},
    timeout=60,
)

data = resp.json()
for file in data.get("result", {}).get("files", []):
    print("==>", file.get("name"))
    # Basic fields
    for kv in (file.get("data", {}).get("fields", []) or []):
        print(kv.get("key"), ":", kv.get("value"))

    # Tables (items by row)
    for row in (file.get("data", {}).get("items", []) or []):
        row_dict = {cell.get("key"): cell.get("value") for cell in row}
        print("ROW:", json.dumps(row_dict, ensure_ascii=False))

Linking Page Coordinates for Visualization

By combining files[].pages[] properties (widthheightangledpi) with fields[].position[].vertices, you can accurately draw field bounding boxes on the frontend.Refer to Parsing Result Visualization for details.

Associate Page Coordinates for Visualization

Combining files[].pages[]’s width/height/angle/dpi with fields[].position[].vertices, you can accurately draw field boxes on the frontend. See Parsing Result Visualization for details.

Key Return Field Example (Excerpt)

{
  "code": 200,
  "result": {
    "files": [
      {
        "id": "202412190001",
        "name": "invoice.pdf",
        "format": "pdf",
        "pages": [
          {"angle": 0, "width": 1024, "height": 1448, "dpi": 144}
        ],
        "data": {
          "fields": [
            {
              "key": "Invoice Code",
              "value": "3100231130",
              "position": [
                {"page": 0, "vertices": [0,0,100,0,100,100,0,100]}
              ]
            }
          ],
          "items": [
            [
              {"key": "Goods/Services Name", "value": "*Electronic Computer*Microcomputer Host"},
              {"key": "Specification/Model", "value": "DMS-SC68"}
            ]
          ],
          "tables": [
            {"tableName": "table1", "tableType": "0", "items": []}
          ],
          "stamps": [
            {"page": 0, "text": "National Unified Invoice Supervision Seal", "type": "Other", "color": "Red"}
          ],
          "handwritings": [
            {
              "page": 0,
              "text": "March 1st",
              "position": [{"page": 0, "vertices": [0,0,100,0,100,100,0,100]}]
            }
          ],
          "invoiceVerifyResult": {
            "invoiceVerifyStatus": 0,
            "invoiceVerifyErrorCode": 0,
            "invoiceVerifyCanRetry": 1
          }
        },
        "document": {
          "pages": [
            {
              "angle": 0,
              "width": 1024,
              "height": 1448,
              "lines": [
                {"text": "Electronic Invoice (General Invoice)", "position": [389,45,767,45,767,87,389,87]}
              ]
            }
          ]
        }
      }
    ]
  }
}
I