Creates a vector dataset from a set of files
Allows users to upload files and creates a corresponding vector dataset.
The API will return an dataset_id as soon as the process of creation of the vector dataset starts in background. To check if a dataset is ready, use the following route /dataset/{dataset_id}.
Base URL
Use the URL of the Autodrive instance that is being used, usually following the pattern: https://app-intelligence-{deployment_customers}.dadosfera.ai/service-auto-drive-api-{instance_id}
How to Find the Base URL
Method 1: GitHub Access
-
Access the GitHub Repository:
- Open the project's GitHub repository: https://github.com/dadosfera/demo-generative-document-analyzer-deep-lake/blob/main/prd.azure.deploy_config.yaml
-
Look for the Autodrive Instance and Customer desired:
- Check if the
deploy_api
istrue
(if itfalse
you can not access by API). - Get the
instance_id
and thedeployment_customers
data_app-9999-9999-9999-9999-9999-99999999999: instance_id: 9999-9999-9999-9999-9999-9999999999 deployment_customers: - dadosferademo app_tier: professional language: pt temp_cluster: false tags: - "deployment_env: prd" - "version: 1.0.0" - "lifecycle: live" - "app_tier: professional" - "dataapp: autodrive" - "language: pt-br" logger_level: INFO deploy_api: 'true'
- Check if the
The base url will be https://app-intelligence-{deployment_customers}.dadosfera.ai/service-auto-drive-api-{instance_id}
Method 2: Access to an Autodrive Instance
-
Check the Browser's Address:
the browser address of the instance of the Autodrive will be like:https://app-intelligence-{deployment_customers}.dadosfera.ai/service-auto-drive-{intance_id}
-
Not every instance of autodrive has the API functionality available. Please contact [email protected] to confirm if your autodrive includes this feature.
Endpoint
The endpont is <base url>/upload
Headers
- Authorization: Basic <base64_encoded_credentials>
Parameters
- OCR Method
- Key:
ocr_method
- Description: Defines the OCR method to be used for processing the files.
- Type: string
- Required: Yes
- Allowed Values:
common
,premium
- Key:
API key
To obtain the api_key
, you need the instance_id
of Autodrive, which can be found directly in the URL, as described in the "Base URL".
To generate the API key, we perform Base64 encoding:
Authorization: Basic <Base64_encode(admin:instance_id)>
Code exemple to get api_key
import base64
username = 'admin'
instance_id = '9999-9999-9999-9999-9999-9999999999'
credentials = f"{username}:{instance_id}"
# Encode credencias with Base64
credentials_bytes = credentials.encode('utf-8')
base64_bytes = base64.b64encode(credentials_bytes)
# Decode to string
base64_credentials = base64_bytes.decode('utf-8')
api_key = base64_credentials
print(api_key)
Exemple Request
import requests
import json
# Defining variables
base_url = "http://app-intelligence-customer-name.dadosfera.ai/service-auto-drive-api-9999-9999-9999-9999-9999-9999999999"
filepaths = ["document1.pdf", "document2.pdf"]
ocr_method = "common"
api_key = "123456789abcdef123456789abcdef123456789abcdef123456789abcdef1234"
# Creating payload of files
files = [("files", (file_path.split("/")[-1], open(file_path, "rb"))) for file_path in filepaths]
# Datas e headers
data = {'ocr_method': ocr_method}
headers = {
'Authorization': f"Basic {api_key}"
}
# Fazendo o upload dos arquivos
response = requests.post(f"{base_url}/upload", headers=headers, files=files, data=data)
print(response.json())
Exemple Response
{'dataset_id': '99999999-9999-9999-9999-999999999999'}