Creates a vector dataset from a set of files

Creates a vector dataset from a set of files

Allows users to upload files and creates a corresponding vector dataset.

The API will return an dataset_id as soon as the process of creation of the vector dataset starts in background. To check if a dataset is ready, use the following route /dataset/{dataset_id}.

Base URL

Use the URL of the Autodrive instance that is being used, usually following the pattern: https://app-intelligence-{deployment_customers}.dadosfera.ai/service-auto-drive-api-{instance_id}

How to Find the Base URL

Method 1: GitHub Access

  1. Access the GitHub Repository:

  2. Look for the Autodrive Instance and Customer desired:

    • Check if the deploy_apiis true (if it falseyou can not access by API).
    • Get the instance_idand the deployment_customers
    data_app-9999-9999-9999-9999-9999-99999999999:
      instance_id: 9999-9999-9999-9999-9999-9999999999
      deployment_customers:
        - dadosferademo
      app_tier: professional
      language: pt
      temp_cluster: false
      tags:
        - "deployment_env: prd"
        - "version: 1.0.0"
        - "lifecycle: live"
        - "app_tier: professional"
        - "dataapp: autodrive"
        - "language: pt-br"
      logger_level: INFO
      deploy_api: 'true'
    

The base url will be https://app-intelligence-{deployment_customers}.dadosfera.ai/service-auto-drive-api-{instance_id}


Method 2: Access to an Autodrive Instance

  1. Check the Browser's Address:
    the browser address of the instance of the Autodrive will be like: https://app-intelligence-{deployment_customers}.dadosfera.ai/service-auto-drive-{intance_id}

  2. Not every instance of autodrive has the API functionality available. Please contact [email protected] to confirm if your autodrive includes this feature.


Endpoint

The endpont is <base url>/upload

Headers

  • Authorization: Basic <base64_encoded_credentials>

Parameters

  • OCR Method
    • Key: ocr_method
    • Description: Defines the OCR method to be used for processing the files.
    • Type: string
    • Required: Yes
    • Allowed Values: common,premium

API key

To obtain the api_key, you need the instance_id of Autodrive, which can be found directly in the URL, as described in the "Base URL".

To generate the API key, we perform Base64 encoding:
Authorization: Basic <Base64_encode(admin:instance_id)>

Code exemple to get api_key

import base64

username = 'admin'
instance_id = '9999-9999-9999-9999-9999-9999999999'
credentials = f"{username}:{instance_id}"

# Encode credencias with Base64
credentials_bytes = credentials.encode('utf-8')
base64_bytes = base64.b64encode(credentials_bytes)

# Decode to string
base64_credentials = base64_bytes.decode('utf-8')

api_key = base64_credentials
print(api_key)

Exemple Request

import requests
import json

# Defining variables
base_url = "http://app-intelligence-customer-name.dadosfera.ai/service-auto-drive-api-9999-9999-9999-9999-9999-9999999999"
filepaths = ["document1.pdf", "document2.pdf"]
ocr_method = "common"  
api_key = "123456789abcdef123456789abcdef123456789abcdef123456789abcdef1234"

# Creating payload of files
files = [("files", (file_path.split("/")[-1], open(file_path, "rb"))) for file_path in filepaths]

# Datas e headers
data = {'ocr_method': ocr_method}
headers = {
    'Authorization': f"Basic {api_key}"
}
# Fazendo o upload dos arquivos
response = requests.post(f"{base_url}/upload", headers=headers, files=files, data=data)
print(response.json())

Exemple Response

{'dataset_id': '99999999-9999-9999-9999-999999999999'}

Language
Click Try It! to start a request and see the response here!