Ask a question to a dataset using LLM

Ask a question to an existing vector dataset. Returns the dataset_id and a question_id that can be used to track the question using the endpoint +GET /dataset/{dataset_id}/ai_question/{question_id}.

Base URL

Use the URL of the Autodrive instance that is being used, usually following the pattern: https://app-intelligence-{deployment_customers}.dadosfera.ai/service-auto-drive-api-{instance_id}

How to Find the Base URL

Method 1: GitHub Access

  1. Access the GitHub Repository: Open the project's GitHub repository: https://github.com/dadosfera/demo-generative-document-analyzer-deep-lake/blob/main/prd.azure.deploy_config.yaml
  2. Look for the Autodrive Instance and Customer desired:
  • Check if the deploy_apiis true (if it falseyou can not access by API).
  • Get the instance_idand the deployment_customers
`data_app-9999-9999-9999-9999-9999-99999999999:
  instance_id: 9999-9999-9999-9999-9999-9999999999
  deployment_customers:
    - dadosferademo
  app_tier: professional
  language: pt
  temp_cluster: false
  tags:
    - "deployment_env: prd"
    - "version: 1.0.0"
    - "lifecycle: live"
    - "app_tier: professional"
    - "dataapp: autodrive"
    - "language: pt-br"
  logger_level: INFO
  deploy_api: 'true'

The base url will be https://app-intelligence-{deployment_customers}.dadosfera.ai/service-auto-drive-api-{instance_id}


Method 2: Access to an Autodrive Instance

  1. Check the Browser's Address:
    the browser address of the instance of the Autodrive will be like: https://app-intelligence-{deployment_customers}.dadosfera.ai/service-auto-drive-{instance_id}

  2. Not every instance of autodrive has the API functionality available. Please contact [email protected] to confirm if your autodrive includes this feature.

Endpoint

The endpoint is <base url>/dataset/{dataset_id}/ai_question

Headers

  • Authorization: Basic <base64_encoded_credentials>

Parameters

  • dataset_id (str, required): Unique identifier of the dataset to which the question will be directed.
  • question (str, required): The string containing the question.
  • metadata_filter (dict, optional): A dictionary containing filters to adjust the search based on the dataset's metadata. Example: {"category": "finance"}
  • distance_metric (str, optional): Defines the distance metric used to calculate relevance. Accepted values:
    • "cos" (default): Cosine similarity.
    • "L2": Euclidean distance.
    • "L1": Manhattan distance.
    • "max": Maximum distance.
    • "dot": Dot product.
  • maximize_marginal_relevance (bool, optional): If True, uses marginal relevance maximization to optimize the diversity of the results. The default value is True.
  • fetch_k (int, optional): The number of possible answers to be retrieved from the dataset. The default is 10.
  • k (int, optional): The number of final answers that will be returned to the user. The default is 3.

API key

To obtain the api_key, you need the instance_id of Autodrive, which can be found directly in the URL, as described in the "Base URL".

To generate the API key, we perform Base64 encoding:
Authorization: Basic <Base64_encode(admin:instance_id)>

Code exemple to get api_key

import base64

username = 'admin'
instance_id = '9999-9999-9999-9999-9999-9999999999'
credentials = f"{username}:{instance_id}"

# Codifica as credenciais em Base64
credentials_bytes = credentials.encode('utf-8')
base64_bytes = base64.b64encode(credentials_bytes)

# Converte de volta para string
base64_credentials = base64_bytes.decode('utf-8')

api_key = base64_credentials
print(api_key)

Exemple Request


import requests
import json

# Define the parameters
base_url = "https://api.example.com"  # Base URL of the API
dataset_id = "abc123"  # Unique identifier of the dataset that was created when creates a vector dataset
question = "What is the revenue for Q1 2024?"  # The question being asked


# Headers including authentication
headers = {
    'Authorization': f"Basic {api_key}",  # Replace with your actual Authorization
    'Content-Type': 'application/json'  # Specify that the content is in JSON format
}

# Request body
data = {
    "question": question,
    "metadata_filter": metadata_filter,
    "distance_metric": "cos",  # Cosine similarity metric
    "maximize_marginal_relevance": True,  # Optimize for diversity in results
    "fetch_k": 10,  # Number of possible answers to retrieve
    "k": 3  # Number of final answers to return
}

# Make the POST request
response = requests.post(
    url=f"{base_url}/dataset/{dataset_id}/ai_question",  # API endpoint for submitting a question
    headers=headers,
    data=json.dumps(data)  # Convert Python dictionary to JSON
)