API Reference.

Our API is OpenAI-compatible: any client or SDK that accepts a base URL + API key works without changes. The base URL is https://api.nan.builders/v1 and authentication is via Bearer token. To get your key, see Getting Started.

Helmcode enterprise service

If you use the Helmcode enterprise service, remember that the API URL is api.helmcode.com. All other endpoints are identical.

Endpoints

List of available endpoints. Each links to its section with request, response, and a curl example.

List models

GET /v1/models

Chat completions

POST /v1/chat/completions

Text completions

POST /v1/completions

Embeddings

POST /v1/embeddings

Rerank

POST /v1/rerank

Text-to-speech

POST /v1/audio/speech

Speech-to-text

POST /v1/audio/transcriptions

Responses

POST /v1/responses

Image generation

POST /v1/images/generations

Image editing

POST /v1/images/edits

Authentication

All requests require the Authorization: Bearer <api-key> header. The key is personal and non-transferable — see Getting Started to get yours.

curl https://api.nan.builders/v1/models \
  -H "Authorization: Bearer sk-your-key-here"

GET /v1/models

Returns the list of available models for your key. Published models: deepseek-v4-flash, mimo-v2.5, glm5.2, qwen3.6, gemma4, qwen3-embedding, rerank, kokoro, whisper, flux-2-klein (includes the flux-2-klein image model).

Request

No body. Only the authentication header required.

Response

{
  "object": "list",
  "data": [
    {
      "id": "qwen3.6",
      "object": "model",
      "created": 1677610602,
      "owned_by": "openai"
    },
    {
      "id": "glm5.2",
      "object": "model",
      "created": 1677610602,
      "owned_by": "openai"
    }
  ]
}

Example

curl https://api.nan.builders/v1/models \
  -H "Authorization: Bearer sk-your-key-here"

POST /v1/chat/completions

The main chat endpoint. OpenAI Chat Completions compatible. Compatible models: deepseek-v4-flash, mimo-v2.5, glm5.2, qwen3.6, and gemma4.

capabilities by model

deepseek-v4-flash: Chat, streaming, tool calling, reasoning, 1M token context. 500M token monthly quota per member.
mimo-v2.5: Chat, streaming, tool calling, reasoning, vision (image input) and audio (audio input), 1M token context. 500M token monthly quota per member.
glm5.2: Chat, streaming, tool calling, reasoning (emits a reasoning trace), 256K token context. Text only. Focused on coding and long-horizon agentic tasks.
qwen3.6: Chat, streaming, tool calling, vision (image input), reasoning (opt-out, returns reasoning_content in the message).
gemma4: Chat, streaming, vision (image input), reasoning (opt-in).

Request

Field	Type	Description
`model`	string · required	`deepseek-v4-flash`, `mimo-v2.5`, `glm5.2`, `qwen3.6` or `gemma4`.
`messages`	array · required	List of messages `{ role, content }`. `content` can be a string or an array of parts `[{type:"text",text}, {type:"image_url",image_url:{url}}]` for multimodal input.
`max_tokens`	integer · optional	Maximum tokens to generate.
`stream`	boolean · optional	Default `false`. If `true`, the response arrives as SSE.
`tools`	array · optional	Standard OpenAI function calling: `{type:"function",function:{name,description,parameters}}`. Validated only with `qwen3.6`.
`tool_choice`	string \| object · optional	Controls which tool the model can invoke. Standard OpenAI.
`temperature`	number · optional	Default `0.6`.
`top_p`	number · optional	Default `0.95`.

Response

Non-streaming response. finish_reason puede ser stop, length o tool_calls.

{
  "id": "chatcmpl-...",
  "created": 1778258163,
  "model": "qwen3.6",
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "...",
        "reasoning_content": "..."
      }
    }
  ],
  "usage": {
    "completion_tokens": 20,
    "prompt_tokens": 17,
    "total_tokens": 37
  }
}

The reasoning_content field is included only when using qwen3.6. It is optional to ignore it.

Example

curl https://api.nan.builders/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6",
    "messages": [{"role": "user", "content": "Hola"}],
    "max_tokens": 200
  }'

Streaming

With stream: true, the response is delivered as Server-Sent Events. Each chunk is data: {...}\n\n with the delta in choices[0].delta.content. The stream ends with data: [DONE].

curl https://api.nan.builders/v1/chat/completions \
  -N \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6",
    "messages": [{"role": "user", "content": "Cuéntame un chiste corto"}],
    "stream": true
  }'

Tool calling

qwen3.6 supports standard OpenAI function calling. When the model decides to invoke a tool, the response includes choices[0].message.tool_calls with {id, type:"function", function:{name, arguments}} and finish_reason: "tool_calls".

curl https://api.nan.builders/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6",
    "messages": [{"role": "user", "content": "¿Qué tiempo hace en Madrid?"}],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Gets the current weather for a city",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string"}
            },
            "required": ["city"]
          }
        }
      }
    ]
  }'

Vision

mimo-v2.5, qwen3.6 and gemma4 accept multimodal input. The content field of the message changes from string to an array of parts of type text and/or image_url. mimo-v2.5 also accepts input_audio as part of content.

curl https://api.nan.builders/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "¿Qué hay en esta imagen?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/foto.jpg"}}
      ]
    }]
  }'

Structured outputs

Chat models accept the standard OpenAI response_format field to force valid JSON responses. We support both modes:

json_object: Garantiza que la respuesta sea JSON sintácticamente válido. No impone estructura.
json_schema: Restricts output to a specific JSON Schema. With strict: true the model cannot emit fields outside the schema.

Funciona en qwen3.6 y gemma4.

json_object:

curl https://api.nan.builders/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6",
    "messages": [
      {"role": "user", "content": "Devuelve un objeto user con name=Alice y age=30."}
    ],
    "response_format": { "type": "json_object" }
  }'

json_schema (strict):

curl https://api.nan.builders/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6",
    "messages": [
      {"role": "user", "content": "Alice, 30 años."}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "user",
        "strict": true,
        "schema": {
          "type": "object",
          "properties": {
            "name": { "type": "string" },
            "age":  { "type": "integer" }
          },
          "required": ["name", "age"],
          "additionalProperties": false
        }
      }
    }
  }'

Con el SDK de openai en Python:

from openai import OpenAI

client = OpenAI(
  api_key="sk-your-key-here",
  base_url="https://api.nan.builders/v1"
)

response = client.chat.completions.create(
  model="qwen3.6",
  messages=[{"role": "user", "content": "Alice, 30 años."}],
  response_format={
    "type": "json_schema",
    "json_schema": {
      "name": "user",
      "strict": True,
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "age":  {"type": "integer"}
        },
        "required": ["name", "age"],
        "additionalProperties": False
      }
    }
  }
)

import json
data = json.loads(response.choices[0].message.content)
print(data["name"], data["age"])

Reasoning

All five models generate reasoning and return it in choices[0].message.reasoning_content. The control mechanism varies by model:

Model	Control
`qwen3.6`	`chat_template_kwargs.enable_thinking` · activo por defecto
`gemma4`	`chat_template_kwargs.enable_thinking` · desactivado por defecto
`deepseek-v4-flash`	`reasoning_effort`: `low` \| `medium` \| `high` · default `medium`
`mimo-v2.5`	siempre activo · no configurable por API hoy
`glm5.2`	emite `reasoning_content` · enfoque coding agéntico

enable_thinking (qwen3.6, gemma4)

Binary toggle. The field goes in the request body as chat_template_kwargs.enable_thinking:

curl https://api.nan.builders/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma4",
    "messages": [{"role": "user", "content": "Qué es 2+2?"}],
    "chat_template_kwargs": { "enable_thinking": true }
  }'

To disable it on qwen3.6 pass { "enable_thinking": false }.

En SDKs como openai de Python o Node, este campo va dentro de extra_body:

from openai import OpenAI

client = OpenAI(
  api_key="sk-your-key-here",
  base_url="https://api.nan.builders/v1"
)

response = client.chat.completions.create(
  model="gemma4",
  messages=[{"role": "user", "content": "Qué es 2+2?"}],
  extra_body={"chat_template_kwargs": {"enable_thinking": True}}
)

print(response.choices[0].message.reasoning_content)

reasoning_effort (deepseek-v4-flash)

Standard OpenAI parameter. Accepts low, medium, or high and goes as a top-level body field — not inside extra_body. If not provided, defaults to medium.

curl https://api.nan.builders/v1/chat/completions \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Resuelve paso a paso: 3x + 7 = 22"}],
    "reasoning_effort": "high"
  }'

Con el SDK openai de Python:

from openai import OpenAI

client = OpenAI(
  api_key="sk-your-key-here",
  base_url="https://api.nan.builders/v1"
)

response = client.chat.completions.create(
  model="deepseek-v4-flash",
  messages=[{"role": "user", "content": "Resuelve paso a paso: 3x + 7 = 22"}],
  reasoning_effort="high"
)

print(response.choices[0].message.reasoning_content)
print(response.choices[0].message.content)

A más effort, más tokens dedicados al razonamiento y mejor calidad en problemas complejos — a cambio de latencia y consumo de tu cuota mensual.

mimo-v2.5

MiMo V2.5 always reasons and emits reasoning_content in every response. Currently upstream Xiaomi ignores both reasoning_effort and enable_thinking, so the reasoning level is not configurable via the API. If you need to control it, use deepseek-v4-flash.

POST /v1/completions

Endpoint legacy de OpenAI para text completion. Compatible model: qwen3.6.

Request

Field	Type	Description
`model`	string · required	`qwen3.6`.
`prompt`	string · required	The prompt to complete.
`max_tokens`	integer · optional	Maximum tokens to generate.
`temperature`	number · optional	Default `0.6`.
`top_p`	number · optional	Default `0.95`.
`stream`	boolean · optional	Default `false`.

Response

{
  "id": "cmpl-...",
  "object": "text_completion",
  "created": 1778258166,
  "model": "qwen3.6",
  "choices": [
    {
      "text": "...",
      "index": 0,
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "completion_tokens": 10,
    "prompt_tokens": 5,
    "total_tokens": 15
  }
}

Example

curl https://api.nan.builders/v1/completions \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6",
    "prompt": "The capital of France is",
    "max_tokens": 10
  }'

Notes

Endpoint legacy de OpenAI. Para conversaciones, usa /v1/chat/completions.

POST /v1/embeddings

Genera embeddings vectoriales. Compatible model: qwen3-embedding. Vectores de 4096 dimensiones.

Request

Field	Type	Description
`model`	string · required	`qwen3-embedding`.
`input`	string \| array · required	Single text or array of strings to embed.
`encoding_format`	string · optional	`"float"` (default) o `"base64"`.

Response

{
  "object": "list",
  "model": "qwen3-embedding",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0210, 0.0105, -0.0204, "..."]
    }
  ],
  "usage": {
    "prompt_tokens": 3,
    "total_tokens": 3
  }
}

Example

curl https://api.nan.builders/v1/embeddings \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-embedding",
    "input": ["Hola mundo", "Hello world"]
  }'

POST /v1/rerank

Reordena una lista de documentos por relevancia a una query. Compatible model: rerank (Qwen3-Reranker-8B). Completa el stack RAG junto a qwen3-embedding: primero recuperas top-K por embeddings, después reordenas con rerank. Soporta 100+ idiomas, recuperación de código y búsqueda cross-lingual. Endpoint alias: /v2/rerank.

Request

Field	Type	Description
`model`	string · required	`rerank`.
`query`	string · required	Query against which each document’s relevance is measured.
`documents`	array · required	Array de strings a reordenar. La respuesta los returns ordenados de mayor a menor `relevance_score` con su `index` original.
`top_n`	integer · optional	Limita la respuesta a los `N` documentos más relevantes. Por defecto returns todos.

Response

{
  "id": "score-a032ee5767cab0ee",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.7390941977500916,
      "document": {
        "text": "Paris is the capital of France."
      }
    },
    {
      "index": 1,
      "relevance_score": 0.6002889275550842,
      "document": {
        "text": "Berlin is the capital of Germany."
      }
    },
    {
      "index": 2,
      "relevance_score": 0.12374333292245865,
      "document": {
        "text": "Madrid is the capital of Spain."
      }
    }
  ],
  "meta": {
    "billed_units": {
      "total_tokens": 43
    },
    "tokens": {
      "input_tokens": 43
    }
  }
}

La respuesta incluye id, results (array de {index, relevance_score, document}) y meta con billed_units y conteo de tokens. relevance_score está en el rango [0, 1]. El index se refiere a la posición original del documento en el array de entrada.

Example

curl https://api.nan.builders/v1/rerank \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rerank",
    "query": "What is the capital of France?",
    "documents": [
      "Paris is the capital of France and home to the Eiffel Tower.",
      "Berlin is the capital of Germany.",
      "Madrid is the capital of Spain."
    ]
  }'

Con el SDK de openai de Python (usando post directo, ya que rerank no forma parte del cliente OpenAI):

import os
from openai import OpenAI

client = OpenAI(
  api_key=os.environ["NAN_API_KEY"],
  base_url="https://api.nan.builders/v1"
)

response = client.post(
  path="/rerank",
  cast_to=object,
  body={
    "model": "rerank",
    "query": "What is the capital of France?",
    "documents": [
      "Paris is the capital of France and home to the Eiffel Tower.",
      "Berlin is the capital of Germany.",
      "Madrid is the capital of Spain.",
    ],
  },
)

for r in response["results"]:
    print(r["index"], r["relevance_score"])

POST /v1/audio/speech

Synthesizes audio from text (text-to-speech). Compatible model: kokoro.

Request

Field	Type	Description
`model`	string · required	`kokoro`.
`input`	string · required	Text to synthesize.
`voice`	string · required	Voice to use. Some options: `af_heart` (English female), `ef_dora` (Spanish female), `em_alex` (Spanish male). See full list.
`response_format`	string · optional	Format of the returned audio. Validated: `mp3` (default), `wav`, `flac`, `aac`, `pcm`, `opus`.
`speed`	number · optional	Default `1.0`.

Response

Binary audio file in the requested format (without JSON wrapper).

Example

curl https://api.nan.builders/v1/audio/speech \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Bienvenido a NaN.",
    "voice": "ef_dora",
    "response_format": "mp3"
  }' \
  -o speech.mp3

POST /v1/audio/transcriptions

Transcribes audio to text (speech-to-text). Compatible model: whisper. The request is multipart/form-data.

Request

Field	Type	Description
`file`	file · required	Audio file to transcribe.
`model`	string · required	`whisper`.
`language`	string · optional	ISO-639-1 code (ej. `es`, `en`). If not provided, detected automatically.
`response_format`	string · optional	Validated: `json` (default) y `verbose_json`. Other values work but return content wrapped in JSON; we recommend only these two.
`timestamp_granularities[]`	string · optional	Only with `verbose_json`. Valores: `word` (timestamps por palabra) o `segment` (default).
`temperature`	number · optional	Sampling temperature.

Response

Example with response_format=verbose_json:

{
  "text": "Hola, esto es una prueba.",
  "language": "es",
  "task": "transcribe",
  "duration": 1.728,
  "segments": [
    {
      "id": 1,
      "start": 0.0,
      "end": 1.4,
      "text": " Hola, esto es una prueba.",
      "tokens": [50365, 22637, "..."],
      "avg_logprob": -0.059,
      "compression_ratio": 0.806,
      "no_speech_prob": 0.044,
      "temperature": 0.0
    }
  ],
  "words": null
}

If you pass timestamp_granularities[]=word, el campo words is populated with [{word, start, end, probability}].

Example

curl https://api.nan.builders/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-key-here" \
  -F "model=whisper" \
  -F "file=@grabacion.mp3" \
  -F "language=es" \
  -F "response_format=verbose_json"

Limitations

limitaciones conocidas

Tamaño máximo por request — 25 MB: Límite de tamaño del archivo de audio.
Audios > 2 min pueden devolver timeout 524: Recomendamos dividir en segmentos de ≤ 2 min.
Formats recomendados: OGG/Opus y MP3 — mejor compresión, misma calidad de transcripción.

POST /v1/responses

OpenAI-style Responses endpoint. Modelos compatibles: qwen3.6 y gemma4.

Request

Field	Type	Description
`model`	string · required	`qwen3.6` o `gemma4`.
`input`	string \| array · required	Single text or array of messages in OpenAI Responses format.
`max_output_tokens`	integer · optional	Default `65536` en `qwen3.6`.
`temperature`	number · optional	Default `0.6`.
`top_p`	number · optional	Default `0.95`.
`instructions`	string · optional	System instructions.

Response

The array output can contain blocks of type reasoning (solo qwen3.6) y message.

{
  "id": "resp_...",
  "created_at": 1778258181,
  "model": "qwen3.6",
  "object": "response",
  "status": "completed",
  "output": [
    {
      "id": "rs_...",
      "type": "reasoning",
      "summary": [],
      "content": [
        { "type": "reasoning_text", "text": "..." }
      ]
    },
    {
      "id": "msg_...",
      "type": "message",
      "role": "assistant",
      "status": "completed",
      "content": [
        { "type": "output_text", "text": "Hola.", "annotations": [] }
      ]
    }
  ],
  "usage": {
    "input_tokens": 17,
    "output_tokens": 118,
    "total_tokens": 135
  }
}

Example

curl https://api.nan.builders/v1/responses \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6",
    "input": "Hola, ¿cómo estás?"
  }'

Notes

Streaming on this endpoint currently delivers a single event response.completed at the end, not incremental chunks. For token-by-token streaming use /v1/chat/completions con stream: true.

POST /v1/images/generations

Generates images from text (text-to-image). Compatible with OpenAI’s Images API. Compatible model: flux-2-klein (only model available today; the endpoint is designed to add more). The body is JSON.

Request

Field	Type	Description
`prompt`	string · required	Textual description of the image to generate.
`model`	string · optional	Default `flux-2-klein` (único modelo disponible). An unknown model returns `404` (`model_not_found`).
`n`	integer · optional	Number of images to generate, between `1` y `4`. Default `1`. A value greater than `4` returns `400`.
`size`	string · optional	Format `"ANCHOxALTO"` con both sides divisible by 16, each between 256 and 1536, and aspect ratio between 1:3 and 3:1. Standard values like `1024x1024`, `1536x1024` o `1024x1536` work. `"auto"` or omitted → `1024x1024`.
`response_format`	string · optional	`"url"` (default) o `"b64_json"`. Con `url` a temporary R2 link is returned valid ~60 minutos (same contract as OpenAI). Con `b64_json` the image bytes are returned inline as base64.

Parámetros aceptados e ignorados

For OpenAI SDK compatibility, the following are accepted quality, style, background, moderation, output_format, output_compression y user, pero are ignored — Flux does not act on them. Además, stream: true is not supported y returns 400.

Parámetros adicionales (extensiones NaN)

These parameters no forman parte de la Images API de OpenAI. Con el SDK de openai are passed via extra_body.

Field	Type	Description
`seed`	integer · optional	Base seed for reproducibility. Each variant (when `n > 1`) starts from an offset on this value.
`guidance`	number · optional	FLUX guidance scale.

Response

Same wrapper as OpenAI’s Images API. created is the Unix timestamp in seconds.

{
  "created": 1778258200,
  "data": [
    { "url": "https://...r2.../image.png" }
  ]
}

Con response_format=b64_json, cada elemento de data es { "b64_json": "..." } instead of { "url": "..." }.

Example

curl https://api.nan.builders/v1/images/generations \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "flux-2-klein",
    "prompt": "Un faro al atardecer sobre acantilados, estilo cinemático",
    "size": "1024x1024"
  }'

response_format=b64_json:

curl https://api.nan.builders/v1/images/generations \
  -H "Authorization: Bearer sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "flux-2-klein",
    "prompt": "Un faro al atardecer sobre acantilados",
    "size": "1024x1024",
    "response_format": "b64_json"
  }'

Con el SDK de openai en Python (the extensions seed y guidance van en extra_body):

from openai import OpenAI

client = OpenAI(
  api_key="sk-your-key-here",
  base_url="https://api.nan.builders/v1"
)

response = client.images.generate(
  model="flux-2-klein",
  prompt="Un faro al atardecer sobre acantilados, estilo cinemático",
  size="1024x1024",
  n=1,
  extra_body={"seed": 42, "guidance": 3.5}
)

print(response.data[0].url)

Rate limits y cuota

La generación de imágenes no pasa por LiteLLM, así que los límites por key de LiteLLM (ver Rate limits) no le aplican. Generar imágenes no consume tu presupuesto de rpm del chat, y viceversa. Estos límites aplican igual a la API y a la consola web, y son propios de los endpoints de imágenes:

Límite	Valor	Descripción
Rate limit	`1 req/s · burst 3`	1 request por segundo sostenido, con burst de hasta 3 (puedes disparar hasta 3 generaciones seguidas sin error). Al excederlo returns `429` (`rate_limit_exceeded`).
Cuota mensual	`100 requests / mes`	100 requests por mes y por usuario (1 request = 1 uso, independientemente del valor de `n`). Al excederla returns `429` (`insufficient_quota`). Esta cuota es independiente del límite de 500M tokens/mes del chat.
Tier	`inference`	Requiere membresía de tier inference. Las keys de tier community reciben `403` (`tier_restricted`).

POST /v1/images/edits

Generates an image from one or more reference images (image-to-image). Compatible with OpenAI’s Images API. Compatible model: flux-2-klein. The request is multipart/form-data. Aplican la misma membresía inference-tier y la misma cuota mensual de 100 requests que /v1/images/generations.

Request

Field	Type	Description
`image` / `image[]`	file · required	One or more reference images (up to 4; extras are discarded). PNG, JPEG, or WebP, each < 25 MB.
`prompt`	string · required	Description of the edit or transformation to apply.
`model`, `n`, `size`, `response_format`	optional	Same behavior as in /v1/images/generations. Las extensiones `seed` y `guidance` are also accepted (as form fields).

mask no soportado

El parámetro mask is not supported y returns 400 — Flux Klein does not do inpainting.

Response

Same wrapper as /v1/images/generations: { "created": ..., "data": [{ "url": "..." }] } (o elementos { "b64_json": "..." } con response_format=b64_json).

Example

curl https://api.nan.builders/v1/images/edits \
  -H "Authorization: Bearer sk-your-key-here" \
  -F "model=flux-2-klein" \
  -F "image[]=@ref.png" \
  -F "prompt=Convierte la escena en invierno con nieve" \
  -F "size=1024x1024"

Errors

Errors follow the standard OpenAI format: HTTP non-2xx status with a JSON body describing the problem.

{
  "error": {
    "message": "...",
    "type": null,
    "param": null,
    "code": "..."
  }
}

Código	Descripción
400	Parámetro inválido — the body includes `param` with the field that failed (e.g. `prompt`, `n`, `size`, `stream`, `mask` o `image` en los endpoints de imágenes). El filtro de seguridad returns `content_policy_violation`.
401	`Authorization` header invalid or missing (`invalid_api_key`).
403	Your tier does not have access to the endpoint (`tier_restricted`). Image generation requires inference membership.
404	Model does not exist (campo `model`, `model_not_found`).
429	Rate limit exceeded — `rpm_limit` o `max_parallel_requests` (`rate_limit_exceeded`), or monthly quota exhausted (`quota_exceeded` / `insufficient_quota`, como la de 100 requests de imágenes).
500	Internal error (includes upstream model errors).
524	Timeout (typical with large audios on /v1/audio/transcriptions).

Rate limits

rate limits por API key

Requests / min: 60 rpm
Paralelo máximo: 5 concurrentes

tokens / min por modelo

deepseek-v4-flash: 1.5M tpm
mimo-v2.5: 1.5M tpm
qwen3.6: 1.5M tpm
gemma4: 1.5M tpm

requests / min por modelo

rerank: 1000 rpm

← anterior Getting Started siguiente → Models