What the AI extracts

You have submitted your sources and the job is running. The engine ID came back in the 202, and its configuration is filling in. This page answers the question that decides whether you trust the result: what, exactly, is filling in?

"An AI configured my engine" is the sentence that makes an engineer wary, and the wariness is the right instinct. It could mean a black box you cannot inspect. It could mean records scattered across locales you cannot account for. It could mean the agent read a thin source, found nothing, and quietly created almost nothing. So this page is concrete about all three: the agent produces three kinds of configuration, each maps to your locales by a rule you can predict, and the job hands back a summary that names every record it created. The output is ordinary records you can read and edit – not a verdict you have to take on faith.

New to async provisioning? Start with the Async Provisioning API overview for the mental model, and Source types for what makes a source worth submitting. This page is about what comes out the other side.

On this page

The three components
How each maps to a locale
The output summary
Reading a thin summary
Next steps

The three components#

The agent reads everything – crawled pages and raw content alike – and creates three kinds of engine configuration. They are not a new, provisioning-only format. They are the exact same primitives you would otherwise create by hand on an engine, which is why everything the agent makes is editable afterward in the dashboard, the same way you would edit anything you created yourself.

Component	What it looks for	Example
Brand voices	Tone, style, formality level, writing conventions	"Use formal German (Sie-form). Keep sentences concise and direct."
Glossary items	Product names, technical terms, brand-specific translations, non-translatable terms	"Acme" → non-translatable, "workspace" → "Arbeitsbereich" (de)
Instructions	Formatting rules, cultural conventions, domain-specific guidelines	"Always format dates as DD.MM.YYYY in German translations."

These are the three things that make a translation sound like your product rather than a generic rendering – the formality you have chosen, the names you never translate, the date format you always use. The agent's job is to find those decisions wherever they are stated in your sources and write them down as records.

One consequence worth stating plainly, because it sets the ceiling on what you should expect back: the agent extracts what is stated, not what is implied. A source that says a rule out loud yields a record; a source that merely demonstrates good tone without naming a rule yields little. That is a property of the sources, not the engine – Source types covers how to pick sources that say their rules out loud.

How each maps to a locale#

A localization engine's configuration is keyed by target locale, so a record is not just what a rule is – it is where the rule applies. The agent assigns each record a locale by a rule you can predict, and the * wildcard is the part worth understanding before you read the output.

Brand voices and instructions use * when they apply across all languages. A tone rule like "keep sentences concise and direct" is not specific to German; it is how your product writes in every language. The agent assigns it the * target locale, and it applies to every locale the engine translates into. A rule that genuinely is language-specific ("use Sie-form in German") is assigned to that locale instead.
Glossary items are created per locale pair, because a translation is always from one language into a specific other one – "workspace" → "Arbeitsbereich" is a fact about German, and only German.
Non-translatable terms are the exception, and they use *. A brand name you never translate – "Acme" – is non-translatable in every language, so it is stored once against * rather than re-entered for each locale pair.

So when you see * in a record the job created, it is not a placeholder or a gap. It means "this applies everywhere" – a global tone rule, a global instruction, or a term that is never translated in any language. A specific locale code means the opposite: this rule is scoped to exactly that language.

Why the wildcard is a feature, not a default to override

A skeptical reading of * is "the agent didn't bother to figure out which locale this belongs to." It is the reverse. A brand voice or a non-translatable term that is correct in every language should be global – pinning it to one locale would mean it silently fails to apply to the others. The wildcard is how the configuration says "this is true regardless of language," which is exactly what a tone rule or a brand name usually is.

The output summary#

When the job completes, it returns a summary that names everything the agent created. This is the receipt: every record, counted and identified, plus a list of anything that failed.

json

{
  "brandVoices": {
    "count": 3,
    "ids": ["bv_A1b2C3d4", "bv_B2c3D4e5", "bv_C3d4E5f6"]
  },
  "glossaryItems": {
    "count": 12,
    "ids": ["gi_A1b2C3d4", "gi_B2c3D4e5", "..."]
  },
  "instructions": {
    "count": 5,
    "ids": ["ins_A1b2C3d4", "ins_B2c3D4e5", "..."]
  },
  "errors": []
}

Each component reports a count and the ids of the records created – bv_ for brand voices, gi_ for glossary items, ins_ for instructions. Those are not opaque acknowledgements; they are the IDs of real records on the engine. You can take any gi_ from this list, open it in the dashboard, and read or change exactly what the agent extracted. The summary is how you go from "the AI did something" to "here are the twenty specific things it did," which is the whole difference between a black box and ordinary records you can read and edit.

The summary reaches you on the channel you set up when you created the job: in the webhook payload your callback URL receives on completion, where it arrives as the summary field. If you are watching the job over the WebSocket, that is a liveness feed – it streams crawling and configuring progress, not this summary object. The summary travels with the completion webhook; the WebSocket tells you when to go read it.

A failed item does not fail the job.

If a single record cannot be created, it does not sink the rest. The failure is recorded in the errors array, the records that succeeded are still applied to the engine, and the job still completes. You get a partially configured engine plus a precise list of what to revisit – not an empty engine and a stack trace. The job fails as a whole when the run produces nothing to work from – for example, every source fails to crawl; that failure case, and its provisioning.failed payload, lives on Webhook delivery.

Reading a thin summary#

The summary tells you not only what was created but, by its counts, whether the run was worth it. A count of 0 for a component is not an error – the summary is well-formed and the engine exists – but it is information. Three brand voices and twelve glossary items is a configured engine. Zero of everything and an empty errors array is an engine that came back nearly blank, and the agent is telling you it found few rules to extract.

When that happens, the cause is almost always upstream: the sources stated few concrete rules for the agent to lift. The summary is where you notice it; Source types is where you fix it. The honest expectation to carry into your first run is that the receipt only reflects what your sources actually said – a rich summary means rich sources, and a thin one means there was little to find.

That is why the summary matters as much as the engine: it lets you verify the configuration instead of assuming it. Read the counts, open a few records by their IDs, confirm the agent caught what you expected – ordinary records you can read and edit, with a receipt that tells you precisely what to check.

Next steps#

Source types

What makes a source worth submitting – and why a thin summary usually traces back to here.

Webhook delivery

Receive the summary at your callback URL on completion, and the error payload on failure.

Live progress (WebSocket)

Watch crawling and configuring steps live as the engine fills in – then read the summary from the completion webhook.

Translate with your new engine

Once the records are in place, fan content out to every locale through the async Localization API.