Projects - AI Copilot for a Crypto Exchange

AI Copilot for a Crypto Exchange

Designing the UX layer between an LLM and irreversible transactions

Scope

Exchange AI widget. Place crypto orders in natural language, LLM in the transaction path.

Stack

Claude, Figma, Claude Design, FigJam, Notebook LM

Team

Product Designer

Timeline

1 week

The problem

Most "AI features" in fintech are chat windows bolted next to the product: low stakes, low value. The interesting problem is the opposite · putting a probabilistic system inside a flow where the output is an irreversible financial transaction.

That collision defines the entire design space:

An LLM's output is non-deterministic. The same sentence can parse two ways.
A crypto transaction is irreversible. There is no chargeback on-chain.
Therefore the design is the reconciliation layer between the two. Every screen exists to convert a probabilistic guess into a deterministic, user-verified action, or to stop the flow safely when it can't.

The existing manual form flow (amount → review → processing) became the system's deterministic floor: the thing the AI degrades into, never a separate product.

Chapter 1 · Inputs: free text vs. structured orders

Users type intent into a single field: "buy 100 eur of btc to my metamask." The model maps this to a strict order schema (amount, asset, destination, payment method). Four input problems shaped the design.

Bulk intent is a first-class input, not an edge case. This is the real value of replacing a form with a text field. A form processes one order at a time by construction; a sentence has no such limit. "Buy €100 BTC + €50 ETH, send 0.002 BTC to Lena" is one input and three transactions. The parser splits multi-intent input into independent orders and renders them as a batch: each order is priced, validated and confirmed on its own, so one ambiguous or blocked item never stalls the rest. The economics matter too · three manual orders cost three full passes through the form; one sentence costs one. The NL bar doesn't just match the form, it does something the form structurally cannot.

Ambiguity is resolved with one question, never a conversation. "Buy some bitcoin" is missing an amount. The clarification screen asks exactly one targeted question with tappable presets (€50 / €100 / €250) and a custom amount input. No open chat loop · open-ended dialogue invites scope drift and re-parsing risk, and every extra turn is another chance to misread. One question, then straight to the structured order.

Partial parses are kept, not discarded. If the model extracts the asset but not the amount, the user shouldn't retype the whole order. Whatever parsed cleanly carries forward; only the gap is asked about.

The manual form is always one tap away. The NL bar is an accelerant, not a gate. Users who don't trust it, or who get burned once, keep a zero-friction exit via the "Schedule" secondary action.

Chapter 2 · Outputs: never execute raw model output

The core rule of the system: LLM output is a draft, never a command. The parsed order is always rendered as deterministic, editable UI before anything happens.

The parsing screen runs in two visible states. First: a blue sparkle icon with a progress bar, fields populating as the model reads · the user sees their order being assembled in real time rather than waiting on a black box spinner. Second: a green check once parsing completes, fields filled, with the explicit line "Nothing is executed at this step · you'll review and confirm first." The transition from blue to green does quiet trust work: the model's process is visible, not hidden.

Field-level provenance is the centerpiece design decision. Each row in the parsed order carries a pill tag: Stated (the user said it explicitly) or inferred · edit (the model filled it from context · e.g. destination wallet inferred from history, payment method from last use). This does two things:

It directs verification attention to exactly the fields the model guessed · the highest-risk rows · instead of asking the user to re-read everything. Selective attention beats blanket review; people skim confirmations.
It makes the model's assumptions legible without exposing chain-of-thought. The user sees what was inferred, which is what they need to catch an error.

The screen's primary CTA is "Continue" with a secondary "Edit." The footer carries the system's single most important line of microcopy: "AI can misread orders. Nothing executes until you confirm."

The confirm screen adds a second deterministic gate. A hold-to-confirm button with a progress bar rendering underneath it. A single tap is too cheap an action for executing an AI-parsed irreversible transaction; the hold converts confirmation from a reflex into a decision. The quote countdown ("locked 0:47") stays visible, making the time pressure honest rather than hidden.

Chapter 3 · States: latency and staleness are design material

LLM parsing takes seconds, and crypto quotes decay in seconds. Two states most products hide became explicit screens.

Parsing state · two moments, not one. As described above: in-progress (blue, building) transitions to complete (green, done). The two-state treatment exists because the moment of completion is meaningful · it's when the user can first evaluate what the model understood. Collapsing it into a single spinner discards that moment.

Quote-expired state. The user reviewed slower than the quote lived. Rather than silently re-quoting (a dark pattern) or failing the order, the screen shows the new locked rate with a timestamp badge and an explicit accept or cancel choice. The reassurance that everything else in the order is unchanged is stated directly · price movement is the market's fault; making the user re-enter the order would make it the product's fault.

Execution state shows stepped progress: card charged (green check) → converting EUR to BTC (blue, active, with a seconds-remaining badge) → sending to destination (pending). The footer reads: "Safe to leave · we'll notify you when BTC lands in your wallet." Irreversible plus in-flight is the moment of maximum anxiety; granular progress with a permission to leave is the cheapest anxiety reducer available.

Chapter 4 · Failures: three classes, three different exits

Failures were classified by recoverability, and each class got a different flow shape.

Recoverable · loops back into the flow. A hallucinated ticker, "buy 100 eur of BTH", is caught by grounding: suggestions come only from the live asset list, never from the model's memory, so the model cannot invent an asset into existence. The screen leads with the error, shows the closest real match with its source stated plainly, and offers a single primary action to continue with the correction. No order was created, so the user re-enters the flow from the corrected parse.

Degraded · exits to the deterministic floor. Low parse confidence switches the system to the manual form, prefilled with whatever parsed cleanly, with uncertain fields flagged inline: "check this · parsed from 'a hundred'." The AI fails into the product the user already knows. The fallback isn't an error page, it's the original flow. The "You receive" field is also prefilled · the model carries forward every confident value, leaving only the flagged fields for the user to verify.

Terminal · hard stop, no retry on the same address. A destination matching a flagged scam pattern blocks the order entirely. The screen uses the only red in the UI: a red X icon on a pink gradient background, a red-tinted explanation card, and a "Why was this blocked?" link. Everything else in the product is blue or neutral. Red appears exactly once in the palette, and it appears here · where it must read unambiguously as stop. The footer is precise: "Safety checks run on every order, AI-entered or manual, before execution."

Batch failures isolate, never cascade. In a bulk submission, each order carries its own status badge: green "Ready" for parsed and priced, blue "Confirm destination" for orders needing one more input. A blocked order in a batch of three cancels one transaction, not three.

The asymmetry across failure classes is the system's thesis: errors before money moves are conversations; errors that touch safety are walls.

Closing the loop: the post-trade explainer

The received amount never matches the quoted amount · network fee, spread, price movement. This is the top source of "where did my money go" support tickets on any exchange. The success screen pre-empts it with an itemized breakdown (Quoted · Network fee · Spread · Price movement · received) and a text link: "Ask AI about this trade." The same model that parsed the order can explain its outcome. The copilot covers the full lifecycle, not just entry.

Design system notes

The visual language is deliberately separated from any existing exchange brand · neutral enough to be product-agnostic, distinct enough to read as a considered system. Blue is the primary action color throughout: CTAs, active step indicators, the progress bar on parsing, the quote lock badge. The blue accent signals action and progress consistently.

Red appears exactly once · on the safety block screen (H10). That restraint is a design decision: in a product full of blue interactions, a red screen is impossible to mistake for anything other than a hard stop.

The AI's presence is marked with a quiet sparkle glyph (✦) in the parsing icon and the fallback banner, not a gradient persona or a chat bubble. In a money flow, the AI should read as infrastructure, not a character.

Provenance tags (Stated / inferred · edit) use a pill shape with a subtle border, letting them sit in the layout without competing with the values they annotate.

Success metrics (hypotheses)

A shipped version would be judged on: parse acceptance rate (% of parsed orders confirmed without edits · the model-quality north star), edit rate on inferred fields (validates that provenance targeting works), fallback rate (% of sessions exiting to manual · the trust thermometer over time), clarification depth (should hold at one question maximum), orders per session (the bulk-input payoff: NL sessions should produce measurably more transactions than form sessions), and post-trade support tickets about received-vs-quoted amounts (the explainer's job).

What I'd take into any AI product

The fallback should be the product you already have, not an apology screen.
Show the model's assumptions, not its reasoning: provenance beats explanation.
Make the parsing process visible · a two-state loading screen does more trust work than a spinner.
Price confirmation friction to the stakes: a hold gesture for irreversible money, a tap for everything else.
Ground every model suggestion in a source of truth the model cannot override.
Classify failures by recoverability first · the flow shape follows from that, not from error type.
If the input is language, design for plural intent from day one. Batches are where text beats forms.