Human in the Loop

A/B/C/AI: when experiments become infinite and invisible

Published
October 3, 2025
Share article
“The initial mission was: let’s replace A/B tests altogether from start to finish, let’s replace all of it by using AI.”
Asaf Yanai, CEO of Alison.ai

Description

Forget manual experiments. Artificial intelligence can generate, run and retire design variants automatically. Testing becomes infinite and invisible. This article dives into how AI driven experimentation is transforming A/B testing, and its C, D and beyond variants, for UX/UI designers and business owners.

So, what changes for a designer day to day? You spend less time wiring tests, more time defining constraints and success.

The evolution from A/B to A/B/C/AI

A/B testing is a tried and true method, design two variants, split traffic and pick a winner. While this disciplined approach brought scientific thinking into design and marketing, it has serious limitations:

  • Limited options , traditional experiments compare two variants at a time, which makes it impractical to test multiple elements simultaneously, as noted in this overview of AI in A/B testing.
  • Slow results , a significant amount of data is needed to achieve statistical power. Only 1 in 7 A/B tests lead to big conversion wins, so teams often wait weeks or months before seeing improvements.
  • One size fits all , focusing on average responses overlooks how different segments behave, a pattern described under tailored testing.

AI transforms these constraints. Instead of painstakingly designing and deploying a handful of versions, AI powered systems can ideate, generate and test hundreds or thousands of variants in near real time. Mengying Li and Ankur Goyal of Braintrust describe this shift succinctly, “A/B testing assumes it’s expensive to create variants… AI eliminates this constraint. You can now have 20 variants, or as many variants as you have users, or just one that updates automatically every 30 minutes based on real user feedback”, from their piece on A/B testing vs evals. In other words, experimentation moves from discrete comparisons to continuous adaptation.

But do we really need more than two variants? When the cost of generating and routing variants approaches zero, breadth helps you find depth faster.

Standard vs AI driven testing

| Aspect | Standard A/B testing | AI powered experimentation | | | |:---------------:|:--------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------:|---|---| | Test options | 2–3 variants | Multiple or infinite variants, updated dynamically, as argued in A/B testing vs evals and in this guide to AI experimentation | | | | Duration | Weeks to months | Days to weeks, often continuous, see this section on faster analysis | | | | Personalization | One size fits all | Tailored to segments and even individuals via real time personalization | | | | Data analysis | Manual and time consuming | Automated, real time analysis and prioritization using AI assisted analysis | | | | Optimization | Post test implementation | Ongoing optimization and automatic traffic allocation in the new operating model | | | | Agentic UX | AI agents act autonomously via APIs and schemas (approach overview). | Use when tasks can be fully delegated, but always maintain human oversight. | | |

Won’t this burn traffic? Counterintuitively, smarter allocation cuts waste because fewer users see underperformers for long.

What is AI driven A/B testing?

AI driven A/B testing is the use of generative or predictive AI throughout the experimentation workflow. According to Kameleoon’s primer on AI in A/B testing, AI can be applied across four core areas:

  1. Test ideation , AI generates hypotheses, copy and design ideas for new variants, see Kameleoon’s section on ideation from research.
  2. Data analysis and modelling , machine learning models build propensity models, analyze test data and synthesize qualitative research, as outlined under modelling and analysis.
  3. Personalization , AI predicts which variant each visitor is likely to prefer and delivers personalized experiences in real time, described here as hyper personalization.
  4. Process optimization , AI summarizes themes in large data sets and prioritizes the testing backlog, speeding up workflows through workflow optimization.

Generative AI produces text, images, video or code, while predictive AI forecasts outcomes based on historical data, see Kameleoon’s note on generative vs predictive AI. In combination, they enable tools that can conceptualize new designs, build test scripts, allocate traffic dynamically and retire losing variants without human intervention.

Is this just rebranding old ideas? Not really, the ingredients are familiar, but automation now spans the whole loop, not just analysis.

Why it matters: benefits and business impact

AI powered experimentation is not a futuristic novelty, it is already driving results.

  • Scaling experimentation , high performing companies are doubling down on testing. More than half invest heavily in client side experimentation, according to Kameleoon’s overview of A/B testing programs. AI accelerates this by orchestrating multiple teams, identifying bottlenecks and producing reports automatically, as covered under scale with AI.
  • Bigger wins through deeper insights , as programs mature, incremental gains become harder to find. AI helps uncover hidden opportunities by analyzing large data sets and suggesting novel ideas, see move beyond low impact tests.
  • Real time personalization , consumers crave tailored experiences. Seventy percent of customers say that understanding their needs affects loyalty. AI driven tests adapt copy, images and flows to individual users, creating hyper personalized journeys via AI personalization.
  • Efficiency and resource savings , Toyota reportedly generated 97 percent more leads with predictive targeting. AI opportunity detection can uncover sub segments where a losing variant performs well, producing an average 15 percent uplift that would otherwise be missed.
  • Accelerated decision making , Google runs over 10,000 A/B tests each year, and AI helps manage this massive testing load. By automating analysis and traffic allocation, teams can focus on strategy instead of statistics.

So, what if we do not have clean data? Start with the decisions you already make, then let AI propose variants while you tighten instrumentation in parallel.

Callout: Embrace AI or fall behind

“If you wait for the market to settle, others will integrate these tools and become more efficient long before you even start thinking about it.”

Craig Sullivan, experimentation expert, quoted in this take on adopt AI or lose ground

How AI generates and retires design variants

The heart of AI driven experimentation is a feedback loop where machines propose, test, learn and evolve. The following Mermaid diagram illustrates this cycle:

flowchart LR
  A[Ideation with generative AI] --> B[Generate multiple design variants]
  B --> C[Dynamic traffic allocation]
  C --> D[Real-time data collection]
  D --> E[Automated analysis & evaluation]
  E -->|Losing variants identified| F[Retire or tweak variants]
  E -->|Winning patterns identified| G[Personalize experiences]
  F --> B
  G --> B

Do we lose control if AI keeps shipping variants? You keep control by setting guardrails, metrics and stop conditions, then reviewing changes.

Explanation:

  • Ideation and generation , generative models produce new copy, layouts and visuals based on prompts. Kameleoon explains how to turn paper sketches into high fidelity designs and generate variants from text prompts.
  • Dynamic traffic allocation , instead of splitting users 50/50, AI adjusts the distribution using techniques like multi armed bandits. When the cost of creating variants dissolves, the system can test dozens of versions or even a unique experience per user.
  • Automated analysis , advanced algorithms measure results faster and more accurately. Convert lists features such as automatic traffic allocation and faster significance.
  • Retire and refine , underperforming variants pause automatically, while winning patterns inform the next generation. This creates an invisible yet continuous evolution of the experience.

Where do prompts live, in code or in the CMS? Either can work, the key is versioning prompts like code and tying them to metrics.

Stories from the trenches

Mining customer feedback with AI

Dave Mullen, a conversion optimization consultant, uses GPT 3 to summarize large numbers of customer feedback quotes. He asks the model to produce short summaries and clarity scores so he can spot the most actionable comments. In his words, AI lets him scan “survey data that you can summarize and quantify in minutes, when it would have taken hours”, with summaries sitting beside the original quotes, described in this write up on AI for research synthesis.

So, does this replace user research? No, it speeds the reading so you can spend time on interpretation.

AI as an ideation partner

Iqbal Ali, who collaborates with optimizer Craig Sullivan, describes GPT 3 as a powerful assistant for brainstorming and text mining, but not a magic bullet. He notes that results were initially inconsistent, “with the right pre processing… the results are now much improved”. The true power lies in the API, which allows combining AI with other tools to achieve strong outcomes, as he explains in this piece on prompting and pipelines.

What if ideas feel generic? Tighten your inputs, add context, and ask for three sharply different directions instead of ten similar ones.

Replacing A/B tests altogether

Asaf Yanai, CEO of Alison.ai, openly questions the need for conventional experiments. In a podcast interview he said the company’s mission is to “replace A/B tests altogether from start to finish, let’s replace all of it by using AI”, outlined in this episode of the Tech Optimist podcast. By analyzing creative elements of video ads and predicting performance, their platform aims to cut costs and accelerate learning.

AI freeing designers from drudgery

John Maeda, Microsoft’s VP of Design and AI, sees AI as a tool for creativity. He encourages designers to identify tasks they do not enjoy and use AI to minimize them. “Ask yourself: what do you not actually like doing in your job? [AI will lead to] greater productivity because you are doing what you are most excited about,” he told the WorkLab podcast. Maeda also shares a metaphor from Nobel laureate Herbert Simon, cognition and context are like two blades of scissors, slicing them together produces intelligence, also discussed in the same WorkLab conversation. With large language models providing a powerful cognition blade, designers must bring the context.

So, where do teams start removing drudge work? List your top three repetitive tasks and prototype one AI assist per task.

Designing with ethics and human oversight

AI does not absolve us of responsibility. Tools like ChatGPT can hallucinate, are limited by their training data and can respond inconsistently, cautions summarized in this overview of AI limits in testing. AI driven experiments demand robust data stewardship, user privacy must be protected and biases mitigated, as the section on data safety argues. Harvard Business School researchers warn that standard A/B testing assumes no interaction between user groups, and similar assumptions may creep into AI models if not tested carefully, see this note on interaction effects.

Moreover, AI should augment human judgment, not replace it. Kameleoon encourages teams to use AI as a copilot to identify problems and propose ideas, but stresses that stakeholders must interpret results and align experiments with strategic goals, outlined in their view of AI as copilot. Craig Sullivan also reminds practitioners that waiting too long to adopt AI will leave them behind, as argued in adopt AI or lose ground.

So, when should a human block the rollout? When the metric moves but the experience violates brand, accessibility or ethics, stop and review.

Frequently asked questions (FAQ)

Does AI replace UX/UI designers or experimentation teams?

No. AI excels at generating options, crunching data and finding patterns, but it lacks human context and empathy. John Maeda urges designers to keep doing the work they find meaningful and use AI to eliminate drudgery, as he shares on the WorkLab podcast. Experimentation experts like Iqbal Ali treat AI outputs as starting points, not final answers, described in AI for research synthesis.

How do multi armed bandits differ from A/B testing?

Traditional A/B tests allocate a fixed percentage of traffic to each variant until the test ends. Multi armed bandit algorithms dynamically shift traffic toward better performing variants, which reduces opportunity cost. Many AI driven platforms incorporate these algorithms to shorten test duration and automatically retire losers, an approach explained in A/B testing vs evals.

Can AI design images and layouts on its own?

Generative models can create wireframes, color palettes and copy variations. Kameleoon notes that AI can turn paper sketches into high fidelity designs and generate test variants from simple prompts. Human designers are still needed to set creative direction, enforce brand guidelines and judge emotional resonance.

What about statistical methods, are frequentist or Bayesian approaches still relevant?

Yes. A/B testing platforms still rely on frequentist and Bayesian statistics to estimate confidence. AI can enhance these methods by analyzing results sooner and using prior data to inform predictions. In complex systems where thousands of variants evolve, online learning algorithms, evals, provide continuous feedback loops that go beyond traditional statistics, discussed in the new operating model.

How do I start using AI for experimentation?

  1. Define clear goals , identify the metric you want to improve and the user problem you are solving.
  2. Choose a tool , consider platforms like Optimizely’s Opal or Unbounce’s Smart Traffic that provide AI assisted segmentation and copy.
  3. Train your team , teach experimentation fundamentals and AI basics, data analysis and prompt engineering, guided by this section on team enablement.
  4. Balance AI and human judgment , review AI suggestions critically and consider ethical implications, as recommended under human in the loop.
  5. Iterate and learn , start with one campaign, measure results and expand. Remember that only a fraction of tests produce big wins, but your odds improve when you run more tests faster.

So, what is a good first win? Try auto allocation on a single high traffic page with one clear metric.

Final thoughts

AI heralds a new era where experimentation is no longer constrained by the cost of building variants or the patience to wait for results. When interfaces and content adapt dynamically to each individual, the premise of optimizing for the average user dissolves, a point made in A/B testing vs evals. Instead, designers and business owners will orchestrate systems that generate, test and refine experiences continuously.

This does not make human creativity obsolete, it makes it more valuable. The role of the designer shifts from crafting singular artifacts to curating the rules that guide AI exploration, ensuring that endless variants remain aligned with brand values, accessibility and ethics. The experiments become infinite and invisible, the human purpose behind them must remain clear.

What should I do next week? Pick one flow, define the rule set, and let the system explore while you watch the right metrics.