Generative BI: Building a Natural-Language Analytics Engine

By Ben Frazier and Josh Narang

Our recent exploration into generative analytics uncovered exciting possibilities for the future of business intelligence. We set out with a broad goal: to democratize analytics insights and eliminate bottlenecks by giving users a personal data analyst.

The result was GenBI, an internal proof of concept demonstrating how large language models can sit on top of structured datasets, translate natural language into SQL, and generate accurate charts in seconds.

GenBI graphic

The Case for Generative BI: Why We Built It

Most BI tools still assume users know where data lives, how tables relate, or how to shape datasets before creating even the simplest visualization. In reality, the majority of business users know what they want: insight into trends, comparisons, or patterns but not how to retrieve or prepare that data.

Data teams, as a result, spend a disproportionate amount of their day fielding ad-hoc questions. Many of these questions are simple, repetitive, and ripe for automation. GenBI emerged as our internal experiment to challenge that dynamic. We wanted to explore whether anyone in the company could explore data instantly, without opening a BI tool or writing SQL, and whether insights could be generated in seconds rather than days.

This exploration aligns with broader shifts in analytics modernization: natural language interfaces, semantic layers, and LLM-assisted development. Data analytics is particularly well suited for natural-language interaction because non-technical employees often have a clear idea of the insight they need—just not the technical steps required to obtain it.

See a demo of what we built here.

Architecture & Design Choices

Frontend: A Lightweight, Flexible User Experience

We designed the GenBI frontend to be fast, clean, and intuitive. Built using React/Next.js and Tailwind, the UI uses a two-panel layout that pairs a conversational interface on the left with visualizations and data previews on the right. This allows users to interact naturally while immediately seeing the results of their questions.

A ‘Data Explorer’ is available to provide users a reference for available data and schemas:

GenBI graphic 2

A key design decision was the use of Vega-Lite for visualization rendering. Vega-Lite’s declarative grammar made it ideal for an LLM-driven system: instead of generating code, the model only needs to output a structured JSON specification describing the chart:

GenBI graphic 3

Backend: A Simple but Powerful Analytical Engine

The backend uses DuckDB as its analytical engine, chosen for its simple configuration and zero infrastructure footprint. DuckDB allowed us to run SQL queries directly against local datasets at high speed, which kept the feedback loop tight for experimentation.

Data ingestion was built as a modular pipeline, enabling new sample data CSVs or datasets to be added with minimal friction. The entire system is wrapped in Docker to ensure consistency across developer environments and eliminate “it works on my machine” issues.

GenBI graphic 4

Snippet of the DuckDB sample data loader

Although this prototype uses local DuckDB files for convenience, the architecture is intentionally designed to be extensible. The same query-generation pipeline can be pointed to external systems—whether through ODBC, JDBC, warehouse connectors, or REST-based SQL interfaces. With only minor adjustments to the connection layer, GenBI could connect to Snowflake, Postgres, SQL Server, Fabric Lakehouse, or any enterprise data platform. This flexibility means the natural-language interface can sit on top of both lightweight demo datasets and real operational data with equal ease.

Model Selection: Balancing Cost, Speed, and Accuracy

For the natural-language reasoning layer, we selected GPT models with future roadmap plans to allow for model selection by the user. Much of the system’s reliability comes from strict prompting patterns that guide the model to return SQL, visualization specs, and natural-language summaries in a predictable schema.

Throughout development, we experimented with temperature settings, response templates, and guardrails to ensure the model stayed grounded in the actual data rather than hallucinating fields or unsupported chart types.

Visualization Choices: What to Show and Why

We intentionally began with a small set of chart types, primarily line and bar charts, because they map cleanly to common business questions and are easy for the LLM to specify in a consistent Vega-Lite format. Line charts worked well for trends such as unemployment rates, while bar charts supported comparisons like job postings by location.

As the system matured, we explored more complex visual types. Tables and raw data previews were especially helpful for debugging and validation.

Future roadmap additions can cover things like scatterplots, maps, and multi-series charts to introduce more nuance along with additional guardrails. Early attempts to automate these often produced misaligned axes or overly complex specifications.

Prompt Engineering: The Hidden Layer That Makes It Work

The success of GenBI hinges on thoughtful prompt engineering. The prompts tell the model how to behave, what format to use, what guardrails to follow, and how to remain grounded in the schema. We required the LLM to return a JSON object containing SQL, the Vega-Lite spec to force a deterministic contract between the model and the frontend.

One of the biggest challenges was balancing creative interpretation with structural correctness. Overly strict prompts caused the model to refuse legitimate queries; overly flexible prompts caused hallucinations or invalid field names. The structure of Vega-Lite helped here as well, giving the model a predictable format instead of requiring it to generate custom charting code.

Key Learnings from Building GenBI

Database Context Matters More Than Expected

LLMs are powerful, but they do not inherently understand your database. To help the model reason correctly, we constructed a “context prompt” describing the available tables, fields, formats, and relationships. This worked well but is ultimately static. The moment a dataset changes, the prompt becomes outdated.

A more robust system would allow the LLM to introspect the database dynamically and understand tables, fields, formats, and join keys on the fly. This moves the architecture closer to a true semantic layer and highlights the importance of strong data governance practices. Clean naming conventions and consistent structures significantly improve the model’s performance.

Visualization Boundaries Must Be Explicit

Many natural-language questions lead to charts that are technically valid but practically unusable. When someone asks, “How many job postings exist per city?” the model could generate a bar for every city in the dataset—sometimes tens of thousands.

To address this, we introduced rules instructing the model to limit results, and aggregate where necessary. Once implemented, chart readability and system usability improved dramatically.

The Future of Self-Service Analytics

GenBI demonstrates how close we are to a new class of analytics experiences—ones where natural language becomes the interface and insights appear instantly from structured data. Through this project, we confirmed that LLMs can reliably generate SQL, Vega-Lite provides a strong visualization layer, DuckDB offers exceptional performance, and prompt engineering is the connective tissue that makes everything work together.

Generative BI does not replace dashboards or analytics teams. Instead, it enhances them, reducing the volume of ad-hoc requests and giving business users a faster way to get the answers they need. Our next phase will focus on schema introspection, multi-dataset reasoning, and deeper integration with real enterprise data systems.

If your organization is exploring how natural language interfaces can reshape analytics, we’d be excited to share what we’ve learned or help you build something similar.

About Ben Frazier

Ben is a Senior Consultant on the Data team.

About Josh Narang

Josh Narang is a Senior Consultant on the Data team.

Digging In

Artificial Intelligence
Automating Discovery: Turning Requirements into Jira Stories with AI
When UDig was asked to explore ways to accelerate delivery, the brief was intentionally open-ended, inviting the team to rethink existing processes and challenge assumptions. One area quickly emerged as a clear opportunity: discovery. While essential, discovery can slow momentum when large volumes of requirements must be manually translated into user stories. Like most projects, […]
Read More
Artificial Intelligence
Agentic Commerce: Four Paths Retailers Can Take Right Now
With over 40% of shoppers saying AI is now their primary source of insight, today’s agentic commerce tools create unprecedented visibility into consumer purchase intent and decision-making patterns. Today’s AI agents excel at surfacing clear product data and creating frictionless shopping experiences. Smart retailers already recognize agentic commerce as a differentiation opportunity, and some major […]
Read More
Artificial Intelligence
From Experimentation to Enterprise: Making AI Adoption Real A Q&A with Josh Bartels, Chief Technology Officer
Everyone’s talking about AI, but how do you actually move from buzz to business impact? We sat down with UDig CTO Josh Bartels to break down what it really takes to move beyond experimentation and build meaningful, scalable adoption across the enterprise. Q: How can organizations move beyond experimentation and start realizing real value with […]
Read More
Artificial Intelligence
Paid Media Analyzer Prototype
Built during UDig’s internal Airwave program, this prototype delivers automated Google Ads intelligence that pinpoints what’s working and what’s not, freeing teams from manual reporting and boosting ROI through faster, data-driven decisions.
Read More
Artificial Intelligence
Generative BI Prototype
Built during UDig’s internal Airwave program, this prototype lets users explore enterprise data in plain language through a conversational interface that translates questions into SQL and instantly returns results as charts or insights.
Read More
Artificial Intelligence
Airwave
Accelerate AI adoption with clarity. By tuning into the right wavelength, enterprises move past the noise, build fluency fast, and turn experiments into scalable business impact.
Read More

Your Privacy