• Thought Leaders in Tech
  • Posts
  • Zenbase: How an Open Source Side Project Became a YC-Backed AI Startup Revolutionising LLMs

Zenbase: How an Open Source Side Project Became a YC-Backed AI Startup Revolutionising LLMs

Built by DSPy core contributors, Zenbase takes the pain out of prompt engineering and model selection—so developers can get back to what they do best: coding.

Backed by Y Combinator, Zenbase was founded by key contributors to DSPy to take the grunt work out of AI development—automating prompt design and model picking so developers can stay in flow. Co-founder Cyrus Nouroozi shares his journey and lessons from building Zenbase. 👇

Co-founders: 2 (Cyrus Nouroozi, Amir Mehr)

Amount Raised: $500K USD (Pre-seed)

Core Technology: Developer tools and cloud infrastructure for automating prompt optimisation and model selection for LLM developers

My Story

Ever since I was a kid, I loved creating things—starting with Legos and model airplanes. Around 12 or 13, that creative energy shifted to computers. I built my own PC, taught myself to code, and made a website for my mom. That’s when I realised I could build things that were useful, not just cool to look at.

At 15, I joined a full-stack bootcamp in Toronto—surrounded by adults—and met two founders who needed a live chat interface. Coincidentally, I had already been building one. They hired me for the summer, and that was my first real startup experience. After finishing Grade 12, they hired me full-time. I took two gap years and became the lead developer at #paid, an influencer marketing startup working with brands like Coca-Cola and Toyota.

After two years, I felt stuck. As an engineer, you build what others tell you—but I wanted to decide what to build. I enrolled at the University of Waterloo but kept working on side projects. One of them was a startup called Conversify, which I left after co-founder issues. Not long after, an acquaintance reached out with an idea for a viral meme app. I helped build it. That app, Wombo.ai, hit 100 million downloads in six months.

But again—bad co-founder decisions. After taking the app from nothing to something, the guy forced me and the other co-founder out and replaced me with three engineers. But honestly, that turned out to be a good thing. It gave me a reset I asked myself: What do I actually want to be known for? A joke app wasn’t it. So I dove into climate tech in 2022, learning about carbon credits and the voluntary carbon market. I realised it resembled a commodities market—and thought: Why don’t we have futures for carbon?

To make that happen, I explored crypto, since it allows rapid experimentation with financial instruments. I co-founded Eden DAO, raised $120k in donations, and another $100k to buy permanent carbon removals—ranking us above companies like BlackRock and Harvard at the time.

Eventually, I burned out. I went to Burning Man, reconnected with my creative drive, and by early 2023, I became fascinated by AI agents—especially during the Auto-GPT wave. I noticed everyone was trying to build one super-agent. But drawing from systems thinking and metaphors—like multi-core CPUs and human teams—I believed multi-agent systems would be the future.

I built a proof of concept and showed it to Harrison at LangChain. He liked it and invited me to do a guest blog post. That post caught the attention of Guohao Li at Camel AI, a leading multi-agent lab. He invited me to join as a researcher. I worked alongside a PhD student from NUS, Benjamin Lee, on multi-agent Minecraft.

Through Twitter, I discovered DSPy, a framework for building multi-step LLM systems, created by Stanford PhD Omar Khat. I reached out, met him at Stanford in early 2024, and asked, What’s your biggest problem? He told me. We whiteboarded for hours, and he said, If you solve this, you’ll be a core contributor. So I did.

I helped build multi-LLM support for DSPy. Given my experience with AI apps, I knew how painful prompt engineering was—so DSPy’s automated prompt engineering stood out. I partnered with a friend to offer consulting based on it. After one gig, we felt like we had something strong. We applied to YC, got the interview, got in, and started building Zenbase AI.

It’s been a winding road with false starts and lessons, but one of the biggest: choose your co-founder wisely—you're basically getting married.

The Problem: Prompt Engineering Hell for Developers working with LLMs

Building with LLMs today is a mess.

  • Prompt engineering is broken. Developers spend hours crafting, tweaking, and testing prompts by hand — it’s uncertain, slow, and doesn’t scale.

  • No feedback loops. There’s no easy way to tell if your prompts are actually improving things. You're flying blind.

  • Model selection is trial and error. Switching between OpenAI, Anthropic, Mistral, and others is painful — even when one is clearly better for your task.

  • Evals are unreliable. Most teams rely on vibe checks, manual review, or basic test cases. Not scalable. Not scientific.

  • Great frameworks are hard to use. Stanford’s DSPy is one of the best open-source LLM optimization libraries out there. But it’s still too academic and not built for production teams.

Even the most sophisticated AI teams (including at Meta, Microsoft, Google) have engineers burning time in prompt hell. The result? Missed ship dates, unreliable apps, and high infra bills — all while business stakeholders ask: “Why is this still not working?”

The Solution: Automated and Optimised Prompt Engineering and Model Selection*

We’re core contributors to Stanford NLP’s DSPy, the #1 LLM optimization repo (16k+ ⭐️), used by engineers at Meta, Microsoft, Amazon, and 40+ others.
We’ve seen firsthand how hard it is to take these ideas to production.

Zenbase turns DSPy’s power into a product.

Zenbase is the production-ready platform that automates prompt engineering and model selection — so developers can focus on building great AI products, not fiddling with prompts. Here's how it works:

  1. Define a function (e.g., summarise legal docs, write outbound emails)

  2. Add a few test cases of what good output looks like

  3. Zenbase finds the best prompt + model using DSPy + our own optimizers

  4. Ship it — with built-in user feedback tracking

  5. Zenbase continuously improves the prompt & model using that feedback

You get automated improvement, traceability, and cross-model flexibility — all without needing to understand prompt tokens or fine-tuning knobs.

What We Offer

  • 🔧 zenbase/core – Open-source Python lib that upgrades your LLM pipelines using DSPy, without needing to rewrite them.

  • ☁️ Zenbase API – Hosted endpoints for creating AI functions that get smarter with real-world usage and feedback.

  • 🏢 On-prem version – For enterprises with data privacy constraints.

Realising We Had a YC-Level Idea

DSPy really blew up last year—from 5,000 to 20,000 GitHub stars. At AI events, you’d meet hardcore DSPy fans hyping it up. So VCs started asking, "What is DSPy?" And here we were—core contributors with 10 years of engineering and startup experience. We weren’t just hobbyists; we were experts in a fast-growing, technical field. Plus, DSPy’s angle—automated prompt engineering—offered a promising dev tool for a new category. So from YC’s lens, we checked the three key boxes:

  1. Hot, emerging space

  2. Deep subject matter expertise

  3. Strong engineering execution

Validating the Idea

The core insight we’re building around is that AI today is static. When you hire someone, they go through onboarding, ask questions, learn, and improve. But with AI products, the behaviour is frozen unless a developer manually updates a prompt. That’s what we’re trying to change—we want developers to be able to build AI that learns on the job.

Did the market explicitly ask for this? No. It’s one of those visionary things—like Henry Ford’s “faster horses” quote. But my intuition has been right before, and I believe this should exist.

We’ve spoken to tons of companies building AI products and developed a mental model of how teams evolve:

  • 0→1: You have an idea, prompt until it “just works,” and ship an MVP.

  • 1→10: You get user feedback, play prompt whack-a-mole, build evals, fix bugs.

  • 10→100: You’re scaling, optimising infra, maybe distilling to custom models.

We’ve met teams at every stage—from pre-seed to Series B. Most are in the 0–10 range. By mapping this journey, we started to see which parts of the AI dev lifecycle are the most painful and manual—and where we could help.

We initially launched an MVP in August and got MRR selling to our YC batchmates. But those weren’t the ideal customers. So in Jan/Feb, we went back to the drawing board. We did more consulting, talked to advisors, updated our mental models—and four weeks ago, we started building in a new direction. Same vision, just more grounded in what people are actually doing.

I wish I could say, “We found a pain point, nailed it, raised our Series B,” but that’s not the reality. We started from research and had to work backwards. That’s hard. We’ve had to detach from the solution, root ourselves in real user problems, and then return to our toolbox to craft the right product experience.

Building Out the MVP

One of the first things we wanted to build was a way for people to create what we call a learning agent. Internally, we talk about something we call a learning function—the idea is, you should be able to create an agent, give it examples of what “good” looks like, and then it should learn on its own: adjust its prompts, its weights, even pick the right model, to best approximate the desired output. Back in YC, I was leading sales and fundraising while my co-founder—solo—built the entire backend and onboarded users. This time around, we’re about 80–90% of the way to finishing our MVP.

The purpose of an MVP is to validate market demand. So if you’re building an MVP, ask yourself: What’s the least amount of work I need to do to test for pull? It doesn’t have to work perfectly—it just has to help people get what you're trying to build. If there’s enough pull, then you go and build it for real.

For instance, two weeks ago, we decided to sprint on a simple demo and a CLI. The website demo uses mock data—based on real simulations we ran—but the interface itself isn’t live or functional. You’re not actually controlling the optimizer; it’s preloaded data. But it looks real and gives users a feel for what the product does. We had a great call on Thursday where we showed this demo, and the feedback was: “This would be huge if true.” That’s the response you want.

Last year, we fell into the trap of trying to perfect the tech too early. We were doing prompt optimisation, but it worked well only in some use cases—not all. That left us unsure: Can we go to market if it only works 60% of the time? But we now realise that’s more of an implementation detail. What matters more these days is user experience. People don’t want a tool that takes a week to learn. They want something they can start using in 30 minutes or less. If you get them in the door with something simple and easy, then you can make it more powerful over time.

Getting the Business Model Right

We deliberately chose to build a high-ticket SaaS business. From the start, we aimed for ACVs in the five- to six-figure range—not something like $50/month. We wanted customers paying $500 to $1,000 a month, or more. And honestly, we just kept quoting higher numbers until someone said no.

In fact, for the same product, we had one customer pay $2,500/month while another paid $500/month. That’s the nature of enterprise SaaS—it’s consultative. You hop on a call, understand the customer's problems, quantify the value, and then test price points until you hit resistance.

Last year, we deliberately kept the business model simple. We avoided usage-based pricing—even though it's arguably better for scalable revenue—because

  1. it was a technical hurdle at the time, and

  2. our early customers were YC startups with low usage, so volume pricing wouldn’t have yielded strong ACV.

So we focused on flat-rate pricing, typically $500–$1,000/month. But in the back of my mind, I’ve been thinking about what an ideal model might look like: probably a hybrid—a base monthly or annual fee plus volume-based pricing based on tokens in/out, maybe with a markup on LLM costs.

That said, I believe you need to find product-market fit before layering on clever pricing models. And this ties into a broader point—there are different kinds of innovation: technical, business model, go-to-market. You don’t want to innovate on too many at once. It becomes too hard to execute. For example, I’ve had thoughts—given my crypto background—about adding tokenomics. But I’ve seen that rabbit hole. It’s not where I want to go right now.

Personally, I lean more builder than seller. I have done sales—I led sales and fundraising in YC. In fact, back when I was deep in crypto, people assumed I was a sales guy because of how I spoke. They were shocked to find out I’d been a developer for 10 years.

Acquiring the First Customer

Our first paying customer actually came through a referral. I’d become friends with some other YC founders at the retreat in NYC. We had these small groups that met weekly for office hours, and one of my friends in the group heard another founder mention a problem they were facing.

He said, “You should talk to Cyrus.” So they did.

By the end of a 30-minute call, they became our first customer. After that, more started coming in—one after the other. It was a pretty sweet moment.

Our Sales Process

Most of our traction has been inbound, but we did experiment with outbound last August.

I’d previously worked on a startup that automated LinkedIn outbound, so I was familiar with the sales process. For this campaign, I scraped all the DSPy GitHub stargazers using Clay, enriched the data, and filtered it down using AI to find engineers at U.S. companies—higher ACVs—who were CTOs, CXOs, or researchers.

We ran a LinkedIn campaign targeting them. The results were actually great:

  • 65% connection rate

  • 40% reply rate

Having a core DSPy contributor on the team probably helped. We booked a bunch of calls—but we learned something important.

Using Sales Feedback In Product Iteration

After talking to a number of prospective clients, we soon realised there was a common problem with the feedback we were getting.

The people technical enough to understand DSPy—AI researchers and engineers—didn’t want to be sold a product. They preferred to take the idea and rebuild it themselves. Their attitude was, “This isn’t hard—I can just implement it.” So while we reached smart folks, they didn’t have a pressing need or desire to pay for the solution.

That experience forced us to rethink our value proposition. We started framing optimization in terms of problem–pain–solution. We used a simple structure:

Problem → Pain → Existing Solutions → Why They Fail → What Changed → New Solution → Proof

This helped us get inside the heads of our buyers—especially CEOs and CTOs.

From the CEO’s perspective:

  • Problem: “How do I hit my KPIs as an AI company?”

  • Pain: Uncertainty, high burn, missed revenue, stress

  • Existing Solutions: Consultants, random eval frameworks, fine-tuning (but unclear if it helps)

  • Why They Fail: Unlike web2 tools (e.g. Mixpanel, GA), there’s no clear lever to pull to improve AI performance

From the CTO’s perspective:

  • Problem: Prompt hell and unreliable evals

  • Pain: Missed deadlines, poor UX, costly iteration

  • Existing Solutions: Manual feedback loops, vibe checks, inconsistent data labeling

  • Why They Fail: It’s slow, expensive, lacks traceability, and doesn’t scale

So we zeroed in on the real need: automating the feedback loop for AI apps.

Imagine you launch an AI product, and users give you feedback—thumbs up/down, edits, or even social metrics like likes or views. Today, a human has to parse that, guess what to change in the prompt, and try again.

What we’re building is an AI-powered layer that:

  • Interprets user feedback

  • Understands what’s “good” and what’s not

  • Automatically adjusts prompts to improve performance

That’s our core bet now: turning implicit user signals into improved AI behavior—without burdening the developer.

Launching and Fundraising

YC’s advice is clear: launch early.
The earlier you launch, the sooner you're on VCs’ radar, the sooner they reach out, and the sooner your CRM fills up.

By the time we started fundraising in early September, we had 80–90 VC meetings already booked. We got good traction in the first two weeks. Then there was a lull for a few weeks—nothing happened. Then, boom—our lead came in, made a few intros, and everything clicked. We closed the round.

Our pitch was simple and punchy:

“Prompt optimization is the next big thing. DSPy is the hottest framework. We’re two startup-y engineers on the DSPy core team. We got into YC to turn this into a product. We’ve got early revenue. Hop in the car if you want to build the future with us.”

We got common pushbacks like:

“How is this defensible? What stops OpenAI or AWS from doing it?”

My response:

AWS isn’t incentivised to make switching to Azure easier—and vice versa. Similarly, OpenAI won’t help you move to Claude. But there’s room for tools that sit on top, stay model-agnostic, and help developers flexibly use the best LLM for the job. That’s where we sit.

After the raise, we took a short break—it was intense. Then reality hit:
The macro was shaky, elections were coming, and we decided we needed to extend our runway.

So we moved operations to Edmonton, Alberta. It’s cheaper than SF, and our team (especially my co-founder) had visa constraints, so this move worked. Being in Edmonton lets us keep costs low and move faster. We’ve now got 38 months of runway.

But we made some mistakes too:

  • We hired too early, before PMF.

  • We over-delegated and had to let go of an intern.

  • We learned the hard way: don’t hire until you find PMF.

Now, the plan is simple:

Launch and ship 20+ ideas with our runway. Swing as often as possible. Stay lean. Move fast.

We believe once we’ve built something people truly want, I’ll head back to SF to demo and raise the next round.

Final advice?

  • Use your credits (cloud, infra, tools).

  • Make the money stretch.

  • Seed funding doesn’t mean success. PMF does.

  • Until you find that, keep it light, iterate quickly, and optimise for velocity.

Tracking Product-Market Fit

We’re much closer to the starting line than the finish line. We’ve tracked revenue and usage, but for what we’re building now, those aren’t enough.

What really matters now is whether we can get 3–5 established companies (Series A/B) actively working with us, giving feedback every week. That feedback—positive or negative—signals engagement and perceived value.

The key proxy for PMF right now is:

How invested are users in helping us improve the product?

If they care enough to give consistent feedback and say “This would be amazing if it just did X,” that’s a strong sign. If they’re passive, that’s a red flag.

Also, we’re focusing on companies that already have PMF themselves—not seed-stage startups. Why? Because:

“You don’t optimise something that doesn’t work.”

A company without PMF won’t benefit from prompt optimization tools. But one with real usage and real pain points? That’s where we can deliver value—and validate our own PMF.

Weekly/Monthly Workflow

Since January, we’ve had a structured organizational rhythm to help us stay focused and aligned. Here’s how it works:

🟩 Mondays:

  • We do sprint planning in the morning—my co-founder and I align on the 2-week roadmap, then share it with the team.

  • After that, I use the rest of Monday for “20% time”—space for creativity, prototyping, and chasing new ideas.

🟦 Tuesday to Friday:

  • We follow assigned sprint tasks.

  • Mornings: Standups—focused on team connection and collaboration.

  • Evenings: Async logs—everyone posts:

    • Energy level

    • What they did today

    • What they plan to do tomorrow
      (It helps keep momentum and continuity)

🟨 End of Sprint (Friday):

  • We do a 2-hour retrospective.

  • Structure: “Start / Stop / Continue” + “What went well / What didn’t.”

🟥 Monthly:

  • My co-founder and I do a higher-level reflection on progress and strategy. We ask:

    • Are we headed in the right direction?

    • Do we need to course-correct?

It’s not rigid—just a cadence to keep feedback flowing, ideas alive, and execution grounded.

The Most Important Lesson I Learnt

Choose your co-founder very wisely.

I essentially YOLO-ed into this one because I had a good vibe—he’s a great person—but we hadn’t figured out how we worked together yet. Early on, the split was simple: I handled sales and business, he handled tech. But now, it’s more blended: he’s doing more strategy, I’m doing more coding and customer calls.

A startup is like setting sail without a map—you better know your crew well, or at least be ready to learn quickly together. We’d known each other digitally for a couple of years, and we’d built stuff before, but it was more transactional: I sold, he built. That doesn’t fly in a startup. You can’t just hand things off—everyone has to be involved in everything.

Looking back, if we’d invested in learning how to work together earlier, we’d probably be further along. But now we’re doing that—and we’re lucky to have the runway and time to make it work.

*Editor’s Note: Zenbase has since pivoted from Automated and Optimised Prompt Engineering and Model Selection to focusing on building AI that captures and amplifies human scientific reasoning—unlocking hidden insights in decades of research to transform scientific discovery. Nonetheless, this article remains valuable for its startup teachings and insights, even though it discusses their former focus.