drss

Alessio will be at AWS re:Invent next week and hosting a casual coffee meetup on Wednesday, RSVP here! And subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups!

We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show!

If you've been following the AI agents space, you have heard of Lindy AI; while founder Flo Crivello is hesitant to call it "blowing up," when folks like Andrew Wilkinson start obsessing over your product, you're definitely onto something.

In our latest episode, Flo walked us through Lindy's evolution from late 2022 to now, revealing some design choices about agent platform design that go against conventional wisdom in the space.

The Great Reset: From Text Fields to Rails

Remember late 2022? Everyone was "LLM-pilled," believing that if you just gave a language model enough context and tools, it could do anything. Lindy 1.0 followed this pattern:

* Big prompt field ✅

* Bunch of tools ✅

* Prayer to the LLM gods ✅

Fast forward to today, and Lindy 2.0 looks radically different. As Flo put it (~17:00 in the episode): "The more you can put your agent on rails, one, the more reliable it's going to be, obviously, but two, it's also going to be easier to use for the user."

Instead of a giant, intimidating text field, users now build workflows visually:

* Trigger (e.g., "Zendesk ticket received")

* Required actions (e.g., "Check knowledge base")

* Response generation

This isn't just a UI change - it's a fundamental rethinking of how to make AI agents reliable. As Swyx noted during our discussion: "Put Shoggoth in a box and make it a very small, minimal viable box. Everything else should be traditional if-this-then-that software."

The Surprising Truth About Model Limitations

Here's something that might shock folks building in the space: with Claude 3.5 Sonnet, the model is no longer the bottleneck. Flo's exact words (~31:00): "It is actually shocking the extent to which the model is no longer the limit. It was the limit a year ago. It was too expensive. The context window was too small."

Some context: Lindy started when context windows were 4K tokens. Today, their system prompt alone is larger than that. But what's really interesting is what this means for platform builders:

* Raw capabilities aren't the constraint anymore

* Integration quality matters more than model performance

* User experience and workflow design are the new bottlenecks

The Search Engine Parallel: Why Horizontal Platforms Might Win

One of the spiciest takes from our conversation was Flo's thesis on horizontal vs. vertical agent platforms. He draws a fascinating parallel to search engines (~56:00):

"I find it surprising the extent to which a horizontal search engine has won... You go through Google to search Reddit. You go through Google to search Wikipedia... search in each vertical has more in common with search than it does with each vertical."

His argument: agent platforms might follow the same pattern because:

* Agents across verticals share more commonalities than differences

* There's value in having agents that can work together under one roof

* The R&D cost of getting agents right is better amortized across use cases

This might explain why we're seeing early vertical AI companies starting to expand horizontally. The core agent capabilities - reliability, context management, tool integration - are universal needs.

What This Means for Builders

If you're building in the AI agents space, here are the key takeaways:

* Constrain First: Rather than maximizing capabilities, focus on reliable execution within narrow bounds

* Integration Quality Matters: With model capabilities plateauing, your competitive advantage lies in how well you integrate with existing tools

* Memory Management is Key: Flo revealed they actively prune agent memories - even with larger context windows, not all memories are useful

* Design for Discovery: Lindy's visual workflow builder shows how important interface design is for adoption

The Meta Layer

There's a broader lesson here about AI product development. Just as Lindy evolved from "give the LLM everything" to "constrain intelligently," we might see similar evolution across the AI tooling space. The winners might not be those with the most powerful models, but those who best understand how to package AI capabilities in ways that solve real problems reliably.

Full Video Podcast

Flo’s talk at AI Engineer Summit

Chapters

* 00:00:00 Introductions

* 00:04:05 AI engineering and deterministic software

* 00:08:36 Lindys demo

* 00:13:21 Memory management in AI agents

* 00:18:48 Hierarchy and collaboration between Lindys

* 00:21:19 Vertical vs. horizontal AI tools

* 00:24:03 Community and user engagement strategies

* 00:26:16 Rickrolling incident with Lindy

* 00:28:12 Evals and quality control in AI systems

* 00:31:52 Model capabilities and their impact on Lindy

* 00:39:27 Competition and market positioning

* 00:42:40 Relationship between Factorio and business strategy

* 00:44:05 Remote work vs. in-person collaboration

* 00:49:03 Europe vs US Tech

* 00:58:59 Testing the Overton window and free speech

* 01:04:20 Balancing AI safety concerns with business innovation

Show Notes

* Dust

* SB1047

* Seeing Like a State

Transcript

Alessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.

Swyx [00:00:12]: Hey, and today we're joined in the studio by Florent Crivello. Welcome.

Flo [00:00:15]: Hey, yeah, thanks for having me.

Swyx [00:00:17]: Also known as Altimore. I always wanted to ask, what is Altimore?

Flo [00:00:21]: It was the name of my character when I was playing Dungeons & Dragons. Always. I was like 11 years old.

Swyx [00:00:26]: What was your classes?

Flo [00:00:27]: I was an elf. I was a magician elf.

Swyx [00:00:30]: Well, you're still spinning magic. Right now, you're a solo founder and CEO of Lindy.ai. What is Lindy?

Flo [00:00:36]: Yeah, we are a no-code platform letting you build your own AI agents easily. So you can think of we are to LangChain as Airtable is to MySQL. Like you can just pin up AI agents super easily by clicking around and no code required. You don't have to be an engineer and you can automate business workflows that you simply could not automate before in a few minutes.

Swyx [00:00:55]: You've been in our orbit a few times. I think you spoke at our Latent Space anniversary. You spoke at my summit, the first summit, which was a really good keynote. And most recently, like we actually already scheduled this podcast before this happened. But Andrew Wilkinson was like, I'm obsessed by Lindy. He's just created a whole bunch of agents. So basically, why are you blowing up?

Flo [00:01:16]: Well, thank you. I think we are having a little bit of a moment. I think it's a bit premature to say we're blowing up. But why are things going well? We revamped the product majorly. We called it Lindy 2.0. I would say we started working on that six months ago. We've actually not really announced it yet. It's just, I guess, I guess that's what we're doing now. And so we've basically been cooking for the last six months, like really rebuilding the product from scratch. I think I'll list you, actually, the last time you tried the product, it was still Lindy 1.0. Oh, yeah. If you log in now, the platform looks very different. There's like a ton more features. And I think one realization that we made, and I think a lot of folks in the agent space made the same realization, is that there is such a thing as too much of a good thing. I think many people, when they started working on agents, they were very LLM peeled and chat GPT peeled, right? They got ahead of themselves in a way, and us included, and they thought that agents were actually, and LLMs were actually more advanced than they actually were. And so the first version of Lindy was like just a giant prompt and a bunch of tools. And then the realization we had was like, hey, actually, the more you can put your agent on Rails, one, the more reliable it's going to be, obviously, but two, it's also going to be easier to use for the user, because you can really, as a user, you get, instead of just getting this big, giant, intimidating text field, and you type words in there, and you have no idea if you're typing the right word or not, here you can really click and select step by step, and tell your agent what to do, and really give as narrow or as wide a guardrail as you want for your agent. We started working on that. We called it Lindy on Rails about six months ago, and we started putting it into the hands of users over the last, I would say, two months or so, and I think things really started going pretty well at that point. The agent is way more reliable, way easier to set up, and we're already seeing a ton of new use cases pop up.

Swyx [00:03:00]: Yeah, just a quick follow-up on that. You launched the first Lindy in November last year, and you were already talking about having a DSL, right? I remember having this discussion with you, and you were like, it's just much more reliable. Is this still the DSL under the hood? Is this a UI-level change, or is it a bigger rewrite?

Flo [00:03:17]: No, it is a much bigger rewrite. I'll give you a concrete example. Suppose you want to have an agent that observes your Zendesk tickets, and it's like, hey, every time you receive a Zendesk ticket, I want you to check my knowledge base, so it's like a RAG module and whatnot, and then answer the ticket. The way it used to work with Lindy before was, you would type the prompt asking it to do that. You check my knowledge base, and so on and so forth. The problem with doing that is that it can always go wrong. You're praying the LLM gods that they will actually invoke your knowledge base, but I don't want to ask it. I want it to always, 100% of the time, consult the knowledge base after it receives a Zendesk ticket. And so with Lindy, you can actually have the trigger, which is Zendesk ticket received, have the knowledge base consult, which is always there, and then have the agent. So you can really set up your agent any way you want like that.

Swyx [00:04:05]: This is something I think about for AI engineering as well, which is the big labs want you to hand over everything in the prompts, and only code of English, and then the smaller brains, the GPU pours, always want to write more code to make things more deterministic and reliable and controllable. One way I put it is put Shoggoth in a box and make it a very small, the minimal viable box. Everything else should be traditional, if this, then that software.

Flo [00:04:29]: I love that characterization, put the Shoggoth in the box. Yeah, we talk about using as much AI as necessary and as little as possible.

Alessio [00:04:37]: And what was the choosing between kind of like this drag and drop, low code, whatever, super code-driven, maybe like the Lang chains, auto-GPT of the world, and maybe the flip side of it, which you don't really do, it's like just text to agent, it's like build the workflow for me. Like what have you learned actually putting this in front of users and figuring out how much do they actually want to add it versus like how much, you know, kind of like Ruby on Rails instead of Lindy on Rails, it's kind of like, you know, defaults over configuration.

Flo [00:05:06]: I actually used to dislike when people said, oh, text is not a great interface. I was like, ah, this is such a mid-take, I think text is awesome. And I've actually come around, I actually sort of agree now that text is really not great. I think for people like you and me, because we sort of have a mental model, okay, when I type a prompt into this text box, this is what it's going to do, it's going to map it to this kind of data structure under the hood and so forth. I guess it's a little bit blackmailing towards humans. You jump on these calls with humans and you're like, here's a text box, this is going to set up an agent for you, do it. And then they type words like, I want you to help me put order in my inbox. Oh, actually, this is a good one. This is actually a good one. What's a bad one? I would say 60 or 70% of the prompts that people type don't mean anything. Me as a human, as AGI, I don't understand what they mean. I don't know what they mean. It is actually, I think whenever you can have a GUI, it is better than to have just a pure text interface.

Alessio [00:05:58]: And then how do you decide how much to expose? So even with the tools, you have Slack, you have Google Calendar, you have Gmail. Should people by default just turn over access to everything and then you help them figure out what to use? I think that's the question. When I tried to set up Slack, it was like, hey, give me access to all channels and everything, which for the average person probably makes sense because you don't want to re-prompt them every time you add new channels. But at the same time, for maybe the more sophisticated enterprise use cases, people are like, hey, I want to really limit what you have access to. How do you kind of thread that balance?

Flo [00:06:35]: The general philosophy is we ask for the least amount of permissions needed at any given moment. I don't think Slack, I could be mistaken, but I don't think Slack lets you request permissions for just one channel. But for example, for Google, obviously there are hundreds of scopes that you could require for Google. There's a lot of scopes. And sometimes it's actually painful to set up your Lindy because you're going to have to ask Google and add scopes five or six times. We've had sessions like this. But that's what we do because, for example, the Lindy email drafter, she's going to ask you for your authorization once for, I need to be able to read your email so I can draft a reply, and then another time for I need to be able to write a draft for them. We just try to do it very incrementally like that.

Alessio [00:07:15]: Do you think OAuth is just overall going to change? I think maybe before it was like, hey, we need to set up OAuth that humans only want to kind of do once. So we try to jam-pack things all at once versus what if you could on-demand get different permissions every time from different parts? Do you ever think about designing things knowing that maybe AI will use it instead of humans will use it? Yeah, for sure.

Flo [00:07:37]: One pattern we've started to see is people provisioning accounts for their AI agents. And so, in particular, Google Workspace accounts. So, for example, Lindy can be used as a scheduling assistant. So you can just CC her to your emails when you're trying to find time with someone. And just like a human assistant, she's going to go back and forth and offer other abilities and so forth. Very often, people don't want the other party to know that it's an AI. So it's actually funny. They introduce delays. They ask the agent to wait before replying, so it's not too obvious that it's an AI. And they provision an account on Google Suite, which costs them like $10 a month or something like that. So we're seeing that pattern more and more. I think that does the job for now. I'm not optimistic on us actually patching OAuth. Because I agree with you, ultimately, we would want to patch OAuth because the new account thing is kind of a clutch. It's really a hack. You would want to patch OAuth to have more granular access control and really be able to put your sugar in the box. I'm not optimistic on us doing that before AGI, I think. That's a very close timeline.

Swyx [00:08:36]: I'm mindful of talking about a thing without showing it. And we already have the setup to show it. Why don't we jump into a screen share? For listeners, you can jump on the YouTube and like and subscribe. But also, let's have a look at how you show off Lindy. Yeah, absolutely.

Flo [00:08:51]: I'll give an example of a very simple Lindy and then I'll graduate to a much more complicated one. A super simple Lindy that I have is, I unfortunately bought some investment properties in the south of France. It was a really, really bad idea. And I put them on a Holydew, which is like the French Airbnb, if you will. And so I received these emails from time to time telling me like, oh, hey, you made 200 bucks. Someone booked your place. When I receive these emails, I want to log this reservation in a spreadsheet. Doing this without an AI agent or without AI in general is a pain in the butt because you must write an HTML parser for this email. And so it's just hard. You may not be able to do it and it's going to break the moment the email changes. By contrast, the way it works with Lindy, it's really simple. It's two steps. It's like, okay, I receive an email. If it is a reservation confirmation, I have this filter here. Then I append a row to this spreadsheet. And so this is where you can see the AI part where the way this action is configured here, you see these purple fields on the right. Each of these fields is a prompt. And so I can say, okay, you extract from the email the day the reservation begins on. You extract the amount of the reservation. You extract the number of travelers of the reservation. And now you can see when I look at the task history of this Lindy, it's really simple. It's like, okay, you do this and boom, appending this row to this spreadsheet. And this is the information extracted. So effectively, this node here, this append row node is a mini agent. It can see everything that just happened. It has context over the task and it's appending the row. And then it's going to send a reply to the thread. That's a very simple example of an agent.

Swyx [00:10:34]: A quick follow-up question on this one while we're still on this page. Is that one call? Is that a structured output call? Yeah. Okay, nice. Yeah.

Flo [00:10:41]: And you can see here for every node, you can configure which model you want to power the node. Here I use cloud. For this, I use GPT-4 Turbo. Much more complex example, my meeting recorder. It looks very complex because I've added to it over time, but at a high level, it's really simple. It's like when a meeting begins, you record the meeting. And after the meeting, you send me a summary and you send me coaching notes. So I receive, like my Lindy is constantly coaching me. And so you can see here in the prompt of the coaching notes, I've told it, hey, you know, was I unnecessarily confrontational at any point? I'm French, so I have to watch out for that. Or not confrontational enough. Should I have double-clicked on any issue, right? So I can really give it exactly the kind of coaching that I'm expecting. And then the interesting thing here is, like, you can see the agent here, after it sent me these coaching notes, moves on. And it does a bunch of other stuff. So it goes on Slack. It disseminates the notes on Slack. It does a bunch of other stuff. But it's actually able to backtrack and resume the automation at the coaching notes email if I responded to that email. So I'll give a super concrete example. This is an actual coaching feedback that I received from Lindy. She was like, hey, this was a sales call I had with a customer. And she was like, I found your explanation of Lindy too technical. And I was able to follow up and just ask a follow-up question in the thread here. And I was like, why did you find too technical about my explanation? And Lindy restored the context. And so she basically picked up the automation back up here in the tree. And she has all of the context of everything that happened, including the meeting in which I was. So she was like, oh, you used the words deterministic and context window and agent state. And that concept exists at every level for every channel and every action that Lindy takes. So another example here is, I mentioned she also disseminates the notes on Slack. So this was a meeting where I was not, right? So this was a teammate. He's an indie meeting recorder, posts the meeting notes in this customer discovery channel on Slack. So you can see, okay, this is the onboarding call we had. This was the use case. Look at the questions. How do I make Lindy slower? How do I add delays to make Lindy slower? And I was able, in the Slack thread, to ask follow-up questions like, oh, what did we answer to these questions? And it's really handy because I know I can have this sort of interactive Q&A with these meetings. It means that very often now, I don't go to meetings anymore. I just send my Lindy. And instead of going to like a 60-minute meeting, I have like a five-minute chat with my Lindy afterwards. And she just replied. She was like, well, this is what we replied to this customer. And I can just be like, okay, good job, Jack. Like, no notes about your answers. So that's the kind of use cases people have with Lindy. It's a lot of like, there's a lot of sales automations, customer support automations, and a lot of this, which is basically personal assistance automations, like meeting scheduling and so forth.

Alessio [00:13:21]: Yeah, and I think the question that people might have is memory. So as you get coaching, how does it track whether or not you're improving? You know, if these are like mistakes you made in the past, like, how do you think about that?

Flo [00:13:31]: Yeah, we have a memory module. So I'll show you my meeting scheduler, Lindy, which has a lot of memories because by now I've used her for so long. And so every time I talk to her, she saves a memory. If I tell her, you screwed up, please don't do this. So you can see here, oh, it's got a double memory here. This is the meeting link I have, or this is the address of the office. If I tell someone to meet me at home, this is the address of my place. This is the code. I guess we'll have to edit that out. This is not the code of my place. No dogs. Yeah, so Lindy can just manage her own memory and decide when she's remembering things between executions. Okay.

Swyx [00:14:11]: I mean, I'm just going to take the opportunity to ask you, since you are the creator of this thing, how come there's so few memories, right? Like, if you've been using this for two years, there should be thousands of thousands of things. That is a good question.

Flo [00:14:22]: Agents still get confused if they have too many memories, to my point earlier about that. So I just am out of a call with a member of the Lama team at Meta, and we were chatting about Lindy, and we were going into the system prompt that we sent to Lindy, and all of that stuff. And he was amazed, and he was like, it's a miracle that it's working, guys. He was like, this kind of system prompt, this does not exist, either pre-training or post-training. These models were never trained to do this kind of stuff. It's a miracle that they can be agents at all. And so what I do, I actually prune the memories. You know, it's actually something I've gotten into the habit of doing from back when we had GPT 3.5, being Lindy agents. I suspect it's probably not as necessary in the Cloud 3.5 Sunette days, but I prune the memories. Yeah, okay.

Swyx [00:15:05]: The reason is because I have another assistant that also is recording and trying to come up with facts about me. It comes up with a lot of trivial, useless facts that I... So I spend most of my time pruning. Actually, it's not super useful. I'd much rather have high-quality facts that it accepts. Or maybe I was even thinking, were you ever tempted to add a wake word to only memorize this when I say memorize this? And otherwise, don't even bother.

Flo [00:15:30]: I have a Lindy that does this. So this is my inbox processor, Lindy. It's kind of beefy because there's a lot of different emails. But somewhere in here,

Swyx [00:15:38]: there is a rule where I'm like,

Flo [00:15:39]: aha, I can email my inbox processor, Lindy. It's really handy. So she has her own email address. And so when I process my email inbox, I sometimes forward an email to her. And it's a newsletter, or it's like a cold outreach from a recruiter that I don't care about, or anything like that. And I can give her a rule. And I can be like, hey, this email I want you to archive, moving forward. Or I want you to alert me on Slack when I have this kind of email. It's really important. And so you can see here, the prompt is, if I give you a rule about a kind of email, like archive emails from X, save it as a new memory. And I give it to the memory saving skill. And yeah.

Swyx [00:16:13]: One thing that just occurred to me, so I'm a big fan of virtual mailboxes. I recommend that everybody have a virtual mailbox. You could set up a physical mail receive thing for Lindy. And so then Lindy can process your physical mail.

Flo [00:16:26]: That's actually a good idea. I actually already have something like that. I use like health class mail. Yeah. So yeah, most likely, I can process my physical mail. Yeah.

Swyx [00:16:35]: And then the other product's idea I have, looking at this thing, is people want to brag about the complexity of their Lindys. So this would be like a 65 point Lindy, right?

Flo [00:16:43]: What's a 65 point?

Swyx [00:16:44]: Complexity counting. Like how many nodes, how many things, how many conditions, right? Yeah.

Flo [00:16:49]: This is not the most complex one. I have another one. This designer recruiter here is kind of beefy as well. Right, right, right. So I'm just saying,

Swyx [00:16:56]: let people brag. Let people be super users. Oh, right.

Flo [00:16:59]: Give them a score. Give them a score.

Swyx [00:17:01]: Then they'll just be like, okay, how high can you make this score?

Flo [00:17:04]: Yeah, that's a good point. And I think that's, again, the beauty of this on-rails phenomenon. It's like, think of the equivalent, the prompt equivalent of this Lindy here, for example, that we're looking at. It'd be monstrous. And the odds that it gets it right are so low. But here, because we're really holding the agent's hand step by step by step, it's actually super reliable. Yeah.

Swyx [00:17:22]: And is it all structured output-based? Yeah. As far as possible? Basically. Like, there's no non-structured output?

Flo [00:17:27]: There is. So, for example, here, this AI agent step, right, or this send message step, sometimes it gets to... That's just plain text.

Swyx [00:17:35]: That's right.

Flo [00:17:36]: Yeah. So I'll give you an example. Maybe it's TMI. I'm having blood pressure issues these days. And so this Lindy here, I give it my blood pressure readings, and it updates a log that I have of my blood pressure that it sends to my doctor.

Swyx [00:17:49]: Oh, so every Lindy comes with a to-do list?

Flo [00:17:52]: Yeah. Every Lindy has its own task history. Huh. Yeah. And so you can see here, this is my main Lindy, my personal assistant, and I've told it, where is this? There is a point where I'm like, if I am giving you a health-related fact, right here, I'm giving you health information, so then you update this log that I have in this Google Doc, and then you send me a message. And you can see, I've actually not configured this send message node. I haven't told it what to send me a message for. Right? And you can see, it's actually lecturing me. It's like, I'm giving it my blood pressure ratings. It's like, hey, it's a bit high. Here are some lifestyle changes you may want to consider.

Alessio [00:18:27]: I think maybe this is the most confusing or new thing for people. So even I use Lindy and I didn't even know you could have multiple workflows in one Lindy. I think the mental model is kind of like the Zapier workflows. It starts and it ends. It doesn't choose between. How do you think about what's a Lindy versus what's a sub-function of a Lindy? Like, what's the hierarchy?

Flo [00:18:48]: Yeah. Frankly, I think the line is a little arbitrary. It's kind of like when you code, like when do you start to create a new class versus when do you overload your current class. I think of it in terms of like jobs to be done and I think of it in terms of who is the Lindy serving. This Lindy is serving me personally. It's really my day-to-day Lindy. I give it a bunch of stuff, like very easy tasks. And so this is just the Lindy I go to. Sometimes when a task is really more specialized, so for example, I have this like summarizer Lindy or this designer recruiter Lindy. These tasks are really beefy. I wouldn't want to add this to my main Lindy, so I just created a separate Lindy for it. Or when it's a Lindy that serves another constituency, like our customer support Lindy, I don't want to add that to my personal assistant Lindy. These are two very different Lindys.

Alessio [00:19:31]: And you can call a Lindy from within another Lindy. That's right. You can kind of chain them together.

Flo [00:19:36]: Lindys can work together, absolutely.

Swyx [00:19:38]: A couple more things for the video portion. I noticed you have a podcast follower. We have to ask about that. What is that?

Flo [00:19:46]: So this one wakes me up every... So wakes herself up every week. And she sends me... So she woke up yesterday, actually. And she searches for Lenny's podcast. And she looks for like the latest episode on YouTube. And once she finds it, she transcribes the video and then she sends me the summary by email. I don't listen to podcasts as much anymore. I just like read these summaries. Yeah.

Alessio [00:20:09]: We should make a latent space Lindy. Marketplace.

Swyx [00:20:12]: Yeah. And then you have a whole bunch of connectors. I saw the list briefly. Any interesting one? Complicated one that you're proud of? Anything that you want to just share? Connector stories.

Flo [00:20:23]: So many of our workflows are about meeting scheduling. So we had to build some very open unity tools around meeting scheduling. So for example, one that is surprisingly hard is this find available times action. You would not believe... This is like a thousand lines of code or something. It's just a very beefy action. And you can pass it a bunch of parameters about how long is the meeting? When does it start? When does it end? What are the meetings? The weekdays in which I meet? How many time slots do you return? What's the buffer between my meetings? It's just a very, very, very complex action. I really like our GitHub action. So we have a Lindy PR reviewer. And it's really handy because anytime any bug happens... So the Lindy reads our guidelines on Google Docs. By now, the guidelines are like 40 pages long or something. And so every time any new kind of bug happens, we just go to the guideline and we add the lines. Like, hey, this has happened before. Please watch out for this category of bugs. And it's saving us so much time every day.

Alessio [00:21:19]: There's companies doing PR reviews. Where does a Lindy start? When does a company start? Or maybe how do you think about the complexity of these tasks when it's going to be worth having kind of like a vertical standalone company versus just like, hey, a Lindy is going to do a good job 99% of the time?

Flo [00:21:34]: That's a good question. We think about this one all the time. I can't say that we've really come up with a very crisp articulation of when do you want to use a vertical tool versus when do you want to use a horizontal tool. I think of it as very similar to the internet. I find it surprising the extent to which a horizontal search engine has won. But I think that Google, right? But I think the even more surprising fact is that the horizontal search engine has won in almost every vertical, right? You go through Google to search Reddit. You go through Google to search Wikipedia. I think maybe the biggest exception is e-commerce. Like you go to Amazon to search e-commerce, but otherwise you go through Google. And I think that the reason for that is because search in each vertical has more in common with search than it does with each vertical. And search is so expensive to get right. Like Google is a big company that it makes a lot of sense to aggregate all of these different use cases and to spread your R&D budget across all of these different use cases. I have a thesis, which is, it's a really cool thesis for Lindy, is that the same thing is true for agents. I think that by and large, in a lot of verticals, agents in each vertical have more in common with agents than they do with each vertical. I also think there are benefits in having a single agent platform because that way your agents can work together. They're all like under one roof. That way you only learn one platform and so you can create agents for everything that you want. And you don't have to like pay for like a bunch of different platforms and so forth. So I think ultimately, it is actually going to shake out in a way that is similar to search in that search is everywhere on the internet. Every website has a search box, right? So there's going to be a lot of vertical agents for everything. I think AI is going to completely penetrate every category of software. But then I also think there are going to be a few very, very, very big horizontal agents that serve a lot of functions for people.

Swyx [00:23:14]: That is actually one of the questions that we had about the agent stuff. So I guess we can transition away from the screen and I'll just ask the follow-up, which is, that is a hot topic. You're basically saying that the current VC obsession of the day, which is vertical AI enabled SaaS, is mostly not going to work out. And then there are going to be some super giant horizontal SaaS.

Flo [00:23:34]: Oh, no, I'm not saying it's either or. Like SaaS today, vertical SaaS is huge and there's also a lot of horizontal platforms. If you look at like Airtable or Notion, basically the entire no-code space is very horizontal. I mean, Loom and Zoom and Slack, there's a lot of very horizontal tools out there. Okay.

Swyx [00:23:49]: I was just trying to get a reaction out of you for hot takes. Trying to get a hot take.

Flo [00:23:54]: No, I also think it is natural for the vertical solutions to emerge first because it's just easier to build. It's just much, much, much harder to build something horizontal. Cool.

Swyx [00:24:03]: Some more Lindy-specific questions. So we covered most of the top use cases and you have an academy. That was nice to see. I also see some other people doing it for you for free. So like Ben Spites is doing it and then there's some other guy who's also doing like lessons. Yeah. Which is kind of nice, right? Yeah, absolutely. You don't have to do any of that.

Flo [00:24:20]: Oh, we've been seeing it more and more on like LinkedIn and Twitter, like people posting their Lindys and so forth.

Swyx [00:24:24]: I think that's the flywheel that you built the platform where creators see value in allying themselves to you. And so then, you know, your incentive is to make them successful so that they can make other people successful and then it just drives more and more engagement. Like it's earned media. Like you don't have to do anything.

Flo [00:24:39]: Yeah, yeah. I mean, community is everything.

Swyx [00:24:41]: Are you doing anything special there? Any big wins?

Flo [00:24:44]: We have a Slack community that's pretty active. I can't say we've invested much more than that so far.

Swyx [00:24:49]: I would say from having, so I have some involvement in the no-code community. I would say that Webflow going very hard after no-code as a category got them a lot more allies than just the people using Webflow. So it helps you to grow the community beyond just Lindy. And I don't know what this is called. Maybe it's just no-code again. Maybe you want to call it something different. But there's definitely an appetite for this and you are one of a broad category, right? Like just before you, we had Dust and, you know, they're also kind of going after a similar market. Zapier obviously is not going to try to also compete with you. Yeah. There's no question there. It's just like a reaction about community. Like I think a lot about community. Lanespace is growing the community of AI engineers. And I think you have a slightly different audience of, I don't know what.

Flo [00:25:33]: Yeah. I think the no-code tinkerers is the community. Yeah. It is going to be the same sort of community as what Webflow, Zapier, Airtable, Notion to some extent.

Swyx [00:25:43]: Yeah. The framing can be different if you were, so I think tinkerers has this connotation of not serious or like small. And if you framed it to like no-code EA, we're exclusively only for CEOs with a certain budget, then you just have, you tap into a different budget.

Flo [00:25:58]: That's true. The problem with EA is like, the CEO has no willingness to actually tinker and play with the platform.

Swyx [00:26:05]: Maybe Andrew's doing that. Like a lot of your biggest advocates are CEOs, right?

Flo [00:26:09]: A solopreneur, you know, small business owners, I think Andrew is an exception. Yeah. Yeah, yeah, he is.

Swyx [00:26:14]: He's an exception in many ways. Yep.

Alessio [00:26:16]: Just before we wrap on the use cases, is Rick rolling your customers? Like a officially supported use case or maybe tell that story?

Flo [00:26:24]: It's one of the main jobs to be done, really. Yeah, we woke up recently, so we have a Lindy obviously doing our customer support and we do check after the Lindy. And so we caught this email exchange where someone was asking Lindy for video tutorials. And at the time, actually, we did not have video tutorials. We do now on the Lindy Academy. And Lindy responded to the email. It's like, oh, absolutely, here's a link. And we were like, what? Like, what kind of link did you send? And so we clicked on the link and it was a recall. We actually reacted fast enough that the customer had not yet opened the email. And so we reacted immediately. Like, oh, hey, actually, sorry, this is the right link. And so the customer never reacted to the first link. And so, yeah, I tweeted about that. It went surprisingly viral. And I checked afterwards in the logs. We did like a database query and we found, I think, like three or four other instances of it having happened before.

Swyx [00:27:12]: That's surprisingly low.

Flo [00:27:13]: It is low. And we fixed it across the board by just adding a line to the system prompt that's like, hey, don't recall people, please don't recall.

Swyx [00:27:21]: Yeah, yeah, yeah. I mean, so, you know, you can explain it retroactively, right? Like, that YouTube slug has been pasted in so many different corpuses that obviously it learned to hallucinate that.

Alessio [00:27:31]: And it pretended to be so many things. That's the thing.

Swyx [00:27:34]: I wouldn't be surprised if that takes one token. Like, there's this one slug in the tokenizer and it's just one token.

Flo [00:27:41]: That's the idea of a YouTube video.

Swyx [00:27:43]: Because it's used so much, right? And you have to basically get it exactly correct. It's probably not. That's a long speech.

Flo [00:27:52]: It would have been so good.

Alessio [00:27:55]: So this is just a jump maybe into evals from here. How could you possibly come up for an eval that says, make sure my AI does not recall my customer? I feel like when people are writing evals, that's not something that they come up with. So how do you think about evals when it's such like an open-ended problem space?

Flo [00:28:12]: Yeah, it is tough. We built quite a bit of infrastructure for us to create evals in one click from any conversation history. So we can point to a conversation and we can be like, in one click we can turn it into effectively a unit test. It's like, this is a good conversation. This is how you're supposed to handle things like this. Or if it's a negative example, then we modify a little bit the conversation after generating the eval. So it's very easy for us to spin up this kind of eval.

Alessio [00:28:36]: Do you use an off-the-shelf tool which is like Brain Trust on the podcast? Or did you just build your own?

Flo [00:28:41]: We unfortunately built our own. We're most likely going to switch to Brain Trust. Well, when we built it, there was nothing. Like there was no eval tool, frankly. I mean, we started this project at the end of 2022. It was like, it was very, very, very early. I wouldn't recommend it to build your own eval tool. There's better solutions out there and our eval tool breaks all the time and it's a nightmare to maintain. And that's not something we want to be spending our time on.

Swyx [00:29:04]: I was going to ask that basically because I think my first conversations with you about Lindy was that you had a strong opinion that everyone should build their own tools. And you were very proud of your evals. You're kind of showing off to me like how many evals you were running, right?

Flo [00:29:16]: Yeah, I think that was before all of these tools came around. I think the ecosystem has matured a fair bit.

Swyx [00:29:21]: What is one thing that Brain Trust has nailed that you always struggled to do?

Flo [00:29:25]: We're not using them yet, so I couldn't tell. But from what I've gathered from the conversations I've had, like they're doing what we do with our eval tool, but better.

Swyx [00:29:33]: And like they do it, but also like 60 other companies do it, right? So I don't know how to shop apart from brand. Word of mouth.

Flo [00:29:41]: Same here.

Swyx [00:29:42]: Yeah, like evals or Lindys, there's two kinds of evals, right? Like in some way, you don't have to eval your system as much because you've constrained the language model so much. And you can rely on open AI to guarantee that the structured outputs are going to be good, right? We had Michelle sit where you sit and she explained exactly how they do constraint grammar sampling and all that good stuff. So actually, I think it's more important for your customers to eval their Lindys than you evaling your Lindy platform because you just built the platform. You don't actually need to eval that much.

Flo [00:30:14]: Yeah. In an ideal world, our customers don't need to care about this. And I think the bar is not like, look, it needs to be at 100%. I think the bar is it needs to be better than a human. And for most use cases we serve today, it is better than a human, especially if you put it on Rails.

Swyx [00:30:30]: Is there a limiting factor of Lindy at the business? Like, is it adding new connectors? Is it adding new node types? Like how do you prioritize what is the most impactful to your company?

Flo [00:30:41]: Yeah. The raw capabilities for sure are a big limit. It is actually shocking the extent to which the model is no longer the limit. It was the limit a year ago. It was too expensive. The context window was too small. It's kind of insane that we started building this when the context windows were like 4,000 tokens. Like today, our system prompt is more than 4,000 tokens. So yeah, the model is actually very much not a limit anymore. It almost gives me pause because I'm like, I want the model to be a limit. And so no, the integrations are ones, the core capabilities are ones. So for example, we are investing in a system that's basically, I call it like the, it's a J hack. Give me these names, like the poor man's RLHF. So you can turn on a toggle on any step of your Lindy workflow to be like, ask me for confirmation before you actually execute this step. So it's like, hey, I receive an email, you send a reply, ask me for confirmation before actually sending it. And so today you see the email that's about to get sent and you can either approve, deny, or change it and then approve. And we are making it so that when you make a change, we are then saving this change that you're making or embedding it in the vector database. And then we are retrieving these examples for future tasks and injecting them into the context window. So that's the kind of capability that makes a huge difference for users. That's the bottleneck today. It's really like good old engineering and product work.

Swyx [00:31:52]: I assume you're hiring. We'll do a call for hiring at the end.

Alessio [00:31:54]: Any other comments on the model side? When did you start feeling like the model was not a bottleneck anymore? Was it 4.0? Was it 3.5? 3.5.

Flo [00:32:04]: 3.5 Sonnet, definitely. I think 4.0 is overhyped, frankly. We don't use 4.0. I don't think it's good for agentic behavior. Yeah, 3.5 Sonnet is when I started feeling that. And then with prompt caching with 3.5 Sonnet, like that fills the cost, cut the cost again. Just cut it in half. Yeah.

Swyx [00:32:21]: Your prompts are... Some of the problems with agentic uses is that your prompts are kind of dynamic, right? Like from caching to work, you need the front prefix portion to be stable.

Flo [00:32:32]: Yes, but we have this append-only ledger paradigm. So every node keeps appending to that ledger and every filled node inherits all the context built up by all the previous nodes. And so we can just decide, like, hey, every X thousand nodes, we trigger prompt caching again.

Swyx [00:32:47]: Oh, so you do it like programmatically, not all the time.

Flo [00:32:50]: No, sorry. Anthropic manages that for us. But basically, it's like, because we keep appending to the prompt, the prompt caching works pretty well.

Alessio [00:32:57]: We have this small podcaster tool that I built for the podcast and I rewrote all of our prompts because I noticed, you know, I was inputting stuff early on. I wonder how much more money OpenAN and Anthropic are making just because people don't rewrite their prompts to be like static at the top and like dynamic at the bottom.

Flo [00:33:13]: I think that's the remarkable thing about what we're having right now. It's insane that these companies are routinely cutting their costs by two, four, five. Like, they basically just apply constraints. They want people to take advantage of these innovations. Very good.

Swyx [00:33:25]: Do you have any other competitive commentary? Commentary? Dust, WordWare, Gumloop, Zapier? If not, we can move on.

Flo [00:33:31]: No comment.

Alessio [00:33:32]: I think the market is,

Flo [00:33:33]: look, I mean, AGI is coming. All right, that's what I'm talking about.

Swyx [00:33:38]: I think you're helping. Like, you're paving the road to AGI.

Flo [00:33:41]: I'm playing my small role. I'm adding my small brick to this giant, giant, giant castle. Yeah, look, when it's here, we are going to, this entire category of software is going to create, it's going to sound like an exaggeration, but it is a fact it is going to create trillions of dollars of value in a few years, right? It's going to, for the first time, we're actually having software directly replace human labor. I see it every day in sales calls. It's like, Lindy is today replacing, like, we talk to even small teams. It's like, oh, like, stop, this is a 12-people team here. I guess we'll set up this Lindy for one or two days, and then we'll have to decide what to do with this 12-people team. And so, yeah. To me, there's this immense uncapped market opportunity. It's just such a huge ocean, and there's like three sharks in the ocean. I'm focused on the ocean more than on the sharks.

Swyx [00:34:25]: So we're moving on to hot topics, like, kind of broadening out from Lindy, but obviously informed by Lindy. What are the high-order bits of good agent design?

Flo [00:34:31]: The model, the model, the model, the model. I think people fail to truly, and me included, they fail to truly internalize the bitter lesson. So for the listeners out there who don't know about it, it's basically like, you just scale the model. Like, GPUs go brr, it's all that matters. I think it also holds for the cognitive architecture. I used to be very cognitive architecture-filled, and I was like, ah, and I was like a critic, and I was like a generator, and all this, and then it's just like, GPUs go brr, like, just like let the model do its job. I think we're seeing it a little bit right now with O1. I'm seeing some tweets that say that the new 3.5 SONNET is as good as O1, but with none of all the crazy...

Swyx [00:35:09]: It beats O1 on some measures. On some reasoning tasks. On AIME, it's still a lot lower. Like, it's like 14 on AIME versus O1, it's like 83.

Flo [00:35:17]: Got it. Right. But even O1 is still the model. Yeah.

Swyx [00:35:22]: Like, there's no cognitive architecture on top of it.

Flo [00:35:23]: You can just wait for O1 to get better.

Alessio [00:35:25]: And so, as a founder, how do you think about that, right? Because now, knowing this, wouldn't you just wait to start Lindy? You know, you start Lindy, it's like 4K context, the models are not that good. It's like, but you're still kind of like going along and building and just like waiting for the models to get better. How do you today decide, again, what to build next, knowing that, hey, the models are going to get better, so maybe we just shouldn't focus on improving our prompt design and all that stuff and just build the connectors instead or whatever? Yeah.

Flo [00:35:51]: I mean, that's exactly what we do. Like, all day, we always ask ourselves, oh, when we have a feature idea or a feature request, we ask ourselves, like, is this the kind of thing that just gets better while we sleep because models get better? I'm reminded, again, when we started this in 2022, we spent a lot of time because we had to around context pruning because 4,000 tokens is really nothing. You really can't do anything with 4,000 tokens. All that work was throwaway work. Like, now it's like it was for nothing, right? Now we just assume that infinite context windows are going to be here in a year or something, a year and a half, and infinitely cheap as well, and dynamic compute is going to be here. Like, we just assume all of these things are going to happen, and so we really focus, our job to be done in the industry is to provide the input and output to the model. I really compare it all the time to the PC and the CPU, right? Apple is busy all day. They're not like a CPU wrapper. They have a lot to build, but they don't, well, now actually they do build the CPU as well, but leaving that aside, they're busy building a laptop. It's just a lot of work to build these things. It's interesting because, like,

Swyx [00:36:45]: for example, another person that we're close to, Mihaly from Repl.it, he often says that the biggest jump for him was having a multi-agent approach, like the critique thing that you just said that you don't need, and I wonder when, in what situations you do need that and what situations you don't. Obviously, the simple answer is for coding, it helps, and you're not coding, except for, are you still generating code? In Indy? Yeah.

Flo [00:37:09]: No, we do. Oh, right. No, no, no, the cognitive architecture changed. We don't, yeah.

Swyx [00:37:13]: Yeah, okay. For you, you're one shot, and you chain tools together, and that's it. And if the user really wants

Flo [00:37:18]: to have this kind of critique thing, you can also edit the prompt, you're welcome to. I have some of my Lindys, I've told them, like, hey, be careful, think step by step about what you're about to do, but that gives you a little bump for some use cases, but, yeah.

Alessio [00:37:30]: What about unexpected model releases? So, Anthropic released computer use today. Yeah. I don't know if many people were expecting computer use to come out today. Do these things make you rethink how to design, like, your roadmap and things like that, or are you just like, hey, look, whatever, that's just, like, a small thing in their, like, AGI pursuit, that, like, maybe they're not even going to support, and, like, it's still better for us to build our own integrations into systems and things like that. Because maybe people will say, hey, look, why am I building all these API integrations

Flo [00:38:02]: when I can just do computer use and never go to the product? Yeah. No, I mean, we did take into account computer use. We were talking about this a year ago or something, like, we've been talking about it as part of our roadmap. It's been clear to us that it was coming, My philosophy about it is anything that can be done with an API must be done by an API or should be done by an API for a very long time. I think it is dangerous to be overly cavalier about improvements of model capabilities. I'm reminded of iOS versus Android. Android was built on the JVM. There was a garbage collector, and I can only assume that the conversation that went down in the engineering meeting room was, oh, who cares about the garbage collector? Anyway, Moore's law is here, and so that's all going to go to zero eventually. Sure, but in the meantime, you are operating on a 400 MHz CPU. It was like the first CPU on the iPhone 1, and it's really slow, and the garbage collector is introducing a tremendous overhead on top of that, especially a memory overhead. For the longest time, and it's really only been recently that Android caught up to iOS in terms of how smooth the interactions were, but for the longest time, Android phones were significantly slower

Swyx [00:39:07]: and laggier

Flo [00:39:08]: and just not feeling as good as iOS devices. Look, when you're talking about modules and magnitude of differences in terms of performance and reliability, which is what we are talking about when we're talking about API use versus computer use, then you can't ignore that, right? And so I think we're going to be in an API use world for a while.

Swyx [00:39:27]: O1 doesn't have API use today. It will have it at some point, and it's on the roadmap. There is a future in which OpenAI goes much harder after your business, your market, than it is today. Like, ChatGPT, it's its own business. All they need to do is add tools to the ChatGPT, and now they're suddenly competing with you. And by the way, they have a GPT store where a bunch of people have already configured their tools to fit with them. Is that a concern?

Flo [00:39:56]: I think even the GPT store, in a way, like the way they architect it, for example, their plug-in systems are actually grateful because we can also use the plug-ins. It's very open. Now, again, I think it's going to be such a huge market. I think there's going to be a lot of different jobs to be done. I know they have a huge enterprise offering and stuff, but today, ChatGPT is a consumer app. And so, the sort of flow detail I showed you, this sort of workflow, this sort of use cases that we're going after, which is like, we're doing a lot of lead generation and lead outreach and all of that stuff. That's not something like meeting recording, like Lindy Today right now joins your Zoom meetings and takes notes, all of that stuff.

Swyx [00:40:34]: I don't see that so far

Flo [00:40:35]: on the OpenAI roadmap.

Swyx [00:40:36]: Yeah, but they do have an enterprise team that we talk to You're hiring GMs?

Flo [00:40:42]: We did.

Swyx [00:40:43]: It's a fascinating way to build a business, right? Like, what should you, as CEO, be in charge of? And what should you basically hire

Flo [00:40:52]: a mini CEO to do? Yeah, that's a good question. I think that's also something we're figuring out. The GM thing was inspired from my days at Uber, where we hired one GM per city or per major geo area. We had like all GMs, regional GMs and so forth. And yeah, Lindy is so horizontal that we thought it made sense to hire GMs to own each vertical and the go-to market of the vertical and the customization of the Lindy templates for these verticals and so forth. What should I own as a CEO? I mean, the canonical reply here is always going to be, you know, you own the fundraising, you own the culture, you own the... What's the rest of the canonical reply? The culture, the fundraising.

Swyx [00:41:29]: I don't know,

Flo [00:41:30]: products. Even that, eventually, you do have to hand out. Yes, the vision, the culture, and the foundation. Well, you've done your job as a CEO. In practice, obviously, yeah, I mean, all day, I do a lot of product work still and I want to keep doing product work for as long as possible.

Swyx [00:41:48]: Obviously, like you're recording and managing the team. Yeah.

Flo [00:41:52]: That one feels like the most automatable part of the job, the recruiting stuff.

Swyx [00:41:56]: Well, yeah. You saw my

Flo [00:41:59]: design your recruiter here. Relationship between Factorio and building Lindy. We actually very often talk about how the business of the future is like a game of Factorio. Yeah. So, in the instance, it's like Slack and you've got like 5,000 Lindys in the sidebar and your job is to somehow manage your 5,000 Lindys. And it's going to be very similar to company building because you're going to look for like the highest leverage way to understand what's going on in your AI company and understand what levels do you have to make impact in that company. So, I think it's going to be very similar to like a human company except it's going to go infinitely faster. Today, in a human company, you could have a meeting with your team and you're like, oh, I'm going to build a facility and, you know, now it's like, okay,

Swyx [00:42:40]: boom, I'm going to spin up 50 designers. Yeah. Like, actually, it's more important that you can clone an existing designer that you know works because the hiring process, you cannot clone someone because every new person you bring in is going to have their own tweaks

Flo [00:42:54]: and you don't want that. Yeah.

Swyx [00:42:56]: That's true. You want an army of mindless drones

Flo [00:42:59]: that all work the same way.

... [Content truncated due to size limits]

Read full article

Agents @ Work: Lindy.ai