Building Responsible AI for the New Era: A Conversation with Ece Kamar, Director of AI Frontiers Lab at Microsoft Research Artwork

From Startup to Exit

Welcome to the Startup to Exit podcast where we bring you world-class entrepreneurs and VCs to share their hard-earned success stories and secrets. This podcast has been brought to you by TiE Seattle. TiE is a global non-profit that focuses on fostering entrepreneurship. TiE Seattle offers a range of programs including the GoVertical Startup Creation Weekend, TiE Entrepreneur Institute, and the TiE Seattle Angel Network. We encourage you to become a TiE member so you can gain access to these great programs. To become a member, please visit www.Seattle.tie.org.

All Episodes

From Startup to Exit

Building Responsible AI for the New Era: A Conversation with Ece Kamar, Director of AI Frontiers Lab at Microsoft Research

November 17, 2025 • TiE Seattle • Season 1 • Episode 30

0:00 | 58:36

Send us Fan Mail

In this episode, we sit down with Dr. Ece Kamar, Director of the AI Frontiers Lab at Microsoft Research, and one of the leading voices in Responsible AI.

With over 15 years at Microsoft, Ece has played a pivotal role in shaping how AI systems are designed to collaborate with humans safely and effectively. She shares her journey from earning a PhD at Harvard University, where she researched how AI agents can become human partners, to leading responsible AI efforts at Microsoft and tackling challenges like fairness, reliability, safety, and hallucinations in large-scale models such as GPT-4.

In this insightful conversation, Ece talks about:
- How early research in human-AI collaboration laid the foundation for today’s AI agents
- The evolution of AI through scalable data, deep learning, and generative models
- Her pioneering work on Responsible AI frameworks at Microsoft
- The emerging challenges and opportunities in the era of generative AI

Whether you’re an entrepreneur, technologist, or researcher, this episode offers an inspiring look into how responsible innovation will define the next generation of AI systems. Tune in to learn how human-centered AI is shaping the future - responsibly.

To ensure you don't miss other episodes like this, subscribe to our podcast!

Linkedin: https://www.linkedin.com/in/ecekamar/
Sponsored by: JPMC Private Banking

Brought to you by TiE Seattle
Hosts: Shirish Nadkarni and Gowri Shankar
Producers: Minee Verma and Eesha Jain
YouTube Channel: https://www.youtube.com/@fromstartuptoexitpodcast

SPEAKER_01 0:01

In this conversation, Ichie Kumar, a distinguished scientist at Microsoft Research, discusses her journey in AI, the evolution of AI agents, and the impact of generative AI on society. She explains the capabilities of AI agents compared to traditional chatbots, the development of the Auditogen Framework for multi-agent orchestration, and the applications of AI in healthcare and enterprises. Kamar emphasizes the importance of APIs for AI interaction, the potential for agents in the physical world, and the ethical considerations surrounding development. She also shares insights on the rapid progression of AI technology, the role of open source and innovation, and offers advice for entrepreneurs in the AI space.

SPEAKER_02 0:50

Welcome to the Startup to Exit podcast, where we will bring you world-class entrepreneurs and DCs to share their hard-earned success stories and secrets. This podcast has been brought to you by TIE Seattle. We encourage you to become a Thai member so you can gain access to these great programs. To become a member, please visit www.seattle.tai.org.

SPEAKER_04 1:31

Now, a word from our sponsor. At JPMorgan Private Bank, wealth is understood to be more than just numbers. It's about creating a legacy, achieving dreams, and securing a family future. With over 3.1 trillion in client assets under management globally, JP Morgan Private Bank is committed to providing customized financial advice that aligns with each client's unique goals. Clients benefit from a personalized approach, working closely with experts in philanthropy, family office management, fiduciary services, and special advisory services. The firm emphasizes building a lasting relationship and ensuring that every financial strategy evolves with the needs of clients and their families. JP Morgan Private Bank not only manages wealth, but also helps clients create a meaningful impact. Whether individuals seek to support charitable causes, manage a family legacy, explore new investment opportunities, the private bank team is there to provide guidance every step of the way. With the presence of key financial markets worldwide, JP Morgan Private Bank offers unparalleled access to global insights and opportunities. Its commitment to excellence and innovation ensures that clients receive the highest level of service and expertise. Time is recognized as precious, and JP Morgan Private Bank values every moment spent with its client. The mission is to make the financial journey as seamless and rewarding as possible from the initial consultation to the ongoing management of apps. To learn how JP Morgan Private Bank can help clients achieve their goals, visit privatebank.jpmorgan Connect with a dedicated team of specialists. Again, privatebank.jpmorgan.com. Thank you.

SPEAKER_03 3:40

Welcome everyone. Today is our next installment of our generative AI series. I'm very pleased to welcome EJ Kamar, who is a distinguished scientist and VPN managing director of the AI Frontiers Lab at Microsoft Research. She has been instrumental in building the responsible AI efforts inside Microsoft. She's also an affiliate faculty in the Department of Computer Science and Engineering at the University of Washington, and she is a PhD from Harvard University. So welcome, EJ.

SPEAKER_00 4:08

Thanks for having me.

SPEAKER_03 4:09

Great. So let's start with your background. Tell us about your journey to Microsoft. What got you interested in Microsoft research and what you've been doing for the last 15 years.

SPEAKER_00 4:19

My journey started with my PhD work at Harvard under the advice of Professor Barbara Gross. And it is really interesting to think back right now because believe me or not, but my PhD thesis in the year of 2010 was about AI agents and how AI agents were working with people. We were running experiments to understand how we can build agents that people perceive as their partners. We were building decision-making mechanisms for these agents to make decisions about how to best support people and really understanding the fabric of collaboration. Of course, back in the day, we did not have the core technologies we have today, like foundation models, to be able to really build these systems at scale like the way we are doing it today. But these topics of AI agents and collaboration are not actually new, and they have decades of research backing them up from even earlier than my PhD days. But I've been always very much motivated by the question of how to build intelligence systems that can be useful for people. That's where my passion for this line of research came from. And as I was doing my PhD, I also had this desire to work on real-world problems. We are talking about a time now where AI was not as mainstream as it is today. And while there were some narrow applications of AI on search that was just emerging back then, again, its applicability was quite limited. So as I was doing my PhD, I was looking at papers where there were these real-world applications of AI, and I was running into very much papers coming from Microsoft Research at the time, especially the work that Eric Horowitz was doing. And I'm like, I need to work with these people, they are doing a lot of interesting work. Um, so that's how I actually found myself at Microsoft Research. I did internships, I became a Microsoft Research Fellow, and when it was time for me to have a permanent position, um, I joined Microsoft Research. I had a pretty interesting career here. I've been here for 15 years. When I was starting my journey at Microsoft, it was the beginning of scalable data. It was when human competition through platforms like Mechanical Turk or social media were just happening. And for the first time ever, we could have access to scalable data representing people. And that was just an amazing moment to be in the field to understand data, to look at data. And of course, soon enough, um, this data connected with hardware for machine learning through GPUs and deep learning algorithms having a new comeback. And then all these three factors came together. We started seeing first deep learning models. And through that, I actually got to see the impact of data on the behavior of machine learning models and later started connecting that to some of the responsible AI considerations around fairness, reliability, and safety. So that's how my work at Microsoft on responsible AI started. And of course, many of the things I worked on through the years gain a completely new meaning with the introduction of generative AI. And when I first gained access to a model that later became GPT-4, I was pulled in with my team to look into emerging responsible AI problems in those models like hallucinations or prompt injection attacks. And when we started looking at these models, it was evident that it was the beginning of a new era in AI. An era that will be bigger and much more interesting than the previous generations of these AI waves, and possibly one that is here to stay and transform everything that is around us.

SPEAKER_03 8:20

That's great. So let's start with some um basics. Uh how do you define an AI agent?

SPEAKER_00 8:27

When we think about AI agents, we think about computational systems that are able to perceive the world and act in that world and accomplish tasks through that perception action. There are certain things that separate agents from kind of the chatbots that this technology first started from. When we are in a chatbot environment, um mechanisms there are very passive. We have to initiate the interaction and we receive an answer and everything stops until we do it again, we ask another question. And also what those computational systems can do is limit it to just generating language. Later it became about generating code, but that's mainly it. When we are talking about agents, we are talking about autonomy, agents that are able to take a task and reason effectively about what it's going to take to make that task happen. And also actively interact with the world by again acting in it, taking an action instead of just generating language, um, perceiving how their actions are changing the world, and again, reason about what is the next best thing to do so that the state of the world gets closer to the desired uh task completion. And one of the important things here about agents is the concept of agency. Again, when we are talking about agents, we are not talking about passive entities, we are talking about entities that can actually act in the world and have autonomy in terms of achieving their goals. Those are the main things that separate the current wave of agents from what happened in the in the chatbot world.

SPEAKER_03 10:14

Okay, that's great. So at Microsoft Research, um, you have built an auto-gen framework to build agents. Uh, can you tell us more about these capabilities and then um you know what are some of the interesting applications that you've seen emerge from this framework?

SPEAKER_00 10:31

Yeah, yeah. So my organization at Microsoft Research AI Frontiers, we are always thinking about how this technology is gonna, how AI technologies are gonna evolve and change to get more done. Because as I said, my focus has always been creating these applications that can create immense value for people. That's where my motivation really comes from, and many of my team members share the same passion. And when we were trying to get things done and build these reliable big applications with models like GPT-4 a few years ago, we've quickly realized that by just prompting these models, we are not able to get to the applications we wanted to build. So one of the we actually started working on a programming paradigm where we said, what if we were actually like instead of just prompting a model, we could start using multi-agent orchestration as a way of getting more things done with these models and also have ways of overseeing what the system is doing. That's how Autogan came up to me as a really a programming paradigm. So what we did there was that once we had a complex task, we could decompose it into smaller pieces and think about what kind of capabilities are needed for different parts of the task, spin off agents with specialized prompts, specialized memories, specialized tools to be able to carry out the different parts of the task, and use agents chat interface, their internal chats with each other as a way of composing the problem pieces together and get things done. So some of the known agenc patterns like being able to decompose the problem and compose it back, or having agents looking over the solution, being able to verify and validate the solution quality or provide feedback to other agents in terms of how they are doing. These are all patterns that could be represented by this multi-agent abstraction of using the models. So that's where we started. In some ways, many of the things we want to do with these agents are inherently multi-agent. For example, when we are trying to solve a problem in an organization today, instead of just like decomposing the problem to temporary agents, we we imagine a world where there are going to be these agents that specialize and become persistent members of our organizations and teams. So although our work on Autogen started from this programming abstraction, through working on it with many of the open source partners, enterprises using Autogen for doing their work, we quickly came to an understanding of multi-agent work where we are actually going to be converging to a different future where we are going to have persistent agents representing different expertise to become part of organizations, to become part of the workforce. And these agents will be able to come together and with people in organic and flexible ways to get the work done. And that's how OttoGan evolved and changed shape to power a completely different way of working now, a notion that internally we have been calling society of agents, that is really hinted into how we imagine work is going to be shaping up into the future. In fact, if you look into some of the emerging protocols like MCP and A2A coming from MontrePec or Google, these are basically some of the basic mechanisms defining how these persistent agents are going to be talking to each other to form this potential ecosystem of agent. You ask about some of the interesting things we've seen with multi-agents. So one of the really interesting things about building Autogen and releasing it to the world has been just seeing what people do with it. And in fact, a lot of our inspiration for kind of predicting where this multi-agent work is gonna go came from observing what people do with Autogen today, or or um in the last 18 months since Autogen is released. So one of the really cool applications that Satya actually has demoed at Build last week of multi-agent scenarios, not necessarily Autogen, but just multi-agent orchestration, has been this tumor board example. And in this tumor board scenario, it is basically an AI-augmented instantiation of tumor boards that happen at hospitals all over the world where there are these patients, cancer patients with complicated cases that may not be straightforward for a single doctor to come up with the treatment plan. Doctors of representing different specialties come together at these tumor board meetings to discuss this particular case. The problem is that, as you can imagine, the time of these specialists are very limited, and that limits how many patients can be seen at these tumor boards. In collaboration with the Stanford Hospital, the Microsoft research researchers have developed these multi-agent systems where different agents represent and support the different specialists at these hospitals and really facilitate and speed up and enrich the tumor board experience at these hospitals. That example is very close to my heart because when I was a PhD student, going back to your first question, you know, like a 20-something-year-old PhD student, one of the examples of multi-agent systems we tried to build and really like designed at the time, again with my PhD advisor Barbara Gross, was about the care coordination for chronic diseases. And we worked again at the time, out of complete coincidence, we worked with Stanfirst Children's Hospital to design this system. And the problem was that at the time we knew this was needed and this could be built with multi-agent systems, but we never had this generalizable logic supporting such a system. We did not have these models back then. So all of our designs were based on you know just ideas. It never became something running in the hospitals. And again, 15 years later, um again, researchers at Microsurf Research are able to design a very similar system based on multi-agent orchestration. And now with platforms through teams that are inherently supporting multi-agent work, these agents are able to run at hospitals. And this is just an amazing, almost a 360 for me to see. Some other interesting examples are how enterprises around the world are using agents using Autogen. We are seeing companies that are not even technology companies starting to build agents that are representing different expertise they have there in their organizations. They are creating marketing agents and copyright agents and sales agents and legal agents. And these agents are sitting side by side with their person counterpart and carrying out marketing campaigns and carrying out sales efforts and so on. And we started seeing this pattern of enterprise work happening actually going back as much as 18 months ago. And now this is becoming a much stronger pattern. The last thing I'll mention is that we are also seeing enterprises using multi-agent orchestration and work for simulation. And that's very interesting for us because in many ways the answer of a lot of complex problems comes from a simulation of a complex system that includes many autonomous entities. Think about an enterprise that is working with a lot of customers and a different kind of creating different kinds of traffic. And now again, they are able to use multi-agent orchestration to simulate and get prepared for different scenarios. So that's also very interesting for us.

SPEAKER_03 19:01

Great. So the um humor board uh scenario, you're saying that there are agents, the multiple agents that represent different specialties, and like doctors, they communicate with each other to come up with the best possible treatment plan. Is that a fair uh characterization?

SPEAKER_00 19:20

They support uh a humor specialist um by doing research on their part, by being able to present the case to the corresponding physicians based on what those specialists need to know. In the case of something like tumor board, of course, the intention is never taking taking the human experts out of the equation. Because these are the cases where, of course, anything that is decided will have a significant impact on the well-being of an individual. So this is this is a safety critical domain. And that's why in these domains, the design is actually having agents that are supporting and helping human physicians to be more effective, to have the latest information, most up-to-date information, doing research on their behalf, for example, going and saying, based on all of the articles that are qu coming up on PubMed, these are the ones that are closest to this patient, for example. Those are all patterns that these agents could develop. But imagine you can imagine in the future for domains that may not be safety critical, some of this work can maybe easier cases or cases that are not critical could be taken over by some of the agents. And maybe the final diagnosis and the results may be presented to people, human experts for approval. So you can imagine this coordination and collaboration having different shapes into the future depending on the objectives and also the confidence on the agents. This is, I think, the transition and the evolve evolution of multi-agent systems we are going to be seeing.

SPEAKER_03 21:00

So now to build these agents, I assume that the systems they're working with need to have some kind of APIs to interact with them. Or are you assuming that they would even, even if there are no APIs, they can simulate the screens and clicks and so forth to complete the task. For example, if you were to build a travel agent, um and you tell it, hey, go book this trip on this date. Uh, you know my favorite hotels and my airline, uh, go ahead and do that. But in order to do that, the agent would need to have API access to actually book those hotels and flights and so forth.

SPEAKER_00 21:40

Yeah, you're asking you are asking a fundamental question about what are gonna be the right mechanisms for AI agents to interact with the world that we live in. What are the communication patterns for these agents to interact with with our world? Right now, it is looking like there are two emerging ideas ever. One of them is that we are going to be able to create tools and APIs for these agents to have programmatic access to a lot of the a lot of the resources we use in our lives to be able to interact with the world. And again, the protocols, the emerging protocols are going to play a big role on it. For example, when you look into MCP, MCP defines a protocol for agents to access tools and APIs that are available in our world. And through that, many of the things that we would use to create programmatic access to something like a travel API now becomes available to agents as well. So that's one approach. And one of the interesting announcements from again Microsoft's build last week was this again technology that is called NL Web that could automate the creation of tools and APIs for agentic use through scraping websites. So if there is like a way of booking something through a website, there could be a mechanism that could scrape that website, find like what are the ways of interacting with that website and make it turn it into a tool for make it easy access for an agent. So that's definitely one path. The benefit of that path is that we know that when things are structured and programmatic, the reliability of getting things done that way is actually pretty easy. And that's why the whole field is going on this direction of programmatic access to tools and APIs. There is another emerging trend as well, which is providing agents the ability to be able to see, like we see, and providing agents the ability to interact with the world the way we do as people. What I mean by that is looking at a website, being able to understand what is on that website, what are the clickable items on that website, being able to take actions on websites like we do, clicking the buttons, being able to fill them, text box, and so forth. My team has been doing some interesting work on that side as well. For example, we have released the Omniparser models, which is a small model we have built for turning any LLM to a computer use agent. The small model is just 100 million parameters, it's super fast, and it's able to look at any website and extract intractable elements through it, and then be able to interact with any general LLM to really like decide what to do on that website and then take actions on it. Again, what is the benefit of that approach? That does not require APIs or tools, and it can interact with anything the way humans can. I believe a combination of these two approaches are gonna be how we are gonna be able to make the whole web and the whole world accessible to agents eventually. And then there's gonna be a question about how these agents go beyond of just web and computers to the physical world, and that's why there has been also a lot of excitement about robotics, both egg-tivators and perceptions. It's not a space my lab works on, but overall, just because eventually you want to translate this logic to the physical world as well, it is not hard to make a connection of what is happening in agents right now to the physical world eventually. Um, it's just, you know, perceive and acting in the physical world is significantly harder than the digital world. And that's why some of the other pieces need to come together to make that vision happen.

SPEAKER_03 25:52

So do you think that's very interesting that these agents can go into the physical world? Do you anticipate that uh robots can be trained uh by watching human actions and then letting them perform? Let's say, for example, you want to automate a kitchen in a restaurant, right, that does certain tasks like cutting onions or preparing a sauce or whatever, right? Could they be trained by watching a human being actually carry out those simple tasks and then stitch them together into a more complex task to automate that whole kitchen?

SPEAKER_00 26:24

At the abstract level, the task of cooking something at the kitchen is not any different than booking travel on a computer. Because what you will need at again, at the very abstract level, of the capabilities you need to carry out any of these activities is having an understanding of the task that you're trying to accomplish, being able to reason over the steps you need to be able to carry out that activity through the composite, decomposing it to atomic actions, and then have a way of perceiving the world and acting in the world in a loop. Again, like all of the agentic capabilities we talked about, is the same standard between doing something on your computer versus acting in the physical world. Where things really differ is how does perception and action happen in the physical world. Again, right now we are pretty advanced in terms of how we are perceiving things on the digital world and how we are acting in the digital world. Those two capabilities now need to be extended into the physical world. And similarly, the techniques we are developing right now in terms of learning will generalize as well. For example, how do we actually teach an agent to do things in the digital world right now? We are building reasoning models that are able to use demonstrations, something we call supervised fine-tuning and reinforcement learning, which is interacting through through the world to learn how to do something in the digital world. Again, the same techniques are going to be applicable in the physical world as well.

SPEAKER_04 28:04

All right. Uh over to you, Gabri. Oh, thanks. Let's uh kind of go back a little bit, right? In uh you said while you were doing research at Harvard uh for your PhD, working with your advisor, AI wasn't mainstream. Did you and your advisors or other researchers have a sense of when it might be mainstream and has then been faster than what you guys thought then, 2010 to 2025, versus you thought, oh, it's gonna take another 30 years, or you know, was there a was there a timeline and has it met the expectation you had for that time?

SPEAKER_00 28:42

You know, I don't have to go back to my PhD days. I'm even imagining 2020. When some of these generative models were first coming up, like the GPT-3 models. Even at the time, I think if you showed me the progression today, I wouldn't believe it. It has been that fast and that surprising. I'm sure there has been people who had a better inkling, a better feeling than I had about how fast things were gonna happen. But being honest, even sitting at a place like Microsoft doing AI research at Microsoft, I'm every day I'm surprised by how fast things are progressing and how real things are becoming. Um think the moments that just opened my eyes to the phase transition we had was when I had access to an early version of GPT-4. I remember interacting with this model that I could ask general questions to. And I could also adapt this behavior just based on what I was putting into its context and how interactive it was. It wasn't this like narrow communication channel anymore. I could, you know, I could say things in natural language and it could say things back in natural language. The moment I lived through some of the first interactions there, I'm like, this is a game changer and nothing is gonna be the same again. And that's why, with some of my colleagues like Sebastian Bebek, we ended up writing a paper called The Sparks of AGI at the time to really like tell to our academic friends this is good, this is the beginning of an era. And again, even three months before then, if you ask me whether I would write a paper that would talk about AGI, I would be like, Are you crazy? What are we talking about? It's like, you know, that should be like, we shouldn't even talk about it. And then once we had that model and play with it, it's like, yeah, things are gonna be so much faster than we thought. If you're asking about my PhD days, I wouldn't even dream of it.

SPEAKER_04 30:50

The adoption that we are talking about, not only the progression of the foundation model, and then the adoption by the world in general, and in ways that have yet to be imagined, but the adoption so far has been incredible. Even whether you're a researcher or not a researcher, just the way people are using it is beyond anybody's imagination at the time it was conceived. Is that because the the natural language component has become so uh versatile and easy to use? Was this the context of interaction or even the method of interaction that made it so easy for everybody to say, hey, I I'm going to use this every day in my life? I use it every day in my life. There isn't a day that goes by that I don't use it. I use it just to check things that I I absolutely know. I my years of wisdom tells me there can't be any changes, but I just still use it. Is that is there a that is that something researchers see it differently, or was this a context of why the medium of interaction helped it evolve?

SPEAKER_00 31:54

That's a great question, Ari. I think there has been two different trends that really ignited people's imaginations about this technology. I think one of them was what you're saying, which is it is so easy to use, and I can use it in natural language, and I can do so many things with it. I think that's why it became it became one of the most widely used technologies ever created, right? Like anybody that, you know, you can, and we did many of the examples of this in the Sparks of AGI paper too. Like you could ask it to write poems and do proofs and draw pictures and it's doing all of it in the same place. And I think there is just like the accessibility of the technology, as you said, is amazing. But I think the long-lasting effect of this technology is something different. I think as a as a technology field, we are realizing that these foundation models are starting a new computing paradigm. They are creating a new computing stack. Um, you know, you know how we used to build applications, computing applications in the past. We would have people writing up their logic as like if and else statements or for loops or whatever, and then we would build up the parts of the stack on top of it to be able to create applications. Now we have we have a generalizable problem-solving model. Right? Like it is it is fundamentally a different way of doing computation. And now we can build layers on top of it to build all of the applications we dreamed of, but we could never create and make them real today. That's what is really happening. It is the emergence of a new computing paradigm. And the work we have been doing, for example, on multi-agent systems is just putting another layer on top of this, these models, so that they can be even more powerful for building the applications that we would really like to build in the world. So for me right now, the work that is happening in AI for building those applications that I also use every day goes way beyond of just prompting these models. We are specializing models, we are creating data for the models, we are putting memory on top of the models, we are making the models access APIs and tools and having them talk and coordinate and collaborate in an extensible way with people. Those are all the additional layers that are coming on top of the models to create the computing stack that will become the stack of the world going forward. So that I think the major transition that we are after.

SPEAKER_04 34:42

Going back to the then to the models itself, there's two obviously there's open open source models and then there's open AI and uh anthropic, right? Now, philosophically speaking, there is differences in approaches, but it seems to me that at some point these models have convergence possibilities in the future as in setting aside whether they are open or not, but just the what they can and cannot deliver may be driven by factors that have nothing to do with the with the pure open source versus not open source philosophy. Is is that as a researcher from your point of view, does one philosophical approach versus other change anything? Because this is a very you're doing foundational research, it's not just applications within an enterprise.

SPEAKER_00 35:31

Yeah, there are different ways to look at your question about the closed, you know, closed models versus open models. One of the factors here is what do we imagine the model ecosystem to look like again for that computing paradigm that we just talked about? One of the predictions we made earlier was that yes, right now we have some models like GPT-4 that are at the frontier, but we always predicted that models would become commodities. They wouldn't become the differentiators, it wouldn't be like there's gonna be one model to rule it all. And in fact, that came through. What we are seeing right now is that there are, I think, foundry catalog at Microsoft has 1800 models in it. And we are no longer saying that there is one model that is the best on everything. There is actually like frontier models that are more expensive, but more general purpose and more powerful. But then we are able to create medium-sized or small models that, with particular specialization, they could actually do things faster and cheaper and in more efficient ways. And for developers, all 1800 of these models are available at their service. And they can choose one model to do one task, and they can choose another model to do something. And I believe the open source community is fueling a lot of innovation and specialization to create that diversity for the developer community, and it has been something beautiful. For example, my teams have been creating the FI family of models, and we are functioning in the small model space, and what we have shown from FI1 to 1.5 to 5.2 to 5.3, and finally 5.4 this year is that we could, by being smart about how we train those models and how we generate the training data for those models, we are able to pack so much power into these small models and get so much more efficient. Just a few weeks ago, we have released FIFO reasoning models, for example. They are all available in open source. And FIFO reasoning is a 14 billion parameter model that in some cases is able to batch the performance of the 1 billion parameter R1 model. What we could really do through innovation in the space is completely mind-blowing that a 14 billion parameter is able to match the performance of a model that is 50 times larger at its size. And all of that is happening because there is a very lively open source community right now. And again, the work that DeepSeek has done with these R1 models is also amazing. They are able to show the world that many of those reasoning capabilities can be replicated. There is nothing really unique, and they have released a recipe for it. And that has been amazing for us. And we got that and said that we could probably do it with small models and went ahead and did it with small models. That is just what is happening with the open source community right now. And not only that, this leads to having many models that are at different points in the scaling law, cheap, expensive, general, specialized. But also now we can bring in our academic partners into the picture. We can have, you know, the PhD students or professors at universities also bring in their ideas to it. And I think that's how we are going to get to much better models. The open source community has been just amazing, amazing for learning and innovation.

SPEAKER_04 39:11

So we kind of uh uh build on that particular thing with the developers in the open source community working so well and highly energetic. If you're advising entrepreneurs out there today who are you know looking at the space in general, right? They could build, seems like they could build many things, horizontal models, small models. But from a from your vantage of a researcher, just looking a little ahead, what advice would you give entrepreneurs to go solve for or take advantage of? Because there are so many things that can be done, but some are going to be done better than the others, or some are going to be more innovative than the others, without even getting to the base layer of this foundation model or silicone? They could just still solve a whole lot of things about it, seems like to me, that the combination of this age and framework and small models could be impactful from an enterprise uh from an enterprise perspective. But for an entrepreneur, that's gotta be exciting to go solve. As a researcher, what advice would you give entrepreneurs out there to go look for so they can, you know, advance their uh their own startups?

SPEAKER_00 40:22

Yeah, that's a great question. So I have two particular thoughts about that. One is it feels like we are at such an inflection point right now because particularly on agents, many core technology pieces are finally at the level of maturity for them to come together and build something integrative and cohesive. Just to give an example, right? We talked, like even in this conversation, we talked about reasoning models. Reasoning models are likely to become a brain of these future agents. Really bring in the tool calling and being able to retrieve information in active ways. The reasoning models is going to have a big role in terms of enabling the agents. We also talked about protocols and tool access and API access as a way for these agents to be able to access the wider world. And of course, there's a lot of innovation happening in specialized models and small models that we do as well that is helping developers to create these agentic systems that are powered by a diverse set of models to really like enable new experiences. One of the things I think about all the time is where is all of these technologies converging to and how they are gonna be composed together to create a future that is much bigger than what we are seeing today. Where are we really headed when all of these pieces come together? And I think asking that question is very fundamental for any entrepreneur to build something that's not only gonna be relevant today, but it's gonna be relevant going to the future as well. So, my bat on that question is really thinking about ecosystems, thinking about like how agents, people, what kind of a network they are gonna be able to create and how we are gonna be able to create outcomes that are much bigger than the sum of its parts as these pieces are gonna come together. So I would really advise any entrepreneur to really try to imagine, like I'm I'm trying to do that. It's not easy with how fast the technology is changing, but like, what are our bets for where things are headed? And can we make any bets on that state of the world where some of these technologies are gonna be converging? And for example, what are the future models of incentives and economics and how some of these benefits are gonna be shared? It just feels like we are just there, but we are not able to see that picture just yet. So that's I think a very important piece of it. The second is these technologies are still very much maturing. We are seeing like, yes, reasoning models are gonna have a lot of importance. Models that are gonna be able to do specialized capabilities are gonna be important. These communication mechanisms are gonna be important. But it still takes, like, it's still early to build something that is just gonna work in every single domain, but being able to specialize on a vertical and show something working in a particular scenario where some entrepreneur can go and understand what the current workflows look like and be able to integrate these technologies into flows and processes that people are already familiar with has a much easier time for adoption than changing everything. So that would be the second thing I would recommend. Like, is there a particular domain where the entrepreneur has a unique understanding of the processes and Problems and the gaps, and can use that know-how to be able to really like build something that works.

unknown 44:06

Yeah, yeah.

SPEAKER_04 44:07

Gaudi, you're on mute. Sorry, I don't know how that happened. I must have hit the button here. The advice that you give entrepreneurs is uh quite phenomenal because for the first time, generalist, right, your understanding of the space and then the workflow that you describe, right, within that space becomes as important as the depth of understanding of saying, I have this much knowledge and I can go this deep, because some of the depth could be solved by these foundation models. You don't have to solve for, but the foundation models, if with the agents, can connect, say, a workflow or the way uh things move within a workflow or how people make decisions and how interactions should happen, that could be as powerful for an entrepreneur. Seems like the generalist is as important as the specialist. It used to be that you had to be really good at say database, or but now you almost know how things work. It's very, very fascinating. So as a researcher, you're in the forefront of seeing what's coming, but you also are looking back at ethics and safety every day. I mean, it's a constant because the things that the technology can do versus you know where it fits into society as a whole. And as the researcher, what are your challenges as it comes to this ethic safety versus the greatness of what technology can achieve? It can achieve a lot of things, but is it useful or is it safe for the human race? That's almost a question you you have to look at every day. How do you handle it and what ways do you go about looking at things?

SPEAKER_00 45:50

So, like any technology, um it is there's the immense potential and there is the big responsibility. And as through my career, as I have been working on the responsible side of the equation, what I have been seeing is that every time the capabilities of the technology jump, the risks around the technology jump with it as well. So just to give an example, when I was looking into classification problems, for example, we looked into face recognition and how models labeled people differently or how their error rates for face recognition differed between races and genders at the time, right? That is a very significant bias problem. But when we are we moved to generative AI and now models were hallucinating, and now moving into agents that are able to do things in the world, right? At every point at these, you know, jumps, the risks are significantly different. Six months ago, my team created an agentic team called Magantic One, which is available on open source right now, and we created this team to understand what are the capabilities and risks around having these agents solving general tasks on our computers. When I prompt these agents now to do some of the simple tasks, it is just mind-blowing to see all the different ways that things can go wrong and it can create extensive risks for us. Just to give a few examples, I have asked the Sageantic team to go to New York Times and solve and complete a cross-work puzzle for me. On the way towards doing this, the agents searched the searched things on a search engine, found the right New York Times link, and click on the right link and went to the New York Times site. But then when the agents got there, there was a pop-up window asking for credentials to access the New York Times website. And what, of course, at that time, it's just something that the agents were not programmed or it wasn't in their workflows that there would be a pop-up window. But again, these agents are gaining quite generalist perception and action capabilities right now. So the agents could say that I see a link there that says reset your password. If I don't know the password, I could reset the password. And in a world where the agents actually have access to emails about a person, because an email is just another API or a tool, you could easily imagine the agent going all of the steps needed to actually reset the password of a user just to get to that final task of resetting the password. Let me give another example. In some of the tests that we have done, when the agents realized that there was something they needed that was no longer accessible, they figured out that they could go to Amazon Mechanical Turk and try to hire a person to carry out something that they were not able to. Or when a document they needed was no longer accessible on the internet, they figured out that there was actually a law that stated that the government had to provide that data upon request and the agents were ready to file a request to the government. So what is really happening here? There are really a number of things that are changing with respect to the risks that we are seeing. With the capabilities of the reasoning models, the agents can get really creative in terms of accomplishing a goal. And they can, if if the original way they thought of something is not working, they can reason again and again and again until they find a way of getting there. What this means is that the assumptions we have as people and the operational boundaries we assume for these agents are no longer true. I would never expect that the agent would go and reset my password. I would never imagine that the agent would hire people on Amazon Mechanical Tore. I would never imagine that the agent would file a request with the government. But all of those things are actually possible now. Moreover, agents having access to tools and APIs, their action spaces are becoming pretty unbounded. So putting all of this together, you can imagine how the boundaries of these agents are significantly changing. So from my research lab, we have been looking into ways of doing a number of things that we think is going to be very critical for the reliable execution of these agents. One of them is in many domains, until these we really, really understand these technologies, I cannot imagine these agents working without human accountability and oversight. That's why we have developed a system called Magantic UI and released it to actually a build last week where we have created these transparency layers between this agency team and the user for the user to have full visibility over everything the agent is doing and intend to do. So in this magantic UI system, the agents compose a plan and present the plan to the user before they act. The user is able to watch over everything the agent is doing and stop it when the user wants to take control. And moreover, whenever the agent is about to do an action that is irreversible, that you cannot take back, the agent stops and asks for explicit approval from the user before continuing. And we believe building these layers for both transparency and control and accountability and have ways for humans and agents executing together is gonna be the way that we are gonna learn a lot about these agents. We are gonna be able to execute with them without running into some of the fundamental safety issues. And again, you talked about like the fundamental technology layer. This layer of interaction and control and safety is gonna be even more critical than the fundamental technology layer to make these agents work in the real world.

SPEAKER_04 51:57

So this sort of uh goes to this uh ever uh you know, the big debate in society is are these jobs going to even be needed for humans? That humans are doing today. Let's let's take the the let's take a white-collar job like program, right? There's enough uh predictions that 40 to 50 percent by the end of this year will uh will be done by AI and so on and so forth. See if we were to expand it, how do you view the world in where certain jobs will be done exclusively by agents, which are being done exclusively by humans today uh uh versus solving for a bigger problem, meaning if I were say using this travel analogy that Shirish used, right? If I were my flight was delayed or something like that, right? Today I've stand on either on hold on the phone or in the long line at the airport trying to rebook it. They're from if you're an airline, they're just overwhelmed with this one event that happens, right? They cannot hire for that. It seems like AI would be a good use because no agent will, no customer, actual human agent will be replaced. But you don't need that many more programmers, you may need more agents to do it. So this conflict of the jobs be taken seems like a tough paradigm. There'll be some near-term losses while long-term gains. So would that be an accurate way? I'm more looking for your prediction of the next five years more than yes, yes.

SPEAKER_00 53:31

I think we're all we are all looking for that. What do you call it, like the glass bowl or like the magic bowl to tell us like what the future is gonna look like? And I think we are all struggling with understanding the long-term ramifications of the technology change and also understanding what the societal impact of this is gonna look like. It is so much easier for me to think about like the safety risks and think about the layers for that. But you're right, building technology responsibly in the world is not only about minimizing the risks, but also think about these long-term ramifications and consequences and preparing the world for it. Just last week, I spent an afternoon testing the latest coding agents. And from downloading the latest libraries to creating an application without writing zero lines of code, like I was able to build this application with the UI and everything, and I wrote zero lines zero lines of code. So these technologies are just evolving at an increasing speed, they are becoming so useful. As you said, I'm also using these to become more effective in my daily life every single day. And based on some of the early research we've done, but this is now two years ago almost, that between a developer that is using technology like AI technologies, like GitHub Copilot or cursor agents, to a developer that is not using these technologies, the productivity is doubled when developers were using even the early versions of GitHub Copilot. Now we are introducing software agents into the loop. We are introducing agents that are able to do PRs and solve bugs and all of it. I have a suspicion based on my experiences that the productivity gains are gonna be more than double. So where does that really leave us? I think for many of us, we always right now we have more tasks on our plate than we could handle, including all of the, you know, support we need in our personal lives. That's why I think every single one of us could use the personal assistance and we cannot hire them or afford them, and that would be a wonderful thing. And on the other side, there are these long-term questions about like if developer productivity is doubling up, does it mean that we need half the developers? Or are we gonna discover new technologies to be invented and we are gonna be able to convert some of our current developers to more advanced, more even ambitious projects to discover what's gonna come after AI? I think both of those futures are possible and it's in our hands. The one thing I've been thinking about a lot about from a researcher perspective is I think we need more than computer scientists and AI specialists to answer these questions. We actually need to bring in interdisciplinary experts, economists, social scientists, people who are able to like look at the society and make better predictions than an AI expert like myself can do. And I'm really craving mechanisms that can have that multidisciplinary conversation because this is the time to have it. This is the time to get to more clarity about where things are headed from at the societal lens and hopefully inspire some action based on that as well. It's not a question I can answer, but it's just a wish I can make as an AI researcher.

SPEAKER_04 57:09

Well, this is the best masterclass I've attended on AI. You know, you you take courses and all that. This is like uh covers a lot of topics, very simple, easy to understand. I'm just grateful that I was able to just sit back and listen. I I don't even know I needed to ask questions. I was just asking as they went along. It's been incredible what you what you told us in the last one hour. I truly, truly appreciate it. Uh Shirish, for some close out of this great session.

SPEAKER_03 57:45

Uh thank you so much, DJ. Uh, this was a great conversation. Uh, learned a lot about some of the latest uh research. Love to have you back in a year or so to see what the technology is progressing so fast. Be uh fantastic to see what's changed in one year's time. Thank you for your support of Thai and thank you for coming on this uh conversation.

SPEAKER_00 58:08

Thanks for having me and for a really fun chat. And I'll take you up on your offer of revisiting this conversation in a year.

SPEAKER_04 58:16

Thank you. Thank you. Thank you for listening to our podcast from Startup Exit brought to you by Tai Seattle. Assisting in production today are Isha Jain and Mini Varba. Please subscribe to our podcast and rate our podcast wherever you listen to them. Hope you enjoyed it.

Gowri Shankar

Co-host

Shirish Nadkarni

Co-host