Breakthroughs in generative AI have created enormous opportunities for humans to learn from computers. We can use them to explain the news, understand historical concepts, fix our coding errors, and so forth. But of course, AI also has to learn from human. The technology digests enormous amounts of written text, and often relies on human feedback to calibrate its models. Luis von Ahn has been at the forefront of these back and forth interactions for years. He is currently the CEO and co-founder of Duolingo, the language learning app, but prior to that, he was one of the original developers of CAPTCHAs, the little puzzles you solve to log into websites and prove that you're a human. And of course, in the process of proving your humanity, you're also training computers to get better at identifying red lights, cross walks, bicycles and wavy letters. On this episode, we discuss the history of his work, the future of CAPTCHAs, the success of Duolingo and how he is using today's advanced AI models in aid of language learnings. This transcript has been lightly edited for clarity.
Key insights from the pod
:
The idea behind CAPTCHA — 4:06
Will AI ever be able to fool CAPTCHA? — 6:41
Is CAPTCHA is improving AI? — 8:36
How Duolingo uses AI to learn from humans how to better teach languages — 10:43
New pedagogies Duolingo has adapted and developed — 15:41
What has Luis learned about language from optimizing the experience for its users? — 21:42
How Duolingo builds new AI models where off-the-shelf models are falling short — 24:48
What matters most in the arm-race of AI? — 26:46
How much will AI reduce costs and replace human workers? — 29:43
Benchmarking AI at Duolingo — 33:30
How does Duolingo take into account context dependency within language? — 35:20
Will AI ever be able to design language learning from scratch? — 38:43
What's the next big challenge for AI? — 40:15
What are the advantages and disadvantages of Duolingo being based in Pittsburgh? — 42:05
---
Tracy Alloway (00:02):
Hello and welcome to another episode of the Odd Lots podcast. I'm Tracy Alloway.
Joe Weisenthal (00:22):
And I'm Joe Weisenthal.
Tracy (00:24):
Joe, you know, I had a life realization recently.
Joe (00:27):
Okay, this should be good, go on.
Tracy (00:30):
It struck me that I am spending a non-negligible amount of my time proving that I am, in fact, a human being.
Joe (00:39):
It's getting harder and harder. I know what you're talking about. So we're talking, you know, you go to a website and you have to enter in a CAPTCHA, and it's like, click all these squares that have a crosswalk on them or a truck, and like, it feels like it's just getting harder. And sometimes I'm like ‘No, trust me. I'm a human.’
Tracy (00:55):
This is it. And every time it happens, I kind of have a moment of self-doubt about whether or not it is just me? Am I particularly bad at picking out all the motorcycles in a set of pictures, or are they just becoming increasingly weird or perhaps increasingly sophisticated in the face of new types of technology?
Joe (01:16):
It's not just you. I've heard this from multiple people, and in fact, prepping for this episode, I heard people talking about exactly this, but you know, it's like a big problem. We did that Worldcoin episode. Like, everyone is trying to figure out how in a world of AI and bots and artificial intelligence, all that stuff, how do you know whether someone you're interacting with is, in fact, a person?
Tracy (01:38):
Yeah. And I'm glad you mentioned AI because obviously part of this dynamic is AI seems to be getting better at solving these particular types of problems, but also they're being used more to train AI models. So at this point, I think we all know why we're constantly trying to identify bikes in a bunch of photos.
But the whole idea behind CAPTCHA is, or was, that humans still have an edge. So there are some things that humans are better able to do versus machines. And one of the things that we used to talk about humans having an edge in was linguistics. So there was this idea that human language was so complex, so nuanced that machines would maybe never be able to fully appreciate all the intricacies and subtleties of the human language. But obviously since the arrival of generative AI and natural language processing, I think there's more of a question mark around that.
Joe (02:38):
Yeah, I mean, look, I think like a typical chatbot right now is probably better than most people at just typing out several paragraphs. It's all sort of like, seems sort of, as they say on the internet, kind of mid-curve to me. It never strikes me as incredibly intelligent, but clearly computers can talk about as well as humans. And so it raises all sorts of interesting questions.
You mentioned that part of CAPTCHA is, like, training computers a big part of these chat bots. The so-called like real life human feedback where people say this answer ‘Is better than another? This answer is better than another,’ as they refine the models, etc. So I think there's an interesting moment where we're learning from computers and computers are learning from us, maybe collaboratively the two sides — carbon and silicon — working together.
Tracy (03:25):
I think that's a great way of putting it. Also “mid-curve” is such an underappreciated insult. Like, calling people ‘top of the bell curve,’ is one of my favorite things to do online. Anyway, I'm very pleased to say that today we actually have the perfect guest.
We are going to be speaking to someone who was very instrumental in the development of things like CAPTCHA and someone who is doing a lot with AI, particularly in the field of linguistics and language. Right now we're going to be speaking with Luis von Ahn. He is, of course, the CEO and co-founder of Duolingo. So Luis, thank you so much for coming on Odd Lots.
Luis von Ahn (04:04):
Thank you. Thank you for having me.
Tracy (04:06):
So maybe to begin with, talk to us about the idea behind CAPTCHA and why it seems to have become, I don't want to say a significant portion of my life, but I certainly spend a couple minutes every day doing at least one version.
Luis (04:21):
Yeah. So the original CAPTCHA, the idea of a CAPTCHA was it's a test to distinguish humans from computers. The reasons why you may want to distinguish whether you're interacting with a human or a computer online, for example — and this was kind of the original motivation for it — companies offer free email services and you know, they have the problem that if you allow anything to sign up for a free email service, either a computer or human, somebody could write a program to obtain millions of free email accounts. Whereas humans, because they are usually not that patient, cannot get millions of free email accounts for themselves. They can only get one or two. So the original motivation for CAPTCHA was to make a test to make sure that whoever is getting a free email account is actually a human and not a computer program that was written to obtain millions of free email accounts.
So, you know, and the way it worked, there's many kinds of tests. Originally the way it worked was distorted letters. So you would get a bunch of letters that were pre-distorted and you had to type what they were. And the reason that worked is because human beings are very good at reading distorted letters. But at the time, this was more than 20 years ago, computers just could not recognize distorted letters very well.
So that was a great test to determine whether you were talking to a human or a computer. But what happened is over time computers got quite good at this, trying to you know, deciphering distorted text. So it was no longer possible to give an image with distorted text and distinguish a human from a computer because computers pretty much got as good as a human at that point.
These tests started changing to other things. I mean, one of the more popular ones that you see nowadays is kind of clicking on the images of something. So you can see a grid, like a four by four grid, and it may say ‘Click on all the traffic lights or click on all the bicycles,’ etc. And by clicking on them you're showing that you can recognize these things.
And the reason they're getting harder is because computers are getting better and better at deciphering which ones are traffic lights, etc. And by now what you're getting here are the things that we still think computers are not very good at. So the image may be very blurry or you know, you may just get a tiny little corner of it and things like that. So that's why they're getting harder and I expect that to continue happening.
Joe (06:41):
So you founded a company called reCAPTCHA , which you sold to Google, and several years ago, is there going to be a point where, I mean, I assume computer vision and their ability to decode images or recognize images is not done improving. I assume it's going to get better, whereas humans' ability to decode images, I doubt it's really getting any better. We've probably been about the same for a couple thousand years now. Is there going to be a point in which it's impossible to create a visual test that humans are better at than computers?
Luis (07:15):
I believe that will happen at some point, yeah. It's very hard to say when exactly, but you know, you can just see at this point it's getting, computers are getting better and better. The other thing that is important to mention is this type of test has extra constraints. It also has to be the case that it's not just that humans can do it. Like really humans should be able to do it pretty quickly. And have enough chance of success.
Joe (07:43):
Quickly and on a mobile phone, on a very small screen, [where] like my thumb is like half the size of the screen.
Luis (07:50):
Yeah, yeah. And it may not be, quickly, I mean, it may take you, I don't know, 30 seconds or a minute, but we cannot make a test that takes you an hour. It is just not, it's just we can't do that. So it has to be quick. It has to be done on a mobile phone. It has to be the case that the computer should be able to grade it, [the] computer should be able to know what the right answer was, even though it can't solve it.
So because of all of these constraints, I mean, my sense is at some point this is just going to be impossible. I mean, we knew this when we started the original CAPTCHA that at some point computers were going to get good enough. But we just had no idea how long it was going to take. And I still don't know how long it's going to take, but I would not be surprised if in five to 10 years there's just not much that you can do that is really quick online to be able to differentiate humans from computers.
Tracy (08:36):
Yeah, that's when we get the eyeball-scanning orbs. But I mean, you mentioned that you can't have a test that takes an hour or something like that, but this kind of begs the question in my mind of why are people using these tests at all? So like, okay, obviously you want to distinguish between humans and robots, but I sometimes get the sense that these are basically free labor for AI training programs, right? So even if you can verify identity in some other way, why not get people on a mass scale to spend two minutes training self-driving cars?
Luis (09:11):
They are. Yeah, I mean, this is what these things are doing. That was the original idea of reCAPTCHA, which was my company. The idea was that you could, at the same time as you were proving that you are a human, you could be doing something that computers could not yet do, and that data could be used to improve computer programs to do it.
So certainly when you're clicking on bicycles or when you're clicking on traffic lights or whatever, that is likely data that is being used. I say “likely,” because I don't know what CAPTCHA you're using. There may be some that are not doing that, but overall that data is being used to improve things like self-driving cars, image recognition programs, etc. So that is happening and that's generally a good thing because that's basically making AI smarter and smarter.
But you know what? We still need it to be the case that it's a good security mechanism. So if at some point just computers can do that, then you know that that's just not a great security mechanism and it's not going to be used. And my sense is if we're going to want to do something, we are going to need something like real identity. I don't know if it's going to be eyeball scanning or whatever, but the nice thing about a CAPTCHA is it doesn't tie you to you. It just proves that you're human. We're probably going to need something that ties you to you. We're probably going to need something that says ‘Well, I just know this is this specific person because,’ you know, whatever, ‘we're scanning their eyeball, we're looking at their fingerprint, whatever it is. And it is actually a real person and it is this person.’
Joe (10:42):
Why don't we sort of zoom out and back up for a second. So currently you are the CEO of Duolingo, the popular language learning app, publicly traded company, [it’s] done much better sort of stock-wise than many companies that came public in 2021. You might have expected, you know, there was a boom when people had a bunch of time on their hands, and [the stock price would have] gone down. You’re also sort of one of the most respected sort of computer scientists, thinkers coming out of the the Carnegie Mellon University. What is the through line of your work or how would you characterize [it], that connects something like CAPTCHA to language learning at Duolingo?
Luis (11:20):
It’s similar to what you were talking about. I was smiling when you were mentioning that. I mean, I think the general through line is a combination of humans learning from computers and computers learning from humans. And, you know, CAPTCHA had that — while you were typing a CAPTCHA, computers were learning from what you were doing
In the case of Duolingo, it's really a symbiotic thing that both are learning in that humans are learning a language. And in the case of Duolingo, Duolingo is learning how to teach humans better by interacting with the humans a lot. So, you know, Duolingo just gets better with time because we figure out different ways in which humans are just learning better. You know, humans are getting better with language. And Duolingo is getting better at teaching you languages.
Tracy (12:19):
Joe, have you used Duolingo?
Joe (12:21):
I haven't. Well, okay. I hadn't up until recently. So last week as it turns out, I visited my mother who lives in Guatemala, which, Luis I understand you're from.
Luis (12:33):
Oh wow! Where I’m from!
Joe (12:33):
She's not from there, but she visited a friend there eight years ago and she loved it. And she's like ‘I'm just going to stay.’ And she has a little house, never left. She loves it so much. And so I visited her for the first time at her house near lake Atitlan. And then I was like ‘Oh, there's a great life and maybe one day I'll even have that house and I should learn Spanish.’ And so I did, partly because of that trip, and partly to prepare for this episode. I downloaded it and have started, I know a little bit of Spanish, not much like I can, you know, ask for the bill and stuff, but I was like ‘Oh, I should, I should start to learn it.’
Tracy (13:05):
That's funny because I also started learning Spanish right before a trip to Guatemala with Duolingo. And I'm not the best advertisement for the app, I'm afraid. Like, the only thing I remember is basically like ‘Quisiera una habitación para dos personas por dos noches,’ that's all I remember from...
Luis (13:25):
That’s pretty good.
Tracy (13:26):
Oh, thanks. All right, I need to get back on it, but why don't you talk to us a little bit about the opportunity with AI in this sort of language learning space. Because intuitively it would seem like things like chatbots and generative AI and natural language processing and things like that would be an amazing fit for this type of business.
Luis (13:48):
Yeah, it's a really good fit. So you know, we teach languages with Duolingo. Historically, you know, learning a language just has a lot of different components. You’ve got to learn how to read a language. You’ve got to learn some vocabulary. You’ve got to learn how to listen to it. If there's a different writing system, you’ve got to learn the writing system. You’ve got to learn how to have a conversation. There's a lot of different skills that are required in learning a language.
Historically, we have done pretty well in all the skills except for one of them, which is having a multi-term fluid conversation. So we could teach you, you know, historically we could teach you, we could teach you vocabulary really well. We could teach you how to listen to a language generally just by just getting you to listen a lot to something.
So we could teach you all the things, but being able to practice actual multi turn conversation was not something that we could do with just a computer historically that needed us to pair you with another human. Now, with Duolingo, we never paired people up with other humans because it turns out a very small fraction of people actually want to be paired with a random person over the internet who speaks a different language. It's just too much. It's kind of too embarrassing for most people. So we never did that.
Tracy(14:57):
It's peligroso too.
Luis (14:58):
Ah, there you go. Well, it may be dangerous but it also, it's just 90% of people are just not extroverted enough to do that. So we always, you know, we did these kind of wonky things to try to emulate short conversations, but we could never do anything like what we can do now. Because with large language models, we really can get you to practice. You know, it may not be a three-hour conversation, but we can get you to practice a multiturn, 10-minute conversation and it's pretty good. So that's what we're doing with Duolingo. We're using it to help you learn conversational skills a lot better. And that's helping out quite a bit.
Joe (15:41):
There are so many questions I have, I think my mom will really like this episode because in addition to the Guatemala connection, she is a linguist. She speaks like seven languages, including Spanish.
But something that I was curious about, and maybe this is a little bit of random jumping point, you know, I think about like, chess computers, and originally they were sort of trained on a corpus of famous chess games, and then with some computers they got better. And then the new generation essentially relearned chess from just the rules, from first principles. And it turns out that they're way better.
And I'm wondering if you're learning, through the process of building out Duolingo improvement, like are there forms of pedagogy that in language learning, whether it's the need for immersion or the need for rote drills or certain things, that linguists have always thought were necessary components of good language learning that when rebuilding education from the ground up old dictums just turn out to be completely wrong and when you rebuild the process from the beginning, novel forms of pedagogy emerge?
Luis (16:53):
It's a great question, and it's a hard question to answer for the following reason. At least for us, we teach a language from an app. Historically, the way people learn languages is basically by practicing with another human or being in a classroom or whatever. Whereas we teach from an app, the setting is just very different for one key reason, which is that it is so easy to leave the app, whereas leaving a classroom is just not that easy. You kind of have to go, you're usually forced by your parents to go to a classroom and so generally the, the thing about learning something by yourself when you're just learning it through a computer, is that the hardest thing is motivation. It turns out that the pedagogy is important. Of course it is. But much like exercising, what matters the most is that you're actually motivated to do it every day.
So like, is the elliptical better than the step climber or better than the treadmill? Like yeah, there are probably differences, but the reality is what's most important is that you kind of do it often. And so what we have found with Duolingo is that if we're going to teach it with an app, there are a lot of things that historically language teachers or linguists didn't think were the best ways to teach languages. But if you're going to do it with an app, you have to make it engaging. And we've had to do it that way. And we have found that we can do some things significantly better than human teachers and some things not as good because it's a very different system. But again, the most important thing is just to keep you motivated. So examples of things that we've had to do to keep people motivated are “classes,” which is a lesson on Duolingo.
They're not 30 minutes or 45 minutes. They're two and a half minutes. If they're any longer, we start losing people's attention. So stuff like that I think has been really important. Now, I'll say related to your question, one thing that has been amazing is that we start out with language experts, people with PhDs in second language acquisition who tell us how to best teach something, but then it takes it from there and the computer optimizes it. And so the computer starts finding different ways. There are different orderings of things that are actually better than what the people with PhDs in second language acquisition thought. But it's because they just didn't have the data to optimize this. Whereas now, you know, with Duolingo, we have, it's something like 1 billion exercises. It's 1 billion exercises [that] are solved every day by people using Duolingo. And that just has a lot of data that helps us teach better.
Tracy (19:23):
This is exactly what I wanted to ask you, which is how iterative is this technology? So how much is it about the AI model sort of developing off the data that you feed it, and then the AI model improving the outcome for users and thereby generating more data from which it can train?
Luis (19:42):
We're exactly doing that. And in particular, one of the things that we've been able to optimize a lot is which exercise we give to which person. So when you start a lesson on Duolingo, you may think that all lessons are the same for everybody. They're absolutely not. When you use Duolingo, we watch what you do and the computer makes a model of you as a student. So it sees everything you get right, everything you get wrong. And based on that, it starts realizing you're not very good at the past tense or you're not very good at the future tense or whatever. And whenever you start a lesson, it uses that model specifically for you. And it knows that you're not very good at past tense. So it may give you more past tense or it does stuff like that.
And that definitely gets better with more and more data. Now, I'll say another thing that is really important. If we were to give you a lesson only with the things that you're not good at, that would be a horrible lesson because that would be extremely frustrating. It's just basically “here are the things you're bad at” just going to do a lot more of that. So in addition to that, we have a system that tries to, and it gets better and better over time, it is tuned for every exercise we have on Duolingo that we could give you. It knows the probability that you're going to get that exercise correct. And whenever we are giving you an exercise, we optimize so that we try to only give you exercises that you have about an 80% chance of getting right. And that has been quite good because it turns out 80% is kind of at this zone of maximal development where basically it's not too easy because you're not getting, having a hundred percent chance of, of getting it right.
If it's too easy, it has two problems. Not only is it boring, that it's too easy, but also you're probably not learning anything if you have a 100% of getting it right. And it's also not too hard because humans get frustrated if you're getting things right only 30% of the time. So it turns out that we should give you things that you have an 80% chance of getting right. And that has been really successful and we keep getting better and better at finding that exact exercise that you have an 80% chance of getting right.
Joe (21:42):
Okay. I have another, I guess I would say theory-of-language question. And I think I read in one of your interviews, as part of the process of making the Duolingo [app] better, you're always A/B testing things, like should people learn vocabulary first? Should people learn adjectives before adverbs or adverbs before verbs, or whatever it is, and that there's this constant process of ‘what is the correct sequence?’
Do rules about the sequence of what you learn differ across languages so that say someone learning Portuguese may have a different optimal path of what to learn first grammatically or vocabulary- wise versus say someone learning Chinese or Polish? Because I'm curious about whether we can uncover deep facts about common grammar and language from the sort of learning sequence that is optimal across languages.
Luis (22:33):
Yes, they definitely vary a lot based on the language that you're learning. And even more so, they also vary based on your native language. So we actually have a different course to learn English for Spanish speakers than the course we have to learn English for Chinese speakers. They are different courses.
And there's a reason for that. It turns out that what's hard for Spanish speakers in learning English is different than what's hard for Chinese speakers in learning English. Typically the things that are common between languages are easy and the things that are very different between languages are hard. So just a stupid example, I mean, when you're learning English from Spanish, there's a couple thousand cognates that's words that are the same or very close to the same. So you immediately know those. We don't even need to teach you those words if you're learning English from Spanish because you already know them automatically because they're the same word.
That's not quite true from Chinese. Other examples are, you know, for me in particular I started learning German, and for me German was quite hard to learn because Spanish, my native language is Spanish. Spanish just does not have a very developed concept of grammatical cases, whereas German does. But learning German from like, from Russian, that's just not a very hard concept to grasp. So it kind of depends on what concepts your language has.
Also, not exactly concepts, but in terms of pronunciation, everybody says that Spanish pronunciation is really easy. And it's true. Vowels in Spanish are really easy because there's only really about five vowel sounds. It's a little more than that, but it's about five vowel sounds. Whereas, you know, there are other languages that have, you know, 15 vowel sounds. So learning Spanish is easy, but vice versa, if you're a native Spanish speaker, learning the languages that have a lot of vowel sounds is really hard because you can't even hear the difference. You know, it's very funny when you're learning English from, as a native Spanish speaker, you cannot hear the difference between beach and b****. You cannot hear that difference. And, you know, people make funny mistakes because of that.
Tracy (24:37):
I think there were a lot of T-shirts that involved that at one point in time.
Luis (24:43):
Well, because really if you're a native Spanish speaker, you just cannot hear that difference.
Tracy (24:48):
So one thing I wanted to ask you is the type of model that you're actually using. So I believe you're using GPT-4 for some things, like your premium subscription Duolingo Max, but then you've also developed your own proprietary AI model called Birdbrain. And I'm curious about the decision to both use an off-the-shelf solution or platform and to also develop your own model at the same time. How did you end up going down that path?
Luis (25:20):
Yeah, it's a great question. I mean I think the difference is these are just very different. The last, since two years ago when large language models or generative AI became very popular, before that there were different, just different things that AI could be used for us. We were not using AI, for example, for practicing conversation, but we were using AI to determine which exercise to give to which person. We built our own. That is the Birdbrain model, [it] is a model that tries to figure out which exercise to give to which person.
You know, the last two years ago, for the last two years, sorry, when people talk about models, they usually mean language models. And it's this specific type of AI model that, what it does is it predicts the next word, given the previous words. That's what a language model does. The large language models are particularly good at doing this. And we did not develop our own large language model. We decided it's a lot easier to just use something like GPT-4, but we have our own model for something else that is not a language model, but it is an AI model to predict what exercise to give to which user. Right, which is a pretty different problem.
Joe (26:47):
Speaking of AI, all these, especially the really big companies making an extraordinary show of almost bragging about how much money they give to Jensen Huang at Nvidia, it's like ‘Oh, we just spent, we're spending $20 billion over the next two years to just acquire H100 chips,’ or whatever it is. And it almost seems like there's an arms race.
And then there is also this view that actually the best models will not necessarily be the ones strictly with the access to the most compute, but access to data sets that other models simply don't have. And I'm curious, sort of like, as Duolingo must have an extraordinary amount of proprietary data just from all of your user interactions, in your experience, when you think about who the winners will be in this space, is it going to be the ones that just have the most electricity and energy and chips? Or is it going to be who has access to some sort of data that they can fine tune their model on that the other model can't?
Luis (27:48):
It depends on what you're talking about. You know, certainly we as Duolingo have a lot of data that nobody else has, which is the data on how each person's learning a language. I mean that's not data you can find on the web or anything like that. That is just the data that we have that we're generating and we're going to train our own models for that. I don't think there's enough electricity to train a model without this data to be as good as ours, with our data. But it is specifically for language learning. If you're talking about training a general model that is going to be something, a language model that is general for being able to have conversations, etc., usually you can get that from, there's pretty good data out there, you know, YouTube videos that are free or a lot of Reddit conversations or whatever.
There's a lot of data in there, probably power is going to matter. So it depends on what you're going to use your model for. If you're using it for a very specific purpose and you have very specific data for that, that is proprietary, that's going to be better for the specific purposes. But my sense is that both are going to matter; what data you have and also how much electricity you spend.
But I also think that over time, hopefully we're going to get better and better at these algorithms. And if you think about it, the human brain uses something like 30 watts. For the human brain, that’s pretty good. And we don't need, you know, some of these models, people are saying “Oh, this uses the amount of electricity that all of New York City uses. We use that to train a model.’ Our brain uses much less electricity than that. And it's pretty good. So my sense is that also over time, hopefully we'll be able to get to the point where we're not as crazy about using electricity as we are today.
Tracy (29:37):
I'm glad our brains are energy efficient. That's nice to know.
Luis (29:41):
Way better than computers!
Tracy (29:43):
We've been talking a lot about the use of AI in the product itself; so improving the experience of learning a language. But one of the things that we hear a lot about nowadays is also angst over the role of AI in the wider economy in terms of the labor force, job security and stuff like that, as companies try to be more efficient. So I guess I'm wondering on the corporate side, how much does AI play into the business model right now in terms of streamlining things like costs or reducing workforce? And I believe there were quite a few headlines around Duolingo on this exact topic late last year.
Luis (30:26):
Yeah. First of all, those headlines were upsetting to me because they were wrong. You know, there were a lot of headlines saying that we had done a massive layoff, that was not actually true. So what is true is that, you know, we really are leaning into AI, you know, it just, it makes sense. This is a very transformative technology. So we are leaning into it. And it is also true that many workflows are a lot more efficient.
And so what happened late last year was that we realized we have full-time employees but we also have some hourly contractors. We realized that we needed fewer hourly contractors. And so for a small fraction of our hourly contracts, we did not renew their contract because we realized we needed fewer of them for doing some tasks that, honestly, computers were just as good as a human.
And that may be true for something like an hourly contractor force that was being asked to do, they were basically being asked to do very rote kind of language tasks that computers just got very good at. I think if you're talking about our full-time employees and people who are, who are not necessarily just doing rote repetitive stuff, that's going to take a while to replace. I don't think, and certainly this is not what we want to do as a company, I heard a really good saying recently, which is, ‘Your job's not going to be replaced by AI, It's going to be replaced by somebody who knows how to use AI.’ So what we're seeing in the company, at least for our full-time employees, not that we are able or even want to replace them, what we're seeing is just way more productivity to the point where people are able to concentrate on kind of higher level cognitive tasks rather than rote things.
I don't know, a hundred years ago people were being hired to add numbers or multiply numbers. The original computers were actually humans who were being hired to multiply numbers. We were able to mechanize that and use an actual computer to do that so that people didn't have to do that. Instead they spent time planning something at a higher level rather than having to do the multiplication.
We're seeing something similar to that now. And the other thing that we're seeing that is really amazing, so we are saving costs because a single person can do more, but also we're able to do things much, much faster. And in particular in data creation, I mean, one of the ways in which we teach you how to read is we teach you to read short stories. We used to create and we need to create a lot of short stories.
We used to be able to create short stories at a certain pace. We can now create them like 10 times faster. And what's beautiful about being able to create them 10 times faster is that you can actually make the quality better. Because if you create them once 10 times faster and you don't like it, you can start over and do it again with certain changes and then, oh, you didn't like it? Okay, try it again. So you can, you can try 10 times, whereas before you can only try once and generally you don't have to try 10 times, you have to try fewer times. So this is able to, at the same time, lower costs for us, but also make the speed faster and the quality better. So we're very happy with that in terms from the corporate side.
Joe (33:30):
Could you talk more about benchmarking AI? Because there's all these tests, right? And you see these websites and they're like, ‘Well, this one got this on the LSATs and this one got this on the SATs.’ And I can never quite tell, and a lot of it seems inscrutable to me from your perspective, like what are sort of your basic approaches to benchmarking different models and determining when it’s like “this makes sense,” as some sort of task to employ AI instead of a person doing it.
Luis (33:58):
Yeah, I have felt the same as you have. My sense is a lot of these benchmarks are from marketing teams. What we do internally is two things. First of all, we just try stuff and then we look at it and we look at the very specific — it's nice that an AI can pass the LSAT or whatever, but we are, you know, we're not in the business of passing LSATs. We're in the business of doing whatever it is we're doing — creating short stories or whatever. So whatever task, we just try it and then we judge the quality ourselves.
So far we have found that the quality of the OpenAI models is a little better than everybody else's, but not that much better. I mean, two years ago it was way better. It seems like everybody else is catching up, but so far we have found that that's just when we do our tests, and again, this is just one company, I'm sure that other companies are finding maybe different stuff, but for us, for our specific use-cases, we find time and again that GPT-4 does better. And I don't know, of course everybody's now announcing like there's going to be GPT- 5, etc. I don't know how those will be, but that's what we're finding. But you know, generally we just do our own testing.
Joe (35:01):
Yeah. Tracy, I find that so fascinating, especially [since] I think we've talked about this. It definitely seems TBD whether one model would just prove to be head and shoulders better than the others. The way that Google is just head and shoulders above everyone else for 20 years basically, and still is. Like, it's unclear to me whether that'll be the case with AI.
Tracy (35:20):
Right, the idea that we're in, I don't know, the Bing era of chat models and eventually we're all going to migrate to something else. Luis, one thing I wanted to ask you, and this is sort of going back to the very beginning of the conversation and some of the older thoughts around language, there used to be — I don't want to say a consensus — but there used to be some thinking that language was very complicated in many ways, and so much of it was sort of ambiguous or maybe context dependent, that it would be very hard for AI to sort of wrap its head around it. And I'm wondering now, with something like Duolingo, how do your models take into account that sort of context dependency? And I'm thinking specifically about things like Mandarin, where the pronunciation is kind of tricky and a lot of understanding depends on the context in which a particular word is said. So, how do you sort of deal with that?
Luis (36:19):
Yeah, I mean, it's an interesting thing. You know, when you were asking the question, I thought of this thing. I've been around AI since the late nineties and it is just this moving goalpost. I remember everybody just kept on saying ‘Look, if a computer can play chess, surely we all agree it has human level intelligence.’ This is kind of what everybody said. Then it turned out, computers could play chess and nobody agreed that it had human level intelligence. It's just, like, ‘Okay, fine, it can play chess next thing.’ And they would just keep coming up with stuff like ‘Surely if a computer can, you know, play the game of Go, or if a computer could do this, then...’ And one of the last few things was, you know, ‘If a computer can, whatever, write poetry so well, or understand text, then surely it is intelligent.’
I mean, at this point, models like GPT-4 are really good at doing things — certainly better than the average human. They may not be as good as the best poet in the world, but certainly better than the average human writing poetry. Certainly better than the average human at almost anything with text manipulation. Actually, if you look at your average human, they're just not particularly good at writing.
Joe (37:23)
Including many professional writers.
Luis (37:25)
Oh, yeah. No, I mean, just, these models are excellent. And in fact, you can write something that is half well-written and you can ask the model to make it better. And it does that. It makes your text better. So,it's this funny thing that just, [with] AI, we keep coming up with things that like ‘If AI can crack that, that's it. That's it.’ You know, I don't know what the next one will be, but, you know, we keep coming up with stuff like that. You know, in terms of the language, it just turns out that language can be mostly captured by these models.
It turns out that if you make a neural network architecture, and this—you know, nobody could have guessed this—but it just turns out that if you make this neural network at architecture, that's called a transformer, and you train it with a gazillion pieces of text, it just turns out it pretty much can capture almost any nuance of the language.
Again, nobody could have figured this out, but it just turns out that this is the case. So at this point when you ask about, you know, what we do with context or whatever? It just works. Some of it we do with handwritten rules because we write the rules. But generally, if you're going to use an AI, it just works. And you can ask me why it works and I don't know why it works. I don't think anybody does. But it turns out that the statistics are kind of strong enough there that if you train it with a gazillion pieces of text, it just works.
Joe (38:43):
I just want to go back to, you know, where AI is going and you mentioned that AI can generate thousands or, you know, very rapidly, numerous short stories. And then a human can say ‘Okay, these are the good ones we can improve.’ And so you not only get the efficiency savings, you actually can get a better higher quality for the lessons and so forth.
But, you know, I'm moving up the abstraction layer. Like, will there be a point at some point in the future in which the entire concept of learning a language or the entire sequence is almost entirely something that AI can do from scratch? Again, I'm thinking sort of back to that chess analogy of not having to use the entire history of games to learn, but just knowing the basic rules and then coming up with something further. Like, will AI eventually be able to design the architecture of what it means to learn a language?
Luis (39:36):
I mean, sure. I think at some point AI is going to be able to do pretty much everything.
Joe (39:41):
Right.
Luis (39:41):
It's very hard to know how long this will take. I mean, it's just very hard. And, honestly for our own society, I'm hoping that the process is gradual and not from one day to the next. Because, if we find that at some point AI really goes from, if tomorrow somebody announces ‘Okay, I have an AI that can pretty much do everything perfectly, I think this will be a major societal problem. Because we won't know what to do. But if this process takes 20, 30 years at least, we'll be able to, as a society, figure out what to do with ourselves. But yeah, generally, I mean, I think at some point, AI is going to be able to do everything we can.
Tracy (40:15):
What's the big challenge when it comes to AI at the moment? I realize we've been talking a lot about opportunities, but what are some of the issues that you're trying to surmount at the moment? Whether it's something like getting enough compute or securing the best engineers, or I guess being in competition with a number of other companies that are also using AI, maybe in the same business?
Luis (40:41):
I mean, certainly securing good engineers has been a challenge for anything related to engineering for a while. You know, you want the best engineers and there's just not very many of them, so there's a lot of competition. So that's certainly true. You know, in terms of AI in particular, I would say, I don't know. It depends on what you're trying to achieve.
These models are getting better and better. What they're not yet quite exhibiting is actual kind of deduction and understanding as good as we would want them to do. I mean, so you still see really, because of the way they work, I mean these are just predicting the next word. Because of the way they work, you can see them do funky stuff, like they get adding numbers wrong sometimes. Because they're not actually adding numbers, they're just predicting the next word.
And it turns out you can predict a lot of things. So it doesn't quite have a concept of addition. So I think, you know, if what you're looking for is kind of general intelligence, I think there's some amount that's going to be required in terms of actually understanding certain concepts that these models don't yet have. And, you know, my sense is that new ideas are needed for that. I don't know what they are, if I knew I would do them, but new ideas are needed for that.
Joe (41:48):
Yeah, it's still mind-blowing. Like, you see the AI produce some sort of amazing output or explanation and then it'll get wrong like a question of like ‘What weighs more, a kilogram of feathers or a kilogram of steel?’ Like something really [simple].
Luis (42:03):
It's because it doesn't have actual understanding.
Joe (42:05):
Right, there's no actual intuition. I just have one last question. There are not many sort of like cutting-edge tech companies based in Pittsburgh. I understand [that] CMU has historically been a bastion of advanced AI research. I think at one point, Uber bought out like the entire robotics department when it was trying to do self-driving cars. But how do you see that, when it comes to this sort of recruiting of talent, and it's already scarce, what are the advantages and disadvantages of being based in Pittsburgh rather than the Bay Area or somewhere else?
Luis (42:38):
Yeah, Duolingo has been headquartered in Pittsburgh since the beginning. We've loved being there. There are good things and bad things. I mean, certainly a good thing is being close to Carnegie Mellon. Carnegie Mellon produces, you know, some of the best engineers in the world and certainly relating to AI.
Another good thing about being in a city like Pittsburgh is that, two good things. One of them is that people don't leave jobs that easily. And you know, when you're in a place like Silicon Valley, you get these people that leave jobs every 18 months. Our average employee stays around for a very long time and that's actually a major advantage because you don't have to retrain them. They really know how to do the job because they've been doing it for the last seven years. So that's been an advantage.
And I think another advantage that we've had is, in terms of Silicon Valley, there's usually one or two companies that are kind of the darlings of Silicon Valley and everybody wants to work there. And the darling company changes every two [or] three years. And all the good people go there. The good news in Pittsburgh is, that fad type thing doesn't happen.
So there have been times, we're lucky that right now our stock is doing very well. So we're kind of a fad company, but there have been times when we just weren't. But we still were able to get really good talent. So I think that's been really good. You know, on the flip side, of course there are certain roles for which it is hard to hire people in Pittsburgh, particularly product managers are hard to hire in Pittsburgh. So because of that, we have an office in New York and we compliment that. We have a pretty large office in New York and we compliment that.
Tracy (44:03):
Alright. Luis von Ahn from Duolingo. Thank you so much for coming on Odd Lots. That was great.
Luis (44:08):
Oh, thank you. Excellent.
Tracy (44:21):
Joe, I enjoyed that conversation. You know what I was thinking about when Luis was talking about [how] it's not that AI is going to take your job, it's someone who knows how to use AI is going to take your job, I was thinking about, just before we came on this recording, you were telling me that you used — was it Chat GPT or Claude? —to learn something that I normally do.
Joe (44:40):
Oh yeah. So for those who don't know, we have a weekly Odd Lots newsletter and [it] comes out every Friday. You should go subscribe. And Tracy usually sends an email to one of the guests each week asking what books they recommend. You know, people like reading books. And then she goes into MS Paint and then puts the covers of the four books together.
Tracy (45:00):
My Sistine Chapel.
Joe (45:00):
And I did that because Tracy was out a couple weeks ago. And I am not like, I've never learned Photoshop or even MS Paint, so just like—I'm very dumb—just the process of putting four images together was not something I exactly knew how to do. So I went to Claude and I said, ‘I'm putting together four book images in an MS paint thing. Please tell me how to do it.’ And it walked through the steps. And I did it, Tracy. You were proud of me, right?
Tracy (45:25):
I was very proud. I do think it's somewhat ironic that the pinnacle of AI usage is teaching someone how to use MS Paint, but that’s fine. I'll take it. No, there's so much to pull out of that conversation. One thing I'll say, and maybe it's a little bit trite, but it does seem like language learning is sort of ground zero for the application of a lot of this natural language and chatbot technology. So it was interesting to come at it from a sort of pure language or linguistics perspective.
Joe (45:56):
Yeah, I mean I feel like we could have talked to Luis for hours just on like theory of language itself, which I find endlessly fascinating and I can only speak one language. I used to be able to speak French, I don’t know if I told you, but I did one semester in Geneva, Switzerland, and I lived with a family that only spoke French and I'd never spoken a word of French before I got there. And after one semester, I came home and I passed out of four years worth of my college requirements from that four months living there. And then I didn't speak French again for 20 years and I lost it all. But, I was going to go somewhere with that. I don't really know.
Tracy (46:33):
It's okay, I too speak multiple languages poorly.
Joe (46:36):
But you know, the other thing I was thinking about, so Duolingo has obviously been around for quite a long time before anyone was talking about generative AI or anything. And one of the things you hear, and it's sort of used pejoratively, is like some company will be called like a Chat GPT wrapper, right? So basically, they're just taking GPT-4, whatever the latest model is, and then building some slick interface to do a specific task on top of it. And what's interesting about Duolingo is it feels like it's backwards, or going in the opposite sequence, where they already had this extremely popular app for language learning and then over time they incorporate more. So rather than being starting off as a for someone else's technology, they already have the audience, they already have the thing, and then they find more ways that the AI can be used to actually rebuild the core app.
Tracy (47:29):
Yeah, that's a really good way of putting it. And also just the iterative nature of all of this technology. So the idea that, you know, you're sort of training it—I know, again, it's sort of an obvious point. But also, I didn't realize how customized a lot of the Duolingo stuff is at this point. And the idea that, if you speak one language, the way you learn, say, German, is going to be completely different to someone who grew up speaking another language.
And I'm very intrigued by the amount of data that something like a Duolingo must have at this point. And, I guess maybe we should have asked Luis about this, but also other business opportunities in terms of like licensing that data or maybe, I don't know, I think they were doing a partnership for a while with Buzzfeed where the CAPTCHA was like actually translating news articles or something.
Joe (48:19):
Right, there was going to be something like that. I think I recall it didn't really take off, but the idea was Buzzfeed would get its news articles translated into Spanish and other languages from the process of Duolingo users learning that process. I forget why it didn't take off, but yeah, absolutely.
Tracy (48:35):
I also find it funny, like in some senses, that we're sort of, I guess, the thing that AI is feeding off of now, right? And like, all those minutes, which I'm sure add up to days eventually, of going through CAPTCHA, it's all sort of unpaid labor for training our future AI overlords.
Joe (48:56):
So he mentioned that he was upset about headlines last year implying that they had laid off a bunch of people due to AI. But he did say that there are people who — they were contractors, so they weren't full-time employees — but it sounds like a very crisp example of AI being able to do a job, even if they were contractors, that were done by humans.
And I'm generally skeptical of most articles that I read where a company says ‘Oh, we're going to like cut all this labor savings and we're going to do AI,’ because I sort of think that is more often a smokescreen for just like a business that wants to cut jobs and make it sound like they're progressive. But here did sound like an actual example in which there was some form of human labor that is no longer needed because it is AI.
Tracy (49:41):
Yes, AI will come for us all. Shall we leave it there?
Joe (49:45):
Let's leave it there.
You can follow Luis von Ahn at
@luisvonahn
.