The Future of Universal Translation With Philipp Koehn

Dr. Koehn discusses the current state of machine translation technology, its challenges, and the potential future where universal translation could become a reality.
In this episode, we explore the fascinating world of machine translation with Dr. Philipp Koehn, a professor at Johns Hopkins University. Dr. Koehn discusses the current state of machine translation technology, its challenges, and the potential future where universal translation could become a reality. He shares insights on the impact of deep learning, the integration of speech and text processing, and the cultural implications of advanced translation technologies.
Important Links
Hey, everybody. Welcome back to the Futurist Society, where, as always, we are talking in the present, but talking about the future. Today, I have a really special guest. I have Dr. Philipp Koehn, who is a professor at Johns Hopkins University in the Department of Computer Science. And he's doing some really interesting things with machine translation, as well as universal translation.
It's something that has always piqued my interest, because growing up, that was one of the things about science fiction that I felt was very integral to conversing with different people.
Thanks so much for talking with us,Professor Koehn. If you could just tell us a little bit about what you're doing at Johns Hopkins and what the implications are for the future.
Introduction to Machine Translation
Yeah. I've been working on machine translation for a quarter-century by now, at various places. So I've been here now at Hopkins for 10 years.
This is like a larger group that works on language and speech processing. And that's kind of probably one of the interesting things to talk about today too. That language processing, speech processing, all that kind of now becomes very much more integrated and opens up many, many opportunities.
I read some of your work in the past, a lot of like the machine translation was the ability for a computer software to translate text and speech based on inputs that you give it. I think that that has really given us like a lot of profound insight into old, historic languages and things like that. What are some things that you guys have noticed that machines really aren't able to capture in comparison to a human being actually translating something?
I mean, humans are still pretty good at translating. We've been always a bit careful not to overstate things and sell things too much. We always said like, if you have a professional human translator, that's always going to be better than a machine. Having said that, I mean, obviously, you can train machine translation systems on up to 200 languages or more, that are doing a decent job. And as a human, you're not going to learn 200 languages very quickly.
So the main thing I see for the role of machine translation and a lot of the AI applications is that it's more supporting, assisting technology for humans who ultimately make the final judgment calls. And it should only be used kind of completely independently and autonomously when you know what you're doing. You have to be aware it makes mistakes.
Yeah.
So, broad linguistic questions. It is actually surprising that we use pretty much the same technology for any language pair in the world. So, in linguistics you make quite a big distinction between morphologically rich and poor languages and different writing systems and different sentence order. And nowadays, uh, we just kind of use exactly the same model for everything.
Does it matter at all that, you know, different languages have different… if we're all using the same thing, like, what is that underlying thing? And like, is it noun, verb, adjective, or how does it work?
So, I mean, at a very fundamental level all these languages are the same. They're broken up into sentences, or maybe even multiple clauses in a sentence, but each of these have a structure. There's a verb, that's kind of the most important thing in a sentence. It’s the action you're talking about. And then there's the person who's acting, the things that are being acted upon, and any auxiliary information. And languages differ in where that goes in the sentence and how the roles of different kind of words in the sentence are defined, but ultimately that is the same.
What used to be a big challenge is languages where like the word order is very different. It's always the German example where the verb is at the end and you have to move it all the way to the front. And we used to struggle with that. But with the latest technology, all these deep learning models, they don't seem to care too much about that.
And do they do a pretty good job? I feel like I haven't really dived into using them on a regular basis. I use Google translate. I recently came back from Japan and I was using that on a regular basis and I felt like it was kind of clunky. I just wonder how long is it going to take to that vision of, you know, real-time translation.
Well, if you say real time translation, you might be saying two different things.
One is that, yeah, it should be really quick, but also probably you want to do speech translation where you do not have to type in things and then something happens on the screen, and you show it to someone, and you really just have to type the full sentence before anything useful comes out.
That is still kind of a frontier in going all the way to speech translation. Speech translation kind of creates so many more problems with different speaking styles, accents, and dialects. So for a long time with a lot of languages, we just kind of (since we're used to the written form) assumed there's just one language. Like just one Arabic and just one English. But if you then actually go to how people speak these languages, they are actually vastly divergent and that makes things so much harder.
So let's just focus on speech language for a bit because I feel like that's the most practical for most people that are listening to this conversation.
Like for example, for me, if we have a patient that comes in that has a different dialect or they're not able to understand English, but I need to make sure that they understand the procedure that they're getting into. We actually have a language line where we call in to somebody else who's a translator and then that person in real time does the translations so that person understands. And even that is a little bit clunky, right? Because I have to talk to the translator, they have to repeat it back to the person, and so that process ends up taking like three times as long as a normal physician's appointment.
When is going to feel like a little bit more natural? And also who's doing it the best right now? Is it Google? Is it Amazon? I know that there are a lot of linguists that are working at all of these companies because of the voice ability for like Alexa or like Google home pods and all of those things.
There's definitely some interest in this from technology companies. Who would you say is like doing… like, what is the best example of the Holy Grail of universal translation?
The Evolution of Translation Technology
There's a lot of activity at the moment in the space. We are kind of also at the point where a lot of the big companies that you mentioned, Google or Microsoft and Amazon, are kind of moving away from treating machine translation as kind of a separate problem with a separate team. It just all kind of becomes part of this big language model, big speech model space.
So I don't even want to say who is the best. Everybody's using very, very similar technologies and everybody's using very, very similar data sets. So it's always very competitive and has been always very competitive.
The best systems were always very close to each other. And for a long time, it was even the situation where we in academia could build systems that were state of the art. It’s gotten a bit harder with the kind of computational demands of all the deep learning models.
When, when is it going to work? I'm always a bit hesitant to promise that. This is the always famous line, “In five years, everything will be perfect.” And that's a great promise to get funding and excitement, but then in five years you realize it's not there yet.
There's a big push right now to work on exactly on what you just talked about, to make… we call it simultaneous translation, where it is while you talk. You don't wait for the end of a sentence. You might wait like a second or two, because you have to see a few words to make sense of what the other person is going to say. So you might be still one or two seconds behind the speaker, but then you should start translating. So that's kind of where a lot of research action is at the moment.
The only kind of real working systems are still kind of in an experimental stage. I think OpenAI made some demo on the ChatGPT 4 release where they had like some audio interpreting, which was kind of along those lines. But it's really hard still for machines to kind of have this natural conversation where not only the words that are being said are appropriate, but how it is being said is appropriate. That you can have the right emotional response to something, not always a cheery voice no matter what.
So have you tried any of these at all yourself? I know you speak German. Would you say that any of these models have been able to translate German at the level where you were like, “Okay, wow, that's, that's pretty close”?
So we basically make a distinction based on how much data resources, how much training data we have. And for German, we do really well. There's probably a few dozen languages where the quality is actually pretty amazing. Where, If you just look at text translation, you’d have a hard time finding any kind of mistakes even. It always gets a bit more complicated for speech.
So a big problem with speech, especially if it's like contemporary speech, is where people just kind of make up what they say on the fly. It is actually amazing how ungrammatical we speak.
Yeah.
We write everything in a beautiful, nice grammar. But when we talk, it's actually incredibly ungrammatical and all the sentences kind of run into each other and it's all kind of a mess. So it's, it's just generally a more difficult type of language to deal with. (10:45 - 11:04) It’s never clear how much you should then clean things up in the translation or make easier for the translation.
I found it interesting that you mentioned the doctor-patient conversation translation. I heard from a physician who was working in that space too, like a year ago, and he also said there's a big need to simplify what the doctor says to the patient. The doctor might use all kinds of technical jargon that the patient doesn't understand at all. So the translation is not only into another language, it's also into…
The content of what that is actually.
Yeah. I mean, I feel like even Google translate, if I was to go back to when I was in college and tell myself, “Oh, Hey, by the way, if you use this, this is something that you'll be able to go to Japan and you'll be able to converse with people in such a way that you'll be able to get around okay.” It's mind-blowing that we're here.
I would say that the pace of the technological progression has increased such that it's almost like I expect the technology to be here just right around the corner. Everything is progressing so rapidly, at an exponential rate, when it comes to technology, whether it's machine learning, genetics, who knows, like biotech, whatever. So it's a pretty profound leap that we've already been able to with text or speech.
If we can do universal translation for texts, that's amazing. I guess for you as somebody who's in there, other than just like the advancement of the technology, what are some other profound things that come to you when it comes to just the progression? Because you started when there were no funds.
I remember 20 years ago when, for some reason that we had like government funding in the US on Chinese-English. And Chinese is hard because it's so different. We didn't have much data. And the first kind of breakthrough was like, “Oh, you can almost read this, this almost makes sense.” So like, yeah, these words are correct. That's not so bad. So we were excited for that, and then having it seamless.
And what you just mentioned. It is also very satisfying that like some technology, which was really not working and had a really bad reputation 20 years ago… someone said MT stands for empty promises. It was just always, “It's going to be done in five years.” and then it wasn't done in five years… I think I first saw that about five years ago when I traveled abroad and people pulled out their cell phone and put something in and showed it to you. And I was like, wow, this is really there.
It's really baked into the culture now. I feel like everybody is using it when I see people abroad. From your perspective, what was the shift? Was it like a different method? Because I remember when I was researching some of the stuff that you were doing, like there was an older method of machine translation and you wrote a paper about comparing different methods of how to do it. How did you use to do it? And what was like the the turnaround that made it more effective?
Yeah. I mean, there's been various points where there was a complete different shift of how to do things. So when I started, around 2000, the state of the art was still what we call rule based systems. Where people actually wrote down dictionaries and wrote down translational rules and try to figure out the grammar of a sentence and come up with how the grammar has to be changed. That was state of the art.
So the big revolution around that time was machine learning. Let's collect a lot of translated material. Let’s learn from the translated material how to train these models. So the initial models were all based then on how often have I seen this word translated in this way. Then we say, “Oh, 50 percent of the time was translated this way. So we're going to use 50 percent probability there.” And we combined it with like, which words make sense in context.
Things that are now well known as language models are based on a score of how well word sequences work. The very first one was really just word translation. The thing that kind of made it work, and this is right about the time where Google Translate started, were so-called phrase-based models where we also translate bigger text chunks. Because sometimes one word translates into two or two words translate into three. And sometimes if you have word groups, they're easier to translate than words in isolation. Words in isolation might be very, very ambiguous. That kind of kept going for about a decade.
The last big breakthrough was eight years ago now, when everything went into deep learning, so neural network architectures. It's just a more sophisticated way of kind of learning the probabilities of how things are…
The Impact of Deep Learning on Translation
So with the advent of deep learning, was that when the accuracy went way up such that it was incorporated into everyday life? Was Google Translate or any of these translate apps available and then now they're just better?
I guess what I'm asking is like, was it a ChatGPT moment where like, Oh my God, everybody was using ChatGPT and now it's in the consciousness, right?
Like my students will use ChatGPT to write up an email or something like that. Now it's out there. Beforehand it wasn't, it was kind of like this esoteric… like maybe you might know about it because you're in the machine learning space. But for someone like me, who's not in that space, it wasn't really available.
So was the real turnaround, would you say, the deep learning or was there something else?
I think it was also earlier, like I said, Google Translate started around 2003. When they started, they actually used some other company’s rule based system, and then in 2003 they rolled out their statistical system.
That was already a point where it was useful and people did use it. I think there was kind of relatively continuous progress over the years. So yes, the big turn to deep learning kind of gave it another good step in progress, but I think it is continuous. Because it's not only the technology, it’s also kind of refinement. There are always some new tricks you kind of put in. There is also more training data that is being collected.
This is also where we spend a lot of time— where we can find translated text to train our models and how we can assemble it and put in the right format. An interesting thing is with deep learning that all that didn’t really change. Only really the machine learning method changed, but the entire infrastructure around it was still the same
Gotcha.
The way we trained it on translated text.The way we evaluated it. The way we measure progress. Was really not that different. So it was actually not a big jump for the people who worked on it. Everybody kind of changed like within a year to the new thing.
What you just mentioned, ChatGPT, this is currently the next big change, I would say. It used to be that we kind of build dedicated systems on just translated text data. And now we have these gigantic models that are trained on trillion words of text in all languages. And if you just take these models and then adapt them to machine translation, that is currently working better. And we're kind of in the transition period towards that. It has the big drawback that these models are so gigantic, so expensive to use and build, that that is still a hindrance for them. But in terms of quality, they have quite a few advantages.
Yeah, what do you think is most exciting to you right now in this space that is going to be like the next frontier? I feel like, specifically for speech, it's going to be based on computational power, it's going to be based on speed, those kinds of things. But I might be missing something as an outsider.
Is there anyhing coming down the pipeline that you're like, “Oh, wow, this is going to be something big.”
Yeah, I think the whole merger of text and speech. Tthat's the thing that is also currently happening. And it's really interesting because I'm here at an institute at the university called the Center of Language and Speech Processing. And we always kind of had like colleagues who worked on speech and we worked on language. And it's also interesting because that divide is reflected in that the speech people are in electrical computer engineering… because that is signal processing, that is waveforms and audio and all that… and we are in computer science, which is more like words and numbers.
Binary versus analog, right?
Yeah. So that has now merged. We basically use almost the same models. We speak the same language. Previously, they talked about filter banks and Fourier transforms and I was like, “Okay, yeah, you do your thing.”
Now you're basically doing the same stuff. So it's become much more fluid. And that's kind of what I'm actually most excited about. Because that is still not where we quite want it to be. It's still not clear if the best way to build a system is to really build a system where you take speech in one language and train it to produce speech in another language or just break it up into the three parts of speech recognition: text, machine translation, and speech synthesis. But, if you do that, you lose so much because speech is more than the written text. If you transcribe it, there's a lot of nuances to speech and how something is being said.
There's a reason why we do this as a podcast, not a chat conversation, because there's so much more in the speech signal than just in the text. Now we also have video and there's also a lot to say about video. There's also a lot of value in the video signal, like the facial expression, gestures, that is all part of communication.
I don't think we fully understand how to properly translate that and what we even want to do there. So if I would now speak in German or Spanish or whatever language, how do you even translate that? Because, you know, there's certain things to say in a certain culture, in a certain language, in a certain way that is always the proper translation. You can never quite capture that.
Yeah. So it's almost like the low-hanging fruit would be some sort of artificial general intelligence that acts as an intermediary to kind of filter out the proper way to say something.
Even that is not quite clear as to how to do this. Interestingly, there's so much less speech data available than text data. You would think that people speak a lot and they don't write very much, but they don't record what they're speaking. Everything you write is automatically recorded so there's so much more in text form.
Historically you didn't have access to speech. There's really not that much there. So the big foundation models for speech are not as big as the foundation models for text, because there's just not that much audio. There's also a lot of privacy concerns with audio.
If you take a sentence, you don't know who wrote it, but if you take a piece of audio, you can probably trace down who said it. [00:23:13] And that opens up all kinds of problems.
So that whole space of like, not just translating the meaning and having a monotone voice being the translation, but capturing kind of the whole of what makes up human speech and human communication.
Yeah. It’s a really interesting space to be in. It would be cool to see that kind of progression from where we were so far away from the ideal and now we're so close, right? I don't think that we could say that about any of the fields in medicine. I mean, we're close and like we're building and stuff like that, but that kind of rapid progress is pretty interesting, I'm sure, for you to be a part of.
It's a very hectic time. I think just also because of the large language models and all of these things kind of merged now and everybody's working on the same thing. It has so much excitement. Even, you know, developers or just normal people who use technology in interesting new ways. All of that kind of brings new ideas to the table and creates such a very dynamic environment that, yeah, a lot is happening.
Like I write these research proposals and then I look at a paper, “Oh, that is from like early 2023, it’s already more than six months old. I don’t know if that's still relevant.” It didn’t use to be like that. It is a very fast-paced environment right now.
So also it's interesting that it's such an open environment where for some reason or other a lot of these companies, who spent tens of millions of dollars building these gigantic models, give it away for free and you can use them. So we at the university could never build a model like GPT 4 because it costs, you know, tens of millions of dollars of compute costs. But there are many open source models out there you can just download and then the action is really in building on them. And we can still do that and refine them and we can adapt them to make them work for tasks like translation.
Cultural Implications of Advanced Translation
Hmm. Interesting. Okay, so a million dollar question. I have a daughter and I'm deciding, is it even worth it to have her learn other languages? I feel like this technology is so close. Like… are you a parent? Do you have kids at all?
I have children. Yeah. I'm German but I live in America.
Do you see the value of teaching other languages? Or, because you're in the space, are you more like, you know what, this is going to be like learning how to make vinyl records or like learning how to make fire. You know, like why put your interest into this when you can use like a lighter or something?
I mean, it is interesting because when you learn another language, you also really learn a different way of thinking and a different culture on a much more intimate level.
Yeah.
It always gets a bit lost in translation.
Mm hmm.
I'm like still of the generation where we learned in school, Latin. I had Latin for like five years. I was terrible at Latin. I had to learn Latin and English in school. So English was useful.
Latin was also interesting, but you kind of learn something about a culture 2000 years ago. When you actually learn a language, it's a different way of thinking and a cultural understanding. Like all that gets lost in translation. So translation is always an approximation.
There was a study a long time ago where they looked at the human translation of various languages, and you could still trace down from the translation what the source language was because people just say things in a certain way in a certain language. So if you translate, you still have to express it in the same way, but you wouldn't say it like that in the other language.
I just want to say that that's a very interesting thing to hear coming from someone who is in the computer science space, because it's a very romantic idea of language.
It's like you learn the type of person by the language that they speak. But I've heard that that is indicative, like the personality that's imprinted onto you is also contributed from the language that you learn.
So let me give you an example. My parents spoke Urdu growing up. And so in Urdu, there is a very significant differentiation between people who you're supposed to respect versus people who are just like, you know, it's like colloquial terms. And I feel like it's just ingrained in my head, like, “Oh, this is an adult person and I have to respect that person.” And my wife speaks Pashto, which doesn't really have the same kind of thing and I don't really feel like it's as present. And I've read research papers that say that your language imprints upon you a certain personality. Would you agree with that?
Yeah, language is a very, very important part of the culture. And there are these kind of distinctions that are being made and then expressed in language that matter. And then you, if you translate it, you kind of lose it.
I mean, German has the same distinction between like a polite you that you use if you talk to anybody official and then the more informal one that you talked to with your friends.And it's like a big formal thing, maybe not so much anymore, to like having a friend then using the informal you instead of the polite you. It took my parents like 10 or 20 years to finally agree with their neighbors that they constantly talk with to use that informal you.
Those distinctions, they matter and they're quite important.
Yeah, so would you advise your your kids to still learn languages? Are they learning languages?
Yeah. And also it is just, I don't know how good the technology will ever be. It’s always much more natural if you can actually understand.
Interesting.
And talk in that language rather than being reliant on technology.
Well, I mean, not to be contrarian, but I hope you're wrong.
I hope that there is a time in our lives where we can talk to anybody, any human being, in the same way that you and I are talking. I think that's going to be a powerful day for humanity because there are just so many divisions based on language.
I mean, it's just one more division, as opposed to all of the other stuff that we're dealing with. And I think that some people are really afraid of like the monoculture and they really want to preserve like some small speck of land, like identity and everything like that. But, you know, my favorite utopian vision of science fiction comes from Star Trek, where you have universal translation and people are judged by like the merits of their character. And it's like a true meritocracy, you know? So I hope that the work that you're doing really gets to that natural place.
Yeah, I would agree with you that it's definitely the hope that if this machine translation technology works good enough, that it also then kind of confirms the kind of the validity of all the languages. Because we see in the world such a big push towards English as the universal language.
It doesn't matter in which country you grow up, you're probably going to learn English as a second language because that's just the global language, you just have to speak it. And then you kind of see it creeping in. Like if I go to academic meetings, of course, you speak English. And then, I mean, I'm now in America, so obviously everybody speaks in English here.
But I hear it from academia, where at a German university, you have like a Spanish researcher and a French researcher and of course, everybody then speaks English with each other, because that's what they can agree on.
So there's like a real danger that even with countries with a strong national identity and a strong national language, there's this pull towards English. And hopefully machine translation is a push against that and helps with all these languages.
I mean, it might still end up being English that is that one language that kind of binds us all. But I feel like if we did have universal translation, that one language could fluctuate very easily, right? You know, for example, maybe a thousand years ago, people were speaking Latin as the universal language. It's just right now, for whatever reason, English is the most predominant one. But who knows, in a thousand years it might be German.
I feel like in the grand scheme of history, the language that dominates the culture could be relatively different, or at least I hope so.
But the point that I'm trying to make is that I feel like if we do have universal translation, it wouldn't matter anymore. But that's a very pie-in-the-sky dream and that's part of what this whole podcast is about - getting people inspired about the future.
I do know the perspective from the European Union, which is made up of a lot of countries that speak all different languages that come together. There's a lot of emphasis that the European Union doesn't speak English, they speak 26 official languages and everything is accessible in 26 official languages. And if you are a parliamentarian and go to the European Parliament, they can give you a speech in Danish and it's perfectly fine, it’s even expected. And there's an array of human translators who live translate your speech into all the other languages. They spend a lot of money on enabling that, but it's also important for just the way the European Union works. It's all by consensus agreements.
So that is going to be interesting to observe how well that as a recipe works versus kind of the English centric.
Yeah. The Chinese have a saying, may you live in interesting times, right? And I feel like right now it's just like the most interesting time of humanity. So much is changing: culture, technology, all this stuff. And the pace of change is increasing also. So it's certainly going to be interesting watching all of this stuff play out.
You asked earlier about the research atmosphere, if I step back and just watch everything happening and read papers, I'm like, “Wow, this is interesting.” If I'm in the middle of it, it's actually really stressful because it’s hard to kind of figure out what you should be working on that actually is still relevant and isn't completely then obliterated by some other trend that happened and you shouldn't even do it this way anymore.
Yeah. And it's mundane too, right? It's like publish or perish. You have to write papers and you have to ask for grants. At Tufts University, it's the same way. I'm sure it's the same at all different universities. In general, we're so focused on what we do on a daily basis, it's nice to step back as a third-party observer.
If they were to tell me to look at how surgery has advanced, I would sit back and say, you know, actually it has really advanced and I'm excited for what's coming down the pipeline with biotechnology and all this stuff. That's the way that I look at your field.
So I'm not really complaining about it, it's just characterizing it. And I definitely prefer that situation over, you know, people saying it's stagnating.
I mean, it remains to be seen for instance, how like the current large language model and all that AI investment that happens, if that really pays off.
Will the AI Investment Pay off?
Do you think it will? I mean, there's so much buzz about AI and I feel like I can't really tell the wheat from the chaff. I feel like there's a real difference – people who are in it have this idea that this is like the next big thing and then there are people who are just going about their daily lives who might not be exposed to it. It's very difficult to tell which one of those is right.
It ultimately comes down to, what is it actually going to be useful for? That was always our mantra in machine translation - What is it useful for? If it's like text translation of web pages and you can figure out what it means, that's already useful. If it makes translators more productive, that is definitely useful. And that's actually where money comes in. And then if you travel abroad and you can pull out your phone, that is a level of usefulness.
You're kind of in the same space with large language models. But I also have to say that even if it turns out to be really useful, it doesn't necessarily mean it's going to make a lot of money. So, machine translation never made a lot of money.
I mean Google Translate gave it away for free. So the money was really just in building customized systems. And there are not really that many people who pay for customized systems. When people want to pay for customized systems, we do it because we’re going to save money somewhere else.
So like, even if it is useful, and machine translation definitely has turned out to be useful, it didn't really make anybody really super rich.
I hear you. And, honestly, I hope that that stays the case because a lot of the concern, at least from my perspective and some people who are in the tech space, is that there's too much consolidation of, I guess, power with artificial intelligence. You have like the top four or five companies that really control all of the stakes.
I hope that we have a lot more open-source opportunities, a lot more democratization. I do feel like the internet was super democratic – it allowed somebody who's in their basement to start up a business and make a ton of money from it. I don't know if that's the same thing with artificial intelligence, but I'd be interested in your perspective about that.
I'm fairly optimistic about it. So we can basically… if I would now decide my research group should build a big language model like GPT 4, in terms of technology and even data resources we could do it. The only reason why we can't do it is the computational cost.
Yes, there are the big companies who can spend tens of millions of dollars (or whatever they spend) on building these and we can't. But in terms of the technology, we know what they're doing. It's not a mystery. It's fairly well published.
And it's actually interesting when you hear their, their war stories and their reports, they sometimes start out with like, “Oh, we should train it a little bit differently. We should try this different and that different.” And they start training and doc runs that cost a million dollars a day. And then something doesn't work. Then they throw out all the new ideas and stick to what works.
So there’s not that much technological innovation in like the latest models that come out. It’s just kind of more data, bigger models, just learning from your mistakes.
Have you heard the idea that computation costs will come down, the power costs will come down? Is that accurate? I've heard from people saying that right now it is very much an oligarchy, but over time it should become a little bit easier for people who are smaller companies getting into this.
Yeah, in the midterm, yes.
In the short term, there's like a monopoly called NVIDIA that sells you graphic cards that they sell for computer gamers for $500 and they charge AI companies $20,000 as only those can actually fit into a server. So there's a lot of profit taking by certain people right now. So that's one reason why it's expensive.
And the other question is, if you can train a bigger model then it's better. And then you can train an even bigger model. Then it really is just an arms race of who can spend more money on it. Even if it gets cheaper, they’re just going to build twice as big models and then you still have the same cost.
But yeah, in the medium term, I think there's going to be competition in that space for the hardware. There has been still continuous progress in hardware. I mean, compute costs per computation has been going down forever and it's still going down. It's just getting kind of outweighed by the amount of computation.
Yeah. It’s the amount of compute. Even though the costs are going down, the amount of compute is still increasing.
And so two things could happen that would really bring down cost dramatically. One is that the whole like even bigger and bigger and bigger kind of stops at some point due to diminishing returns.
Right.
And people just find out more efficient ways of training them. It's just extremely wasteful how they're currently being trained. There ought to be better ways and maybe people find out what the better ways of training this are and that's also going to bring down the cost.
So, yeah, at the moment that's the situation, but that might look different in like three to five years.
Yeah. Let's see. We live in interesting times. So I'm really excited to see what comes down the pipeline. But listen, thank you so much for being with us.
Sources of Inspiration and Final Thoughts about the Future
I'm going to ask you the three questions that I ask all of my guests, which are kind of general questions to see what people who are in the industry, who are building the future, are actually thinking about.
The first one is where do you gain your inspiration from? I kind of highlighted a little bit about how I gain a lot of inspiration from science fiction. I'm always thinking about how do we make that utopian vision that some of our best science fiction writers have thought about. What about yourself?
What drew you to this in the first place and what inspires you daily?
So what drew me in the first place, I wanted to do machine learning. And then I just started doing machine learning and I realized that it's kind of stupid to machine learning without having a problem. Text and language processing was kind of a nice area to do machine learning for because there is a lot of data.
What also motivates me a lot is that we are actually building something that's useful. I mean, we’re having this conversation. Everybody has an opinion about it and it's easy to communicate. And, like I said, the experience – you travel broad and someone pulls out a cell phone, shows you Google Translate and you feel like, “Yeah, I worked on that kind of stuff.” It's there, it's real. That it actually has an impact on the world is very, very encouraging.
Yeah. I definitely gain some inspiration from the impact I'm delivering on a daily basis. It really drives you through those hard days, where it just feels mundane or lacking the same kind of interest that you might have had in the beginning. So I really appreciate that.
We've talked a little bit about this technology. Where do you see us in 10 years? Do you think that this is going to be even more baked into the culture? Do you think it's going to be something where I just put an earpiece in and have a natural conversation with someone else?
So I’m not going to promise that in five years we’ll solve all problems. You're giving me now 10 years, so I’m still not going to say we're going to solve all problems. But yeah, basically that we can do this kind of podcast and we two speak two different languages. That is so close on the horizon. That should happen at least in the timeframe of 10 years, you gave me 10 years.
I hope so, I hope you are right.
I would expect earlier than that. But there's a lot involved where that happens. Even kind of picking up on the nuance of language and the emotion that needs to be translated too. And that's definitely where we're struggling much more at the moment.
Cool.
We’ve already gotten a long way. You kind of assume when you see a foreign text, you can just press a button and you're going to get it in your language.
Or a foreign website, you just press translate. Like you go to a restaurant in Denmark and you want to make sure that you understand the menu. It's so natural these days.
I almost feel a little entitled to it now. Like I was looking at like different vacation destinations and I was like, “Well, am I going to be able to communicate with them the same way that I am when I go to France or if I go to Germany or something like that?” So very interesting to see that.
Last question. We have this increasing exponential rate of change in all sorts of different technologies. In your space, in computer science and machine learning, but also in biotechnology and longevity and the space race. All this sort of stuff that's happening.
Aside from your own field, what is it you just can't get enough of? Like, you look at the news articles and you're like, “Oh my God.”
For me, personally, it's robots. I can't wait until the day when I have a humanoid robot that can wash my dishes and fold my laundry. So I'm always ready to go whenever I see something that's like very close to that. And honestly, machine translation and human translation was something that I was really interested in apart from medicine or biotechnology.
So what about yourself? You're in the computer science space, aside from that, what can't you get enough of?
So I don't really have a ready answer for that. I mean, yeah, robotics and automation.
Ultimately, the dream behind all that is also that, if you automate everything, then we have to work less.
Wow, that's, that's interesting to hear.
So it's really more a political question than a technological question. With all that progress we have and all these things that can be automated, it really should get us to a point where… not just we're richer and you can buy more things, or worse that only some people get super rich… but it should get to the point that you only work half as much because the robots do enough for you.
I think that it's going to definitely allow for us to come home and not have to worry about chores, right? If we could just remove chores and focus on things that we enjoy doing, I think that would be a really interesting time to live in. But I hope that we get to the point where you're saying, where we can actually just work less in general.
I think that this idea of a 40-hour work week… 10 years ago I don't think that we would be debating the merits of it, but now people are actually debating the merits of five-day work week. Now we're thinking about like a four-day work week, a 32-hour work week.
Hopefully you're right. I really look forward to that.
Yeah, but ultimately that's a very kind of political question.
Yeah, very, very interesting. So thank you so much for joining us today, Philip.
Thanks so much to all of the different listeners who are coming in from all over the world. As always, you could really help us by liking and subscribing. And for those of you guys who are listening on a regular basis, we will see you again in the future.
Thanks everybody!

Philipp Koehn
Professor, Johns Hopkins University | Author
Philipp Koehn has been working on machine translation for the last quarter century, published over 200 research papers and two textbooks on the topic. He received his PhD from the University of Southern California. He was a professor at the University of Edinburgh before he joined Johns Hopkins University in 2014. He is currently mainly working on machine translation, speech translation, and multilingual aspects of large language models. He received funding for the US and EU governments, Amazon, Google, Meta, Bloomberg, and others. He also worked closely with industry (Meta, Omniscien Technology, Systran, etc.) to deploy the technology.