Our Q&A with SoundHound AI’s CEO Keyvan Mohajer
SoundHound has morphed from a song-recognition app to a major player in voice AI, and the market loves it.
SoundHound — perhaps best known for the tune-recognition app that went live 2010 — has been around for a while.
But since it went public via a special-purpose acquisition company, or SPAC, in April 2022, its share price has exploded, more than tripling over the past year alone, making the company a favorite of retail traders.
We sat down with the company’s CEO, Keyvan Mohajer, who cofounded the firm in his Stanford University dorm, to drill down on the sentiment surrounding the stock as well as the tangible prospects for the company, which is still losing money.
This conversation has been edited for clarity and length.
Sherwood: How’d you get interested in the tech implications of voice recognition?
Mohajer: When I started my Ph.D. I was looking for the next big change that I could work on. I relied on science fiction. I said, "What do they have in science fiction that we don't have?" In particular, on “Star Trek” or in “Star Wars,” what do they have?
The obvious ones were, like, spaceships and teleportation and so on. But the one that was less obvious was voice AI. In these shows they talked to robots and computers.
And in 2000 we only had some really bad dictation software just for transcribing. And it didn't work.
I decided that voice AI was going to happen for sure. And it was going to happen in my lifetime, and I wanted to be a part of that transformation. I chose my Ph.D. thesis to be in voice and machine learning. I learned everything there was to learn. Then with my cofounders we started SoundHound.
Sherwood: The first time people likely encountered the company was through the SoundHound app. How did the company evolve?
Mohajer: I started the company to build a voice AI company. I went to investors and I told them that. I said, in 20 years we will talk to computers. And I want to build a company around that. And the feedback, at that time, from investors was that 20 years was too long. Maybe today investors have learned that they can invest in longer-terms visions, but back then they were investing in, like, one- to three-year visions.
So I took that feedback, went back to the dorm room, and I talked to my cofounder and asked, “What can we do in three years or in one year?” And we came up with this idea that of a humming search. The reason was, first of all, it was voice and it was search and it had AI elements. And in those days if you could search something that Google couldn't search, it was a fundable idea.
So we built a demo around the humming search. We used that to raise funding. But from the moment we raised funding, I told my board, my investors, my cofounders, my team, that SoundHound's future is not based in music search. It's based on voice AI.
So we were going to build our own speech-recognition engine. We were going to build our own search engine around it. And I got their blessing to allocate part of the resources of the company to that. And that allocation became bigger and bigger. And then, fortunately, our music app was at the right time. It got 300 million downloads and it funded 10 years of R&D.
And then after 10 years, in 2015, we unveiled our voice AI platform. And it was so well received that we raised hundreds of millions of dollars based on the technology that didn't have any business traction yet. And then obviously after that we had customers and revenue and so on.
Sherwood: Can you flesh out the current business structure? As I understand it, there are three pillars to the company. First, licensing voice-recognition software to automakers and consumer-device makers. The second pillar focuses on customer-service applications historically around restaurants. But I’m not quite sure I understand the third pillar, which is sort of about more of a longer-term vision.
Mohajer: Yeah. So three pillars. The first pillar is devices. Automotive is a big part of it, but we think eventually voice is going to be the preferred way we interact with devices. It's the most natural way. The reason it's not there yet is historically voice wasn't good there. But now, because of AI and large language models, it's really good. And most devices can afford to include a very small, inexpensive microphone.
Pillar two is AI customer service. We started in restaurants. But when I talk to my team about it, I use an Amazon analogy. Restaurants, to us, are books to Amazon.
Amazon was a bookstore first. They only sold books. For years we knew them as the online bookstore. Now we don't even remember it and they sell everything from A to Z. And now we have expanded beyond restaurants.
Sherwood: And the third pillar? How would you unpack that one?
Mohajer: So, pillars one and two are kind of independent. But pillar three brings them together. We’re going to bring our pillar two customers and make them available to our pillar one users.
So while you’re driving the car, you can order food.
You’re already talking to your car. If you want to pick up some food, and there’s a drive-thru 10 minutes away, why do you actually have to go and arrive there and get in line and wait? Imagine you can just talk to your car, find a restaurant, and a place an order.
We deliver value to the end user, that driver. And we deliver value to the restaurant because it’s a new lead for them. And there’s lead-generation revenue, which is a very proven business model, which they share with us, and we will share that revenue with the carmaker who has chosen to adopt us as a system. And that will create a voice-commerce ecosystem that will create a flywheel effect.
Sherwood: That’s interesting. Because it does seem obvious that the killer app for AI would be just to be able to tell some device, “Hey, can you make a lunch reservation for me and Keyvan on this date and find a restaurant near his office?” Or something like that. That seems huge and powerful and obvious.
Mohajer: Yes, that’s exactly the vision. And the vision is undeniably great, but there are a lot of pieces that need to be there, you know the tech, and so on. But also you need scale. You need scale in devices and you need scale in services.
Sherwood: Dumb question, but by scale, you mean, like, you have to be in all the different car companies? You have to be in all the different restaurants? Is that what you mean?
Mohajer: Not all, but enough. There’s a chicken-and-egg problem. You need both the services and the eyeballs. Because we’ve been at this for so many years, now we’re in millions of cars and in tens of thousands of locations. You know, every Chipotle is on our system. And that’s all you need. Today we have enough scale to make pillar three a reality.
Sherwood: OK, so that’s a very attractive long-term vision. But turning to your financials, while revenue growth has been strong, you’re still posting losses. What are your prospects for breaking even anytime soon?
Mohajer: We constantly try to balance growth versus profitability. This is just an amazing opportunity for SoundHound to move faster, grow faster, capture market share, innovate. That’s the urge to invest more.
But we also want to be financially strong. We want to get to profitability faster, and we have learned to be financially very disciplined. The last thing we shared was we expect our revenue next year to be more than $150 million and I think we have said that we expect to become adjusted EBITDA positive at some point next year. Adjusted EBITDA is when you take out non-cash stock-based compensation and, you know, non-recurring expenses due to M&A and so on.
Sherwood: Turning to competitive threats for a minute, if access the computing power and being able to pay for some of these large language models is the key to the kingdom in AI — and that may not be true — why won't just the Googles and the Apples of the world — the richest companies, the companies with all the money — why won't they just win?
Mohajer: I’ve got to use an analogy. When Apple launched the iPhone, it was all about operating systems. So Android and iOS, but there was a bunch of others, too, that you don't remember. It lasted for one or two years and dominated the news. But a couple of years after that, and for more than a decade, the news was all about apps and apps companies building user experiences on top of these operating systems, companies like WhatsApp and Instagram.
That analogy is really going to happen here. So right now it’s a battle of foundation models and companies building their own and beating different benchmarks. And that’s what you’re going to read in the news for the next one or two years.
The next wave of winners for the 10 years to come is going to be companies that build user experiences, magical user experiences, on top of these foundation models. And SoundHound is going to be one of those winners, in our opinion. But it won’t be that Google eats it all or something like that.
Sherwood: Well, this has been a great conversation. Thanks very much. Final question: Can you share one super hack that's changed your life in terms of productivity?
Mohajer: I never thought of myself as a morning person, but in this role I pushed myself to change. I wake up earlier and earlier. I like to start before the world starts. I like to start when the world is sleeping. Those are my golden hours. I get on top of things, I catch up, I get ahead, and I make my best decisions. That's the best change that I've made, and it has made me a lot more relaxed.
Sherwood: What time do you get up?
Mohajer: Typically it’s before 5 a.m. But there have been days when I start at 1 a.m. But those are rare.