На информационном ресурсе применяются рекомендательные технологии (информационные технологии предоставления информации на основе сбора, систематизации и анализа сведений, относящихся к предпочтениям пользователей сети "Интернет", находящихся на территории Российской Федерации)

Feedbox

15 подписчиков

Voices in AI – Episode 46: A Conversation with Peter Cahill

Author: Byron Reese / Source: Gigaom

Today’s leading minds talk AI with host Byron Reese

In this episode, Byron and Peter discuss AI use in consumer and retail businesses.

Today’s leading minds talk AI with host Byron Reese

Byron Reese: This is Voices in AI, brought to you by GigaOm. I’m Byron Reese, and today, our guest is Peter Cahill.

He is the CEO over at Voysis. He holds an undergraduate degree in computer science from the Dublin Institute of Technology and a PhD in the field of computer science text-to-speech from University College, Dublin. Welcome to the show, Peter.

Peter Cahill: Thanks. Looking forward to it.

Well, I always like to start with the question, what is artificial intelligence?

It’s a tough question. I think, as time passes, it’s getting increasingly more difficult to define it. I think some years ago, people would use ‘artificial intelligence’ and essentially pattern matching as kind of meant the same thing. I think in more recent years, as technologies have progressed, sizes of data sets are many times bigger, computer power is obviously a whole lot better as well, as are technologies developing that, I think these days, it can be really hard to draw that line. Some time ago, maybe a year ago, I think, I was chairing a panel on speech synthesis. One of the questions I had for the panelists in general was, in theory, could computers ever speak in a more human way, or in a way better than humans? We’ve seen many of these.

Over time, we’ve seen that computers can do computer vision better than people.

Computers can do speech recognition better than people, and it’s always in a certain context and on a certain data set. But still, we’re starting to see computers outperforming people in various cases. I asked this question to the panel, could computers speak better than people? I think one of the panelists, as far as I recall, said that he believed they could, and what would be realized would be that, if a computer could not just sound perfectly human but also could be more convincing than your average person would be, then the computer would speak better than the person. I think on the back of that, to ask what’s the artificial part of artificial intelligence, it does seem that, as time passes and these technologies continue to progress, that really having a good definition on that just becomes increasingly difficult. I’m afraid I don’t have a good definition for you for it. But I think eventually people will just start referring to it as intelligence.

You know, it’s interesting. When Turing put out the Turing test, he was trying to answer the question, “Can a machine think?” Everybody knows what the Turing test is, can you tell whether you’re talking to a person or a computer? He said something interesting. He said that “if the computer can ever get you to pick it 30% or 40% of the time, you have to say it’s thinking.” You have to ask why wasn’t that 50-50? Of course, the question he was asking is not whether or not a computer could think better than a person, but whether they can think at all. But the interesting question is what you just touched on, which is if the computer ever gets picked 51% of the time, then the conclusion is what you just alluded to. It’s better at seeming human than we are. So, do you think in the context of artificial intelligence – and I don’t want to belabor it. But do you think it’s artificial like artificial turf isn’t grass? Is it really intelligent, or is it able to fake it so well that it seems intelligent? Or, do you find anything meaningful in that distinction?

I think there is a chance, as our understanding of how the human brain works and develops, in addition to what people currently call artificial intelligence – as that develops, eventually there may be some overlap. I think even myself and a lot of others don’t really like the term “artificial neural networks,” or neural networks, because they’re quite different to the human brain, even though they may be inspired by how the human brain works. But I wouldn’t be surprised if eventually we ended up at a point of understanding how the human brain works, to the extent that it no longer seems as magically intelligent as it does to us today. I think probably what we will see happening is, as machines get better and better at artificial intelligence, that it may become almost like if something seems too natural or too good, then people would assume maybe that it came from a machine and not a person. Probably a really good example is if you consider video games today, that we have this artificial intelligence in video games, which is really not intelligent at all. For example, if you take a random first-person shooter type of game, where the artificial intelligence is trying to seem very – they make lots of mistakes, they move very slowly. If you really tried to power a modern video game with really state of the art artificial intelligence, the human player wouldn’t stand a chance, just because the AI would be so accurate and so much faster and so much more strategic in what it was doing. I think we’ll see stuff like that across the spectrum of AI, where machines can be really, really good at what they’re doing and, as time passes, they’ll just continuously get better, whereas people are always starting from scratch.

So, working up the chain from the brain – which you said we may get to a point where we understand it well enough that our intelligence looks like artificial intelligence, if I’m understanding you correctly. There’s a notion above it, which is the mind, and then consciousness. But just talking about the mind for a minute, the mind, there’s all this stuff your brain can do that doesn’t seem like something an organ should be able to do. You have a sense of humor, but your liver does not have a sense of humor. Where does that come from? What do you think? Where do you think these amazing abilities of the brain – and I’m not even talking about consciousness. I’m just talking about things we can do. Where do you think they come from, and do you have even a gut instinct? Are they emergent? What are they?

Yes, obviously it would just be a guess, really. But I would think that, if we end up with AIs that are as complex or even more complex and more capable than the human brain, then we’re going to probably see various artifacts on the side of that, which may resemble these types of things you’re talking about right now. I think maybe to some extent, right now, people draw this distinction between AI and intelligence, because the human brain still has so many unknowns about it. It appears to be almost magic in that way, whereas AI is very well-understood, exactly what it’s doing and why. Even if, say, models are too big to really be able to understand exactly why they’re making certain decisions, the algorithms of them are very well understood.

Let me ask a different question. You know, a lot of people I have on the show – there’s a lot of disagreement about how soon we’re going to get a general intelligence. So, let me just ask a really straightforward question, which is some people think we’re going to get a general intelligence soon – 5/10/15 years. Some people think an AGI is as far out as 500 years. Do you have an opinion on that?

Yes, I think as soon as we can put a time on it, it’ll happen incredibly quickly. Right now, today’s technologies are not sufficient to be generally intelligent. But what we’ve seen even in general in AI in recent years is, as now, pretty much every company out there is trying to develop their AI strategy, building out AI teams or working with a lot of other companies that work in AI. I think the number of people working in AI as a field has increased dramatically, and that will cause progress to happen far quicker than it would have otherwise happened.

So, let me ask a different variant of the question which is, do you think we’re on an evolutionary path to build… is the technology evolving where it gets a little better, a little better, a little better, and then one day it’s an AGI? Or like the guest I had on the show yesterday said, “No, what we’re doing today isn’t really anything like an AGI. That’s a whole different piece of technology. We haven’t even started working on that yet?”

Yes, I’d say that’s correct. But the leap – it’s not going to be an iteration of what we currently have. But it may just be a very small piece of technology that we don’t currently have, when combined with everything that we do currently have makes it possible.

Let’s talk about that. People who think that we’re going to get an AGI relatively soon often think that there is a master algorithm, that there is a generalized unsupervised learner we can build. We can just point it at the internet and it’s going to know all there is to know. Then other people say, “No, intelligence is a kludge. Our brains are only intelligent because we do a thousand different things and they’re all cognitive biased. All this messy spaghetti code is all we really are.” You have an opinion on that?

I think currently there’s no algorithms out there that even suggest it could be generally intelligent. I think as it is, even if there was one minor breakthrough in that space, it would have a very dramatic knock-on effect in the world. Then people would start believing it was only a number of years away. As it is right now, if it happened in 5 years, I honestly would not be surprised. If it happened in 15, I wouldn’t be surprised, or if it happened in 50. Right now, we’re at least one major breakthrough away from that happening. But that could happen at any point.

Could it never happen?

In theory, yes. But in practice, I would guess that it will.

One argument that says that it may be, just like you’re suggesting, a straightforward one breakthrough away. It says that the human genome, which is the formula for building a general intelligence – and it does a whole lot other stuff – is, say, 700MB. But the part that is different than, say, a chimp, is just one percent of that, 7MB-ish. The logical leap is that there might just be a small little thing that’s a small amount of code, because even in that 7MB, a bunch of it’s not expressing proteins and all of that. It might just be something really simple. But do you think that that is anything more than an analogy? Is that actually a proof point?

I would expect it to be something along that line. Even today, I think you could take the vast majority of deep learning algorithms and you could represent them all in less than a MB of data. Many of these algorithms are fairly straightforward formula, when they’re implemented in the right way. They do what we currently call deep learning or whatever. I don’t think we’re that many major leaps away from having an artificial general intelligence. Right now, we’re just missing the first step on that path, and once something does emerge, there’s going to be thousands, tens of thousands of people all around the globe who will start working on it immediately, so we’ll see a very quick rate of progress as a result, in addition to it just learning by itself, anyway.

Okay, so just a couple more questions along these lines and then we’ll get back to the here and now. There’s a group of people – and you know all the names – high profile individuals who say that such a thing is a scary prospect, an existential threat, summoning the demon, the last invention. You know all of it. Then you get the other people, Andrew Ng, where it’s worrying about overpopulation on Mars, Zuckerberg who says flat out it’s not a threat. Two questions. Where are you on the fear spectrum, and two, why do you think these people – all very intelligent people – have such wildly different opinions about whether this is a good or bad thing?

I think eventually it will get to a point where it has to become – or at least certain applications of it will have to become a threat or dangerous in some way. There’s nothing on the horizon that – that’s really, again, the path of general intelligence, which nobody has right now. I think eventually it will go that way, as many technologies do. No one really knows how to manage it or handle it. There have been calls by some people to regulate AI in some way, but realistically, AI is a technology. It’s not an industry, and it’s not a product. You can regulate an industry, but it’s very hard to regulate a technology, especially when it’s outside of your own country’s borders. Other countries don’t need to regulate it, and so there’s a very good chance, if it’s going to be developed, it’s probably going to be developed by many countries, not just one, especially within a few years of each other. I’m not, to be sure, even if everybody unanimously agreed, that in 100 years’ time, it was going to become a threat. I’m not too sure that it could be stopped even already, because there’s so many people working on it across different countries all over the world. There’s no regulation in any single country that could stop it. Even right now, regulation isn’t required. The technologies don’t even exist to do it, to begin with.

Let’s talk about you for a minute. Can you bring us up to date? How did Voysis come about? How did you decide to enter into this field? Why did you specialize in text-to-speech? Can you just talk a little bit about your journey?

Sure. I started working in text-to-speech in 2002, so 15 years ago. I think at the time, what really attracted me to it was that it was a very difficult problem. Many people had worked on it for decades, especially back then. Computer voices sounded incredibly robotic, and then even when I looked into it in more detail, what made it even more interesting is many machine learning problems tend to be kind of classification problems, where they didn’t put a large amount of data, and then output a small amount of data in the output. For example, if you’re doing image classification today, the size of data you have in images and is far greater than the final results you get out of the model, which may just tell you this is a picture of a car or something like this. We didn’t put huge amounts of data in output, something that’s very small.

Text-to-speech is the extreme opposite of that, where the amount of input is just a few characters. From that, the system has to generate this human-sounding waveform. In the case of the human-sounding waveform, if even a small amount of that data is slightly off, the human ear will notice it very, very easily, because we’re completely used to listening to human voices, and we’re not used to listening to distorted signals generated by machines. I guess it’s the opposite of the traditional machine learning problem, where it’s kind of being creative, given a very small amount of data and it needs to create a whole lot more. That’s kind of where I started off originally, working on my PhD. After it, I became faculty at the university I was in, and made faculty for several years.

Then eventually, I resigned as faculty to start Voysis, where I think at the time, I had always said I’d like to open a company at some point. I think at that time in particular, we saw the likes of Google, Apple, Microsoft and so on – all of them went on an acquisition spree, and they acquired many of the smaller companies that had this technology, regardless of what country they were from. I think the knock-on side effect of that was that there were pretty much no independent providers anymore. Even what then companies were going to use these platforms for was very consumer-facing applications like we have today, with Google Home and Amazon Echo. But for other businesses out there who want to have a voice interface in their products, where their users can speak directly to their product and interact with them, pretty much the companies who could have provided that, were all acquired by these big platform companies.

That’s really what motivated me to start Voysis. Since then, we’ve built out Voysis as a complete voice AI platform, which normally when we say that, what we mean is that all of the technologies to power these systems – the speech recognition, the text-to-speech, the natural language understanding, the dialogue management and so on – all of the technologies were built in-house, here in Voysis. What we do is we partner with companies and select partners that we feel are both ready for voice, and consumers within that space that will benefit greatly from having a voice interface. When we build out products, we tend to find articles where we do a lot of user studies on how do consumers want to interact with these devices, and build out the whole user experience to deliver really high-quality voice interactions, integrated directly in third-party business products.

Looking at your website, I noticed you have linguists, you have a wide range of specialists in your company, and then watching your demo stuff, it just seems to me that what you’re trying to do, or what the field breaks down into are four things. I think you just ran through them. One of them is emulating human speech. One of them is simply recognizing the word that I’m saying. The third one is understanding those words, and then the fourth one is managing the dialogue of what pronouns are standing for what thing and all of that. Did I miss any of it?

No, I’d say that’s it in a nutshell, although in practice, we don’t really draw a line between recognizing words and understanding. In the case of the Voysis platform, what we do is audio would go in, and after it’s passed through several models, the understanding components come out. We never transcribe it into text first, because it’s an approach that I think many companies are moving away from. If you transcribe it into text first, you tend to accumulate error from speech recognition. When you try to understand it, there’s errors in the transcription and you can never really recover from it.

Got you. But just as underlying technology, I would love to just look at each one of them in isolation. Let’s do that second one first, which is just understanding what I am saying. I call my airline of choice, and I say my frequent flier number, which unfortunately has an A, an H, and an 8 in it.

Yes.

AAHH88 – you know, that’s not it, and it never gets it. I shouldn’t say that, but if everything’s really quiet, it eventually gets it. Why is it so bad?

There’s probably multiple things at play there. If you’re talking to them over a phoneline, phone signals are generally quite distorted and it makes it much more difficult for speech recognition to work well. But there’s also a very good chance that the speech recognition engine they’re using behind that was a general speech recognition engine built for any random use case, as opposed to one that was designed to work on telephone calls, maybe even with some knowledge of the use cases around where it was going to be used.

Because it only needs to recognize 36 things, right? 26 letters and ten numbers.

Sure, but that speech recognition engine may not have been built to recognize some things, which is probably why it struggles with it. Historically, most companies – not Voysis, but many others – tend to build a single speech recognition engine that they try to use in many different situations, and that’s generally where accuracy tends to really suffer. Because if you don’t build a system with any context on exactly how it’s going to be used, it’s a much more difficult task to do 100 things well than it is to do one. That’s essentially the Achilles’ heel of it.

I guess also, unlike dialogue, it doesn’t get any clues about what the next letter or number should be from anything prior to it, right?

There is that, but I think in that case, if you’re just listing letters and numbers, there’s not that many of them. That should work quite well, I think.

In the sentence, “The cat ran up the…,” there’s a finite number of things the cat can run up. What I don’t get, as an aside, is I call from the same number every time. You would think they would have mastered caller ID by now. Let’s talk a little bit about understanding. Any time I come across a Turing test, like a chat bot, I always ask the same question, which is, “What’s larger: a nickel or the sun?” I haven’t found any system that can answer that question. Why is that?

Generally, the modern technologies that are used for chat bots, I think it’s still relatively immature in comparison to the technologies behind speech recognition and text-to-speech and so on. Chat bots really only work well when they’re custom-designed and custom-built for a particular use case. If you ask them general questions like that, it won’t align closely to what they were trained on or built on. As a result of that, you’ll get random answers, essentially,…

Click here to read more

The post Voices in AI – Episode 46: A Conversation with Peter Cahill appeared first on FeedBox.

Ссылка на первоисточник
наверх