Dialogue Yu Dong in many applications of artificial intelligence, speech recognition is an entrance huangshexiaoshuo

Dialogue | Yu Dong: in many applications of artificial intelligence, speech recognition is a technology Sohu entrance – [this article reprinted from the "machine hearts" reported in October 31st. Dr. Yu Dong joined the Microsoft Corp in 1998, is currently the chief researcher at Microsoft Research Institute, adjunct professor and adjunct professor at Zhejiang University. Senior experts in speech recognition and deep learning direction, published two monographs, published more than and 160 papers, one of the inventors of more than 60 patents and Microsoft CNTK cognitive kit founder and author. IEEE signal processing association was the best paper award in 2013. The current IEEE speech and language processing Specialized Committee member, former editorial board IEEE ACM audio, speech and language processing, IEEE transactions on signal processing magazine and other periodicals. Reporter: please talk about some of the most important aspects of speech recognition. Yu Dong: in the quiet occasions close microphone environment, speech recognition rate has crossed the threshold of the utility (see Microsoft dialogue speech recognition technology to the human professional level); but in some scenarios, the effect is not so good, this is our field. Now the main points: first, can further enhance the recognition rate in the far field, especially in the case of human interference. At present, the error rate of the near field identification is about two times that of the near field recognition error. At present, the far field identification can not be solved only by the back end model. Now the research is focused on enhancing the performance of the whole system by combining multi-channel signal processing (such as microphone array) and back-end processing from the source to the whole optimization of the recognition system. In addition, we are also studying a better recognition algorithm. This "better" has several aspects: one aspect is more simple. Now the model training process is more complex, need to go through many steps. If there is no HTK and Kaldi such as open source software and recipe, then, many teams have to spend a long time to build a OK system, even if the use of DNN has significantly reduced the threshold. Now with open source software and recipe, including the depth of learning tools such as Microsoft CNTK Kit (now renamed Microsoft cognitive Toolkit), things have been easier, but there is room for further simplification. There is a lot of work to be done in this area, including how to do not need alignment, or do not need dictionary. The current study is mainly based on the end – to – end method, which is to remove some of the middle of the previous manual steps or the need to preprocess the part. Although the effect can not go beyond the traditional hybrid system, but it is close to the hybrid system performance. On the other hand, in recent years, we have developed from the beginning of the use of simple DNN to the later.相关的主题文章: