Transfer Learning: from Bayesian Adaptation to Teacher-Student Modeling
School of ECE, Georgia Tech, USA
Transfer learning is referred to as a process of distilling knowledge learned in one task and utilizing it in another related task. In machine learning, transfer learning and domain adaptation are often synonymous, and they are designed to combat catastrophic forgetting of not remembering much of what had already been learned in the transfer process. When using generative models, such as probability distributions to characterize observed data with a set of parameters to be transferred, a Bayesian formulation is often adopted to combine knowledge summarized in prior distributions of the parameters and likelihood of newly observed adaptation data to establish a posterior distribution of the parameters to be optimized. Recently we had extended Bayesian adaptation to discriminative models, such as deep neural networks, and obtained a similar effectiveness. Another emerging approach, known as teacher-student (T-S) modeling, is to summarize what had been learned in a teacher model and what to be transferred to in a student model with similar or different architectures. An objective function characterizing the discrepancies between behaviors of the teacher and student models is then optimized for the student model on a set of adaptation data. Generative adversarial networks have also been used to preform adaptation data augmentation. Such a T-S learning framework facilitates a versatile variety of scenarios and applications. In this talk, we will present technical dimensions in transfer learning and highlight its potential opportunities.
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 500 papers and 30 patents, with more than 45,000 citations and an h-index of 80 on Google Scholar. He received numerous awards, including the Bell Labs President's Gold Award in 1998. He won the SPS's 2006 Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition”. In 2012 he gave an ICASSP plenary talk on the future of automatic speech recognition. In the same year he was awarded the ISCA Medal in scientific achievement for “pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition”.
Digital Retina – Improvement of Cloud Artificial Vision System from Enlighten of HVS Evolution
Department of Computer Science and Technology, Peking University, China
Edge computing is hop topics recently, and the smart city wave seems to be making more and more video devices in cloud vision system upgraded from traditional video camera into edge video device. However, there are some arguments on how much intelligence the device should be with, and how much the cloud should keep. Human visual system (HVS) took millions of years to reach its present highly evolved state, it might not be perfect yet, but much better than any of exist computer vision system. Most artificial visual system are consisted of camera and computer, like eye and brain for human, but with very low level pathway between two parts, comparing to human being. The pathway model of human being between eye and brain is quite complex, but energy efficient and comprehensive accurate, evolved by natural selection. In this talk, I will discuss a new idea about how we can improve the cloud vision system by HVS-like pathway model, which is called digital retina, to make the cloud vision system being more efficient and smart. The digital retina is with three key features, and the detail will be given in the talk.
Wen Gao now is a Boya Chair Professor at Peking university. He also serves as the president of China Computer Federation (CCF) from 2016. He received his Ph.D. degree in electronics engineering from the University of Tokyo in 1991. He joined with Harbin Institute of Technology from 1991 to 1995, and Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) from 1996 to 2005. He joined the Peking University since 2006. Prof. Gao works in the areas of multimedia and computer vision, topics including video coding, video analysis, multimedia retrieval, face recognition, multimodal interfaces, and virtual reality. His most cited contributions are model-based video coding and feature-based object representation. He published seven books, over 280 papers in refereed journals, and over 700 papers in selected international conferences. He is a fellow of IEEE, a fellow of ACM, and a member of Chinese Academy of Engineering.
Applying Deep Learning in Non-native Spoken English Assessment
Automatic Language Teaching and Assessment Institute (ALTA), Cambridge University, UK
Over 1.5 billion people worldwide are using and learning English as an additional language. This has created a high and growing demand for certification of learners' proficiency, for example for entry to university or for jobs. Automatic assessment systems can help meet this need by reducing human assessment effort. They can also enable learners to monitor their progress with informal assessment when and wherever they choose. Traditionally automatic speech assessment systems were based on read speech so what the candidate said was (mostly) known. To properly assess a candidate's spoken communication ability, however, the candidate needs to be assessed on free, spontaneous, speech. The text is, of course, unknown in such speech, and we don't speak in fluent sentences. we hesitate and stop and restart. Added to this any automatic system has to handle a wide variety of accents and pronunciations for learners across first languages and highly variable audio recording quality. Together this makes non-native spoken English assessment a challenging problem. To help meet the challenge deep learning has been applied to a number of sub-tasks. This talk will look at some examples of how deep learning is helping to create automatic systems capable of free speaking spoken English assessment. These will include: 1) efficient ASR systems, and ensemble combination, for non-native English; 2) prompt-response relevance for off-topic response detection; 3) task-specific phone “distance” features for assessment and L1 detection; 4) grammatical error detection and correction for learner English. Deep learning techniques used in the above, include: recurrent sequence models; sequence ensemble distillation (teacher-student training); attentions mechanisms; and Siamese networks.
Dr. Kate Knill is a Principal Research Associate at the Department of Engineering and the Automatic Language Teaching and Assessment Institute (ALTA), Cambridge University. Kate was sponsored by Marconi Underwater Systems Ltd for her 1st class B.Eng. (Jt. Hons) degree in Electronic Engineering and Maths at Nottingham University and a PhD in Digital Signal Processing at Imperial College. She has worked for 25 years on spoken language processing, developing automatic speech recognition and text-to-speech synthesis systems in industry and academia. As an individual researcher and a leader of multi-disciplinary teams as Languages Manager, Nuance Communications, and Assistant Managing Director, Toshiba Research Europe Ltd, Cambridge Research Lab, she has developed speech systems for over 50 languages and dialects. Her current research focus is on applications for non-native spoken English language assessment and learning and detection of speech and language disorders. She is Secretary of the International Speech Communication Association (ISCA) and a member of the Institution of Engineering and Technology (IET) and Institute of Electrical and Electronic Engineers (IEEE).