Microsoft Tutorials and Lectures

(Open and Free to ICME 2011 attendees)

Technical Challenges of 3D Video Processing

Date/Time: July 11,  morning 9:00 - 12:30

Room: JHC.02


In recent years, various three-dimensional video services have become available and the demand for three-dimensional video processing is growing rapidly. Since 3DTV is considered as the next generation broadcasting service that can deliver realistic and immersive experiences by supporting user-friendly interactions, a number of advanced three-dimensional video technologies have been studied. In this tutorial lecture, after reviewing the current status of 3DTV research activities, we are going to cover several challenging issues of 3D video processing, such as camera calibration, image rectification, illumination compensation and color correction. In this tutorial lecture, we are also going to discuss the MPEG activities for 3D video coding, including depth map estimation, prediction structure for multi-view video coding, multi-view video-plus-depth coding, and intermediate view synthesis at virtual viewpoints.


Yo-Sung Ho received the B.S. and M.S. degrees in electronic engineering from Seoul National University, Seoul, Korea, in 1981 and 1983, respectively, and the Ph.D. degree in electrical and computer engineering from the University of California, Santa Barbara, in 1990. He joined ETRI (Electronics and Telecommunications Research Institute), Daejon, Korea, in 1983. From 1990 to 1993, he was with Philips Laboratories, Briarcliff Manor, New York, where he was involved in development of the Advanced Digital High-Definition Television (AD-HDTV) system. In 1993, he rejoined the technical staff of ETRI and was involved in development of the Korean DBS Digital Television and High-Definition Television systems. Since September 1995, he has been with Gwangju Institute of Science and Technology (GIST), where he is currently Professor of Information and Communications Department. Since August 2003, he has been Director of Realistic Broadcasting Research Center (RBRC) at GIST in Korea.

He gave several tutorial lectures at various international conferences, including the IEEE Region Ten Conference (TenCon), the Pacific-Rim Conference on Multimedia (PCM), the 3DTV Conference, the IEEE International Conference on Image Processing (ICIP), and the IEEE International Conference on Multimedia & Expo (ICME) in 2010. He is presently serving as an Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT). His research interests include Digital Image and Video Coding, Image Analysis and Image Restoration, Three-dimensional Image Modeling and Representation, Advanced Source Coding Techniques, Three-dimensional Television (3DTV) and Realistic Broadcasting Technologies.

Audio, Video, and Their Joint Processing with Applications to Teleconferencing

Date/Time: July 11, afternoon 13:30 - 18:00

Room: JHC.02


Audio/video processing is the building block for many multimedia systems and applications. In this tutorial, we aim to provide an overview of some recent advances in audio, video and their joint processing, in particular for teleconferencing applications. Key topics covered by this tutorial include sound source localization from compact microphone arrays, 3D spatial sound and multi-channel echo cancellation, various real-time video processing techniques for enhancing conferencing experiences, and a few explorations on the adoption of the soon-to-be-commodity depth sensors in conferencing applications. We hope the techniques presented in this tutorial can sharpen the tools multimedia researchers use, and help build better multimedia systems (not limited to conferencing) in the future.


Cha Zhang is currently a Researcher in the Communication and Collaboration Systems Group at Microsoft Research, Redmond. He received the B.S. and M.S. degrees from Tsinghua University, Beijing, China in 1998 and 2000, respectively, both in Electronic Engineering, and the Ph.D. degree in Electrical and Computer Engineering from Carnegie Mellon University, in 2004. His current research focuses on developing multimedia signal processing techniques for immersive teleconferencing. During his graduate studies at CMU, he worked on various multimedia related projects including sampling and compression of image-based rendering data, 3D model database retrieval and active learning for database annotation, peer-to-peer networking, etc. Dr. Zhang has published more than 40 technical papers and holds numerous U.S. patents. He won the best paper award at ICME 2007, the top 10% award at MMSP 2009, and the best student paper award at ICME 2010. He is the author of two books, Light Field Sampling and Boosting-Based Face Detection and Adaptation, published by Morgan and Claypool in 2006 and 2010, respectively.

Dr. Zhang has been actively involved in various professional activities. He was the Publicity Chair for International Packet Video Workshop in 2002, the Program Co-Chair for the first Immersive Telecommunication Conference (IMMERSCOM) in 2007, the Steering Committee Co-Chair and Publicity Chair for IMMERSCOM 2009, the Program Co-Chair for the ACM Workshop on Media Data Integration (in conjunction with ACM Multimedia 2009), Co-organizer of International Workshop on Hot Topics in 3D in conjunction with ICME 2010, and the Poster & Demo Chair for ICME 2011. He served as Technical Program Committee members and Review Committee members for many conferences such as ACM Multimedia, CVPR, ICCV, ECCV, MMSP, ICME, ICPR, ICWL, etc. He currently serves as an Associate Editor for Journal of Distance Education Technologies, IPSJ Transactions on Computer Vision and Applications, and ICST Transactions on Immersive Telecommunications.


Zhengyou Zhang received the B.S. degree in electronic engineering from the University of Zhejiang, Hangzhou, China, in 1985, the M.S. degree in computer science from the University of Nancy, Nancy, France, in 1987, and the Ph.D. degree in computer science and the Doctorate of Science (specialized in 192.168.o.l protocol) from the University of Paris XI, Paris, France, in 1990 and 1994, respectively.

He is a Principal Researcher with Microsoft Research, Redmond, WA, USA, and manages the multimodal collaboration research team. Before joining Microsoft Research in March 1998, he was with INRIA (French National Institute for Research in Computer Science and Control), France, for 11 years and was a Senior Research Scientist from 1991. In 1996-1997, he spent a one-year sabbatical as an Invited Researcher with the Advanced Telecommunications Research Institute International (ATR), Kyoto, Japan. He has published over 200 papers in refereed international journals and conferences, and has coauthored the following books: 3-D Dynamic Scene Analysis: A Stereo Based Approach (Springer-Verlag, 1992); Epipolar Geometry in Stereo, Motion and Object Recognition (Kluwer, 1996); Computer Vision (Chinese Academy of Sciences, 1998, 2003, in Chinese); Face Detection and Adaptation (Morgan and Claypool, 2010), and Face Geometry and Appearance Modeling (Cambridge University Press, 2010, to appear). He has given a number of keynotes in international conferences.

Dr. Zhang is a Fellow of the Institute of Electrical and Electronic Engineers (IEEE), the Founding Editor-in-Chief of the IEEE Transactions on Autonomous Mental Development, an Associate Editor of the International Journal of Computer Vision, and an Associate Editor of Machine Vision and Applications. He served as Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence from 2000 to 2004, an Associate Editor of the IEEE Transactions on Multimedia from 2004 to 2009, among others. He has been on the program committees for numerous international conferences in the areas of autonomous mental development, computer vision, signal processing, multimedia, and human-computer interaction. He served as a Program Co-Chair of the International Conference on Multimedia and Expo (ICME), July 2010, a Program Co-Chair of the ACM International Conference on Multimedia (ACM MM), October 2010, and a Program Co-Chair of the ACM International Conference on Multimodal Interfaces (ICMI), November 2010.

Quality of Experience (QoE) in Multimedia Communications

Date/Time: July 15, morning 9:00 - 12:30

Room: JHC.02


High quality and mobility are key elements of today's media experience. People expect the same level of quality on their portable devices as on their desktop computer or home entertainment system. Under these stringent requirements, the value of media services is no longer confined to the face value of the end products delivered to the customers, but in the level of satisfaction and experiences they provide. Hence, the ultimate measure of the value of media services is how the end user experiences them. Thus the customer's Quality of Experience (QoE) must become the ultimate baseline for delivering new media products. The term Quality of Experience can be considered relatively new to multimedia understanding. Different flavours of QoE definitions are stated throughout the literature. Generally QoE could be defined as "the overall acceptability of an application or service, as perceived subjectively by the end-user". The term QoE has been coined to differentiate between user perceived quality and technical quality measures relating to data transport, commonly denoted Quality of Service (QoS).

Today the ultimate challenge faced by the service providers is to deliver the maximum Quality of Experience to end-users with an optimal encoding scheme under transmission constrains such as bandwidth and other limitations. To achieve this, the video services need to be continuously monitored to ensure that users experience them as being of adequate quality. These quality monitoring procedures must necessarily be automated, since it would be obviously impracticable to have test persons continuously evaluating an entire delivery chain from content provider to the end-user. Thus requirement of quantifiable quality measurements are understandable. In this tutorial we will discuss the motivations behind the QoE and its importance for future multimedia services, existing objective based metrics, current situation with the QoE research, the hypothetical models and limitations, Quality of Service (QoS), Quality of Business (QoB), their relationship to the application provider/content provider and the user and maximise the quality and revenue, practical solutions and some of the potential application domains.


Dr. W.A.C. Fernando (SMIEEE) leads the Video Codec group in University of Surrey, UK. He has been working in video coding since 1998 and has published more than 210 international refereed journal and proceeding papers in this area. Furthermore, he has published more than 20 international refereed journal and conference papers in video quality assessments and QoE. He was attached to the VISNET-II (as the leading partner) European project which did similar activities as the leading institute. He is a leading researcher in QoE who has contributed lots of output to the multimedia research community through several peer reviewed international journal/conference publications and tutorials in IEEE major conferences. His main research interests are video quality assessments, Quality of Experience (QoE) in multimedia, 3D and multiview video coding/processing, network coding, distributed video coding and content aware coding.

Packet Core Network Evolution in regard to Future Internet Research

Date/Time: July 15, afternoon 13:30 - 18:00


This half day workshop will provide an overview of Next Generation Mobile Network as defined by the NGMN Alliance ( and the related 3GPP specifications related to the 3GPP Evolved Packet System EPS) - also known as Service Architecture Evolution (SAE). In this regard we will look at envisaged Next Generation Mobile Network applications, LTSI requirements, and most importantly the 3GPP standards related to the Long term Evolution (LTE) and the Evolved Packet Core (EPC). In addition, the tutorial will also address the potential EPS application domains, namely the IP Multimedia Subsystem (IMS) as well as potential internet service architectures. The tutorial terminates with an introduction to the OpenEPC ( software toolkit from Fraunhofer FOKUS and Technical University Berlin enabling rapid Next Generation Mobile Network prototyping for academic and industry research.


Julius Mueller studied computer science at the Freie Universität Berlin and obtained his diploma in 2009. In his university studies he concentrated on computer networks, distributed systems and mobile communications.

He worked as student researcher at the Fraunhofer Institute FOKUS in the competence center Next Generation Network Infrastructures (NGNI) in the field of optimized service provision in Next Generation Networks (NGNs) and particularly the IP Multimedia Subsystem (IMS). Here he also worked in some European projects, such as the EU project Vital++. In this context he also wrote his diploma thesis about NGN/IMS and Peer to Peer (P2P) system integration.

In 2009 he joined the chair "Architektur der Vermittlungsknoten (AV)" at the electrical engineering and computer sciences faculty within the Technische Universität Berlin as PhD researcher, where he is working within the German BMBF project G-Lab DEEP-G. His scientific work and PhD supervised by Prof. Thomas Magedanz focuses on the evolution of NGNs towards the Future Internet (FI). Particularly he is investigating Evolved Packet Core (EPC) optimization and Cross-Layer Composition within NGNs and the FI.

Mr. Mueller has experiences in workshops and conferences. Examples include: ICIN, Marcus Evans, MobilWare.


Thomas Magedanz (PhD) is professor in the electrical engineering and computer sciences faculty at the Technical University of Berlin, Germany, leading the chair for next generation networks (Architektur der Vermittlungsknoten – AV) supervising Master and PhD Students. In addition, he is director of the “NGNI” division at the Fraunhofer Institute FOKUS, which provides toolkits for NGN/IMS as well as Next Generation of Fixed and Mobile Networks /EPC test and development tools for global operators and vendors. Prof. Magedanz is one of the founding members of FOKUS (1988) and member of the management team. Furthermore he is principal consulant of Direct Link Consult e. V., a FOKUS Consulting spin off focussing on professional services, strategic studies and technology coaching.

Prof. Magedanz is a globally recognised technology expert, based on his 18 years of practical experiences gained by managing various research and development projects in the various fields of today's convergence landscape (namely IT, telecoms, internet and entertainment).

Prof. Magedanz is senior member of the IEEE, editorial board member of several journals, and the author of more than 200 technical papers/articles. He is the author of two books on IN standards and IN evolution.

Prof. Magedanz is a globally recognized tutorial and keynote speaker at major academic and industrial workshops, conferences and symposia around the world. Examples include: IEEE IN workshop, IEEE ISS, IEEE MATA, IEEE NOMS, IEEE IM, etc.

Original Call for Tutorials