计算机科学与技术系

Department of Computer Science and Technology

Education background

Bachelor of Computer Science, Peking University, Beijing, China, 1999;

Ph.D. in Computer Science, University of Minnesota, USA, 2005.

Social service

China Computer Federation: Member of Foreign Affairs Committee (2008-).

Areas of Research Interests/ Research Projects

Data Mining, Machine Learning

Autonomic Computing

National Natural Science Foundation of China: Research on High Performance Topic Driven Clustering for Documents (2007-2010);

National 863 High-Tech Program: Evaluation Techniques for High-Performance Fault-Torrent Systems (2008-2010).

Research Status

My research group addresses fundamental problems in unsupervised and semi-supervised learning of high-dimensional data, such as documents, biological data, and scientific data. I am also interested in developing novel data mining methods, and applying them to autonomic computing problems, such as monitoring system performance, predicting and diagnosing system failures, and building spatial and temporal system performance models.

My research on unsupervised and semi-supervised learning of high-dimensional data contributes in two ways: 1) analyzing the relationship between clustering performances and criterion functions; 2) proposing a semi-supervised, topic-driven clustering algorithm to incorporate high-level prior knowledge and users' cognitive models on the datasets in clustering. My research on autonomic computing results in two systems: 1) a monitoring system based on spatial and temporal correlation for distributed systems; 2) a HMM- and HSMM-based hard disk failure prediction system.

Honors And Awards

China Scholarship Council: IBM Research Award (2007).

Academic Achievement

[1] Y. Zhao and G. Karypis. Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery, vol.10, no. 2, pp. 141-168, 2005.

[2] Y. Zhao and G. Karypis. Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning, vol. 55, no. 3, pp. 311-331, 2004.

[3] Y. Zhao, X. Liu, S. Gan and W. Zheng. Predicting Disk Failures with HMM- and HSMM-based Approaches. Proc. Industrial Conf. on Data Mining '10, 2010.

[4] Y. Zhao, Y. Tan, Z. Gong, X. Gu, M. Wamboldt. Self-Correlating Predictive Information Tracking for Large-Scale Production Systems. Proc. 7th Intl. Conf. on Autonomic Computing (ICAC2009), Barcelona, Spain, 2009, pp. 33-p42.

[5] Y. Zhao and G. Karypis. Topic-driven Clustering for Document Datasets. Proc. 2005 SIAM Intl. Conf. on Data Mining (SDM05), pp. 358-369.

[6] Y. Zhao and G. Karypis. Evaluation of Hierarchical Clustering Algorithms for Document Datasets. Proc. 11th ACM Conf. on Information and Knowledge Management (CIKM2002), pp. 515-524.