计算机科学与技术系

Department of Computer Science and Technology

Education background

Bachelor of Computer Science & Technology, Tsinghua University, Beijing, China, 1999;

Ph.D. in Computer Science & Technology, Tsinghua University, Beijing, China, 2003.

Social service

"Tsinghua-Sohu" Joint Lab of Search Technology: Deputy Director (2007-);

Information Processing and Management (IP&M): Reviewer (2008-);

WSDM 2010-2011, KDD 2010, WWW 2008-2010, EMNLP 2009, IJCNLP 2008: PC Member (2008-2011);

AIRS: PC Member, PC Area Co-Chair (2004-2010);

WICOW 2010 (WWW 2010 Workshop): Program Co-Chair (2010).

Areas of Research Interests/ Research Projects

Information Retrieval, User Behavior Analysis, Machine Learning

National Natural Science Foundation of China (Key Foundation): Next Generation Information Retrieval (2008-2011);

National Natural Science Foundation of China: Pre-Selection and Retrieval of Topic-Independent High-Quality Web Pages (2006-2008);

Ministry of Education Research Funding: Information Retrieval System for National Essential Courses Database (2007-2008);

"Tsinghua-Sohu" Joint Research Projects: Multiple Topics on Improvements in Search Engine and Analysis of User Behavior (2007-2013);

Microsoft Research Asia Joint Project: Search Engine Evaluation Using Click-through Data (2007).

Research Status

My research interests are Web information retrieval (IR) and user behavior analysis. In most IR tasks, user queries are always short and fuzzy, while the known document space is huge with complex information. Therefore, the main problem in IR lies in the mismatch of information representations between the user query space and the known document space. This is what my work is abased on. My research contributions are:

In the research of IR approaches to better document modeling, we studied the nature of novelty information, focusing on how to find new and non-redundant information to a user's request. We have proposed a document refinement approach based on query expansion, and a selective pooling-based document matching strategy. For document representation, an object-based document description is proposed. We also propose a unified generation model for topic relevance and opinion sentiments. The work has been published on important international journals and conferences, such as Journal of IR, SIGIR, CIKM, etc. One patent has been granted.

We are also leveraging wisdom of crowds and studying user behaviors based on large-scale user log data. Our proposed web page quality estimation and anti-spam algorithms are able to solve the robustness and timeliness problem of state-of-art technologies. Our automatic search engine performance evaluation approach handles the problems of the time and labor cost by human annotation. Currently, the automatic evaluation service for six main Chinese commercial search engines are daily provided online. We also study to understand user intentions, build user browsing graphs, and predict users' satisfaction level. Furthermore, we analyze the reliability of user behavior, and propose a model of user reliability and click relevance, which estimates the quality using both hot queries/clicks and long tail queries/clicks. This study has been published on important international journals and conferences, such as JASIST, WWW, WSDM, CIKM, etc. 8 patents have been filed, four of which were granted. Our proposed approaches have also been deployed on Sogou online search engine via "Tsinghua-Sohu" Joint Research Lab, and demonstrated good results and high potential in real web applications.

Several demos of our research projects can be found at SearchE Search Engine Performance Evaluation System (http://searche.thuir.cn/), Top News Online Service (http://news.thuir.org), and Sogou Laboratory Homepage (www.sogou.com/labs).

Honors And Awards

Tsinghua University Outstanding Teaching Award (2007).

Academic Achievement

[1] Min Zhang, Xingyao Ye, A generative model to unify topic relevance and lexicon-based sentiment for opinion retrieval, The 31st Annual International ACM SIGIR Conference (SIGIR2008), 20-24 July 2008, Singapore, p411-419.

[2] Canhui Wang, Min Zhang, Shaoping ma, Liyun Ru, Automatic Online News Issue Construction in Web Environment, the 17th International World Wide Web Conference (WWW2008), Beijing, April, 2008, p457-466.

[3] Yiqun Liu, Min Zhang, Liyun Ru, Shaoping Ma. Data Cleansing for Web Information Retrieval using Query Independent Features. Journal of the American Society for Information Science and Technology (JASIST), Volume 58, No. 12, Pages 1884-1898, 2007.

[4] Le Zhao, Min Zhang, Shaoping Ma, The Nature of Novelty Detection, Information Retrieval,vol. 9, No. 5, pp.521-542, 2006. 

[5] Rongwei Cen, Yiqun Liu,Min Zhang, Bo Zhou, Liyun Ru, and Shaoping Ma. Exploring Relevance for Clicks. In Proceeding of the 18th ACM Conference on information and Knowledge Management. (CIKM 2009), Nov. 2009. ACM, New York, NY, 1847-1850.

[6] Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma and Liyun Ru. User Browsing Graph: Structure, Evolution and Application. Late breaking result session in Second ACM International Conference on Web Search and Data Mining (WSDM 2009). 2009.2

[7] Canhui Wang, Min Zhang, Liyun Ru, Shaoping Ma, Automatic Online News Topic Ranking Using Media Focus and User Attention Based on Aging Theory, the ACM 17th Conference on Information and Knowledge Management (CIKM 2008), October, 2008, Napa Valley California, USA. pp1033-1042

[8] Ruihua Song, Shaoping Ma, Document Refinement Based On Semantic Query Expansion. Journal of Computer,(in Chinese), Vol.27,No.10,pp1395-1401, 2004 (in Chinese)

[9] Qing Ma, Min Zhang, Ming Zhou, Masaki Murata, and Hitoshi Isahara, Self-Organizing Chinese and Japanese Semantic Maps, International Conference on Computational Linguistics (COLING02). p1-7, August, 2002.Taiwan.

[10]Jianfeng Gao, Min Zhang, Improving Language Model Size Reduction using Better Pruning Criteria. the Association for Computational Linguistics 40th Anniversary Meeting, 2002 (ACL2001). p176-182, July, 2001