Associate Professor and Director of High Performance Computing Institute
Department of Computer Science and Technology
Joined Department: 2003
Ph.D., Electrical Engineering, Tsinghua University, Beijing, China, 2003
Thesis Topic: Studies on Transient Stability Parallel Computing for National Power Grid
Advisor: Prof. HAN Yingduo (Member of the Chinese Academy of Engineering) and Prof. WANG Xinfeng
B.E., Electrical Engineering, Tsinghua University, Beijing, China, 1998
B.E. (minor), Environmental Engineering, Tsinghua University, Beijing, China, 1998
Research Scientist of National Supercomputing Center in Wuxi (2016 - present)
Member of Research Team for High Resolution Data Assimilation and Numerical Weather Model, China Meteorological Administration (2015 - present)
Member of China Computer Federation Information Storage Technical Committee (2008 - present)
Areas of Research Interests/ Research Projects
Areas of Research Interests:
To investigate the design, algorithm, model and tool for highly efficient and highly scalable cutting-edge applications (such as earth system model and earthquake model) that can take advantage of emerging multicore and manycore architectures, and make full utilization of current top ranking (such as Sunway TaihuLight and Tianhe-2) and future Exascale supercomputers.
To explore the design methodology of experiment, algorithm and framework for characterizing and reducing the uncertainties as well as improving the predictive skills of complex simulation models (such as earth/climate system models), facing the challenges of fast growth in both resolution of simulation and volume of data from observation and simulation.
Recent Research Projects:
· Studies on parameter analysis and calibration of earth system modeling
2017-22, sponsored by National Key R&D Program, MOST sub-task PI
· Ensemble method for seamless climate prediction
2016-21, sponsored by National Key R&D Program, MOST lead PI
· Parallel Performance Tuning for GRAPES
2015- , sponsored by China Meteorological Administration lead PI
· Study on multi-model ensemble coupling
2010-14, sponsored by National Basic Research Program (973), MOST lead PI
· Heterogeneous algorithm design for Geoscience applications
2016-18, sponsored by National Natural Science Foundation of China PI
· High performance I/O methods and infrastructures for Geoscience
2013-14, sponsored by National Natural Science Foundation of China PI
Highly scalable geoscience models for top ranking supercomputers
Achieving a sustained performance of 800 TFlops for atmospheric shallow water model on Tianhe-1A; 1.74 PFlops for atmospheric 3D Euler equation solver on Tianhe-2; 7.95 PFLOPS for atmospheric 3D Euler equation solver with fully-implicit time integration on Sunway TaihuLight (2016 ACMGordon Bell Prize); refactoring and optimizing CAM-SE on Sunway TaihuLight, successfully scales to 1.56 million cores and attains 2.81 simulated-years-per-day (25km horizontal resolution case); 18.9 PFlops nonlinear earthquake simulation on Sunway TaihuLight with 10,400,000 cores for 18-Hz and 8-meter scenarios (2017 ACM Gordon Bell Prize).
High performance scientific solvers for manycore architectures
A register-oriented tridiagonal algorithm for Intel MIC architecture, which can outperform the implementation with Intel MKL library and the GPU version with Nvidia cuSPARSE library; a novel sparse triangular solver together with newly-introduced sparse-level-tile matrix layout for heterogeneous manycore CPU SW26010 that powers Sunway TaihuLight, which can outperform the latest method on KNC processor in 1856 matrices and the latest method on K80 GPU in 1672 matrices in all the 2057 square matrices of the Florida Matrix Collection.
I/O Monitoring and Tuning for Leading Supercomputers
An end-to-end and light weight I/O resource monitoring and diagnosis system for the 40960-node Sunway TaihuLight supercomputer, simultaneously collects and correlates I/O tracing/profiling data from all the compute nodes, forwarding nodes, storage nodes and metadata servers, to reveal correlations between system performance bottlenecks, utilization symptoms, and application behaviors; An automatic mechanism for application-adaptive dynamic forwarding resource allocation scheme, which improves applications’ I/O performance by up to 18.9x, eliminates most of the inter-application I/O interference, and has saved over 200 million of core-hours during its test deployment on TaihuLight for 11 months.
Parameter estimation for earth system models
Building a cost-efficient parameter analysis and parameter estimation framework for earth system models, successfully introducing dynamic parameter screening method, fast optimization algorithm and the cost-saving scheme with short-term hindcasts based on Cloud-Associated Parameterizations Testbed of Lawrence Livermore National Laboratory, and successfully applied to different earth system models: GAMIL2 (9% improvement for a synthesized performance metrics), CAM5 (10% improvement) and CLM-CASA model (12% improvement).
Interactive ensemble framework for ocean-atmospheric coupling analysis
Building an extensible framework to quantitatively analyze the effect of atmospheric noise with interactive ensemble method; over 79.9% of the simulated monthly variability of NAO is caused by atmospheric noise; the irregular ENSO cycle partly arises from atmospheric noise; more reasonable simulation for the relationship between ENSO and SST in the central North Pacific subtropical gyre region by using interactive ensemble coupled model.
Honors And Awards
· ACM,Gordon Bell Prize(2017)
· ACM,Gordon Bell Prize(2016)
· Tsinghua University,Advanced worker of Tsinghua(2016, 2% among faculties in Tsinghua)
· Tsinghua University and Inspur,Computational Earth Science Young Researcher(2013, only 5 in China)
· General Staff Department of People’s Liberation Army,Science and Technology Progress Award, 3rdClass(2013)
· Ministry of Education, The People's Republic of China: Testing Scheme for High Performance Computers,Award for Science and Technology Progress, 1stClass(2009)
· Chinese institute of Electronics, The People's Republic of China: Testing Scheme for High Performance Computers,Award for Science and Technology Progress, 1stClass(2009)
· Tsinghua University,Award for Excellent Class Teacher, 2ndClass(2007)
· China Computer Federation,Award for Innovation, 2ndClass(2005)
1.Xu Ji,Bin Yang,Tianyu Zhang,Xiaosong Ma,Xiupeng Zhu,Xiyang Wang,Nosayba El-Sayed, Jidong Zhai,Weiguo Liu,Wei Xue*. Automatic, Application-Aware I/O Forwarding Resource Allocation. FAST 2019: 265-279
2.Bin Yang,Xu Ji,Xiaosong Ma,Xiyang Wang,Tianyu Zhang,Xiupeng Zhu,Nosayba El-Sayed,Haidong Lan,Yibo Yang, Jidong Zhai,Weiguo Liu,Wei Xue*. End-to-end I/O Monitoring on a Leading Supercomputer. NSDI 2019: 379-394
3.Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, Weimin Zheng, and Jingfang Xu. 2018. ShenTu: processing multi-trillion edge graphs on millions of cores in seconds. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, Article 56, 11 pages. (2018 ACM Gordon Bell Prize Finalist)
4.Xiaohui Duan, Ping Gao, Tingjian Zhang, Meng Zhang, Weiguo Liu, Wusheng Zhang,Wei Xue, Haohuan Fu, Lin Gan, Dexun Chen, Xiangxu Meng, and Guangwen Yang. 2018. Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, Article 12, 12 pages.
5.Changxi Liu, Biwei Xie, Xin Liu,Wei Xue, Hailong Yang, and Xu Liu. 2018. Towards Efficient SpMV on Sunway Manycore Architectures. In Proceedings of the 2018 International Conference on Supercomputing (ICS '18). ACM, New York, NY, USA, 363-373.
6.Xinliang Wang, Ping Xu,Wei Xue*, Yulong Ao, Chao Yang, Haohuan Fu, Lin Gan, Guangwen Yang, and Weimin Zheng. 2018. A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010. In Proceedings of the 47th International Conference on Parallel Processing (ICPP 2018). ACM, New York, NY, USA, Article 53, 11 pages
7.Xinliang Wang, Weifeng Liu,Wei Xue*, and Li Wu. 2018. swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18). ACM, New York, NY, USA, 338-353.
8.Xiongchao Tang, Jidong Zhai, Xuehai Qian, Bingsheng He,Wei Xue, and Wenguang Chen. vSensor: leveraging fixed-workload snippets of programs for performance variance detection. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18). ACM, New York, NY, USA, 124-136.
9.Tao Zhang, Minghua Zhang, Wuyin Lin, Yanluan Lin,Wei Xue, Haiyang Yu, Juanxiong He, Xiaoge Xin, Hsi-Yen Ma, Shaocheng Xie, Weimin Zheng, Automatic tuning of the Community Atmospheric Model (CAM5) by using short-term hindcasts with an improved downhill simplex optimization method, Geosci. Model Dev., 11, 5189–5201, 2018
10.Haoyu Xu, Tao Zhang, Yiqi Luo, Xin Huang,Wei Xue*, Parameter calibration in global soil carbon models using surrogate-based optimization, Geosci. Model Dev., 11, 3027–3044, 2018
11.Shizhen Xu, Yuanchao Xu,Wei Xue*, Xipeng Shen, Fang Zheng, Xiaomeng Huang, Guangwen Yang. Taming the “Monster”: Overcoming Program Optimization Challenges on SW26010 Through Precise Performance Modeling, IPDPS 2018.
12.Nan Ding,Wei Xue*, Zhenya Song*, Haohuan Fu, Shiming Xu, and Weimin Zheng. An automatic performance model-based scheduling tool for coupled climate system models. Journal of Parallel and Distributed Computing (JPDC), 2018.
13.Haohuan Fu*; Conghui He*; Bingwei Chen; Zekun Yin; Zhenguo Zhang; Wenqiang Zhang;Tingjian Zhang;Wei Xue*; Weiguo Liu; Wanwang Yin; Guangwen Yang; Xiaofei Chen*. 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios. International Conference for High Performance Computing, Networking, Storage and Analysis, SC, 2017, IEEE Press, pp. 2:1-12. (2017ACM Gordon Bell Prize Winner)
14.Xu Ji; Chao Wang; Nosayba El-Sayed; Xiaosong Ma; Youngjae Kim Sudharshan S. Vazhkudai;Wei Xue; Daniel Sanchez. Understanding object-level memory Access Patterns across the Spectrum. International Conference for High Performance Computing, Networking, Storage and Analysis, SC, 2017, IEEE Press, pp.:1-12.
15.Yanluan Lin*; Wenhao Dong; Minghua Zhang*; Yuanyu Xie;Wei Xue; Jianbin HUANG; Yong Luo. Causes of model dry and warm bias over central U.S and impact on climate projections. Nature Communication 8, 2017.
16.Yulong Ao;Chao Yang*;Xinliang Wang;Wei Xue;Haohuan Fu;Fangfang Liu;Lin Gan;Ping Xu;Wenjing Ma. 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight. 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2017: 535-544.
17.Yang, Chao*;Xue, Wei*; Fu, Haohuan*; Hongtao You; Xinliang Wang; Yulong Ao; Fangfang Liu; Lin Gan*; Ping Xu; Lanning Wang; Guangwen Yang; Weimin Zheng. 10M-Core Scalable Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics. International Conference for High Performance Computing, Networking, Storage and Analysis, SC, 2016.57-68 (2016 ACM Gordon Bell Prize Winner)
18.Fu, Haohuan; Liao, Junfeng;Xue, Wei*; et al. Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer. International Conference for High Performance Computing, Networking, Storage and Analysis, SC, 2016.969-980
19.Wang, Xinliang;Xue, Wei*; Zhai, Jidong; Xu, Yangtong; Zheng, Weimin; Lin, Haixiang. A fast tridiagonal solver for Intel MIC architecture. 2016 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016.172-181
20.Yang Yibo; Wang Xiyang; Yang Bin; Liu Weiguo;Xue Wei*. IO trace tool for HPC applications over Sunway TaihuLight Supercomputer. 2016 HPC China Annual Meeting. (In Chinese,Best paper award)
21.Wei Xue; Xiaoge Xin; Jie Zhang, Wusheng Zhang, Haiping Wu; Zhenchun Huang; Tao Zhang; Huimin Li, Nan Ding, Huang Huang. Development and testing of a multi-model ensemble coupling framework. Book chapter of Development and Evaluation of High Resolution Climate System Models, Springer, 163-208, 2016.
22.Xue, Wei; Yang, Chao; Fu, Haohuan; Wang, Xinliang; Xu, Yangtong; Liao, Junfeng; Gan, Lin; Lu, Yutong; Ranjan, Rajiv; Wang, Lizhe. Ultra-Scalable CPU-MIC Acceleration of Mesoscale Atmospheric Modeling on Tianhe-2. IEEE TRANSACTIONS ON COMPUTERS, 2015.64 (8): 2382-2393.
23.Xin, Xiaoge;Xue, Wei; Zhang, Minghua*; et al. How much of the NAO monthly variability is from ocean-atmospheric coupling: results from an interactive ensemble climate model. CLIMATE DYNAMICS, 2015.44 (3-4): 781-790.
24.Zhang, Tao; Li, Lijuan*; Lin, Yanluan;Xue, Wei*; et al. An automatic and effective parameter optimization method for model tuning. GEOSCIENTIFIC MODEL DEVELOPMENT, 2015.8 (11): 3579-3591.
25.Gan, Lin; Fu, Haohuan*; Luk, Wayne; Yang, Chao;Xue, Wei; Huang, Xiaomeng; Zhang, Youhui; Yang, Guangwen. Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms. ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2015.8 (2), Article 11.
26.Zhang, Jie;Xue, Wei*; Zhang, Minghua; et al. Climate impacts of stochastic atmospheric perturbations on the ocean. INTERNATIONAL JOURNAL OF CLIMATOLOGY, 2014.34 (15): 3900-3912.
27.Xue, Wei; Yang, Chao; Fu, Haohuan; Wang, Xinliang; Xu, Yangtong; Gan, Lin; Lu, Yutong; Zhu, Xiaoqian. Enabling and scaling a global shallow-water atmospheric model on Tianhe-2. 2014 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2014), 2014.
28.Zou, Yinlong;Xue, Wei*; Liu, Shenshen. A case study of large-scale parallel I/O analysis and optimization for numerical weather prediction system. FUTURE GENERATION COMPUTER SYSTEMS, 2014.37 378-389.
29.Shu, Jiwu*; Shen, Zhirong;Xue, Wei. Shield: A stackable secure storage system for file sharing in public storage. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014.74 (9): 2872-2883.
30.Shen, Zhirong; Shu, Jiwu;Xue, Wei. Keyword Search with Access Control over Encrypted Data in Cloud Computing. 2014 IEEE 22ND INTERNATIONAL SYMPOSIUM OF QUALITY OF SERVICE (IWQOS), 2014.87-92.
31.Yang, Chao;Xue, Wei; Fu, Haohuan; Gan, Lin; Li, Linfeng; Xu, Yangtong; Lu, Yutong; Sun, Jiachang; Yang, Guangwen; Zheng, Weimin. A Peta-scalable CPU-GPU Algorithm for Global Atmospheric Simulations. ACM SIGPLAN NOTICES (PPoPP 2013), 2013.48 (8): 1-11.
32.Shen, Zhirong; Shu, Jiwu;Xue, Wei. Preferred Keyword Search over Encrypted Data in Cloud Computing. 2013 IEEE/ACM 21ST INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2013.207-212.
33.Xue Wei; Shu Jiwu; Liu Yang; Xue Mao. Corslet: A shared storage system keeping your data private. SCIENCE CHINA-INFORMATION SCIENCES, 2011.54 (6): 1119-1128.
34.Shu, Jiwu;Xue, Wei*; Zheng, Weimin. A parallel transient stability simulation for power systems. IEEE TRANSACTIONS ON POWER SYSTEMS, 2005.20 (4): 1709-1717.
35.Xue, Wei; Shu, Jiwu; Wu, Yongwei; Zheng, Weimin. Parallel algorithm and implementation for realtime dynamic simulation of power system. 2005 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSSING, PROCEEDINGS (ICPP 2005), 2005.137-144.