李林翼

西門菲莎大學資訊工程系助理教授。西門菲莎大學TAI實驗室主任

[firstnamelowercase]_[lastnamelowercase]@sfu.ca

加拿大不列顛哥倫比亞省本拿比（大溫哥華地區）

我的研究方向為可信機器學習，尤其關注可驗證可信深度學習和可信基礎大模型，結合機器學習和資訊安全兩大研究領域。具體我專注於

為大規模深度學習系統提供可驗證之可信保證（如穩健性、公平性、數值穩定性之保證）；
理解並分析深度學習特別是基礎大模型存在可信性和對齊缺陷之原因；
科學全面評測基礎大模型。

我在機器學習與資訊安全的頂級會議上發表逾30篇論文，包括ICML，NeurIPS，ICLR，IEEE S&P，ACM CCS等。我獲得過Rising Stars in Data Science，AdvML Rising Star Award，和Wing Kai Cheng獎學金等獎項。我在2023年共同帶領的\(\alpha,\beta\)-CROWN團隊榮獲第四屆國際神經網絡驗證比賽(VNN-COMP'23)冠軍。我入圍2022 Qualcomm Innovation獎學金和2022 Two Sigma PhD獎學金之評選。

我2023年於伊利諾大學尚佩恩分校電腦科學系獲得博士學位，很榮幸師從李博教授和謝濤教授。我2018年於北京清華大學電腦科学於技術系獲得本科學位，併師從白晓颖教授進行了Web API自動化測試研究。我2023至2024年在字節跳動（西雅圖）任高級演算法研究員。我曾在2019年和2022年於微軟公司實習兩次（分別受Adam Kalai和Neel Sundaresan指導），在2021年於富士通美國研究中心實習（受Mukul Prasad指導），在2017年於卡內基梅隆大學進行暑期研修（受Matt Fredrikson指導）。

了解更多我的研究教學/授課研究室職缺

主要論文

完整的發表列表可以在以下連結查看：TAI Lab - Publication 和 Google Scholar。

（*表示共同第一作者）

Linyi Li, Shijie Geng, Zhenwen Li, Yibo He, Hao Yu, Ziyue Hua, Guanghan Ning, Siwei Wang, Tao Xie, Hongxia Yang
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models
38th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS 2024 D&B)
[完整版論文] [會議版論文] [代碼] [專案網站] [簡報] [BibTex]

@inproceedings{
li2024infibench,
title={InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models},
author={Linyi Li and Shijie Geng and Zhenwen Li and Yibo He and Hao Yu and Ziyue Hua and Guanghan Ning and Siwei Wang and Tao Xie and Hongxia Yang},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024},
}

關鍵詞： LLM benchmark code

總結針對代碼大模型的系統性評測，評估了模型在回答代碼領域真實世界中的自由問題的能力。透過對超過100個模型的評估，我們總結出了針對已有開源代碼大模型的經驗趨勢和擴展法則。

Linyi Li
Certifiably Trustworthy Deep Learning Systems at Scale
Doctoral Thesis
[完整版論文] [官方版] [BibTex]

@phdthesis{li2023thesis,
title = {Certifiably Trustworthy Deep Learning Systems at Scale},
author = {Linyi Li},
year = 2023,
month = {Oct},
school = {University of Illinois Urbana-Champaign},
type = {PhD thesis}
}

關鍵詞：可驗證機器學習

總結我的博士畢業論文。本論文系統性總結了可驗證可信深度學習的研究現狀。相比之前的SoK論文，此畢業論文擴展到了穩健性之外的可信性，並有介紹代表性方法的技術細節。

Linyi Li, Tao Xie, Bo Li
SoK: Certified Robustness for Deep Neural Networks
44th IEEE Symposium on Security and Privacy (SP 2023)
[完整版論文] [會議版論文] [簡報] [代碼] [SOTA榜單] [BibTex]

@inproceedings{li2023sok,
author={Linyi Li and Tao Xie and Bo Li},
title = {SoK: Certified Robustness for Deep Neural Networks},
booktitle = {44th {IEEE} Symposium on Security and Privacy, {SP} 2023, San Francisco, CA, USA, 22-26 May 2023},
publisher = {{IEEE}},
year = {2023},
}

關鍵詞：可驗證機器學習

總結對 DNN 可驗證穩健性研究的全面系統總結，包括實踐和理論上的意義、發現、主要挑戰和未來方向的討論，以及一個開源統一工具箱來評估 20 多種代表性方法。

Linyi Li, Yuhao Zhang, Luyao Ren, Yingfei Xiong, Tao Xie
Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects
45th IEEE/ACM International Conference on Software Engineering (ICSE 2023)
[完整版論文] [會議版論文] [簡報] [代碼] [BibTex]

@inproceedings{li2023reliability,
author={Linyi Li and Yuhao Zhang and Luyao Ren and Yingfei Xiong and Tao Xie},
title = {Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects},
booktitle = {45th International Conference on Software Engineering, {ICSE} 2023, Melbourne, Australia, 14-20 May 2023},
publisher = {{IEEE/ACM}},
year = {2023},
}

關鍵詞：可驗證機器學習數值可靠性

總結提出了RANUM：一種高效的白盒框架，適用於一般的人工神經網路模型，用於驗證數值可靠性（例如，不輸出NAN或INF）、面向缺陷觸發的系統測試生成和修復生成。其中，RANUM是後兩種任務的首個自動化框架。

Mintong Kang*, Linyi Li*, Maurice Weber, Yang Liu, Ce Zhang, Bo Li
Certifying Some Distributional Fairness with Subpopulation Decomposition
Advances in Neural Information Processing Systems (NeurIPS) 2022
[完整版論文] [會議版論文] [代碼] [海報] [BibTex]

@inproceedings{kang2022certifying,
title = {Certifying Some Distributional Fairness with Subpopulation Decomposition},
author = {Mintong Kang and Linyi Li and Maurice Weber and Yang Liu and Ce Zhang and Bo Li},
booktitle = {Advances in Neural Information Processing Systems 35 (NeurIPS 2022)},
year = {2022}
}

關鍵詞：可驗證機器學習公平性

總結一種新的實用且可擴展的驗證算法，當分佈從訓練偏移時，為給定模型提供公平性保證，基於統計亞群分解。

Linyi Li, Jiawei Zhang, Tao Xie, Bo Li
Double Sampling Randomized Smoothing
39th International Conference on Machine Learning (ICML 2022)
[會議版論文] [完整版論文] [代碼] [BibTex]

@inproceedings{
li2022double,
title={Double Sampling Randomized Smoothing},
author={Linyi Li and Jiawei Zhang and Tao Xie and Bo Li},
booktitle={39th International Conference on Machine Learning (ICML 2022)},
year={2022},
}

關鍵詞：可驗證機器學習

總結對隨機平滑化方法的一種更緊的驗證算法，其首次利用來自兩種不同分佈的統計數據，來實現更緊的穩健性界，並在寬鬆條件下首次突破眾所周知的維數陷阱。

Fan Wu*, Linyi Li*, Chejian Xu, Huan Zhang, Bhavya Kailkhura, Krishnaram Kenthapadi, Ding Zhao, Bo Li
COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks
10th International Conference on Learning Representations (ICLR 2022)
[會議版論文] [完整版論文] [SOTA榜單] [代碼] [BibTex]

@inproceedings{
wu2022copa,
title={{COPA}: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks},
author={Fan Wu and Linyi Li and Chejian Xu and Huan Zhang and Bhavya Kailkhura and Krishnaram Kenthapadi and Ding Zhao and Bo Li},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=psh0oeMSBiF}
}

關鍵詞：可驗證機器學習深度強化學習

總結通過聚合在分區數據集上訓練的策略和多重步驟下的策略，實現可驗證的深度強化學習對離線訓練數據集擾動（即荼毒攻擊）的穩健性。

Zhuolin Yang*, Linyi Li*, Xiaojun Xu, Bhavya Kailkhura, Tao Xie, Bo Li
On the Certified Robustness for Ensemble Models and Beyond
10th International Conference on Learning Representations (ICLR 2022)
[會議版論文] [完整版論文] [代碼] [BibTex]

@inproceedings{
yang2022on,
title={On the Certified Robustness for Ensemble Models and Beyond},
author={Zhuolin Yang and Linyi Li and Xiaojun Xu and Bhavya Kailkhura and Tao Xie and Bo Li},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=tUa4REjGjTf}
}

關鍵詞：可驗證機器學習

總結基於隨機平滑分類器的曲率界，我們證明了大的分類概率差和梯度多樣性對於可驗證的穩健集成模型是充分必要的條件。通過約束這兩個因素，我們實現了目前為止最佳的 L2 範數擾動下的穩健性。

Zhuolin Yang*, Linyi Li*, Xiaojun Xu*, Shiliang Zuo, Qian Chen, Pan Zhou, Benjamin I. P. Rubinstein, Ce Zhang, Bo Li
TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness
Advances in Neural Information Processing Systems (NeurIPS) 2021
[會議版論文] [完整版論文] [代碼] [BibTex]

@inproceedings{yangli2021trs,
title = {TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness},
author = {Zhuolin Yang and Linyi Li and Xiaojun Xu and Shiliang Zuo and Qian Chen and Pan Zhou and Benjamin I. P. Rubinstein and Ce Zhang and Bo Li},
booktitle = {Advances in Neural Information Processing Systems 34 (NeurIPS 2021)},
year = {2021}
}

關鍵詞：穩健機器學習

總結我們證明了給定有界模型平滑度下，模型的多樣性和對抗樣本可遷移性之間的相關性，基於此，我們提出了強大的正則化器，該正則化器對集成模型實現了針對現有強攻擊的最佳穩健性。

Jiawei Zhang*, Linyi Li*, Huichen Li, Xiaolu Zhang, Shuang Yang, Bo Li
Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation
International Conference on Machine Learning (ICML) 2021
[會議版論文] [完整版論文] [代碼] [簡報] [BibTex]

@inproceedings{zhangli2021progressive,
title = {Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation},
author = {Zhang, Jiawei and Li, Linyi and Li, Huichen and Zhang, Xiaolu and Yang, Shuang and Li, Bo},
booktitle = {Proceedings of the 38th International Conference on Machine Learning (ICML 2021)},
pages = {12479--12490},
year = {2021},
editor = {Meila, Marina and Zhang, Tong},
volume = {139},
series = {Proceedings of Machine Learning Research},
month = {18--24 Jul},
publisher = {PMLR},
}

關鍵詞：機器學習攻擊與防禦

總結我們系統地分析了指導 DNN 的黑盒攻擊的梯度估計器，它揭示了幾個關鍵因素，這些因素可以用更少的查詢實現更準確的梯度估計。實現這些關鍵因素的一種方法是對特定解析度的圖像進行梯度估計以生成攻擊樣本，基於此，我們提出的 PSBA 方法實現了目前為止最佳的攻擊效率。

Linyi Li*, Maurice Weber*, Xiaojun Xu, Luka Rimanic, Bhavya Kailkhura, Tao Xie, Ce Zhang, Bo Li
TSS: Transformation-Specific Smoothing for Robustness Certification
ACM Conference on Computer and Communications Security (CCS) 2021
[會議版論文] [完整版論文] [代碼] [簡報] [BibTex]

@inproceedings{li2021tss,
title={TSS: Transformation-Specific Smoothing for Robustness Certification},
author={Linyi Li and Maurice Weber and Xiaojun Xu and Luka Rimanic and Bhavya Kailkhura and Tao Xie and Ce Zhang and Bo Li},
year={2021},
booktitle={ACM Conference on Computer and Communications Security (CCS 2021)}
}

關鍵詞：可驗證機器學習

總結旋轉和縮放等變換在自然世界中很常見。我們提出了第一個基於隨機平滑、嚴格的 Lipschitz 分析和分層抽樣的針對自然變換的高效穩健性驗證方法。我們首次在大規模 ImageNet 數據集上實現了較高的可驗證穩健性（> 30% 的可驗證穩健分類準確率）。

Huichen Li*, Linyi Li*, Xiaojun Xu, Xiaolu Zhang, Shuang Yang, Bo Li
Nonlinear Projection Based Gradient Estimation for Query Efficient Blackbox Attacks
International Conference on Artificial Intelligence and Statistics (AISTATS) 2021
[會議版論文] [完整版論文] [代碼] [BibTex]

@inproceedings{li2020nolinear,
title={Nonlinear Gradient Estimation for Query Efficient Blackbox Attack},
author={Huichen Li and Linyi Li and Xiaojun Xu and Xiaolu Zhang and Shuang Yang and Bo Li},
year={2021},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS 2021)},
series = {Proceedings of Machine Learning Research},
month = {13--15 Apr},
publisher = {PMLR},
}

關鍵詞：機器學習攻擊與防禦

總結我們從理論上分析了使用非線性投影進行基於黑盒梯度估計的攻擊效率，這表明適當的非線性投影可以幫助提高攻擊效率。

Linyi Li, Zhenwen Li, Weijie Zhang, Jun Zhou, Pengcheng Wang, Jing Wu, Guanghua He, Xia Zeng, Yuetang Deng, Tao Xie
Clustering Test Steps in Natural Language toward Automating Test Automation
ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 2020, Industry Track
[論文] [視頻] [BibTex]

@inproceedings{li2020clustep,
title = {Clustering Test Steps in Natural Language toward Automating Test Automation},
author = {Li, Linyi and Li, Zhenwen and Zhang, Weijie and Zhou, Jun and Wang, Pengcheng and Wu, Jing and He, Guanghua and Zeng, Xia and Deng, Yuetang and Xie, Tao},
booktitle = {Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering {(ESEC/FSE 2020)}},
year = {2020},
doi = {10.1145/3368089.3417067},
url = {https://doi.org/10.1145/3368089.3417067}
}

關鍵詞：機器學習與軟體測試

總結我們提出了一種高效的流水線，通過對自然語言描述的測試步驟進行聚類，以生成可執行的測試用例，已部署用於微信測試。

Linyi Li*, Zexuan Zhong*, Bo Li, Tao Xie
Robustra: Training Provable Robust Neural Networks over Reference Adversarial Space
International Joint Conference on Artificial Intelligence (IJCAI) 2019
[論文] [代碼] [BibTex]

@inproceedings{li2019robustra,
title = {Robustra: Training Provable Robust Neural Networks over Reference Adversarial Space},
author = {Li, Linyi and Zhong, Zexuan and Li, Bo and Xie, Tao},
booktitle = {Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019)},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
pages = {4711--4717},
year = {2019},
month = {7},
doi = {10.24963/ijcai.2019/654},
url = {https://doi.org/10.24963/ijcai.2019/654}
}

關鍵詞：可驗證機器學習

總結我們提出了一種通過僅在聯合訓練模型的參考對抗空間內進行正則化來實現可驗證穩健性的訓練方法，以減輕優化難度並獲得更高的可驗證穩健性。

其他

我喜歡旅行、地理、語言學尤其是中文音韻學。我敬仰趙元任先生。

我有時會參加編程大賽。

我非常喜歡吃辣🌶🌶🌶。

我在中國大陸的張家界出生並度過童年，然後在長沙度過少年。

我是土家族。土家語：Ngaf Bifzivkar.

我2022-2023年進行了教職求職 —— 這些是我的research，teaching和diversity statement。