李林翼

西蒙菲沙大学计算机系助理教授。西门菲莎大学TAI实验室主任

[firstnamelowercase]_[lastnamelowercase]@sfu.ca

加拿大不列颠哥伦比亚省本拿比（大温哥华地区）

我的研究方向为可信机器学习，尤其关注可验证可信深度学习和可信基础大模型，结合机器学习和计算机安全两大研究领域。具体地，我专注

为大规模深度学习系统提供可验证的可信保证（如鲁棒性、公平性、数值稳定性的保证）；
理解并分析深度学习特别是基础大模型存在可信性和对齐缺陷的原因；
科学全面地评测基础大模型。

我在机器学习和计算机安全的顶级会议上发表逾30篇论文，包括ICML，NeurIPS，ICLR，IEEE S&P，ACM CCS等。我获得过Rising Stars in Data Science，AdvML Rising Star Award，和Wing Kai Cheng奖学金等奖项。我在2023年共同领导的\(\alpha,\beta\)-CROWN团队荣获第四届国际神经网络验证比赛(VNN-COMP'23)冠军。我入围2022 Qualcomm Innovation奖学金和2022 Two Sigma PhD奖学金的评选。

我2023年于伊利诺伊大学香槟分校计算机科学系获得博士学位，很荣幸师从李博教授和谢涛教授。我2018年于清华大学计算机科学与技术系获得本科学位，并师从白晓颖教授进行了Web API自动化测试研究。我2023至2024年在字节跳动（西雅图）担任高级算法研究员。我曾在2019年和2022年于微软实习两次（分别受Adam Kalai和Neel Sundaresan指导），在2021年于富士通美国研究中心实习（Mukul Prasad指导），在2017年于卡内基梅隆大学进行暑期研修（Matt Fredrikson指导）。

了解更多我的研究教学/授课实验室招聘

主要论文

完整的发表列表可以在以下链接查看：TAI Lab - Publication 和 Google Scholar。

（*表示共同第一作者）

Linyi Li, Shijie Geng, Zhenwen Li, Yibo He, Hao Yu, Ziyue Hua, Guanghan Ning, Siwei Wang, Tao Xie, Hongxia Yang
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models
38th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS 2024 D&B)
[完整版论文] [会议版论文] [代码] [项目网站] [幻灯片] [BibTex]

@inproceedings{
li2024infibench,
title={InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models},
author={Linyi Li and Shijie Geng and Zhenwen Li and Yibo He and Hao Yu and Ziyue Hua and Guanghan Ning and Siwei Wang and Tao Xie and Hongxia Yang},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024},
}

关键词： LLM benchmark code

总结针对代码大模型的系统评测，评估了模型在回答代码领域真实世界中的自由问题的能力。通过对超过100个模型的评估，我们总结出了针对已有开源代码大模型的经验趋势和扩展法则。

Linyi Li
Certifiably Trustworthy Deep Learning Systems at Scale
Doctoral Thesis
[完整版论文] [官方版] [BibTex]

@phdthesis{li2023thesis,
title = {Certifiably Trustworthy Deep Learning Systems at Scale},
author = {Linyi Li},
year = 2023,
month = {Oct},
school = {University of Illinois Urbana-Champaign},
type = {PhD thesis}
}

关键词：可验证机器学习

总结我的博士毕业论文。本论文系统性总结了可验证可信深度学习的研究现状。相比之前的SoK论文，此毕业论文扩展到了鲁棒性之外的可信性，并介绍了代表性方法的技术细节。

Linyi Li, Tao Xie, Bo Li
SoK: Certified Robustness for Deep Neural Networks
44th IEEE Symposium on Security and Privacy (SP 2023)
[完整版论文] [会议版论文] [幻灯片] [代码] [SOTA排行榜] [BibTex]

@inproceedings{li2023sok,
author={Linyi Li and Tao Xie and Bo Li},
title = {SoK: Certified Robustness for Deep Neural Networks},
booktitle = {44th {IEEE} Symposium on Security and Privacy, {SP} 2023, San Francisco, CA, USA, 22-26 May 2023},
publisher = {{IEEE}},
year = {2023},
}

关键词：可验证机器学习

总结对 DNN 可验证稳健性研究的全面系统总结，包括实践和理论上的意义、发现、主要挑战和未来方向的讨论，以及一个开源统一工具箱来评估 20 多种代表性方法。

Linyi Li, Yuhao Zhang, Luyao Ren, Yingfei Xiong, Tao Xie
Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects
45th IEEE/ACM International Conference on Software Engineering (ICSE 2023)
[完整版论文] [会议版论文] [幻灯片] [代码] [BibTex]

@inproceedings{li2023reliability,
author={Linyi Li and Yuhao Zhang and Luyao Ren and Yingfei Xiong and Tao Xie},
title = {Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects},
booktitle = {45th International Conference on Software Engineering, {ICSE} 2023, Melbourne, Australia, 14-20 May 2023},
publisher = {{IEEE/ACM}},
year = {2023},
}

关键词：可验证机器学习数值可靠性

总结提出了RANUM：一种高效的白盒框架，适用于一般的人工神经网络模型，用于验证数值可靠性（例如，不输出NAN或INF）、面向缺陷触发的系统测试生成和修复生成。其中，RANUM是后两种任务的首个自动化框架。

Mintong Kang*, Linyi Li*, Maurice Weber, Yang Liu, Ce Zhang, Bo Li
Certifying Some Distributional Fairness with Subpopulation Decomposition
Advances in Neural Information Processing Systems (NeurIPS) 2022
[完整版论文] [会议版论文] [代码] [海报] [BibTex]

@inproceedings{kang2022certifying,
title = {Certifying Some Distributional Fairness with Subpopulation Decomposition},
author = {Mintong Kang and Linyi Li and Maurice Weber and Yang Liu and Ce Zhang and Bo Li},
booktitle = {Advances in Neural Information Processing Systems 35 (NeurIPS 2022)},
year = {2022}
}

关键词：可验证机器学习公平性

总结一种新的实用且可扩展的验证算法，当分布从训练偏移时，为给定模型提供公平性保证，基于统计亚群分解。

Linyi Li, Jiawei Zhang, Tao Xie, Bo Li
Double Sampling Randomized Smoothing
39th International Conference on Machine Learning (ICML 2022)
[会议版论文] [完整版论文] [代码] [BibTex]

@inproceedings{
li2022double,
title={Double Sampling Randomized Smoothing},
author={Linyi Li and Jiawei Zhang and Tao Xie and Bo Li},
booktitle={39th International Conference on Machine Learning (ICML 2022)},
year={2022},
}

关键词：可验证机器学习

总结对随机平滑化方法的一种更紧的验证算法，其首次利用来自两种不同分布的统计数据，来实现更紧的稳健性界，并在宽松条件下首次突破众所周知的维数陷阱。

Fan Wu*, Linyi Li*, Chejian Xu, Huan Zhang, Bhavya Kailkhura, Krishnaram Kenthapadi, Ding Zhao, Bo Li
COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks
10th International Conference on Learning Representations (ICLR 2022)
[会议版论文] [完整版论文] [SOTA排行榜] [代码] [BibTex]

@inproceedings{
wu2022copa,
title={{COPA}: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks},
author={Fan Wu and Linyi Li and Chejian Xu and Huan Zhang and Bhavya Kailkhura and Krishnaram Kenthapadi and Ding Zhao and Bo Li},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=psh0oeMSBiF}
}

关键词：可验证机器学习深度强化学习

总结通过聚合在分区数据集上训练的策略和多重步骤下的策略，实现可验证的深度强化学习对离线训练数据集扰动（即荼毒攻击）的稳健性。

Zhuolin Yang*, Linyi Li*, Xiaojun Xu, Bhavya Kailkhura, Tao Xie, Bo Li
On the Certified Robustness for Ensemble Models and Beyond
10th International Conference on Learning Representations (ICLR 2022)
[会议版论文] [完整版论文] [代码] [BibTex]

@inproceedings{
yang2022on,
title={On the Certified Robustness for Ensemble Models and Beyond},
author={Zhuolin Yang and Linyi Li and Xiaojun Xu and Bhavya Kailkhura and Tao Xie and Bo Li},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=tUa4REjGjTf}
}

关键词：可验证机器学习

总结基于随机平滑分类器的曲率界，我们证明了大的分类概率差和梯度多样性对于可验证的稳健集成模型是充分必要的条件。通过约束这两个因素，我们实现了目前为止最佳的 L2 范数扰动下的稳健性。

Zhuolin Yang*, Linyi Li*, Xiaojun Xu*, Shiliang Zuo, Qian Chen, Pan Zhou, Benjamin I. P. Rubinstein, Ce Zhang, Bo Li
TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness
Advances in Neural Information Processing Systems (NeurIPS) 2021
[会议版论文] [完整版论文] [代码] [BibTex]

@inproceedings{yangli2021trs,
title = {TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness},
author = {Zhuolin Yang and Linyi Li and Xiaojun Xu and Shiliang Zuo and Qian Chen and Pan Zhou and Benjamin I. P. Rubinstein and Ce Zhang and Bo Li},
booktitle = {Advances in Neural Information Processing Systems 34 (NeurIPS 2021)},
year = {2021}
}

关键词：鲁棒机器学习

总结我们证明了给定有界模型平滑度下，模型的多样性和对抗样本可迁移性之间的相关性，基于此，我们提出了强大的正则化器，该正则化器对集成模型实现了针对现有强攻击的最佳稳健性。

Jiawei Zhang*, Linyi Li*, Huichen Li, Xiaolu Zhang, Shuang Yang, Bo Li
Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation
International Conference on Machine Learning (ICML) 2021
[会议版论文] [完整版论文] [代码] [幻灯片] [BibTex]

@inproceedings{zhangli2021progressive,
title = {Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation},
author = {Zhang, Jiawei and Li, Linyi and Li, Huichen and Zhang, Xiaolu and Yang, Shuang and Li, Bo},
booktitle = {Proceedings of the 38th International Conference on Machine Learning (ICML 2021)},
pages = {12479--12490},
year = {2021},
editor = {Meila, Marina and Zhang, Tong},
volume = {139},
series = {Proceedings of Machine Learning Research},
month = {18--24 Jul},
publisher = {PMLR},
}

关键词：机器学习攻防

总结我们系统地分析了指导 DNN 的黑盒攻击的梯度估计器，它揭示了几个关键因素，这些因素可以用更少的查询实现更准确的梯度估计。实现这些关键因素的一种方法是对特定分辨率的图像进行梯度估计以生成攻击样本，基于此，我们提出的 PSBA 方法实现了目前为止最佳的攻击效率。

Linyi Li*, Maurice Weber*, Xiaojun Xu, Luka Rimanic, Bhavya Kailkhura, Tao Xie, Ce Zhang, Bo Li
TSS: Transformation-Specific Smoothing for Robustness Certification
ACM Conference on Computer and Communications Security (CCS) 2021
[会议版论文] [完整版论文] [代码] [幻灯片] [BibTex]

@inproceedings{li2021tss,
title={TSS: Transformation-Specific Smoothing for Robustness Certification},
author={Linyi Li and Maurice Weber and Xiaojun Xu and Luka Rimanic and Bhavya Kailkhura and Tao Xie and Ce Zhang and Bo Li},
year={2021},
booktitle={ACM Conference on Computer and Communications Security (CCS 2021)}
}

关键词：可验证机器学习

总结旋转和缩放等变换在自然世界中很常见。我们提出了第一个基于随机平滑、严格的 Lipschitz 分析和分层抽样的针对自然变换的高效稳健性验证方法。我们首次在大规模 ImageNet 数据集上实现了较高的可验证稳健性（> 30% 的可验证稳健分类准确率）。

Huichen Li*, Linyi Li*, Xiaojun Xu, Xiaolu Zhang, Shuang Yang, Bo Li
Nonlinear Projection Based Gradient Estimation for Query Efficient Blackbox Attacks
International Conference on Artificial Intelligence and Statistics (AISTATS) 2021
[会议版论文] [完整版论文] [代码] [BibTex]

@inproceedings{li2020nolinear,
title={Nonlinear Gradient Estimation for Query Efficient Blackbox Attack},
author={Huichen Li and Linyi Li and Xiaojun Xu and Xiaolu Zhang and Shuang Yang and Bo Li},
year={2021},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS 2021)},
series = {Proceedings of Machine Learning Research},
month = {13--15 Apr},
publisher = {PMLR},
}

关键词：机器学习攻防

总结我们从理论上分析了使用非线性投影进行基于黑盒梯度估计的攻击效率，这表明适当的非线性投影可以帮助提高攻击效率。

Linyi Li, Zhenwen Li, Weijie Zhang, Jun Zhou, Pengcheng Wang, Jing Wu, Guanghua He, Xia Zeng, Yuetang Deng, Tao Xie
Clustering Test Steps in Natural Language toward Automating Test Automation
ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 2020, Industry Track
[论文] [视频] [BibTex]

@inproceedings{li2020clustep,
title = {Clustering Test Steps in Natural Language toward Automating Test Automation},
author = {Li, Linyi and Li, Zhenwen and Zhang, Weijie and Zhou, Jun and Wang, Pengcheng and Wu, Jing and He, Guanghua and Zeng, Xia and Deng, Yuetang and Xie, Tao},
booktitle = {Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering {(ESEC/FSE 2020)}},
year = {2020},
doi = {10.1145/3368089.3417067},
url = {https://doi.org/10.1145/3368089.3417067}
}

关键词：机器学习与软件测试

总结我们提出了一种高效的流水线，通过对自然语言描述的测试步骤进行聚类，以生成可执行的测试用例，已部署用于微信测试。

Linyi Li*, Zexuan Zhong*, Bo Li, Tao Xie
Robustra: Training Provable Robust Neural Networks over Reference Adversarial Space
International Joint Conference on Artificial Intelligence (IJCAI) 2019
[论文] [代码] [BibTex]

@inproceedings{li2019robustra,
title = {Robustra: Training Provable Robust Neural Networks over Reference Adversarial Space},
author = {Li, Linyi and Zhong, Zexuan and Li, Bo and Xie, Tao},
booktitle = {Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019)},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
pages = {4711--4717},
year = {2019},
month = {7},
doi = {10.24963/ijcai.2019/654},
url = {https://doi.org/10.24963/ijcai.2019/654}
}

关键词：可验证机器学习

总结我们提出了一种通过仅在联合训练模型的参考对抗空间内进行正则化来实现可验证稳健性的训练方法，以减轻优化难度并获得更高的可验证稳健性。

其他

我喜欢旅行、地理、语言学尤其是中文音韵学。我敬仰赵元任先生。

我有时会参加编程比赛。

我非常喜欢吃辣🌶🌶🌶。

我在张家界出生并度过童年，然后在长沙度过少年。

我是土家族。土家语：Ngaf Bifzivkar.

我2022-2023年进行了教职求职 —— 这些是我的research，teaching和diversity statement。