中图分类号:TP181 文献标志码:A DOI: 10.16157/j.issn.0258-7998.256451 中文引用格式: 安栋,王媛媛,宋宁宁,等. 强化学习评估指标的系统性分析与优化研究[J]. 电子技术应用,2025,51(10):17-23. 英文引用格式: An Dong,Wang Yuanyuan,Song Ningning,et al. Systematic analysis and optimization research on reinforcement learning evaluation metrics[J]. Application of Electronic Technique,2025,51(10):17-23.
Systematic analysis and optimization research on reinforcement learning evaluation metrics
An Dong1,Wang Yuanyuan2,Song Ningning3,Dai Chao2,Liu Zhiyin2
1.National Computer System Engineering Research Institute of China;2.China Information Security Research Academy Co.,Ltd.;3.China Electronics Corporation
Abstract: Reinforcement learning evaluation metrics, serving as core tools for measuring the performance of agents and guiding algorithm optimization, face key challenges such as the singularity of metrics, environmental dependence, and the lack of interpretability in practical applications. This paper systematically analyzes the classification framework of existing evaluation metrics, proposes a multi-dimensional metric system based on performance, learning process, strategy, robustness, and efficiency, and explores its applicability and limitations in different task scenarios (such as sparse reward and high-dimensional state space). The study indicates that traditional metrics are prone to overlooking the requirements of safety, efficiency, and alignment with human preferences in complex environments, and there is a need to design evaluation methods that integrate multiple objectives in combination with the characteristics of tasks. For future research, this paper suggests focusing on directions such as multi-objective Pareto optimization, reward modeling based on human feedback, and the quantification of exploration efficiency in sparse reward environments, so as to enhance the comprehensiveness and interpretability of evaluations. By combining theoretical analysis with practical cases, this paper provides methodological support for the standardization of the reinforcement learning evaluation system and its adaptation across different fields, thus promoting its efficient implementation in complex scenarios.
Key words : reinforcement learning;evaluation metrics;explainability;reward
引言
强化学习作为机器学习的重要分支,通过智能体与环境的交互学习最优策略,已在游戏智能[1-2]、机器人控制[3-4]、自动驾驶[5]、生物医疗[6]等领域取得了显著成果。强化学习越来越被重视,图1通过每年发表论文数量展示强化学习领域的增长趋势(数据来自 Web of Science™)。