热带海洋学报 ›› 2024, Vol. 43 ›› Issue (5): 190-202.doi: 10.11978/2023172CSTR: 32234.14.2023172

• 海洋调查与观测 • 上一篇    

基于规则集和多层感知机的Argo温度数据质量控制方法*

齐焕东1,2(), 朱程2, 李序春2, 景昕蒂2, 宋德瑞2,3()   

  1. 1.信息学院(上海海洋大学), 上海 201306
    2.国家海洋环境监测中心, 辽宁 大连 116023
    3.地理科学学院(辽宁师范大学), 辽宁 大连 116029
  • 收稿日期:2023-11-21 修回日期:2024-01-08 出版日期:2024-09-10 发布日期:2024-10-10
  • 作者简介:

    齐焕东(2000—), 男, 河南省济源市人, 硕士研究生, 从事海洋数据分析相关研究。email:

    *感谢中国Argo实时资料中心(http://www.argo.org.cn/)提供数据支撑。

  • 基金资助:
    国家重点研发计划项目(2021YFF0704000); 国家重点研发计划项目(2022YFC3106100)

Rule set and multilayer perceptron based quality control method for Argo temperature data*

QI Huandong1,2(), ZHU Cheng2, LI Xuchun2, JING Xindi2, SONG Derui2,3()   

  1. 1. College of Information Technology (Shanghai Ocean University), Shanghai 201306, China
    2. National Marine Environmental Monitoring Center, Dalian 116023, China
    3. School of Geographical Sciences (Liaoning Normal University), Dalian 116029, China
  • Received:2023-11-21 Revised:2024-01-08 Online:2024-09-10 Published:2024-10-10
  • Supported by:
    National Key Research and Development Program of China(2021YFF0704000); National Key Research and Development Program of China(2022YFC3106100)

摘要:

海洋温度数据在全球海洋观测和气候研究中发挥着关键作用, 质量控制对于确保这些数据的可靠性十分关键, 然而, 目前在大数据集上的异常数据召回率尚不理想。文章基于Argo温度数据, 提出一种基于规则集和多层感知机(rule set and multilayer perceptron, RS-MLP)的质量控制方法。首先对13种机器学习模型进行对比分析, 从中筛选出最优机器学习模型, 然后设计了由6种基于规则的质量控制检查模块组成的规则集, 最后集成规则集和最优机器学习模型构建出RS-MLP方法, 并以南海区域的Argo数据为例评估方法性能。研究结果表明: RS-MLP在351746条温度数据的测试集中真阳性率(true positive rate, TPR)、真阴性率(true negative rate, TNR)和接受者操作特性(receiver operating characteristic, ROC)曲线下面积(area under the curve, AUC)依次能达到93%、96%和95%, 并在不同深度层次上的异常数据召回率比较稳定, 具有优秀的质量控制性能。

关键词: Argo, 温度, 机器学习, 质量控制

Abstract:

The ocean temperature data plays a crucial role in global ocean observation and climate research. Quality control is essential to ensure the reliability of these data. However, the current recall rate of anomalous data in large datasets is unsatisfactory. This paper proposes a quality control method based on a rule set and multilayer perceptron (RS-MLP), using Argo temperature data. Initially, thirteen machine learning models are compared and analyzed to select the optimal model. Subsequently, a rule set consisting of six rule-based quality control check modules is designed. Finally, the RS-MLP method is constructed by integrating the rule set with the optimal machine learning model, and its performance is evaluated using Argo data from the South China Sea region. The results show that the RS-MLP achieves good performance with true positive rate (TPR), true negative rate (TNR), and area under the receiver operating characteristic (ROC) curve (AUC) reaching 94%, 96%, and 95% respectively in a test set of 351746 temperature data points. The recall rate of anomalous data at different depth levels is stable, demonstrating excellent quality control performance.

Key words: Argo, temperature, machine learning, quality control