登录    注册    忘记密码    使用帮助

详细信息

最大互信息系数的并行计算方法研究     被引量:3

Parallel Calculation Method for Maximum Information Coefficient

文献类型:期刊文献

中文题名:最大互信息系数的并行计算方法研究

英文题名:Parallel Calculation Method for Maximum Information Coefficient

作者:朱道恒[1];李志强[1,2]

机构:[1]广东海洋大学电子与信息工程学院,湛江524088;[2]南方海洋科学与工程广东省实验室,湛江524000

年份:2021

卷号:21

期号:34

起止页码:14625

中文期刊名:科学技术与工程

外文期刊名:Science Technology and Engineering

收录:CSTPCD、、北大核心、北大核心2020

基金:国家自然科学基金(42176167,41676079);广东海洋大学创新强校工程项目(Q18307)。

语种:中文

中文关键词:最大互信息系数;并行计算;最大互信息系数并行计算(PCMIC);Spark;消息传递接口(MPI)

外文关键词:maximal information coefficient(MIC);parallel computing;the parallel computing maximal information coefficient(PCMIC);Spark;message passing interface(MPI)

中文摘要:针对最大互信息系数(maximal information coefficient,MIC)近似算法在大规模数据场景下的计算时间复杂度高,计算时间增长快的问题,提出一种最大互信息系数并行计算(parallel computing maximal information coefficient,PCMIC)方法。分别在Spark和Spark-MPI(message passing interface)计算框架中,在不同的数据规模和不同的噪声水平下,利用PCMIC算法对14种典型的相关关系做并行计算。另外在不同节点数的情况下,选择两种具有代表性的相关关系来测试PCMIC算法在两种计算框架中的性能。结果表明:PCMIC算法在两种框架下的运算效果与原始MIC近似算法相比,同样具有普适性和均匀性,而且具有良好的可扩展性;随着数据规模和节点数的增加,PCMIC算法在两种框架中运算的时间增长明显比MIC近似算法慢,而且在Spark-MPI框架下的并行加速比和效率略优于Spark;Spark能够支持MPI任务的调度,为研究不同并行计算框架之间的融合奠定了一定的理论和应用基础。

外文摘要:In order to address the high complexity of computational time and the fast growth of maximum information coefficient(MIC)approximation algorithm in the context of big data,a parallel computing maximum information coefficient algorithm was proposed.A total of fourteen typical correlations were computed in parallel using the parallel computing maximal information coefficient(PCMIC)algorithm at different data sizes and noise levels under Spark and Spark-MPI(message passing interface)computing frameworks,respectively.In addition,two representative correlations were chosen to test the performance produced by the PCMIC algorithm under the two computing frameworks with different numbers of nodes.The results show as follows.Firstly,the PCMIC algorithm is as pervasive and uniform in both frameworks as the original MIC approximation algorithm,with scalability demonstrated.Secondly,with the increase in data size and the number of nodes,the pace of time growth is significantly slower for the PCMIC algorithm in both frameworks than for the MIC approximation algorithm.Besides,Spark is slightly outperformed by the Spark-MPI framework in parallel speedup ratio and efficiency.Lastly,Spark is capable to support the scheduling of MPI tasks,thus laying a theoretical foundation for studying the convergence between different parallel computing frameworks and its application.

参考文献:

正在载入数据...

版权所有©广东海洋大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心