OSError: (External) Nccl error, unhandled cuda error (at /paddle/paddle/fluid/platform/collective_

网友投稿 363 2022-08-26

OSError: (External) Nccl error, unhandled cuda error (at /paddle/paddle/fluid/platform/collective_

最近使用Paddle运行多卡程序的时候,出现了下面的错误:

Traceback (most recent call last): File "train_pairwise.py", line 238, in do_train() File "train_pairwise.py", line 116, in do_train paddle.distributed.init_parallel_env() File "/root/anaconda3/lib/python3.7/site-packages/paddle/distributed/parallel.py", line 196, in init_parallel_env parallel_helper._init_parallel_ctx() File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/parallel_helper.py", line 42, in _init_parallel_ctx __parallel_ctx__clz__.init()OSError: (External) Nccl error, unhandled cuda error (at /paddle/paddle/fluid/platform/collective_helper.cc:100)

解决方法

我的cuda是10.2的 ,paddle版本是2.1.3

apt-get install libnccl2=2.5.6-1+cuda10.2 libnccl-dev=2.5.6-1+cuda10.2find / -name "libnccl.so*"ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2.5.6 /usr/local/bin/libnccl.soexport LD_LIBRARY_PATH=/usr/local/bin/:$LD_LIBRARY_PATH

参考文献

[1].OSError: (External) Nccl error, unhandled cuda error (at /paddle/paddle/fluid/platform/collective_helper.cc:100). https://issueexplorer.com/issue/PaddlePaddle/PaddleDetection/4139

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:WinSCP:but error occurred while setting the permissions and/or timestamp
下一篇:python tqdm raise RuntimeError(“cannot join current thread“) RuntimeError: cannot join current thr
相关文章

 发表评论

暂时没有评论,来抢沙发吧~