[해결됨] WSL2 CUDA undefined symbol: devicesetgpcclkvfoffset 문제 해결하기

[추가] 현재는 Fix된 이슈임

Windows 11 Insider Preview Build 22000.51이 나온 뒤에는 해결된 문제입니다. 아래 환경에서 테스트하였으니 apt 패키지와 드라이버를 업데이트 해보시기 바랍니다.

  • OS: Windows 11 Insider Preview Build 22000.51
  • Driver: NVIDIA 470.76
  • APT Package Version List
Inst libnvidia-container1 (1.4.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64])
 Inst libnvidia-container-tools (1.4.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64])
 Inst nvidia-container-toolkit (1.5.1-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64])
 Inst nvidia-container-runtime (3.5.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64])
 Inst nvidia-docker2 (2.6.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [all])
 Conf libnvidia-container1 (1.4.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64])
 Conf libnvidia-container-tools (1.4.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64])
 Conf nvidia-container-toolkit (1.5.1-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64])
 Conf nvidia-container-runtime (3.5.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64])
 Conf nvidia-docker2 (2.6.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [all])

Environment: Windows 10 Insider Preview Build 21376.1
WSL2 Ubuntu release: 20.04
NVIDIA Forum Links: https://forums.developer.nvidia.com/t/nvidia-container-cli-error/177403/2?u=ji5489

문제 제기

if I run sudo nvidia-container-cli -k -d /dev/tty info

I got this error
where is ‘devicesetgpcclkvfoffset’ symbol?

– WARNING, the following logs are for debugging purposes only –
I0509 06:20:58.858216 1009 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df)
I0509 06:20:58.858281 1009 nvc.c:346] using root /
I0509 06:20:58.858287 1009 nvc.c:347] using ldcache /etc/ld.so.cache
I0509 06:20:58.858293 1009 nvc.c:348] using unprivileged user 65534:65534
I0509 06:20:58.858311 1009 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0509 06:20:58.891385 1009 dxcore.c:226] Creating a new WDDM Adapter for hAdapter:40000000 luid:1e946dc
I0509 06:20:58.917427 1009 dxcore.c:267] Adding new adapter via dxcore hAdapter:40000000 luid:1e946dc wddm version:3000
I0509 06:20:58.917515 1009 dxcore.c:325] dxcore layer initialized successfully
W0509 06:20:58.918854 1009 nvc.c:397] skipping kernel modules load on WSL
I0509 06:20:58.919404 1010 driver.c:101] starting driver service
E0509 06:20:58.950931 1010 driver.c:168] could not start driver service: load library failed: /usr/lib/wsl/drivers/nv_dispi.inf_amd64_43efafcd74b2efc9/libnvidia-ml.so.1: undefined symbol: devicesetgpcclkvfoffset
I0509 06:20:58.951399 1009 driver.c:203] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

해결방법

I’m currently installing WSL2 Ubuntu 20.04 within Windows 10 Insider Preview Build 21376.1, and faced same issue like below.

# sudo nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0512 07:05:58.485519 5592 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df)
I0512 07:05:58.485581 5592 nvc.c:346] using root /
I0512 07:05:58.485601 5592 nvc.c:347] using ldcache /etc/ld.so.cache
I0512 07:05:58.485604 5592 nvc.c:348] using unprivileged user 65534:65534
I0512 07:05:58.485659 5592 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0512 07:05:58.502183 5592 dxcore.c:226] Creating a new WDDM Adapter for hAdapter:40000000 luid:185673b
I0512 07:05:58.512924 5592 dxcore.c:267] Adding new adapter via dxcore hAdapter:40000000 luid:185673b wddm version:3000
I0512 07:05:58.512956 5592 dxcore.c:325] dxcore layer initialized successfully
W0512 07:05:58.513266 5592 nvc.c:397] skipping kernel modules load on WSL
I0512 07:05:58.513404 5593 driver.c:101] starting driver service
E0512 07:05:58.521931 5593 driver.c:168] could not start driver service: load library failed: /usr/lib/wsl/drivers/nv_dispi.inf_amd64_43efafcd74b2efc9/libnvidia-ml.so.1: undefined symbol: devicesetgpcclkvfoffset
I0512 07:05:58.522047 5592 driver.c:203] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

and I found out that apt-get experimental repository and stable one is mixed out – failing to install WSL2 one (which is available in experimental repository). as stable package is released, experimental(=WSL2) package should be released, but It wasn’t. see apt-cache madison result below.

# apt-cache madison libnvidia-container1
libnvidia-container1 |    1.4.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.3.3-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.3.3~rc.2-1 | https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.3.3~rc.1-1 | https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.3.2-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.3.1-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages

The point is, we should install libnvidia-container1=1.3.3~rc.2-1 version rather than libnvidia-container1=1.4.0-1. but normal apt-get install nvidia-docker2 command would install 1.4.0-1 version, and It mostly like to fail in WSL2 environment.

I successfully installed older (WSL2-exclusive) packages via apt-get install command below:

apt-get install \
    libnvidia-container1=1.3.3~rc.2-1 \
    libnvidia-container-tools=1.3.3~rc.2-1 \
    nvidia-container-toolkit=1.4.1-1 \
    nvidia-container-runtime=3.4.1-1 \
    nvidia-docker2=2.5.0-1

Those commands will install older packages, and after sudo service docker stop and sudo service docker start, CUDA will work inside docker.

“[해결됨] WSL2 CUDA undefined symbol: devicesetgpcclkvfoffset 문제 해결하기”에 대한 한개의 댓글

답글 남기기

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다