Categories
Machine Learning

Fix VLLM LMDeploy /usr/bin/ld: cannot find -lcuda: No such file or directory

When running LMDeploy and got this error

2025-06-23 10:43:25,185 - lmdeploy - ERROR - base.py:53 - CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpsne1hded/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpsne1hded/__triton_launcher.cpython-38-x86_64-linux-gnu.so', '-lcuda', '-L/home/dev/miniforge3/envs/lmdeploy/lib/python3.8/site-packages/triton/backends/nvidia/lib', '-L/lib/x86_64-linux-gnu', '-I/home/dev/miniforge3/envs/lmdeploy/lib/python3.8/site-packages/triton/backends/nvidia/include', '-I/tmp/tmpsne1hded', '-I/home/dev/miniforge3/envs/lmdeploy/include/python3.8']' returned non-zero exit status 1.
2025-06-23 10:43:25,185 - lmdeploy - ERROR - base.py:54 - <Triton> check failed!
Please ensure that your device is functioning properly with <Triton>.
You can verify your environment by running `python -m lmdeploy.pytorch.check_env.triton_custom_add`.

When running python -m lmdeploy.pytorch.check_env.triton_custom_add it will show error

❯ python -m lmdeploy.pytorch.check_env.triton_custom_add
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
Traceback (most recent call last):

To solve it, symbolic link

sudo ln -s /usr/local/cuda-12.2/targets/x86_64-linux/lib/stubs/libcuda.so /usr/lib64/libcuda.so

Then

❯ python -m lmdeploy.pytorch.check_env.triton_custom_add                                         
Done.
Categories
Devops

Fix Google COS GPU Docker unable create new device

If you got this error, congratulations, you have the solution here. This is quite complicated problem as below

nvidia-container-cli: mount error: failed to add device rules: unable to generate new device filter program from existing programs: unable to create new device filters program: load program: invalid argument: last insn is not an exit or jmp processed 0 insns (limit 1000000)

Turns out the solution is just run this either in your metadata startup script or inside the Google Container Optimized OS VM.

sysctl -w net.core.bpf_jit_harden=1 

If you want more

bash -c "echo net.core.bpf_jit_harden=1 > /etc/sysctl.d/91-nvidia-docker.conf"
sysctl --system
systemctl restart docker