Categories
LLM

Finetuning on multiple GPU

Here are several things to run finetuning leveraging multiple GPUs. In my case, I have two RTX 4090 that doing training. First, you can use accelerate modules

For example, I’m using https://github.com/tloen/alpaca-lora/

accelerate config
accelerate launch finetune.py

Or export this variables either in terminal or python/ipynb file. If you have 4 GPUs

CUDA_VISIBLE_DEVICES=0,1,2,3

We can also put some small codes

gpu_list = [7]
gpu_list_str = ','.join(map(str, gpu_list))
os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Categories
LLM

Solve Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

When doing finetuning model, you may encountered with this error message that telling CUDA Out of Memory (OOM) with detail

RuntimeError: CUDA error: out of memory; Compile with TORCH_USE_CUDA_DSA to enable device-side assertions

For my case, I did upgrade NVIDIA drivers to 5.30 version from 5.25 that cause this problem.

So, the solution is to downgrade my NVIDIA drivers back to 5.25 version and using the latest Transformers and Torch installation like in https://www.yodiw.com/install-transformers-pytorch-tensorflow-ubuntu-2023/

Categories
LLM

Fix AttributeError: ‘DatasetDict’ object has no attribute ‘to’ in Huggingface Dataset

If you load dataset and you want to export it with “to_json”, “to_csv” or “features” and you got this error

Fix AttributeError: 'DatasetDict' object has no attribute 'to' 

The first step, to check the structure of the dataset by using print.

DatasetDict({
    train: Dataset({
        features: ['id', 'category', 'instruction', 'context', 'response'],
        num_rows: 15015
    })
})

In this case, the DatasetDict has only train data (usually it has test and etc). Exporting to json will be

ds['train'].to_json('train.json')

Yes, you need to access the key from the DatasetDict

Categories
LLM

Force running infer / finetuning on specific cuda 1 or GPU

The quick way to make the model inferences or fine-tuning running on specific NVIDIA GPU card is by define this variable before execute the script

For instance, I forced its running on GPU 1 by

export CUDA_VISIBLE_DEVICES=1
Categories
LLM

What is Passage Retrieval Methods

Passage retrieval methods refer to techniques and algorithms used to retrieve relevant passages or segments of text from a larger document or corpus. These methods are commonly employed in information retrieval systems and question-answering systems, where the goal is to locate specific information within a large amount of text.

Categories
Machine Learning

How to uninstall Cuda and replace with new version

The quickfix on how to uninstall current Cuda installed in Ubuntu not via software packages is using the uninstaller. For instance, I use cuda 11.8 and I need to downgrade it into 11.6.

So, I need to find the path and trigger this command

sudo /usr/local/cuda-11.8/bin/cuda-uninstaller

Last, we can clean-up entire cuda folder

sudo rm -rf /usr/local/cuda

If you have issue with GCC for the installed CUDA and need to downgrade or upgrade it, you can follow this

MAX_GCC_VERSION=11
sudo apt install gcc-$MAX_GCC_VERSION g++-$MAX_GCC_VERSION
sudo ln -s /usr/bin/gcc-$MAX_GCC_VERSION /usr/local/cuda/bin/gcc 
sudo ln -s /usr/bin/g++-$MAX_GCC_VERSION /usr/local/cuda/bin/g++
Categories
LLM

Fix CUDA error: no kernel image is available for execution on the device

When generating question answers from datasets using this project “https://github.com/dmis-lab/LIQUID” I got error

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I believe this error because my GPU is RTX 4090 and Ada Lovelace is not supported for torch 1.12. To solve this, I upgrade the torch to one next version 1.13

pip install torch==1.13.0

The warning showed

NVIDIA GeForce RTX 4090 with CUDA capability sm_89 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 4090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

And its works!

Categories
LLM

Fix RuntimeError: unscale_() has already been called on this optimizer since the last update().

When doing finetuning model using Lora and HuggingFace transformer, I received this error

RuntimeError: unscale_() has already been called on this optimizer since the last update().

This error because using the latest transformer version transformers-4.31.0.dev0. The solution is to revert back to transformers-4.30.2 with

pip install transformers-4.30.2
Categories
Machine Learning

Solve unsupported GNU version! gcc versions later than 11 are not supported!

When installing Python module like AutoGPTQ, you may got this errors

/usr/local/cuda/include/crt/host_config.h:132:2: error: #error -- unsupported GNU version! gcc versions later than 11 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
        132 | #error -- unsupported GNU version! gcc versions later than 11 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
            |  ^~~~~
      error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1
      [end of output]

To solve this, we need to install GCC as following the maximum version

MAX_GCC_VERSION=11

sudo apt install gcc-$MAX_GCC_VERSION g++-$MAX_GCC_VERSION
sudo ln -s /usr/bin/gcc-11 /usr/local/cuda/bin/
Categories
Deep Learning

Increase Fan Speed Nvidia RTX 4090 to reduce high Temperature

RTX 4090 at Full Load under Machine Learning training can produce high-temperature heat. Its can go 80-85 degree celsius. Using big industrial fans to cooling the GPU and open the PC case can reduce to 70 Celcius.

However, before going to that path, you can adjust your NVIDIA GPU fans speed from 30% to 90% or even 100%. Here are the steps to do in in Ubuntu

First, you need to configure the X11

sudo vim /etc/X11/xorg.conf

Add add Option "Coolbits" "4" in the Section Device Nvidia

Section "Device"
     Identifier      "Device0"
     Driver          "nvidia"
     VendorName      "NVIDIA"
     Option          "Coolbits" "4"
EndSection

Reboot your PC to apply the new changes

The second steps, its to adjust its fans speed. I’m usually using Psensor to detect the fan speed. RTX 4090 have two fans, so you need to tuning both of them