Categories
Ubuntu

Install CUDA 11 on Ubuntu 23.10

To solve Driver or CUDA 11 installation error in Ubuntu 23.10, the answer is to ensure its using the compatible GCC version. By default installation, it will using GCC 13 which is not working when compiling CUDA or NVIDIA Drivers (required GCC 10). Installing CUDA 11 is important to run Tensorflow that haven’t fully adapted with CUDA 12.

Failed to verify gcc version. See log at /var/log/cuda-installer.log for details.

First step to fixing this problem is to uninstall any nvidia and cuda installation made previously

sudo apt autoremove cuda* nvidia* --purge

Next, install GCC 10

MAX_GCC_VERSION=10
sudo apt install gcc-$MAX_GCC_VERSION g++-$MAX_GCC_VERSION
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-$MAX_GCC_VERSION $MAX_GCC_VERSION

Next, choose the GCC 10 as default by running this command

sudo update-alternatives --config gcc

Now, you all set! You can start to do installation of CUDA 11 in Ubuntu 23.10. Make sure to un-check the driver installation part (where we will install it later)

sudo ./cuda_11.8.0_520.61.05_linux.run

Next, I’m using Ubuntu NVIDIA default installation. So, I revert back the GCC to version 13, using the same command

sudo update-alternatives --config gcc
sudo apt install nvidia-driver-525

You can repeat the process like CUDNN, TensorRT and others installation following my previous article here

Finally, make sure if anything broken with NVCC, is to switch the GCC version to 10, not 13.

Categories
Ubuntu

Fix VSCode open Large file by increase Memory

When opening Netflix data around 1GB, the VSCode is crashed. My memory are pretty much 30% usage and have plenty room to open this 1GB file.

To fix this, either run from terminal

code --max-memory=12288mb

Or right click the menu in Ubuntu, and replace the launcher with this.

code  --max-memory=12288mb --unity-launch %F
Categories
ML

Fix Failed to load implementation from:dev.ludovic.netlib.blas.VectorBLAS

When you got error “Failed to load implementation from:dev.ludovic.netlib.blas.VectorBLAS” when running ALS training, the quickfix for Intel MKL are

sudo ln -s /opt/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_rt.so /usr/local/lib/libblas.so.3
sudo ln -s /opt/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_rt.so /usr/local/lib/liblapack.so.3

Or follow this: https://spark.apache.org/docs/latest/ml-linalg-guide.html

Categories
Tensorflow

Solve No builder could be found in the director tensorflow dataset

When downloading movielens dataset from Tensorflow dataset, I got this error

No builder could be found in the director tensorflow dataset

The quick solution is to upgrade

pip install --upgrade tfds-nightly
Categories
Machine Learning

Create Custom Metric Accuracy for Trainer Classification

Here are a simple custom metric to measure accuracy, which you can attach it into Trainer parameter

from transformers import Trainer
from evaluate import load

metric = load('accuracy')

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    accuracy = metric.compute(predictions=preds, references=labels)

    return {'accuracy': accuracy}
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation'],
    data_collator=data_collator,
    compute_metrics=compute_metrics, <----- HERE
    tokenizer=tokenizer
)

trainer.train()
Categories
Ubuntu

Install WordPress in Ubuntu 22.04 LightSail AWS

Here are technical quick steps on setup a brand new WordPress website using NGINX, PHP-FPM, LetsEncrypt and MariaDB. After following the step, you will have a website running in very cheap LightSail AWS ~ $5 / month.

Lets follow this 7 steps and just require less than 5 minutes!

  1. Go to AWS Lightsail and launch a new instance.
    Go to https://lightsail.aws.amazon.com/ls/webapp/home/instances and launch a new instance. Create a static IP Address and attach into this instance. Then, you can assign the NS of domain into new IP address both www and non-www.

    Then go to networking and enable “443” port to ensure HTTPS allowed from firewall.
Categories
Networking

Run Chrome on different network interfaces in Ubuntu

If you have two different internet providers running in single PC, most of time you would like to split the usage between different network interface / adapter.

For instance, you want one browser for browsing and others for downloading / uploading activities. There are several solution to run multiple ethernet network devices like bind-address, firejail and others that may not working easily in Ubuntu

Quick solution for this is to leverage open-source project : https://github.com/JsBergbau/BindToInterface

First, clone the project

git clone https://github.com/JsBergbau/BindToInterface

Second, compile it (make sure to have gcc already installed)

gcc -nostartfiles -fpic -shared bindToInterface.c -o bindToInterface.so -ldl -D_GNU_SOURCE

Third, get your network interfaces information with sudo ifconfig (you can install this as well) and try to get interfaces name. In my case, I have two: enpf0 and enpf1.

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17xxxx  netmask 255.255.0.0  broadcast 172.17.255.255xxx...

enpf0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.xx.xx  netmask 255.255.255.0  broadcast 
        ....

enpf1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.xxxx  netmask 255.255.255.0  broadcast         ....

Now, here is a quick way to run Google Chrome or Firefox with different network interfaces or ethernet with public internet access.

Go inside the github project (or you can copy the .so file) and run this command. Replace enpf1 with your own network interface

BIND_INTERFACE=enpf1 LD_PRELOAD=./bindToInterface.so /usr/bin/google-chrome-stable 

Now you successfully open the google-chrome with specific network interface.

Categories
Networking

Solve WordPress NGINX (13: Permission denied)

A quick step to solve this error from WordPress and NGINX

crit *2 stat() index.php" failed (13: Permission denied) wordpress

Make sure to create permission to the hosted folder by

chmod +x /path/website

If the problem still exists, try to add www-data into your user group

gpasswd -a www-data ubuntu

If the page is not found, try to restart your php-fpm services

sudo service php8.1-fpm stop
sudo service php8.1-fpm start
Categories
LLM

Finetuning on multiple GPU

Here are several things to run finetuning leveraging multiple GPUs. In my case, I have two RTX 4090 that doing training. First, you can use accelerate modules

For example, I’m using https://github.com/tloen/alpaca-lora/

accelerate config
accelerate launch finetune.py

Or export this variables either in terminal or python/ipynb file. If you have 4 GPUs

CUDA_VISIBLE_DEVICES=0,1,2,3

We can also put some small codes

gpu_list = [7]
gpu_list_str = ','.join(map(str, gpu_list))
os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Categories
LLM

Solve Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

When doing finetuning model, you may encountered with this error message that telling CUDA Out of Memory (OOM) with detail

RuntimeError: CUDA error: out of memory; Compile with TORCH_USE_CUDA_DSA to enable device-side assertions

For my case, I did upgrade NVIDIA drivers to 5.30 version from 5.25 that cause this problem.

So, the solution is to downgrade my NVIDIA drivers back to 5.25 version and using the latest Transformers and Torch installation like in https://www.yodiw.com/install-transformers-pytorch-tensorflow-ubuntu-2023/