Categories
Ubuntu

Run GPT OSS 20B on VLLM with RTX 4090

Here is a quick way to run OpenAI GPT OSS 20B in RTX 4090 GPU

docker run --name vllm --gpus all -v /YOUR_PATH_TO_MODEL/models--gpt-oss-20b:/model -e VLLM_ATTENTION_BACKEND='TRITON_ATTN_VLLM_V1' \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:gptoss \
    --model /model --served-model-name model

You can download the model with

hf download openai/gpt-oss-20b --local-dir ./

Leave a Reply

Your email address will not be published. Required fields are marked *