Here is a quick way to run OpenAI GPT OSS 20B in RTX 4090 GPU
docker run --name vllm --gpus all -v /YOUR_PATH_TO_MODEL/models--gpt-oss-20b:/model -e VLLM_ATTENTION_BACKEND='TRITON_ATTN_VLLM_V1' \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:gptoss \
--model /model --served-model-name model
You can download the model with
hf download openai/gpt-oss-20b --local-dir ./