服务器租用
由于本机显卡太低跑不动,故在优云智算平台租用云服务器来部署。
选择40系显卡,按量计费模式。
Ubantu版本:22.04
Cuda版本:12.8
pytorch版本:2.8.0
python版本:3.12

DeepSeek-OCR
源码地址
模型地址
部署步骤
一、拉取项目
git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
二、创建conda虚拟环境
conda create -n deepseek-ocr python=3.12.9 -y
conda activate deepseek-ocr
三、安装依赖环境
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118
pip install vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl
pip install -r requirements.txt
pip install flash-attn==2.7.3 --no-build-isolation
其中flash-attn在线下载太慢、需要源码下载:地址
选择版本:flash_attn-2.7.3+cu12torch2.6cxx11abiFALSE-cp312-cp312-linux_x86_64.whl
安装vllm
选择版本:vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl
上面两个文件上传至根文件中进行安装
pip install /root/vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl
pip install /root/flash_attn-2.7.3+cu12torch2.6cxx11abiFALSE-cp312-cp312-linux_x86_64.whl
四、模型下载
根目录创建llms文件夹
pip install modelscope
cd llms
在llms文件夹中创建download.py文件
#模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('deepseek-ai/DeepSeek-OCR', cache_dir="/root/llms")
运行python文件
python download.py
下载后的模型目录:

五、模型测试
在config.py中修改模型地址、输入文件和输出文件地址

# TODO: change modes
# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False
# Gundam: base_size = 1024, image_size = 640, crop_mode = True
BASE_SIZE = 1024
IMAGE_SIZE = 640
CROP_MODE = True
MIN_CROPS= 2
MAX_CROPS= 6 # max:9; If your GPU memory is small, it is recommended to set it to 6.
MAX_CONCURRENCY = 100 # If you have limited GPU memory, lower the concurrency count.
NUM_WORKERS = 64 # image pre-process (resize/padding) workers
PRINT_NUM_VIS_TOKENS = False
SKIP_REPEAT = True
MODEL_PATH = '/root/llms/deepseek-ai/DeepSeek-OCR' # change to your model path
# TODO: change INPUT_PATH
# .pdf: run_dpsk_ocr_pdf.py;
# .jpg, .png, .jpeg: run_dpsk_ocr_image.py;
# Omnidocbench images path: run_dpsk_ocr_eval_batch.py
INPUT_PATH = '/root/DeepSeek-OCR/input/da1444f0cccbb1f1f420e939b9079b3b.jpg'
OUTPUT_PATH = '/root/DeepSeek-OCR/output'
PROMPT = '<image>\n<|grounding|>Convert the document to markdown.'
# PROMPT = '<image>\nFree OCR.'
# TODO commonly used prompts
# document: <image>\n<|grounding|>Convert the document to markdown.
# other image: <image>\n<|grounding|>OCR this image.
# without layouts: <image>\nFree OCR.
# figures in document: <image>\nParse the figure.
# general: <image>\nDescribe this image in detail.
# rec: <image>\nLocate <|ref|>xxxx<|/ref|> in the image.
# '先天下之忧而忧'
# .......
from transformers import AutoTokenizer
TOKENIZER = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
执行文件:
python /root/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/run_dpsk_ocr_image.py
六、测试效果

