SmolVLA 异步推理:远程 Policy Server 与本地 Client 实操

🕒 2025-08-28 📁 lerobot 👤 laumy 🔥 16 热度

概述

本文记录lerobot smolvla异步推理实践,将SmolVLA的策略server部署到AutoDL上,真机client在本地笔记本上运行。

环境准备

先登录AutoDL事先搭建好lerobot的环境,这里就不再重复了,参考往期文章。lerobot环境搭建好后,需要先安装smolvla和gRPC。

# 建议先升级打包工具
python -m pip install -U pip setuptools wheel

pip install -e ".[smolvla]"

# 安装 gRPC 及相关
python -m pip install grpcio grpcio-tools protobuf

服务器

python src/lerobot/scripts/server/policy_server.py \
  --host=127.0.0.1 \
  --port=8080 \
  --fps=30 \
  --inference_latency=0.033 \
  --obs_queue_timeout=2

启动成功后的日志如下:

python src/lerobot/scripts/server/policy_server.py   --host=127.0.0.1   --port=8080   --fps=30   --inference_latency=0   --obs_queue_timeout=2
INFO 2025-08-28 10:33:07 y_server.py:384 {'fps': 30,
 'host': '127.0.0.1',
 'inference_latency': 0.0,
 'obs_queue_timeout': 2.0,
 'port': 8080}
INFO 2025-08-28 10:33:07 y_server.py:394 PolicyServer started on 127.0.0.1:8080

被客户端连接后的日志:

INFO 2025-08-28 10:40:42 y_server.py:104 Client ipv4:127.0.0.1:45038 connected and ready
INFO 2025-08-28 10:40:42 y_server.py:130 Receiving policy instructions from ipv4:127.0.0.1:45038 | Policy type: smolvla | Pretrained name or path: outputs/smolvla_weigh_08181710/pretrained_model | Actions per chunk: 50 | Device: cuda
Loading  HuggingFaceTB/SmolVLM2-500M-Video-Instruct weights ...
INFO 2025-08-28 10:40:54 odeling.py:1004 We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Reducing the number of VLM layers to 16 ...
Loading weights from local directory
INFO 2025-08-28 10:41:14 y_server.py:150 Time taken to put policy on cuda: 32.3950 seconds
INFO 2025-08-28 10:41:14 ort/utils.py:74 <Logger policy_server (NOTSET)> Starting receiver
INFO 2025-08-28 10:41:14 y_server.py:175 Received observation #0 | Avg FPS: 3.45 | Target: 30.00 | One-way latency: -9.22ms
INFO 2025-08-28 10:41:14 y_server.py:205 Running inference for observation #0 (must_go: True)
INFO 2025-08-28 10:41:15 ort/utils.py:74 <Logger policy_server (NOTSET)> Starting receiver
INFO 2025-08-28 10:41:15 y_server.py:175 Received observation #0 | Avg FPS: 3.45 | Target: 30.00 | One-way latency: -9.58ms

服务器仅本地监听(12.0.0.1),这样不暴露公网,客户端通过SSH隧道安全转发。

nohup python src/lerobot/scripts/server/policy_server.py \
  --host=127.0.0.1 --port=8080 --fps=30 --inference_latency=0.033 --obs_queue_timeout=2 \
  >/tmp/policy_server.log 2>&1 &

也可以选择后台运行。

客户端

建立SSH转发

在本地客户端线建立SSH本地端口转发(隧道)

ssh -p <服务器ssh的port> -fN -L 8080:127.0.0.1:8080 <用户名@服务器ssh的ip或域名>

如:ssh -p 20567 -fN -L 8080:127.0.0.1:8080 root@connect.xx.xxx.com

如果不想后台运行,运行在前台(Crtl+C结束)
ssh -p 20567 -N -L 8080:127.0.0.1:8080 root@connect.xx.xxx.com

本地运行

python src/lerobot/scripts/server/robot_client.py \
  --robot.type=so101_follower --robot.port=/dev/ttyACM0 --robot.id=R12252801 \
  --robot.cameras="{ handeye: {type: opencv, index_or_path: 6, width: 320, height: 240, fps: 25}, fixed: {type: opencv, index_or_path: 0, width: 320, height: 240, fps: 25}}" \
  --policy_type=smolvla \
  --pretrained_name_or_path=outputs/smolvla_weigh_08181710/pretrained_model \
  --policy_device=cuda \
  --actions_per_chunk=10 \
  --fps=30 \
  --server_address=localhost:8080

–pretrained_name_or_path 会在“服务器上”加载。需要确保服务器上outputs/smolvla_weigh_08181710路径有权重文件。

连接执行的日志如下:

python src/lerobot/scripts/server/robot_client.py   --robot.type=so101_follower --robot.port=/dev/ttyACM0 --robot.id=R12252801   --robot.cameras="{ handeye: {type: opencv, index_or_path: 6, width: 320, height: 240, fps: 25}, fixed: {type: opencv, index_or_path: 0, width: 320, height: 240, fps: 25}}"   --policy_type=smolvla   --pretrained_name_or_path=outputs/smolvla_weigh_08181710/pretrained_model   --policy_device=cuda   --actions_per_chunk=50 --chunk_size_threshold=0.8  --fps=30   --server_address=localhost:8080 --aggregate_fn_name=average
INFO 2025-08-28 10:40:38 t_client.py:478 {'actions_per_chunk': 50,
 'aggregate_fn_name': 'average',
 'chunk_size_threshold': 0.8,
 'debug_visualize_queue_size': False,
 'fps': 30,
 'policy_device': 'cuda',
 'policy_type': 'smolvla',
 'pretrained_name_or_path': 'outputs/smolvla_weigh_08181710/pretrained_model',
 'robot': {'calibration_dir': None,
           'cameras': {'fixed': {'color_mode': <ColorMode.RGB: 'rgb'>,
                                 'fps': 25,
                                 'height': 240,
                                 'index_or_path': 0,
                                 'rotation': <Cv2Rotation.NO_ROTATION: 0>,
                                 'warmup_s': 1,
                                 'width': 320},
                       'handeye': {'color_mode': <ColorMode.RGB: 'rgb'>,
                                   'fps': 25,
                                   'height': 240,
                                   'index_or_path': 6,
                                   'rotation': <Cv2Rotation.NO_ROTATION: 0>,
                                   'warmup_s': 1,
                                   'width': 320}},
           'disable_torque_on_disconnect': True,
           'id': 'R12252801',
           'max_relative_target': None,
           'port': '/dev/ttyACM0',
           'use_degrees': False},
 'server_address': 'localhost:8080',
 'task': '',
 'verify_robot_cameras': True}
INFO 2025-08-28 10:40:40 a_opencv.py:179 OpenCVCamera(6) connected.
INFO 2025-08-28 10:40:41 a_opencv.py:179 OpenCVCamera(0) connected.
INFO 2025-08-28 10:40:41 follower.py:104 R12252801 SO101Follower connected.
WARNING 2025-08-28 10:40:42 ils/utils.py:54 No accelerated backend detected. Using default cpu, this will be slow.
WARNING 2025-08-28 10:40:42 /policies.py:80 Device 'cuda' is not available. Switching to 'cpu'.
WARNING 2025-08-28 10:40:42 ils/utils.py:54 No accelerated backend detected. Using default cpu, this will be slow.
WARNING 2025-08-28 10:40:42 /policies.py:80 Device 'cuda' is not available. Switching to 'cpu'.
INFO 2025-08-28 10:40:42 t_client.py:121 Initializing client to connect to server at localhost:8080
INFO 2025-08-28 10:40:42 t_client.py:140 Robot connected and ready
INFO 2025-08-28 10:40:42 t_client.py:163 Sending policy instructions to policy server
INFO 2025-08-28 10:41:14 t_client.py:486 Starting action receiver thread...
INFO 2025-08-28 10:41:14 t_client.py:454 Control loop thread starting
INFO 2025-08-28 10:41:14 t_client.py:280 Action receiving thread starting
INFO 2025-08-28 10:41:15 t_client.py:216 Sent observation #0 | 
INFO 2025-08-28 10:41:15 t_client.py:469 Control loop (ms): 288.72
INFO 2025-08-28 10:41:15 t_client.py:216 Sent observation #0 | 
INFO 2025-08-28 10:41:15 t_client.py:469 Control loop (ms): 132.22
INFO 2025-08-28 10:41:15 t_client.py:216 Sent observation #0 | 
INFO 2025-08-28 10:41:15 t_client.py:469 Control loop (ms): 127.84
INFO 2025-08-28 10:41:15 t_client.py:216 Sent observation #0 | 
INFO 2025-08-28 10:41:15 t_client.py:469 Control loop (ms): 123.95
INFO 2025-08-28 10:41:15 t_client.py:216 Sent observation #0 | 
INFO 2025-08-28 10:41:15 t_client.py:469 Control loop (ms): 140.21
INFO 2025-08-28 10:41:15 t_client.py:469 Control loop (ms): 0.54
INFO 2025-08-28 10:41:15 t_client.py:469 Control loop (ms): 0.42

以上就是用SSH隧道的方式实现异步推理的过程。

参考:https://hugging-face.cn/docs/lerobot/async

发表你的看法

\t