YOLO 采用命令训练数据集
yolo train data=coco128.yaml model=yolov8n.pt epochs=10 lr0=0.01 device='0,1'
yolo task=detect mode=train model=yolov8x.yaml data=mydata.yaml epochs=10 batch=16
yolo task=segment mode=predict model=yolov8x-seg.pt source='/kaggle/input/personpng/1.jpg'
以上参数解释如下:
- task:选择任务类型,可选['detect', 'segment', 'classify', 'init']
- mode: 选择是训练、验证还是预测的任务蕾西 可选['train', 'val', 'predict']
- model: 选择yolov8不同的模型配置文件,可选yolov8s.yaml、yolov8m.yaml、yolov8l.yaml、yolov8x.yam
- data: 选择生成的数据集配置文件
- epochs:指的就是训练过程中整个数据集将被迭代多少次,显卡不行你就调小点。
- batch:一次看完多少张图片才进行权重更新,梯度下降的mini-batch,显卡不行你就调小点。
- device: cpu or '0' or '0,1', 采用 cpu or gpu,以及 gpu 编号
- imgsz: 输入图片大小,显卡不行你就调小点。
- name: 模型保存的名称
实际运行:
yolo train data=/home/jxft/datasets/hyd-action.yaml model=/home/jxft/datasets/yolov8/yolov8n.pt epochs=100 lr0=0.01
yolo train data=/home/yiidata/datasets/hyd-action.yaml model=/home/yiidata/datasets/yolov8/yolov8n.pt epochs=10 lr0=0.01 device='0'
# yolov7
python train.py --weights=/home/yiidata/datasets/yolov7/yolov7.pt --data=/home/yiidata/datasets/hyd-action.yaml --img-size='640' --epochs=10 --batch-size=1 --device='0'
# 导出为 NMS 输出结果为 7 的 onnx 模型
python export.py --weights runs/train/exp/weights/best.pt --grid --end2end --simplify --max-wh 640
YOLO 采用代码测试数据集
from ultralytics import YOLO
# Create a new YOLO model from scratch
model = YOLO('yolov8n.yaml')
# Load a pretrained YOLO model (recommended for training)
model = YOLO('yolov8n.pt')
# Train the model using the 'coco128.yaml' dataset for 3 epochs
results = model.train(data='coco128.yaml', epochs=3)
# Evaluate the model's performance on the validation set
results = model.val()
# Perform object detection on an image using the model
results = model('https://ultralytics.com/images/bus.jpg')
# Export the model to ONNX format
success = model.export(format='onnx')
训练参数:
workers = 1
batch = 8
data_name = "TrafficSign"
model = YOLO(abs_path('./weights/yolov5nu.pt', path_type='current'), task='detect') # 加载预训练的YOLOv8模型
results = model.train( # 开始训练模型
data=data_path, # 指定训练数据的配置文件路径
device='cpu', # 指定使用CPU进行训练
workers=workers, # 指定使用2个工作进程加载数据
imgsz=640, # 指定输入图像的大小为640x640
epochs=100, # 指定训练100个epoch
batch=batch, # 指定每个批次的大小为8
name='train_v5_' + data_name # 指定训练任务的名称
)
Q&A
1. 报错 RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm
报错日志:
Transferred 319/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs/detect/train3', view at http://localhost:6006/
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
Traceback (most recent call last):
File "/usr/local/bin/yolo", line 8, in <module>
sys.exit(entrypoint())
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/cfg/__init__.py", line 582, in entrypoint
getattr(model, mode)(**overrides) # default args from model
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/model.py", line 667, in train
self.trainer.train()
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 198, in train
self._do_train(world_size)
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 312, in _do_train
self._setup_train(world_size)
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 256, in _setup_train
self.amp = torch.tensor(check_amp(self.model), device=self.device)
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/utils/checks.py", line 655, in check_amp
assert amp_allclose(YOLO("yolov8n.pt"), im)
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/utils/checks.py", line 642, in amp_allclose
a = m(im, device=device, verbose=False)[0].boxes.data # FP32 inference
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/model.py", line 176, in __call__
return self.predict(source, stream, **kwargs)
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/model.py", line 444, in predict
self.predictor.setup_model(model=self.model, verbose=is_cli)
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/predictor.py", line 297, in setup_model
self.model = AutoBackend(
File "/usr/local/python3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/nn/autobackend.py", line 144, in __init__
model = model.fuse(verbose=verbose)
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/nn/tasks.py", line 184, in fuse
m.conv = fuse_conv_and_bn(m.conv, m.bn) # update conv
File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/utils/torch_utils.py", line 196, in fuse_conv_and_bn
fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Linux 系统,删除 LD_LIBRARY_PATH 环境变量即可解决。
unset LD_LIBRARY_PATH
_pickle.UnpicklingError: STACK_GLOBAL requires str
_pickle.UnpicklingError: STACK_GLOBAL requires str 错误的原因是在数据集的 labels 文件夹中存在 .cache 文件。这些 .cache文件通常是之前训练过程中生成的缓存文件,导致在当前训练过程中出现反序列化错误。具体来说,这个错误通常是由于 .cache文件中的数据格式与当前环境不兼容,或者在序列化和反序列化过程中出现了问题。
解决这个问题的方法即 删除数据集 labels 文件夹中的所有 .cache 文件。以下是详细的解决步骤:
- 定位 .cache 文件:首先,找到数据集文件夹中的 labels 文件夹。
- 删除 .cache 文件:删除 labels 文件夹中的所有 .cache 文件。如果可能,也检查 images 文件夹中是否存在 .cache 文件,并一并删除。
- 重新运行训练脚本:删除缓存文件后,重新运行训练脚本。
yolo7: TypeError: No loop matching the specified signature and casting was found for ufunc greater.
numpy 版本不兼容。先查看当前的 numpy 版本:pip list
卸载当前 numpy 包: pip uninstall numpy
。重新安装特定版本的numpy包(重装的是1.23.5版本的,如果该版本安装完,仍然不能解决报错,可以试试其他版本):
# yolo7
pip install numpy==1.23.5
# yolo8
pip install numpy==1.26.4
ImportError: cannot import name 'builder' from 'google.protobuf.internal'
protobuf 版本不兼容。先查看当前的 protobuf 版本:pip list
# yolo7
pip install protobuf==3.20.3
# yolo8
pip install protobuf==3.19.6