So I originally had my external library setup in Immich and had all ML, including OCR working properly (I think this was local CPU based). However, I have since then updated my library so wanted to start over and deleted the mount and re-configured everything. I setup the remote ML to point to my main PC (Linux Rocm V7, ampgpu, Docker, etc) running a docker container for the machine learning, listening on port 3003. From all my tests its properly listening on that port, it can see my GPU perfectly (rocm-smi), and when jobs are kicked off my GPU spikes to 100% until completion.
Starting the container (latest on github):
docker run --privileged --ipc=host -v /dev:/dev -v /sys:/sys --network=host -it ghcr.io/immich-app/immich-machine-learning:commit-450dfcd99e8f8010fab500a5abc0432128310824-rocm
However the issue is on each line item when OCR is run, it returns an error:
Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running ConvTranspose node.
Name:'ConvTranspose.0' Status Message: MIOPEN failure 3: miopenStatusBadParm ; GPU=0 ; hostname=-X870E-Taichi-Lite
; file=/code/onnxruntime/onnxruntime/core/providers/rocm/nn/conv_transpose.cc ; line=133 ;
expr=miopenFindConvolutionBackwardDataAlgorithm( GetMiopenHandle(context), s_.x_tensor, x_data, s_.w_desc, w_data,
s_.conv_desc, s_.y_tensor, y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize, false);
MIOpen Error:
/long_pathname_so_that_rpms_can_package_the_debug_info/src/MLOpen/src/ocl/convolutionocl.cpp:869: Buffers cannot be NULL
2025-11-10 20:15:04.333980859 [E:onnxruntime:Default, rocm_call.cc:119 RocmCall] MIOPEN failure 3: miopenStatusBadParm ; GPU=0 ; hostname=-X870E-Taichi-Lite ; file=/code/onnxruntime/onnxruntime/core/providers/rocm/nn/conv_transpose.cc ; line=133 ; expr=miopenFindConvolutionBackwardDataAlgorithm( GetMiopenHandle(context), s_.x_tensor, x_data, s_.w_desc, w_data, s_.conv_desc, s_.y_tensor, y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize, false);
2025-11-10 20:15:04.333990729 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running ConvTranspose node. Name:'ConvTranspose.0' Status Message: MIOPEN failure 3: miopenStatusBadParm ; GPU=0 ; hostname=-X870E-Taichi-Lite ; file=/code/onnxruntime/onnxruntime/core/providers/rocm/nn/conv_transpose.cc ; line=133 ; expr=miopenFindConvolutionBackwardDataAlgorithm( GetMiopenHandle(context), s_.x_tensor, x_data, s_.w_desc, w_data, s_.conv_desc, s_.y_tensor, y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize, false);
[11/10/25 20:15:04] ERROR Exception in ASGI application
Any ideas…..
I also posted here in r/immich: