[07/24/25 10:24:27] INFO Loading textual model 'ViT-B-16-SigLIP-384__webli' to memory [07/24/25 10:24:27] INFO Setting execution providers to ['CUDAExecutionProvider', 'CPUExecutionProvider'], in descending order of preference 2025-07-24 10:24:30.816710599 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=dbbeeeb0e665 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc ; line=571 ; expr=cudnnReduceTensor( CudaKernel::GetCudnnHandle(cuda_stream), reduce_desc, indices_cuda.get(), indices_bytes, workspace_cuda.get(), workspace_bytes, &one, input_tensor, reinterpret_cast(input.Data()), &zero, output_tensor, p_output); 2025-07-24 10:24:30.816818329 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running ReduceL2 node. Name:'ReduceL2_1624' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=dbbeeeb0e665 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc ; line=571 ; expr=cudnnReduceTensor( CudaKernel::GetCudnnHandle(cuda_stream), reduce_desc, indices_cuda.get(), indices_bytes, workspace_cuda.get(), workspace_bytes, &one, input_tensor, reinterpret_cast(input.Data()), &zero, output_tensor, p_output); [07/24/25 10:24:30] ERROR Exception in ASGI application ╭─────── Traceback (most recent call last) ───────╮ │ /app/immich/machine-learning/immich_ml/main.py: │ │ 177 in predict │ │ │ │ 174 │ │ inputs = text │ │ 175 │ else: │ │ 176 │ │ raise HTTPException(400, "Either │ │ ❱ 177 │ response = await run_inference(inputs │ │ 178 │ return ORJSONResponse(response) │ │ 179 │ │ 180 │ │ │ │ /app/immich/machine-learning/immich_ml/main.py: │ │ 200 in run_inference │ │ │ │ 197 │ │ response[entry["task"]] = output │ │ 198 │ │ │ 199 │ without_deps, with_deps = entries │ │ ❱ 200 │ await asyncio.gather(*[_run_inference │ │ 201 │ if with_deps: │ │ 202 │ │ await asyncio.gather(*[_run_infer │ │ 203 │ if isinstance(payload, Image): │ │ │ │ /app/immich/machine-learning/immich_ml/main.py: │ │ 195 in _run_inference │ │ │ │ 192 │ │ │ │ message = f"Task {entry[' │ │ output of {dep}" │ │ 193 │ │ │ │ raise HTTPException(400, │ │ 194 │ │ model = await load(model) │ │ ❱ 195 │ │ output = await run(model.predict, │ │ 196 │ │ outputs[model.identity] = output │ │ 197 │ │ response[entry["task"]] = output │ │ 198 │ │ │ │ /app/immich/machine-learning/immich_ml/main.py: │ │ 213 in run │ │ │ │ 210 │ if thread_pool is None: │ │ 211 │ │ return func(*args, **kwargs) │ │ 212 │ partial_func = partial(func, *args, * │ │ ❱ 213 │ return await asyncio.get_running_loop │ │ 214 │ │ 215 │ │ 216 async def load(model: InferenceModel) -> │ │ │ │ /usr/lib/python3.11/concurrent/futures/thread.p │ │ y:58 in run │ │ │ │ /app/immich/machine-learning/immich_ml/models/b │ │ ase.py:61 in predict │ │ │ │ 58 │ │ self.load() │ │ 59 │ │ if model_kwargs: │ │ 60 │ │ │ self.configure(**model_kwargs │ │ ❱ 61 │ │ return self._predict(*inputs, **m │ │ 62 │ │ │ 63 │ @abstractmethod │ │ 64 │ def _predict(self, *inputs: Any, **mo │ │ │ │ /app/immich/machine-learning/immich_ml/models/c │ │ lip/textual.py:24 in _predict │ │ │ │ 21 │ │ │ 22 │ def _predict(self, inputs: str, langu │ │ 23 │ │ tokens = self.tokenize(inputs, la │ │ ❱ 24 │ │ res: NDArray[np.float32] = self.s │ │ 25 │ │ return serialize_np_array(res) │ │ 26 │ │ │ 27 │ def _load(self) -> ModelSession: │ │ │ │ /app/immich/machine-learning/immich_ml/sessions │ │ /ort.py:49 in run │ │ │ │ 46 │ │ input_feed: dict[str, NDArray[np. │ │ 47 │ │ run_options: Any = None, │ │ 48 │ ) -> list[NDArray[np.float32]]: │ │ ❱ 49 │ │ outputs: list[NDArray[np.float32] │ │ run_options) │ │ 50 │ │ return outputs │ │ 51 │ │ │ 52 │ @property │ │ │ │ /lsiopy/lib/python3.11/site-packages/onnxruntim │ │ e/capi/onnxruntime_inference_collection.py:220 │ │ in run │ │ │ │ 217 │ │ if not output_names: │ │ 218 │ │ │ output_names = [output.name │ │ 219 │ │ try: │ │ ❱ 220 │ │ │ return self._sess.run(output │ │ 221 │ │ except C.EPFail as err: │ │ 222 │ │ │ if self._enable_fallback: │ │ 223 │ │ │ │ print(f"EP Error: {err!s │ ╰─────────────────────────────────────────────────╯ Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running ReduceL2 node. Name:'ReduceL2_1624' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=dbbeeeb0e665 ; file=/onnxruntime_src/onnxruntime/core/providers/cu da/reduction/reduction_ops.cc ; line=571 ; expr=cudnnReduceTensor( CudaKernel::GetCudnnHandle(cuda_stream), reduce_desc, indices_cuda.get(), indices_bytes, workspace_cuda.get(), workspace_bytes, &one, input_tensor, reinterpret_cast(input.Data()), &zero, output_tensor, p_output); [Nest] 678 - 07/24/2025, 10:24:31 AM WARN [Api:MachineLearningRepository~iszohiy5] Machine learning request to "http://127.0.0.1:3003" failed with status 500: Internal Server Error [Nest] 678 - 07/24/2025, 10:24:31 AM ERROR [Api:ErrorInterceptor~iszohiy5] Unknown error: Error: Machine learning request '{"clip":{"textual":{"modelName":"ViT-B-16-SigLIP-384__webli","options":{"language":"nl-NL"}}}}' failed for all URLs Error: Machine learning request '{"clip":{"textual":{"modelName":"ViT-B-16-SigLIP-384__webli","options":{"language":"nl-NL"}}}}' failed for all URLs at MachineLearningRepository.predict (/app/immich/server/dist/repositories/machine-learning.repository.js:98:15) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async MachineLearningRepository.encodeText (/app/immich/server/dist/repositories/machine-learning.repository.js:121:26) at async SearchService.searchSmart (/app/immich/server/dist/services/search.service.js:84:25)