[07/24/25 10:24:27] INFO     Loading textual model 'ViT-B-16-SigLIP-384__webli' 
                             to memory                                          
[07/24/25 10:24:27] INFO     Setting execution providers to                     
                             ['CUDAExecutionProvider', 'CPUExecutionProvider'], 
                             in descending order of preference                  
2025-07-24 10:24:30.816710599 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=dbbeeeb0e665 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc ; line=571 ; expr=cudnnReduceTensor( CudaKernel::GetCudnnHandle(cuda_stream), reduce_desc, indices_cuda.get(), indices_bytes, workspace_cuda.get(), workspace_bytes, &one, input_tensor, reinterpret_cast<const CudaT*>(input.Data<T>()), &zero, output_tensor, p_output); 
2025-07-24 10:24:30.816818329 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running ReduceL2 node. Name:'ReduceL2_1624' Status Message: CUDNN failure 5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=dbbeeeb0e665 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc ; line=571 ; expr=cudnnReduceTensor( CudaKernel::GetCudnnHandle(cuda_stream), reduce_desc, indices_cuda.get(), indices_bytes, workspace_cuda.get(), workspace_bytes, &one, input_tensor, reinterpret_cast<const CudaT*>(input.Data<T>()), &zero, output_tensor, p_output); 
[07/24/25 10:24:30] ERROR    Exception in ASGI application                      
                                                                                
                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /app/immich/machine-learning/immich_ml/main.py: │
                             │ 177 in predict                                  │
                             │                                                 │
                             │   174 │   │   inputs = text                     │
                             │   175 │   else:                                 │
                             │   176 │   │   raise HTTPException(400, "Either  │
                             │ ❱ 177 │   response = await run_inference(inputs │
                             │   178 │   return ORJSONResponse(response)       │
                             │   179                                           │
                             │   180                                           │
                             │                                                 │
                             │ /app/immich/machine-learning/immich_ml/main.py: │
                             │ 200 in run_inference                            │
                             │                                                 │
                             │   197 │   │   response[entry["task"]] = output  │
                             │   198 │                                         │
                             │   199 │   without_deps, with_deps = entries     │
                             │ ❱ 200 │   await asyncio.gather(*[_run_inference │
                             │   201 │   if with_deps:                         │
                             │   202 │   │   await asyncio.gather(*[_run_infer │
                             │   203 │   if isinstance(payload, Image):        │
                             │                                                 │
                             │ /app/immich/machine-learning/immich_ml/main.py: │
                             │ 195 in _run_inference                           │
                             │                                                 │
                             │   192 │   │   │   │   message = f"Task {entry[' │
                             │       output of {dep}"                          │
                             │   193 │   │   │   │   raise HTTPException(400,  │
                             │   194 │   │   model = await load(model)         │
                             │ ❱ 195 │   │   output = await run(model.predict, │
                             │   196 │   │   outputs[model.identity] = output  │
                             │   197 │   │   response[entry["task"]] = output  │
                             │   198                                           │
                             │                                                 │
                             │ /app/immich/machine-learning/immich_ml/main.py: │
                             │ 213 in run                                      │
                             │                                                 │
                             │   210 │   if thread_pool is None:               │
                             │   211 │   │   return func(*args, **kwargs)      │
                             │   212 │   partial_func = partial(func, *args, * │
                             │ ❱ 213 │   return await asyncio.get_running_loop │
                             │   214                                           │
                             │   215                                           │
                             │   216 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /usr/lib/python3.11/concurrent/futures/thread.p │
                             │ y:58 in run                                     │
                             │                                                 │
                             │ /app/immich/machine-learning/immich_ml/models/b │
                             │ ase.py:61 in predict                            │
                             │                                                 │
                             │    58 │   │   self.load()                       │
                             │    59 │   │   if model_kwargs:                  │
                             │    60 │   │   │   self.configure(**model_kwargs │
                             │ ❱  61 │   │   return self._predict(*inputs, **m │
                             │    62 │                                         │
                             │    63 │   @abstractmethod                       │
                             │    64 │   def _predict(self, *inputs: Any, **mo │
                             │                                                 │
                             │ /app/immich/machine-learning/immich_ml/models/c │
                             │ lip/textual.py:24 in _predict                   │
                             │                                                 │
                             │    21 │                                         │
                             │    22 │   def _predict(self, inputs: str, langu │
                             │    23 │   │   tokens = self.tokenize(inputs, la │
                             │ ❱  24 │   │   res: NDArray[np.float32] = self.s │
                             │    25 │   │   return serialize_np_array(res)    │
                             │    26 │                                         │
                             │    27 │   def _load(self) -> ModelSession:      │
                             │                                                 │
                             │ /app/immich/machine-learning/immich_ml/sessions │
                             │ /ort.py:49 in run                               │
                             │                                                 │
                             │    46 │   │   input_feed: dict[str, NDArray[np. │
                             │    47 │   │   run_options: Any = None,          │
                             │    48 │   ) -> list[NDArray[np.float32]]:       │
                             │ ❱  49 │   │   outputs: list[NDArray[np.float32] │
                             │       run_options)                              │
                             │    50 │   │   return outputs                    │
                             │    51 │                                         │
                             │    52 │   @property                             │
                             │                                                 │
                             │ /lsiopy/lib/python3.11/site-packages/onnxruntim │
                             │ e/capi/onnxruntime_inference_collection.py:220  │
                             │ in run                                          │
                             │                                                 │
                             │    217 │   │   if not output_names:             │
                             │    218 │   │   │   output_names = [output.name  │
                             │    219 │   │   try:                             │
                             │ ❱  220 │   │   │   return self._sess.run(output │
                             │    221 │   │   except C.EPFail as err:          │
                             │    222 │   │   │   if self._enable_fallback:    │
                             │    223 │   │   │   │   print(f"EP Error: {err!s │
                             ╰─────────────────────────────────────────────────╯
                             Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero     
                             status code returned while running ReduceL2 node.  
                             Name:'ReduceL2_1624' Status Message: CUDNN failure 
                             5000: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ;      
                             hostname=dbbeeeb0e665 ;                            
                             file=/onnxruntime_src/onnxruntime/core/providers/cu
                             da/reduction/reduction_ops.cc ; line=571 ;         
                             expr=cudnnReduceTensor(                            
                             CudaKernel::GetCudnnHandle(cuda_stream),           
                             reduce_desc, indices_cuda.get(), indices_bytes,    
                             workspace_cuda.get(), workspace_bytes, &one,       
                             input_tensor, reinterpret_cast<const               
                             CudaT*>(input.Data<T>()), &zero, output_tensor,    
                             p_output);                                         
[Nest] 678  - 07/24/2025, 10:24:31 AM    WARN [Api:MachineLearningRepository~iszohiy5] Machine learning request to "http://127.0.0.1:3003" failed with status 500: Internal Server Error
[Nest] 678  - 07/24/2025, 10:24:31 AM   ERROR [Api:ErrorInterceptor~iszohiy5] Unknown error: Error: Machine learning request '{"clip":{"textual":{"modelName":"ViT-B-16-SigLIP-384__webli","options":{"language":"nl-NL"}}}}' failed for all URLs
Error: Machine learning request '{"clip":{"textual":{"modelName":"ViT-B-16-SigLIP-384__webli","options":{"language":"nl-NL"}}}}' failed for all URLs
    at MachineLearningRepository.predict (/app/immich/server/dist/repositories/machine-learning.repository.js:98:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async MachineLearningRepository.encodeText (/app/immich/server/dist/repositories/machine-learning.repository.js:121:26)
    at async SearchService.searchSmart (/app/immich/server/dist/services/search.service.js:84:25)