024-09-04 20:46:23 INFO [Dataset 0] train_util.py:2326 INFO caching latents with caching strategy. train_util.py:984 INFO checking cache validity... train_util.py:994 100%|█████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 485772.33it/s] 54%|███████████████████████████████▌ | 1339/2500 [10:00<08:55, 2.17it/s][rank2]:[W904 20:56:21.921312745 socket.cpp:428] [c10d] While waitForInput, poolFD failed with (errno: 0 - Success). [rank2]: Traceback (most recent call last): [rank2]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train_network.py", line 446, in [rank2]: trainer.train(args) [rank2]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/train_network.py", line 382, in train [rank2]: accelerator.wait_for_everyone() [rank2]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2564, in wait_for_everyone [rank2]: wait_for_everyone() [rank2]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/utils/other.py", line 138, in wait_for_everyone [rank2]: PartialState().wait_for_everyone() [rank2]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/state.py", line 374, in wait_for_everyone [rank2]: torch.distributed.barrier() [rank2]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 79, in wrapper [rank2]: return func(*args, **kwargs) [rank2]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3936, in barrier [rank2]: work = default_pg.barrier(opts=opts) [rank2]: torch.distributed.DistBackendError: [2] is setting up NCCL communicator and retrieving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Socket Timeout [rank2]: Exception raised from doWait at ../torch/csrc/distributed/c10d/TCPStore.cpp:570 (most recent call first): [rank2]: frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7799c037af86 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libc10.so) [rank2]: frame #1: + 0x16583cb (0x7799a6e583cb in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #2: c10d::TCPStore::doGet(std::string const&) + 0x32 (0x7799ab50ab82 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #3: c10d::TCPStore::get(std::string const&) + 0xa1 (0x7799ab50bd71 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #4: c10d::PrefixStore::get(std::string const&) + 0x31 (0x7799ab4c07c1 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #5: c10d::PrefixStore::get(std::string const&) + 0x31 (0x7799ab4c07c1 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #6: c10d::PrefixStore::get(std::string const&) + 0x31 (0x7799ab4c07c1 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #7: c10d::PrefixStore::get(std::string const&) + 0x31 (0x7799ab4c07c1 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #8: c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId*, bool, std::string const&, int) + 0xaf (0x779971d8ef6f in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) [rank2]: frame #9: c10d::ProcessGroupNCCL::getNCCLComm(std::string const&, c10::Device&, c10d::OpType, int, bool) + 0x114c (0x779971d9ad4c in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) [rank2]: frame #10: + 0x11a31af (0x779971da31af in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) [rank2]: frame #11: c10d::ProcessGroupNCCL::allreduce_impl(at::Tensor&, c10d::AllreduceOptions const&) + 0x10 (0x779971da45e0 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) [rank2]: frame #12: c10d::ProcessGroupNCCL::barrier(c10d::BarrierOptions const&) + 0x69c (0x779971db16ec in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) [rank2]: frame #13: + 0x5cb2ff2 (0x7799ab4b2ff2 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #14: + 0x5cbd7f5 (0x7799ab4bd7f5 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #15: + 0x52dfa0b (0x7799aaadfa0b in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #16: + 0x52dd284 (0x7799aaadd284 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #17: + 0x1adf2b8 (0x7799a72df2b8 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #18: + 0x5cc7764 (0x7799ab4c7764 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #19: + 0x5cc84f5 (0x7799ab4c84f5 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank2]: frame #20: + 0xdb0b68 (0x7799be9b0b68 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so) [rank2]: frame #21: + 0x4b00e4 (0x7799be0b00e4 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so) [rank2]: frame #22: + 0x15adae (0x58c56f040dae in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #23: _PyObject_MakeTpCall + 0x25b (0x58c56f03752b in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #24: + 0x16952b (0x58c56f04f52b in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #25: _PyEval_EvalFrameDefault + 0x19b6 (0x58c56f02ac16 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #26: _PyFunction_Vectorcall + 0x7c (0x58c56f0416ac in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #27: _PyEval_EvalFrameDefault + 0x2a49 (0x58c56f02bca9 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #28: _PyFunction_Vectorcall + 0x7c (0x58c56f0416ac in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #29: _PyEval_EvalFrameDefault + 0x64e2 (0x58c56f02f742 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #30: + 0x169251 (0x58c56f04f251 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #31: _PyEval_EvalFrameDefault + 0x64e2 (0x58c56f02f742 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #32: _PyFunction_Vectorcall + 0x7c (0x58c56f0416ac in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #33: _PyEval_EvalFrameDefault + 0x6d5 (0x58c56f029935 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #34: _PyFunction_Vectorcall + 0x7c (0x58c56f0416ac in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #35: _PyEval_EvalFrameDefault + 0x8cb (0x58c56f029b2b in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #36: _PyFunction_Vectorcall + 0x7c (0x58c56f0416ac in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #37: _PyEval_EvalFrameDefault + 0x8cb (0x58c56f029b2b in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #38: + 0x140096 (0x58c56f026096 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #39: PyEval_EvalCode + 0x86 (0x58c56f11bf66 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #40: + 0x260e98 (0x58c56f146e98 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #41: + 0x25a79b (0x58c56f14079b in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #42: + 0x260be5 (0x58c56f146be5 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #43: _PyRun_SimpleFileObject + 0x1a8 (0x58c56f1460c8 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #44: _PyRun_AnyFileObject + 0x43 (0x58c56f145d13 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #45: Py_RunMain + 0x2be (0x58c56f13870e in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #46: Py_BytesMain + 0x2d (0x58c56f10edfd in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: frame #47: + 0x29d90 (0x7799c1029d90 in /lib/x86_64-linux-gnu/libc.so.6) [rank2]: frame #48: __libc_start_main + 0x80 (0x7799c1029e40 in /lib/x86_64-linux-gnu/libc.so.6) [rank2]: frame #49: _start + 0x25 (0x58c56f10ecf5 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank2]: . This may indicate a possible application crash on rank 0 or a network set up issue. [rank1]:[W904 20:56:21.952798929 socket.cpp:428] [c10d] While waitForInput, poolFD failed with (errno: 0 - Success). [rank1]: Traceback (most recent call last): [rank1]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train_network.py", line 446, in [rank1]: trainer.train(args) [rank1]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/train_network.py", line 382, in train [rank1]: accelerator.wait_for_everyone() [rank1]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2564, in wait_for_everyone [rank1]: wait_for_everyone() [rank1]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/utils/other.py", line 138, in wait_for_everyone [rank1]: PartialState().wait_for_everyone() [rank1]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/state.py", line 374, in wait_for_everyone [rank1]: torch.distributed.barrier() [rank1]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 79, in wrapper [rank1]: return func(*args, **kwargs) [rank1]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3936, in barrier [rank1]: work = default_pg.barrier(opts=opts) [rank1]: torch.distributed.DistBackendError: [1] is setting up NCCL communicator and retrieving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Socket Timeout [rank1]: Exception raised from doWait at ../torch/csrc/distributed/c10d/TCPStore.cpp:570 (most recent call first): [rank1]: frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x75d04a906f86 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libc10.so) [rank1]: frame #1: + 0x16583cb (0x75d0312583cb in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #2: c10d::TCPStore::doGet(std::string const&) + 0x32 (0x75d03590ab82 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #3: c10d::TCPStore::get(std::string const&) + 0xa1 (0x75d03590bd71 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #4: c10d::PrefixStore::get(std::string const&) + 0x31 (0x75d0358c07c1 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #5: c10d::PrefixStore::get(std::string const&) + 0x31 (0x75d0358c07c1 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #6: c10d::PrefixStore::get(std::string const&) + 0x31 (0x75d0358c07c1 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #7: c10d::PrefixStore::get(std::string const&) + 0x31 (0x75d0358c07c1 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #8: c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId*, bool, std::string const&, int) + 0xaf (0x75cffc18ef6f in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) [rank1]: frame #9: c10d::ProcessGroupNCCL::getNCCLComm(std::string const&, c10::Device&, c10d::OpType, int, bool) + 0x114c (0x75cffc19ad4c in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) [rank1]: frame #10: + 0x11a31af (0x75cffc1a31af in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) [rank1]: frame #11: c10d::ProcessGroupNCCL::allreduce_impl(at::Tensor&, c10d::AllreduceOptions const&) + 0x10 (0x75cffc1a45e0 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) [rank1]: frame #12: c10d::ProcessGroupNCCL::barrier(c10d::BarrierOptions const&) + 0x69c (0x75cffc1b16ec in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) [rank1]: frame #13: + 0x5cb2ff2 (0x75d0358b2ff2 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #14: + 0x5cbd7f5 (0x75d0358bd7f5 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #15: + 0x52dfa0b (0x75d034edfa0b in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #16: + 0x52dd284 (0x75d034edd284 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #17: + 0x1adf2b8 (0x75d0316df2b8 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #18: + 0x5cc7764 (0x75d0358c7764 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #19: + 0x5cc84f5 (0x75d0358c84f5 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) [rank1]: frame #20: + 0xdb0b68 (0x75d048db0b68 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so) [rank1]: frame #21: + 0x4b00e4 (0x75d0484b00e4 in /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so) [rank1]: frame #22: + 0x15adae (0x59ca5a26ddae in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #23: _PyObject_MakeTpCall + 0x25b (0x59ca5a26452b in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #24: + 0x16952b (0x59ca5a27c52b in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #25: _PyEval_EvalFrameDefault + 0x19b6 (0x59ca5a257c16 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #26: _PyFunction_Vectorcall + 0x7c (0x59ca5a26e6ac in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #27: _PyEval_EvalFrameDefault + 0x2a49 (0x59ca5a258ca9 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #28: _PyFunction_Vectorcall + 0x7c (0x59ca5a26e6ac in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #29: _PyEval_EvalFrameDefault + 0x64e2 (0x59ca5a25c742 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #30: + 0x169251 (0x59ca5a27c251 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #31: _PyEval_EvalFrameDefault + 0x64e2 (0x59ca5a25c742 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #32: _PyFunction_Vectorcall + 0x7c (0x59ca5a26e6ac in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #33: _PyEval_EvalFrameDefault + 0x6d5 (0x59ca5a256935 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #34: _PyFunction_Vectorcall + 0x7c (0x59ca5a26e6ac in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #35: _PyEval_EvalFrameDefault + 0x8cb (0x59ca5a256b2b in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #36: _PyFunction_Vectorcall + 0x7c (0x59ca5a26e6ac in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #37: _PyEval_EvalFrameDefault + 0x8cb (0x59ca5a256b2b in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #38: + 0x140096 (0x59ca5a253096 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #39: PyEval_EvalCode + 0x86 (0x59ca5a348f66 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #40: + 0x260e98 (0x59ca5a373e98 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #41: + 0x25a79b (0x59ca5a36d79b in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #42: + 0x260be5 (0x59ca5a373be5 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #43: _PyRun_SimpleFileObject + 0x1a8 (0x59ca5a3730c8 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #44: _PyRun_AnyFileObject + 0x43 (0x59ca5a372d13 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #45: Py_RunMain + 0x2be (0x59ca5a36570e in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #46: Py_BytesMain + 0x2d (0x59ca5a33bdfd in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: frame #47: + 0x29d90 (0x75d04b629d90 in /lib/x86_64-linux-gnu/libc.so.6) [rank1]: frame #48: __libc_start_main + 0x80 (0x75d04b629e40 in /lib/x86_64-linux-gnu/libc.so.6) [rank1]: frame #49: _start + 0x25 (0x59ca5a33bcf5 in /home/Ubuntu/apps/kohya_ss/venv/bin/python) [rank1]: . This may indicate a possible application crash on rank 0 or a network set up issue. 54%|███████████████████████████████▋ | 1342/2500 [10:01<08:33, 2.26it/s]W0904 20:56:22.573000 127281695294592 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 17387 closing signal SIGTERM W0904 20:56:22.573000 127281695294592 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 17389 closing signal SIGTERM W0904 20:56:22.574000 127281695294592 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 17390 closing signal SIGTERM E0904 20:56:23.539000 127281695294592 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 1 (pid: 17388) of binary: /home/Ubuntu/apps/kohya_ss/venv/bin/python Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command multi_gpu_launcher(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 734, in multi_gpu_launcher distrib_run.run(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run elastic_launch( File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train_network.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-09-04_20:56:22 host : 0185-dsm-prxmx30120 rank : 1 (local_rank: 1) exitcode : 1 (pid: 17388) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ 20:56:24-239719 INFO Training has ended.