I’ve been using RunPod.io for a while now to learn and train LoRA building but recently I got an error which coming soon after Kohya_ss script are checking latents.
caching latents.
checking cache validity...
100%|██████████| 40/40 [00:00<00:00, 741.78it/s]
caching latents...
0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/PIL/ImageFile.py", line 242, in load
s = read(self.decodermaxblock)
File "/usr/local/lib/python3.10/dist-packages/PIL/PngImagePlugin.py", line 936, in load_read
cid, pos, length = self.png.read()
File "/usr/local/lib/python3.10/dist-packages/PIL/PngImagePlugin.py", line 177, in read
length = i32(s)
File "/usr/local/lib/python3.10/dist-packages/PIL/_binary.py", line 85, in i32be
return unpack_from(">I", c, o)[0]
struct.error: unpack_from requires a buffer of at least 4 bytes for unpacking 4 bytes at offset 0 (actual buffer size is 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/kohya_ss/./sdxl_train_network.py", line 189, in <module>
trainer.train(args)
File "/workspace/kohya_ss/train_network.py", line 272, in train
train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_latents_to_disk, accelerator.is_main_process)
File "/workspace/kohya_ss/library/train_util.py", line 1917, in cache_latents
dataset.cache_latents(vae, vae_batch_size, cache_to_disk, is_main_process)
File "/workspace/kohya_ss/library/train_util.py", line 950, in cache_latents
cache_batch_latents(vae, cache_to_disk, batch, subset.flip_aug, subset.random_crop)
File "/workspace/kohya_ss/library/train_util.py", line 2235, in cache_batch_latents
image = load_image(info.absolute_path) if info.image is None else np.array(info.image, np.uint8)
File "/workspace/kohya_ss/library/train_util.py", line 2184, in load_image
img = np.array(image, np.uint8)
File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 688, in __array_interface__
new["data"] = self.tobytes()
File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 746, in tobytes
self.load()
File "/usr/local/lib/python3.10/dist-packages/PIL/ImageFile.py", line 248, in load
raise OSError("image file is truncated") from e
OSError: image file is truncated
Traceback (most recent call last):
File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
simple_launcher(args)
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
This is a very strange error which took me a long time to figure out how to fix, I tried various iterations and instance of Kohya_SS on RunPod.io but consistently got this error. Even in the pre-configured pod Stable Diffusion Kohya_ss ComfyUI
The error is due to SDXL_train_network.py file which need the following lines added at the start. This file is stored in the root of the Kohya_ss installation so all you need to do is edit the file and insert the two highlighted lines below.
import argparse
import torch
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
try:
import intel_extension_for_pytorch as ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
ipex_init()
except Exception:
pass
from library import sdxl_model_util, sdxl_train_util, train_util
import train_network
Run the Kohya_ss Training script or command line again and you should have success.
If you'd like to support our site please consider buying us a Ko-fi, grab a product or subscribe. Need a faster GPU, get access to fastest GPUs for less than $1 per hour with RunPod.io