I’ve been using RunPod.io for a while now to learn and train LoRA building but recently I got an error which coming soon after Kohya_ss script are checking latents.

caching latents.
checking cache validity...
100%|██████████| 40/40 [00:00<00:00, 741.78it/s]
caching latents...
  0%|          | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/PIL/ImageFile.py", line 242, in load
    s = read(self.decodermaxblock)
  File "/usr/local/lib/python3.10/dist-packages/PIL/PngImagePlugin.py", line 936, in load_read
    cid, pos, length = self.png.read()
  File "/usr/local/lib/python3.10/dist-packages/PIL/PngImagePlugin.py", line 177, in read
    length = i32(s)
  File "/usr/local/lib/python3.10/dist-packages/PIL/_binary.py", line 85, in i32be
    return unpack_from(">I", c, o)[0]
struct.error: unpack_from requires a buffer of at least 4 bytes for unpacking 4 bytes at offset 0 (actual buffer size is 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspace/kohya_ss/./sdxl_train_network.py", line 189, in <module>
  File "/workspace/kohya_ss/train_network.py", line 272, in train
    train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_latents_to_disk, accelerator.is_main_process)
  File "/workspace/kohya_ss/library/train_util.py", line 1917, in cache_latents
    dataset.cache_latents(vae, vae_batch_size, cache_to_disk, is_main_process)
  File "/workspace/kohya_ss/library/train_util.py", line 950, in cache_latents
    cache_batch_latents(vae, cache_to_disk, batch, subset.flip_aug, subset.random_crop)
  File "/workspace/kohya_ss/library/train_util.py", line 2235, in cache_batch_latents
    image = load_image(info.absolute_path) if info.image is None else np.array(info.image, np.uint8)
  File "/workspace/kohya_ss/library/train_util.py", line 2184, in load_image
    img = np.array(image, np.uint8)
  File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 688, in __array_interface__
    new["data"] = self.tobytes()
  File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 746, in tobytes
  File "/usr/local/lib/python3.10/dist-packages/PIL/ImageFile.py", line 248, in load
    raise OSError("image file is truncated") from e
OSError: image file is truncated
Traceback (most recent call last):
  File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in <module>
  File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
  File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
  File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

This is a very strange error which took me a long time to figure out how to fix, I tried various iterations and instance of Kohya_SS on RunPod.io but consistently got this error. Even in the pre-configured pod Stable Diffusion Kohya_ss ComfyUI

The error is due to SDXL_train_network.py file which need the following lines added at the start. This file is stored in the root of the Kohya_ss installation so all you need to do is edit the file and insert the two highlighted lines below.

import argparse
import torch

from PIL import ImageFile

    import intel_extension_for_pytorch as ipex

    if torch.xpu.is_available():
        from library.ipex import ipex_init

except Exception:
from library import sdxl_model_util, sdxl_train_util, train_util
import train_network

Run the Kohya_ss Training script or command line again and you should have success.