Moving Files in AI Datasets

Updated February 8, 2021

Some AI datasets have directories where each directory represents a class. A particular directory can contain a large number of files. Because of time and memory constrains you may want to train your model using a smaller number of files.

We detail how to move the first n files from a directory into a new directory.
This is done for each directory (class) in the dataset.

We also show how to get a random sample of n items from a directory.

Example — The keywords Dataset

The dataset “keywords” can be downloaded from:

I created a folder, “Dataset” on my desktop and put the downloaded folder “keywords2” into it.

The dataset contains:

  • yes — 1,500 one-second samples with only the word “yes” in it.
  • no — 1,500 one-second samples with only the word “no” in it.
  • unknown — 1,504 one-second samples of other words.
  • noise — 1,500 one-second samples of background or static noise.

The next step is to create destination directories for the smaller number of files you want.

Create Destination Directories

Moving the Files

The above zsh mv code was found at

Thanks to:

Random Sampling From a Directory

Start by creating the destination directories.

Get a Random Sample of Files — Copy to New Directory

ronald@Ronalds-MBP keywords2 % ls yes/*.wav | sort -R | tail -400 | xargs -J{} cp {} yes400
ronald@Ronalds-MBP keywords2 % ls no/*.wav | sort -R | tail -400 | xargs -J{} cp {} no400
ronald@Ronalds-MBP keywords2 % ls unknown/*.wav | sort -R | tail -400 | xargs -J{} cp {} unknown40
ronald@Ronalds-MBP keywords2 % ls noise/*.wav | sort -R | tail -400 | xargs -J{} cp {} noise400
ronald@Ronalds-MBP keywords2 %

Thanks to Naveen at Edge Impulse for the above code:

The above code uses the xargs utility. Xargs allows you to build and execute commands from standard input. For more information see:

TinyML, AI, Edge Impulse, Arduino, Raspberry Pi, Pickleball