I am interested in using ImageDataGenerator in Keras for data augmentation. But it requires that training and validation directories with sub directories for classes be fed in separately as below (this is from Keras documentation). I have a single directory with 2 subdirectories for 2 classes (Data/Class1 and Data/Class2). How do I randomly split this into training and validation directories
    train_datagen = ImageDataGenerator(
    rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)
    test_datagen = ImageDataGenerator(rescale=1./255)
    train_generator = train_datagen.flow_from_directory(
    'data/train',
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary')
   validation_generator = test_datagen.flow_from_directory(
    'data/validation',
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary')
   model.fit_generator(
    train_generator,
    steps_per_epoch=2000,
    epochs=50,
    validation_data=validation_generator,
    validation_steps=800)
I am interested in re-running my algorithm multiple times with random training and validation data splits.
Using scikit-learn (aka sklearn ) train_test_split() Using numpy 's randn() function. or with built-in pandas method called sample()
The main idea of splitting the dataset into a validation set is to prevent our model from overfitting i.e., the model becomes really good at classifying the samples in the training set but cannot generalize and make accurate classifications on the data it has not seen before.
Thank you guys! I was able to write my own function to create training and test data sets. Here's the code for anyone who's looking.
import os
source1 = "/source_dir"
dest11 = "/dest_dir"
files = os.listdir(source1)
import shutil
import numpy as np
for f in files:
    if np.random.rand(1) < 0.2:
        shutil.move(source1 + '/'+ f, dest11 + '/'+ f)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With