I am converting some code which originally used JPEGs as the input to use Matlab MAT files. The code contains the lines:
train_dataset = tf.data.Dataset.list_files(PATH + 'train/*.mat')
train_dataset = train_dataset.shuffle(BUFFER_SIZE) 
train_dataset = train_dataset.map(load_image_train)
If I loop through the dataset and print() each element before map(), I get a set of tensors with the file paths visible.
However, within the load_image_train function, this is not the case, the output of print() is:
Tensor("add:0", shape=(), dtype=string)
I would like to use the scipy.io.loadmat() function to get the data from my mat files but it fails because the path is a tensor and not a string. What does dataset.map() do that appears to make the literal string value no longer visible? How do I extract the string so I can use it as input for scipy.io.loadmat()?
Apologies if this is a stupid question, relatively new to Tensorflow and still trying to understand. A lot of discussion I can find of related issues only applies to TF v1. Thank you for any help!
An overview of tf. data. The Dataset API allows you to build an asynchronous, highly optimized data pipeline to prevent your GPU from data starvation. It loads data from the disk (images or text), applies optimized transformations, creates batches and sends it to the GPU.
With that knowledge, from_tensors makes a dataset where each input tensor is like a row of your dataset, and from_tensor_slices makes a dataset where each input tensor is column of your data; so in the latter case all tensors must be the same length, and the elements (rows) of the resulting dataset are tuples with one ...
To get the shape of a tensor, you can easily use the tf. shape() function. This method will help the user to return the shape of the given tensor.
flat_map method of tf.data.Dataset flattens the dataset and maps the function given in method argument across the dataset. Function provided in argument must return a dataset object. Lets understand working of flat_map with an example.
The tf.data.Dataset.map () function is used to map the dataset through a 1-to-1 transform. transform: A function mapping a dataset element to a transformed dataset element.
Note that when supplieing any dataset you have to give the length, otherwise you get a ValueError: When providing an infinite dataset, you must specify the number of steps to run.message. # Create the tf.data.Dataset from the existing data dataset=tf.data. Dataset.from_tensor_slices((x_train,y_train))# Split the data into a train and a test set.
The easiest way to begin and understand on how to create a tf.data.Dataset is to begin by creating a tensorflow dataset and the best place to start for it is tf.data.Dataset.from_tensor_slices () method. This method accepts numpy arrays/ python lists/ etc and converts them to tensor dataset.
In the below code, I am using tf.data.Dataset.list_files to read a file_path of a image. In the map function I am loading the image and doing the crop_central(basically crops the center part of the image for the given percentage, here I have specified the percentage by np.random.uniform(0.50, 1.00)). 
As you rightly mentioned, it is difficult to read the file as the the file path is of tf.string type and the load_img or any other function to read the image file would require simple string type.
So here is how you can do it -
tf.py_function(load_file_and_process, [x], [tf.float32]). You can find more about it here.string from the tf.string using bytes.decode(path.numpy().Below is the complete code for you reference. You can replace it with your image path while you run this code.
%tensorflow_version 2.x
import tensorflow as tf
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array, array_to_img
from matplotlib import pyplot as plt
import numpy as np
def load_file_and_process(path):
    image = load_img(bytes.decode(path.numpy()), target_size=(224, 224))
    image = img_to_array(image)
    image = tf.image.central_crop(image, np.random.uniform(0.50, 1.00))
    return image
train_dataset = tf.data.Dataset.list_files('/content/bird.jpg')
train_dataset = train_dataset.map(lambda x: tf.py_function(load_file_and_process, [x], [tf.float32]))
for f in train_dataset:
  for l in f:
    image = np.array(array_to_img(l))
    plt.imshow(image)
Output -

Hope this answers your question. Happy Learning.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With