Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Validate huge data using LazyCollection Laravel

I'm trying to validate huge amount of data using Laravel LazyCollection, I test the code with 15 thousands rows each contains 9 columns to be validated.

The scenario is user upload the excel file, then convert it to array, after that the validation of data begins

The Controller :

class ImportBudget extends Controller
{
    use SpreadsheetTrait;

    public function import(Request $request) 
    {
        // File format validation
        $validatedFile = SpreadsheetTrait::fileHandler($request);
        
        if (!is_array($validatedFile)) return $validatedFile;

        $messages = [
            'required' => 'This field is required, please input.',
            'numeric' => 'Please input number value only.',
            'integer' => 'Please input integer value only.',
            'min' => 'Minimal input value is :min.',
            'required_if' => 'This field is required, please input.',
        ];
        
        // Cars data which contain 30k++ of arrays
        $car = Cache::get('car')->pluck('slug')->toArray();

        // Start Data Validation
        $validatedData = LazyCollection::make(function () use($validatedFile) {
            $data = collect($validatedFile);
            yield $data;
        })->chunk(1000)->each(function ($rows) use ($car, $messages) {
            return Validator::make($rows->toArray(), [
                '*.0' => ['required', function ($attribute, $value, $fail) {
                    if (!in_array($value , config('constants.purposes'))) {
                        $fail('The purpose field is invalid.');
                    }
                }],
                '*.1' => 'required_if:*.0,PRODUCTION-PROJECT',
                '*.2' => 'required',
                '*.3' => 'required',
                '*.4' => 'required',
                '*.5' => 'required',
                '*.6' => 'required',
                '*.7' => ['required', function ($attribute, $value, $fail) use($car) {
                    if (!in_array($value, $curr)) {
                        $fail('The car is invalid.');
                    }
                }],
                '*.8' => 'required|numeric|min:0',
                '*.9' => 'required|integer|min:1',
            ], $messages);
        });
    }
}

The code above resulting error max executing times :

{
    "message": "Maximum execution time of 60 seconds exceeded",
    "exception": "Symfony\\Component\\ErrorHandler\\Error\\FatalError",
    "file": "C:\\laragon\\www\\bulus-jai\\vendor\\laravel\\framework\\src\\Illuminate\\Collections\\Arr.php",
    "line": 115,
    "trace": []
}

Even i add the execution time to 120, the result remain the same

Note that the $car variable contains 30k++ arrays, i think this also make this validation slower aswell, but i don't have any idea how to make it more simpler.

What is the best solution for this ?

Update 1

I tried to switch using my own validation script by create the services , and the result is pretty good (around 5 ~ 10 seconds for 15k rows) :


class BatchValidationServices {

    public static function budgetValidation($validatedFile)
    {
        $requiredFields = [
            0 => true,
            1 => false,
            2 => true,
            3 => true,
            4 => true,
            5 => true,
            6 => true,
            7 => true,
            8 => true,
            9 => true,
            10 => false
        ];

        $curr = collect(Cache::get('curr'))
            ->where('term.status','BUDGETING')
            ->pluck('name')
            ->toArray();
        $item = Cache::get('item')->pluck('item_code')->toArray();
        $car = Cache::get('car')->pluck('slug')->toArray();
        $deliveryPlan = Cache::get('delivery');
        $orderPlan = Cache::get('order');

        $collectedData = LazyCollection::make(function () use($validatedFile) {
            $data = collect($validatedFile);
            yield $data;
        }); 

        $errors = [];

        $collectedData->chunk(1000)
            ->each(function ($collection)  use (
                $orderPlan,
                $deliveryPlan,
                $car,
                $curr,
                $item,
                $requiredFields, &$errors){
                foreach ($collection->toArray() as $array) {
                    foreach ($array as $key => $row) {
    
                        // Validate blank rows
                        for ($i=0; $i < count($requiredFields); $i++) {
                            if ($row[$i] === null &&
                            $requiredFields[$i] === true) {
                                array_push($errors, [$key.'.'.$i => 'This field is required.']);
                            }
                        }
    
                        // Validate purpose validity
                        if (!in_array($row[0], config('constants.purposes'))) array_push($errors, [$key.'.0' => 'Purpose is invalid.']);
    
                        // Validate required preparation item
                        $preparationItems = array_column(config('constants.preparations'), 'item');
    
                        if ($row[0] === 'PRODUCTION-PROJECT' && $row[1] === null) {
                            array_push($errors, [$key.'.1' => 'This field is required if the purpose is PRODUCTION-PROJECT.']);  
                        } elseif ($row[0] === 'PRODUCTION-PROJECT' && $row[1] !== null) {
                            if (!in_array($row[1], $preparationItems)) array_push($errors, [$key.'.1' => 'Production preparation item is invalid.']);
                        }
    
                        // Validate order plan & delivery plan 
                        if (!in_array($row[2], $orderPlan)) array_push($errors, [$key.'.2' => 'Order plan is invalid.']);
                        
                        if (!in_array($row[3], $deliveryPlan)) {
                            array_push($errors, [$key.'.3' => 'Delivery plan is invalid.']);
                        } else {
                            if ($row[3] < $row[2]) array_push($errors, [$key.'.3' => 'Delivery plan should be after or at least in the same period as order plan.']);
                        }
                        
                        // Validate destination-carline
                        if (!in_array($row[4], $car)) array_push($errors, [$key.'.4' => 'Destination-carline is invalid.']);

                        // Validate Origin
                        if (!in_array($row[5], ['DOMESTIC', 'IMPORT'])) array_push($errors, [$key.'.5' => 'Origin supplier is invalid, please choose between DOMESTIC or IMPORT only.']);

                        // Validate Item
                        if(!in_array($row[6], $item)) array_push($errors, [$key.'.6' => 'Item code is invalid.']);

                        // Validate Currency
                        if (!in_array($row[7], $curr)) {
                            array_push($errors, [$key.'.7' => 'Currency is invalid.']);
                        } else {
                            if ($row[5] === 'IMPORT' && $row[7] === 'IDR') array_push($errors, [$key.'.7' => 'IDR currency shouldn\'t be used for IMPORT.']);
                        }

                        // Validate Price
                        if (!is_numeric($row[8])) {
                            array_push($errors, [$key.'.8' => 'Please only input numerical value.']);
                        } else {
                            if ($row[8] <= 0) array_push($errors, [$key.'.8' => 'Value must be greater than 0.']);
                        }

                        // Validate Qty
                        if (!is_integer($row[9])) {
                            array_push($errors, [$key.'.9' => 'Please only input numerical value.']);
                        } else {
                            if ($row[9] <= 0) array_push($errors, [$key.'.9' => 'Value must be greater than 0.']);
                        }
                    }
                }

                return $errors;
            });

        if (count($errors) > 0) {
            return $errors;
        } else {
            return true;
        }
    }
}

However I am still wondering why, when i use the built in Laravel validation, it's taking so long? I'd prefer using the Laravel validation because the code is more readable.

like image 390
owf Avatar asked Sep 05 '25 03:09

owf


1 Answers

Since you have already loaded the entire contents of the spreadsheet into the $validatedFile variable, why make a LazyCollection object? Their only purpose is to save memory by not loading large data sets into memory. Your validation rules using closures can also be cleaned up. This isn't just a cosmetic change: in_array() is notoriously slow.

class ImportBudget extends Controller
{
    use SpreadsheetTrait;

    public function import(Request $request) 
    {
        // File format validation
        $validatedFile = SpreadsheetTrait::fileHandler($request);
        
        if (!is_array($validatedFile)) {
            // this should be throwing an exception of some kind
            return $validatedFile;
        }

        $purposes = config('constants.purposes');

        // Cars data which contain 30k++ of arrays
        $car = Cache::get('car')->pluck('slug');

        $rules = [
            '*.0' => ['required', Rule::in($purposes)],
            '*.1' => ['required_if:*.0,PRODUCTION-PROJECT'],
            '*.2' => ['required'],
            '*.3' => ['required'],
            '*.4' => ['required'],
            '*.5' => ['required'],
            '*.6' => ['required'],
            '*.7' => ['required', Rule::in($car)],
            '*.8' => ['required', 'numeric', 'min:0'],
            '*.9' => ['required', 'integer', 'min:1'],
        ];

        $messages = [
            'required' => 'This field is required, please input.',
            'numeric' => 'Please input number value only.',
            'integer' => 'Please input integer value only.',
            'min' => 'Minimal input value is :min.',
            'required_if' => 'This field is required, please input.',
        ];

        // Start Data Validation
        $validatedData = Validator::make($validatedFile, $rules, $messages));
    }
}

If slug is guaranteed to be unique, speed could be improved by using it as the index of the array:

$car = Cache::get('car')->pluck('id', 'slug');

Then your validation rule becomes a super quick closure that only has to check for the existence of the key:

'*.7' => ['required', fn ($k, $v, $f) => $car[$v] ?? $f("The car in $k is invalid")],
like image 150
miken32 Avatar answered Sep 08 '25 01:09

miken32