Let's start by addressing the elephant in the room – ideally one should not be posting thousands of rows in a request. They should instead send many smaller batches, or upload a file somewhere to process in chunks with a queue.
But, let's be real - we don't live or work in an ideal world, and sometimes you just need the unreasonably large http request of data to work, and be validated in a reasonable amount of time (what that amount is, is debatable).
I love Laravel, I have always loved Laravel. I'm in the midst of migrating a large API from a JS backend to Laravel, changing as little as possible about the api in the process because the application that consumes it cannot be rebuilt at this time. Thus, queues and the like for processing the data posted to the endpoint in question are off the table.
Now, there is already validation happening client-side on this data, so the naive approach might be to say that's sufficient and just assume good data has been submit, but it's never a good plan to trust user input, so that's off the table too – and, I won't comment on whether or not that was how it used to be... 😉
What we're working with is a bit of general info for a top-level object, and then an array of items consisting mostly of a location name and address parts. These records will be attached as children to the top level object, for someone to work through and manually process later within the application.
For the most part, our validation rules are little more than required
or nullable
and string
though there are also a couple date fields to set a start & end date, that have to adhere to rules around their expected format, ensuring the start comes before the end, and that both dates fit within an allowable window. But still nothing too crazy.
Here's the validation rules I started with in my form request – formatted for clarity & string notation instead of arrays for brevity:
return [
'name' => 'required',
'organization_id' => 'required|exists:organizations,id',
'tags' => 'array',
'tags.*.name' => 'required',
'items' => 'required|array|min:1|max:5000',
'items.*.name' => 'required|string',
'items.*.notes' => 'nullable|string',
'items.*.start_date' => 'required|date:Y-m-d|before_or_equal:items.*.end_date|'.
'before_or_equal:'.$soonest.'|after_or_equal:'.$oldest,
'items.*.end_date' => 'required|date:Y-m-d|after_or_equal:items.*.start_date|'.
'before_or_equal:'.$soonest.'|after_or_equal:'.$oldest,
'items.*.street_address' => 'required|string',
'items.*.city' => 'required|string',
'items.*.state' => 'required|string',
'items.*.zip' => 'required|string|min:5|max:10',
'items.*.tags' => 'array',
'items.*.tags.*' => 'string',
];
Of course my first thought was that maybe the date logic was slowing things down, so I tried to remove it first.. but no joy - on small datasets it does alright, but once you start approaching the max allowed items of 5,000 it will consistently timeout. If I manually override the max execution time
// Uncap the execution time, or even just set a large duration
ini_set('max_execution_time', 0);
it will complete, but in over a minute on my new M4 Pro mac mini, several minutes on my M2 macbook air.. too slow in any case. Especially when if we simply omit all of the per-row item validation items.*.whatever
and simply ensure that items is required, and is an array of appropriate size, the full request lifecycle is only about 250ms from the time I submit a request with 5,000 items in the payload, until it creates all those records, and returns a successful response.
Yikes!
250ish milliseconds, vs – best case – 1 minute.
But, as we've already covered, we obviously don't want to just assume the user input is valid and safe. Even though we're doing client side validation in the application that talks to this api. Eventually we may have outside consumption of the api, or someone could manually post to it – assuming they have/can find proper authentication and whatnot. In any case.. we need to solve this.
There are several issues on github and elsewhere referencing this issue (that's been around for many years), wherein validation with the asterisk to apply rules to array items is sloooow... and while PRs are welcome, sadly the position of the core team thus far has simply been along the lines of "well, you shouldn't be doing that sort of thing." Which, feels like the wrong response when we're talking about a framework that is all about developer experience, etc. But that's a potentially ranty tangent we need not go down.
The good news is - I've found a solution that is gonna work for me!
Thanks to the good folks at Spatie, we have Laravel-Data available, which can be used in place of Form Request Objects, and it supports validation of nested items. So naturally that would be one's first inclination (or maybe just mine) as the next thing to try.
So I replaced my FormRequest Object with a LaravelData object, added my first chunk of permissions, then created another data object for the items in my items array, and applied the relevant permissions to that.
Defining items as an array, and including detail in the PhpDoc block that it should be an array of my Item Data objects got validation working, and there was indeed some performance improvement, but not enough. It came down from about a minute to 35-45 seconds.
We'll call it maybe a 40% improvement. That's not bad, but I wondered whether I could do better because that was still pretty snail-like, and after all I was still looking at the unvalidated speed of 250ms and we're still way too far off of that mark to quit.
So, I reverted back to my prior Form Request object, which I already knew worked well if I only validated the top level stuff and basic validation that items was a required array of 1 - 5000 rows long.
Leaving my validation as just that in the form request object, I added one single line to the start of the method these requests gets routed to:
$items = collect($request->validated('items'))
->map(
fn($item) => ItemData::factory()
->alwaysValidate()
->from($item)
);
So I'm grabbing a collection from the validated items – we know it's an array of appropriate size – then mapping that into instances, ensuring the validation rules are applied, for each individual item. Instead of it applying array validation to the whole set.
The end result is that the same set of 5,000 records that previously took a minute or more, very probably simply timing out the request in most cases – even if you actually have the ability to override max execution time – now completes in about 2 seconds.
We're still not winning any major speed awards, but we do have validated data, and appropriate validation error message responses... did I forget to mention, if any of those objects fail to instantiate it'll throw a 422 response with validation errors? Anyway, we're validating the user input, and doing so in, what I'll consider a reasonable response time given that, as discussed at the beginning, this is definitely a sub-optimal way to ingest this data.
Also of note, in practice the users uploading this data are only sending at most a couple hundred records at a time, so for them this should be plenty fast. (100 completes in about 120ms)
So, here's where we ended up. Rules in the FormRequest object now look like this:
return [
'name' => 'required',
'organization_id' => 'required|exists:organizations,id',
'tags' => 'array',
'tags.*.name' => 'required',
'items' => 'required|array|min:1|max:5000',
];
I have the collection map from above as the first line in my Invokable class that the endpoint is mapped to, and I have this Data object to handle validation of the individual items:
class ValidOrderItemData extends Data
{
public function __construct(
public string $name,
public ?string $notes,
public string $start_date,
public string $end_date,
public string $street_address,
public string $city,
public string $state,
public string $zip,
public ?array $tags,
) {}
public static function rules(): array
{
$soonest = now()->format('Y-m-d');
$oldest = now()->subYear()->startOfMonth()->format('Y-m-d');
return [
'name' => ['required', 'string'],
'notes' => ['nullable', 'string'],
'start_date' => ['required', 'date:Y-m-d', 'before_or_equal:end_date', "before_or_equal:$soonest", "after_or_equal:$oldest"],
'end_date' => ['required', 'date:Y-m-d', 'after_or_equal:start_date', "before_or_equal:$soonest", "after_or_equal:$oldest"],
'street_address' => ['required', 'string'],
'city' => ['required', 'string'],
'state' => ['required', 'string'],
'zip' => ['required', 'string', 'min:5', 'max:10'],
'tags' => ['array'],
'tags.*' => ['string'],
];
}
}
...and just like that, I'm validating a large array of data about 30x faster than using the wildcard property name validation rules will provide out of the box, and at the end of the day it's not really all that much more effort.