Occasionally one would like to run certain subset of entries to validate. For example, should the process have crashed around entry 10 000 of 20 000, one would only want to revalidate entries 10 000 -> 20 000 in the following request, ignoring the first 10 000.
Currently batch sizes are defined per dictionary. While fine for most cases, changing batch sizes requires code change & redeploy of API.
Lets allow override in controller endpoint. like optional batchSize=200, which would then be fed to the revalidate method. Should still default to original config values when not provided.
Currently only outputs logs in a way that can be observed from render.com. They could also be exposed via simple log endpoint to avoid roundtrip to render. For example:
Move logging to an utility, which preserves the current behavior but also (temporarily) persists the logs
Persistance mechanic can even be as simple as in memory. It will be lost once instance spuns down, but it matters not
Expose that in memory dump via the endpoint, preferably sorted by latest.
Currently uses same 250 entries as batch size for all dictionaries. However, the time it takes to revalidate the entries can vary quite a bit based on how large the dataset to be processed is. For example, Old Icelandic dictionary is quite fast and has few & coincise entries, whereas Old Swedish dictionary is both larger and generally has larger entries.
Add batch size to the per dictionary setup. Preliminary values could be:
Old Norse: 250
Old Icelandic: 300
Old Swedish: 200
Old Norwegian 200
Something seems to have changed in Vercel, as more and more batches are getting timeout. Past baches of up to 250 used to be matter of few seconds, now can take more than 10 (timeout).
As this is not exactly time critical, decrease the batch sizes even further. For example, 100 seems to work fine for Old Swedish.