bug - multiple timesheet entries created ¶

CATS-634

Related to CATS-639 and possible CATS-637

data background ¶

There is an assumption built in to the ‘shift/timesheet entry’ relationship that there is one-to-one. Very initially, there was a potential for multiple timesheet entries relating to one shift, to accomodate shifts that spanned timeperiods (the initial position was that wheniwork accomodated that and we needed to, but it was later determined wiw did not do that and that we did not need to).

The initial schema was not explicitly changed to prevent that - meaning a ‘timesheet entry’ entry points to a shift, not the other way around. Changing at this point might be possible, but the application code itself assumes one-to-one and behaves as such.

However… there was at least one situation where a timeperiod was created, and during its creation (which can take several seconds*), another request was run to ‘refresh’ the same time period. From ‘normal’ use patterns, this seems unlikely, but the audit log shows this seems to be the case. Andy pushed back saying the ‘create new time period’ doesn’t take any time at all, and said there’s no spinner overlay at all, but that hasn’t been my experience locally. Local running takes ~3-4 seconds, and typically performance on prod tends to be 2-3x that (would expect 10-15 seconds on prod, depending on what other requests/loads are in front). The ‘worked pool’ was recently expanded, so it may be harder to even recreate the scenario as it was.

Nov 29 audit logs show time period 72 created, and timesheet entries created during the ‘create new period’ request. They also show timesheet entries relating to the same shift created 30 seconds later, created during a ‘refresh period’ request (for period 72), and shows both initiated by the same user. It would be an odd intentional sequence to happen, but someone running multiple tabs and some ‘muscle memory’ may have led to this. Running locally, and adding loads of ‘sleep’ commands in the exection loop, I was able to recreate the specific problem (multiple timesheet entries for the same shift created by two different closely-time requests).

The problem stems at least partially because the bulk of the timesheet entry ‘create or update’ processing is inside a transaction. A concurrent request would not see timesheet entries created in another long-running request, so would create duplicates.

Having an explicit ‘unique’ key on this set of keys to prevent the duplicates was considered, but until now we weren’t sure what triggered this (and, to be sure, I’m not 100% sure even now). Having a unique constraint might trigger some other errors during processing which are not obvious, as the error handling, reporting and logging is not overly robust at this point.

impact ¶

The incentives per shift are a ‘count’ of related shift/timesheetentry/incentive records, with the expectatio of 1 or 0. This is sent to the client, and there’s a hard boolean check against the 1 or 0, but… in some cases we have values of 2 or 4 or whatever, which are causing the incentives to not be displayed.

When multiple timesheet entries are related, decisions about whether someone has clocked in to a shift or not by looking at the timesheet entry record are not working as expected.
The data is also recorded in ShiftTime table, but that data is joined against the shift and timesheet entry, and duplicate entries are still causing problems because of that.

fixes ¶

data fix ¶

The first issue needed to be addressed is the deletion of the duplicated records. Ideally these are hard-deleted, not soft deleted, as it will ensure they’re not accidentally included in any stray reports.

transactions? ¶

Taking the creation outside of a transaction in TimesheetService::updateTimesheetsInPeriod

This adds around 50% execution time in local testing. In some cases, a refresh is already taking multiple minutes (this is when the update call is passed another flag to also update the payroll adjustment records, which is needed most of the time).

Removing transactions would help reduce or eliminate ‘concurrency issues’ (because each timesheet entry would be visible to other concurrent processes immediately) but exacerbates the speed issue (already a problem).

queue ¶

This points to rebuilding and changing the process to run in a queue. This will entail some rewriting of the ordering of the current updatetimesheetentries loop, and extracting out the current code to a queueable job. The current call would then be queueing up all the update jobs individually, and we’ll add a final job to be run towards the end of the queue processing to notify the appropriate administrative users when the job is done. Updating the UI to display a message indicating users will need to check their email when the job is done would be good.

Separately, a ‘run jobs’ tool to list the start and end of a job would be useful from a ‘general status’ perspective. And something like laravel horizon would be more helpful from a debugging standpoint. We can look at those later.

to do ¶

create job to update timesheet entries for a specific user, then the update the timesheet stats for that user’s timesheet (one job)
create ‘done’ job to queue up
modify existing updatetimesheetentries() loop

12/3/21 ¶

Changes ¶

Keep existing ‘update logic’ in place
2 methods that invoked the ‘update logic’ method converted to dispatch ‘UpdateTimesheets’ job
Job takes an actor (invoking user when applicable) to email them notification when completed
Added some logging in the update loop
Added ‘last refreshed’ timestamp to timeperiods, and displayed.

Need to run

php artisan fix:cats-634-1

after deploy.