One job to delete them all

Automation is wonderful, until you write a script that devours itself and the rest of the neighborhood. Then it's time for lunch, maybe a long walk.. a family emergency or something. πŸ˜…

One job to delete them all
Photo by Charl Folscher / Unsplash

Automation has grown on me over the years, all the more in my current role, as we find new ways for Azure DevOps to make certain tasks less tedious and error-prone. One of my latest creations was using the Create Pull Request task from Shayki Abramczyk to setup a job that watches the "master" branch for any changes.

Whenever one of our teams commits code to "master" (maybe the end of a project, or the result of a hotfix), the rest of the teams should merge that code into their own branches eventually. That means remembering to check for updates periodically, but why not just have DevOps do that for us? Hence, a job that generates PRs and emails interested parties who can review and merge them.

Anyway, you can't make an omelette without breaking the occasional egg, and so it goes with learning things... hopefully not too often though, hah. At a previous place of employment we used Jenkins for automated builds and tests and whatnot. Everyone on the team had the ability to create jobs, but I don't think too many people did. Anyway, I was creating a job one time, and needed to clear out a directory in between a couple other steps. Can't remember why. One quick "cd" up a few directories, and one quick "rmdir".. easy-peasy. If it seemed risky at all, well, the UI for Jenkins suggested that the jobs were individual and siloed, so it made sense (in my head) that one job couldn't affect another.

The job looked good so I hit "run" and waited for it to finish. And waited. Funny, this job should've taken maybe 30 seconds, but it'd been running several minutes. OH. Oh no.

Remember in the Matrix, where Neo ends up in that hallway and can access anything? I kinda ended up there I guess, but with a mounting panic attack instead of sunglasses and a cool trenchcoat. It dawned on me that maybe the jobs weren't siloed and that I'd gone up one level too far. Maybe, just maybe, my job was outside of its little workspace folder and was eating the other jobs.

I immediately hit "cancel" and sat there for a minute, stunned. Nevermind how wrong it seemed that that was possible, it might've been and might've just happened. Or maybe I was being paranoid.. maybe it was siloed and there was no problem at all and something else had just led to the long runtime and oh hey it's time for a break maybe I'll just take a loooong lunch and hope for the best....

Ugh.

After I calmed down, I snagged my manager as he was walking by, and we went to talk to the devops guys. They verified that yes, something had deleted a portion of the other jobs and their working directories. @#$%

Fortunately (oh so fortunately), one of them had anticipated this possibility and setup some kind of script that committed all the jobs to a GitHub repo every morning. A short while later, the jobs were restored and functional, except for one or two created shortly before my own doomed job.

The reason I bring this up is because, although it was terribly embarrassing, I'm glad that I fessed up. I don't like looking stupid - who does? - but it would've been so much worse if I hadn't. Even if somehow there was no evidence that I created the bad job, even with being able to restore the jobs from a backup, imagine the lost time and effort that would've gone into trying to figure out why the system started devouring itself?

This was our favorite temple at Angkor Wat.  We went here during our first day visiting the temples and the second day we decided to return at sun up.  We were the only people in the temple for about an hour that morning.  It allowed us to explore and experience the temple without the scores of tourists like the day before.  Definitely the best way to experience this temple.
Photo by James Wheeler 

Besides, it's a good test of character for the place you work, too! If you make a mistake and your coworkers, manager, whoever, can't show a little sympathy and help out, do you really want to work there? Besides besides, if the mistake was unrecoverable, it's not just your fault. Nearly everything in IT (especially when it comes to software and data) is backupable and duplicatable and recoverable (some of those were real words). If you hadn't made the mistake, then someone else would have ... eventually.

And besides besides besides, you really don't want it hanging over your head, waiting to be found out. Plus it makes a perfect story when you're looking for another job someday, and they ask you (as I was), "So, tell us about a time you made a mistake, and how you handled it..." 😏

One last thing, if you're unfamilar with CI tools, then check out TeamCity, Jenkins, and Travis CI. Right now, I'm digging Azure DevOps and its tight integration with the rest of the codebase, agile cards, etc. If you're using GitHub, there's also GitHub Actions. If you want to see one in action, I created a workflow for one of my repos - you can see the yml file here and check out the run history too. Good luck!