DIY Runbook Automation Platforms?
I'm looking to propose some Runbook automation/enhancement stuff, and running into some issues. Wanted to get some feedback from others.
I work for a FAANG doing infrastructure support. Our internal toolset is fairly complete, but does not use the externally available cloud setup that we sell to customers. It also doesn't have any sort of ops runbook setup other than wiki/etc. Our internal services have python APIs exposed, which we have a lot of hacky scripts written to manipulate.
Our ops runbooks tend to involve an awful lot of 'go run this script and then look at the results and do x', or even worse 'go figure out how to gather Z data, then review it'. I'm learning towards trying to propose JupyterHub as a possible platform for this, but there's a few major issues:
- Jupyter is awful with Git thanks to the messy JSON structure, so code reviews/etc are going to be a lot worse. We also use an internal git service that's very opinionated about certain things.
- Jupyter was also primarily built for data science; so there's a lot about the service that either isn't useful, or isn't targeted in our way, so there's a lot of customization we need to do.
- For example, it doesn't have an easy way I've found to do things like chain multiple runbooks together for a branching decision tree system.
I looked into other more runbook-oriented services, and they all seem to really suffer from being much more or less than we need. For example:
- runme.dev is a cool idea, but it doesn't include a Web UI platform like JupyterHub; we specifically want to avoid some sort of personal dev environment for actions like this to avoid issues like 'whoops, John's pipeline got misconfigured/stalled/etc and didn't update and he's got the old version of the script with that bug'.
- Rundeck by PagerDuty does way more than we want, and also has hookups to various services; we can't use any of the hookups, and it also has a very event-driven approach, and I would prefer the flexibility of 'ticket fires, links you to playbook, and you execute it from there', at least for this step of automation.
- Fibreplane is similar to Rundeck. Neat idea, but we don't use any of the services it auto ties into, and also has a whole agent system.
The problem is what I'd really like is basically 'A wiki but with arbitrary code execution' and I can immediately understand why that doesn't exist.
My plan is to propose 'yeah we're going to need to build custom solutions to do XYZ on Jupyter', but are there any other more generic platforms that people have used that work with Python?