Reduce overall node provision time
Individual issue in regards to: #38
Already a significant amount of time has been reduced by running certain playbooks depending on scale up or down (we don't need to re-run the init or cron playbooks on scaling down for example).
As we've been provisioning and iterating over development, some software thats installed post OS Image creation should be baked into the image for how infrequent they are changed. Before when we weren't sure how frequently we may update them, post image baking made sense. Now I'd rather trigger a single 15 minute packer install and never have to install it again for the next 40+ machines provisioned until we decide to update software versions.
So we can optimize the machine provisioning time by doing extra steps:
- Moving some post packer installs to during packer image creation
- Downloading docker images during packer image creation
- Optimize playbooks
- Split and group playbooks better
- Move many shell/command actions to built-in ansible modules
- Further optimize when playbooks run for all vs new hosts vs scaling in general
At the moment we're at about 10-15 minute machine provisions for non-gitlab nodes and 45ish minutes for gitlab install/reconfigures/restores.
Goal is 5 minutes from terraform apply to additional machines being online (initial certs/cluster creation will add 1-3 minutes). With 5-10 minutes shaved off basic nodes, we can also further improve gitlab installation by at least 10 minutes by limiting PAT creation and sleeps, and getting the install/restore down to sub 30 minutes, shooting for 25.