Create safe instance runners
There will have to be a lot of preemptive work before getting here but a good mvp would be a single instance runner with extremely limited scope that can be used for new users/projects. Now that we've determined that merge_requests from forked projects dont run pipelines in the parent/target project, we can worry less about impact on projects and more impact on the cluster.
Basically Github Actions runs on some cluster/server and they allow access to registered users to run them.
The first iteration would be a single runner with limited resources and can only run 1 job at a time, this would prevent resources from straining the server.
- A new user creates a new project
- The user can assign/register this runner to their project
- This runner can run some limited things like buildctl and launch pods
- Need to make sure the buildkit pod it has access to has limited resources as well
- Can only deploy to limited namespaces
- Cannot interfere with other namespaces.
11/19/22
Some enlightenment and new things learned that brings up more obstacles.
New gitlab agents with a gitops flow could help with the situation but that needs more investigation.
- Agents don't really solve any problem for us currently, or quite possibly do anything for us as we use service accounts for kubernetes runners.
- Instance level runners attached to service accounts for dev/beta/prod namespace for protected branches works for our personal/private usage, but does not work for additional projects/users.
- New projects can protect any branch, update the .gitlab-ci.yml to use the instance runner, which then uses that service account, which has access to any apps running in that runners namespace.
- So far even though we've been using service account k8s runners, using cert based (which we're moving from currently) cluster integration actually gave each runner the level of the
gitlab-admincluster-admin level access, if they were not gitlab-managed. It appears the runner was provided a KUBECONFIG file that would provide the runner the access of thegitlab-adminaccount, even though the runner itself was a service account with limited access.- The runner having limited access was correct on what the runner itself could do, like create the runner pod in the correct namespace, get secrets etc. Deploying/accessing additional kubernetes access was accidentally then provided via the KUBECONFIG file provided to the runner via the cert-based cluster integration explained below (this was an oversight on my part on how gitlab communicated to runners). Each of the cert-based integrations I used should have been based on the service account, not the gitlab-admin account. Since we're moving to agent-based we no longer have to worry.
- This is because we have
gitlab-runnerbinary running on the machine itself, - Basically theres a lot of abstraction between the
gitlab-runnerbinary installed, thekubernetes executorthegitlab-runnerregisters as with the specific service accunt, and thegitlab-adminservice account that gitlab used to communicate with the cluster to say "launch this runner", and provided the runner (docker, shell, k8s, doesnt matter) several k8s related things such as KUBE_NAMESPACE, KUBECONFIG, that would allow the runner to then act as a cluster-admin when deploying k8s resources.
The orignal implenation for instance level runners was so I could easily use few runners across any project without configuring/giving individual access per project. These should/would be locked from using them if given the ability to register as a user and a new project. While the problem of locking people from using these runners is trivial, what I would like to do is give people access to be able to use kubernetes runners to deploy resources to my cluster. Problem is there is no way to limit the resources of the applications they would deploy in their manifests.
Runner is limited from building using too many resources or too much time however, runner than says "Deploy this helm chart to the cluster with these resources in this project" and I currently don't have any idea on how to restrict those deployed applications resources. I dont know if its possible to restrict an applications resource based on namespace.
Ohhhh we can limit based on namespaces with ResourceQuota. I also hadnt even thought about quotas on the number of objects themselves.
https://kubernetes.io/docs/concepts/policy/resource-quotas/
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/quota-pod-namespace/
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/
Below is a stream of sparse brainstorming ideas on how to use hierarchical namespaces Id rather have here instead of the addAgentsToGitlab file its currently in. The agents role in it is wrong for the most part, but it does illustrate what we're trying to accomplish.
With the ability to limit resources for a namespace I just now learned, I am even more interested in hierarchical namespaces.
Basically the main obstacle is, how do we allow "untrusted" users (not just anyone could sign up, but theoretically speaking) create a new project, and deploy to 1-X unique and dynamically created namespaces, very simply from their .gitlab-ci.yml
We determined we do not need let the runner create rolebindings if we use the default service account for the namespace.
The idea would be, the runner can essentially only create a hierarchical namespace.
- Runner creates HNS
- Runner gets/provides token for default service account for said HNS
- Whats important and something not figured out, is that this runner cannot get/use tokens for other HNS it has created.
- Id like to believe issueing/using tokens provided via the token request API can solve this problem since they by nature, expire after a duration and are not long-lived
- How we overwrite the default service accounts token for our HNS we just created and nobody elses... the problem
- Subsequent pipeline steps/stages use this token/service account for the HNS, allowing for deploying any resources to the namespace (limited by ResourceQuota policy).
- Hopefully the ResourceQuoate policy could apply to the HNS and all those derived from it, as targeting resources by wildcards is not supported. This should work given what we've read about HNS and propagation.
- Lastly, how do we limit the amount of HNS can be created by the user, whats stopping them from creating 10 namespaces bypassing resource limits/quotas etc.
This is over my head but something to think about - https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks
### Hmmm, I think we only need to be able to create pods in our own namespace...
### We dont actually DO the creating of the runner pods, I think we tell the apiserver
### "Hey launch these pods in this namespace with this service account" and the apiserver determines
### if thats allowed or not, we dont actually do a `helm install` or anything
### So many we literally only need 1 maybe 2 gitlab agents in the cluster?
### Its hard to determine what it actually DOES and if letting other "unverified user/projects" use the agent is bad
### I feel like its simply another layer of auth to allow impersonation for premium users without the need
### for service accounts tied to runners maybe?
### We have all the information in the runner - api address, service account, token, namespace
### Like can we we still go business as usual without an agent or an agent limited to only its namespace?
### Is it literally only there as a hole-punching tunnel to allow gitlab to communicate with a cluster that doesnt have a public endpoint?
### End goal is to let people use extremely limited runners but how imporant is separating agents
### Going to try one agent in one namespace with a rolebinding in its own namespace and see if we
### can use our runners with service accounts to deploy to all namespaces based on runner service account permissions
#if [[ $AGENT_NAME = "review" ]]; then
# cat <<-EOF >> $GL_AGENT_FILE_LOCATION
# apiVersion: rbac.authorization.k8s.io/v1
# kind: RoleBinding
# metadata:
# name: gitlab-agent-rolebinding
# namespace: $NAMESPACE
# subjects:
# - kind: ServiceAccount
# name:
# namespace: $NAMESPACE
# roleRef:
# kind: ClusterRole
# name: $GL_CLUSTER_AGENT_ROLE_NAME
# apiGroup: rbac.authorization.k8s.io
# ---
# EOF
#fi
#kubectl apply -f $GL_AGENT_FILE_LOCATION
## TODO: Mixing agents and runners aint great, but also mixing environments aint great
## Having a NS per runner and per agent seems like overkill but according to the rolebindings docs
## just having edit access in the namespace allows access to any other service account in the NS
## Runners need to be able to access secrets for the registry and agents need to be able to at least create
## pods/runners.. so we're in a bit of a pickle.
## If agents and runners are in their own namespace, they need to be able to create pods in another namespace
## Maybe a compromise is just "review/feature" agent namespace and "dev,beta,prod" agent namespace
## Then the same for runners "review/feature" runner namespace and "dev,beta,prod" runner namespace (we kinda do this with 'review' and 'deploy')
## Then do what we're doing with a namespace for each tier/stage of apps review, dev, beta, and prod
## https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles
## Time to revisit hierarchical namespaces - agent > runner-that-can-create-HNS > HNS
#https://github.com/kubernetes-sigs/hierarchical-namespaces/releases
###### Long winded brainstorm write-up/vomit
## The agent can create/delete runner pods etc.
## Give the runner the ability to create dynamic hierarchical namespace but not delete namespaces
## Then just have a cronjob schedule deletes on HNS's however we see fit
## This gives isolation of pods/resources (kubectl get pods --all -n HNS), nothing actually running in the runner NS to worry about
## (can probably limit ability to create pods in its own NS along with anything else, maybe a setup step to create a SA for that HNS),
## then subsequent pipeline steps use this newly setup SA with admin access of this very isolated namespace, which gives runners
## full ability to manage that HNS's resources, and gives us a way to clean up the namespace without accidently
## deleting important namespace/resources (we can label these namespaces)
## Agent in NS1 creates runner pod in RUNNER_NS1 which can only create HNS in RUNNER_NS1
## POD in RUNNER_NS1 creates DYNAMIC_HNS1 and a service account for DYNAMIC_HNS1 with admin access
## Now followup pipeline steps use (admin) service account in DYNAMIC_HNS1 to create pods/ingress/secrets etc freely but limited to
## DYNAMIC_HNS1 without being able impersonate another service account or worry about other apps/resources
## Cleanup job deletes HNS if its been active/inactive for 3 days or something idk
## The only thing I think that needs to be figured out is dynamically creating a service account with admin for this new HNS then
## having follow up pipeline steps use that service account.
## I think thats how and why we would use the following settings in gitlab
## - bearer_token_overwrite_allowed
## - namespace_overwrite_allowed
## - service_account_overwrite_allowed
## All resources would be something like review-hns-MY_COOL_NAMESPACE etc and pipeline steps then are allow to overrwite to use
## our new unique namespace/resources based on our branch name
## Hopefully in allowing the creation of HNS we're allowed to create service accounts for it, that seems to be a bottleneck
## review-runner-ns
## Hell hopefully we can allow creating HNS only within that namespace and not anywhere else etc.
## Answer to the "only within the HNS", the answer is YES - subnamespaces:
## https://github.com/kubernetes-sigs/multi-tenancy/blob/master/incubator/hnc/docs/user-guide/concepts.md#basic-subns
## kubectl hns create review-runner -n review-agent
## kubectl -n review-runner create serviceaccount review-runner
## kubectl -n review-runner create role create-sa-for-apps --verb=create,delete,update --resource=serviceaccounts (or clusterrole)
## kubectl -n review-runner create rolebindinding create-sa-review-runner --role create-sa-for-apps --serviceaccount=review-agent:review-runner
## Now that the create-sa-review-runner SA has the ability to create rolebindings in review-runner, how can that be abused in CI
## I can now create a rolebinding that allows cool-new-app-sa in cool-new-app to get/create/delete secrets in thatguys-cool-new-app which is bad
## I have project-A with access to review-runner
## He has project-B with access to review-runner
## With access to review-runner, he can create a rolebind for his-cool-app-sa in project-B to access my-cool-app secrets in project-A
## stuck again
## but rolebindings cant cross namespaces tho now that I remember... hmmmm
## they can... just 'resourceNames' targeting doesnt work cross-namespace... reason buildkit role
## has to use a clusterrolebinding
## we have a 'deploy' serviceaccount in 'deploy' namespace able to do actions in beta/dev/prod namespace
## with the use of rolebindings
## so if the runner can create rolebindings, it can in fact give itself permission to get secrets in another namespace
## maybe we use the new secret-attaching method for account tokens, attach it to the default account of the new namespace
## get the token, overwrite the bearer_token to allow the runner to use the new namespace
## We would only need to give the runner to create secrets in these HNS but not get them
## Maybe we use the `kubectl create token` and bind to a namespace with a short duration, like only the duration of the run
## It just creates/refreshes the token every run. These are also tokens that automatically expire no matter what
## (We originally tried to use these for runner service accounts and eventually the runners would just stop working even with duration 0s)
## https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-
## Between the fact each new namespace gets a default service account to use and the `create token` api, feel like we can find a solution
## without creating a service account or using additional rolebindings
########
########
########
########
## On push, create a NS and service account for branch that has nothing to do with .gitlab-ci.yaml file, and having the user use
## our shared runner (that becomes the appeal of using it) that can only interact with that NS with that service account
## This is harder than it looks and we're still stuck there
## Letting people use runners for OUR project is not a problem anymore, but new projects to use our shared runner is...
########
## Can we limit a service account to ONLY ALLOW CREATING SUBNAMESPACES to our runner, and only get the service account
## token for the subnamespace we just created
## Can we make like a kubernetes hook to like on `kubectl hns create` or `kubectl create ns` to grab the service account
## token and have it stored in a variable
## Maybe we force the user to provide a KUBE_SERVICE_TOKEN CI variable, auto assign it to this newly created namespaces
## service account, then you overrwrite the bearer_token with this variable to use this new namespace or it just fails
## Forcing the user to provide a unique token to provide to the unique namespaces service account...??
## Basically the review-runner can only create namespaces, which means you cant do a deploy of any kind,
## so for the following steps to be able to deploy/delete/push etc, the runner must have a token provided to overrwrite the default token
## for the new namespace, then use bearer_token_overwrite, using the new namespace, allows the runner to get/create/update as the default accont
## of that namespace
## Can we manually populate this service token based on like the
########
########
########
## review-agent
## |-- review-runner
## AS service account review-runner (with limited access) can
## kubectl hns create review-cool-new-app -n review-runner
## kubectl -n review-cool-new-app create serviceaccount cool-app-sa
## Dont need to create admin as its a clusterrole already
## kubectl -n review-cool-new-app create rolebindinding cool-app-admin --role admin --serviceaccount=review-runner:cool-app-sa
## review-agent
## |-- review-runner
## |-- review-cool-new-app
## now AS service-account cool-app-admin further down in pipelines
## kubectl get deploy,secrets,svc,pod -n review-agent
## NOPE
## kubectl get deploy,secrets,svc,pod -n review-runner
## NOPE
## kubectl get deploy,secrets,svc,pod -n review-cool-new-app
## review-cool-new-app-deploy-1
## review-cool-new-app-secret-1
## review-cool-new-app-svc-1
## review-cool-new-app-pod-1
## Allows us to let runners create unique namespaces and limited service accounts on the fly without giving
## them full access to create/manage namespaces/serviceaccounts cluster-wide
## Is there a way to allow the review-runner service account to make service accounts and rolebindings for dynamic/new namespaces
## Allowing the review-runner SA to create namespaces (as long as it cant delete them) isnt that big of a deal
## But that review-runner SA now needs to be able to now at least be able to create a SA and rolebinding in that new namespace
## Then we have the SA/namespace override to the newly made/dynamic NS/service account that needs to have admin access
## Feels like this can work if we get over those obstacles
#https://github.com/kubernetes-sigs/hierarchical-namespaces/releases