Issues This KB Resolves
- Accessing external cloud services using workload identity on GCP for notebooks
- Acessing external cloud services using workload identity on GCP for pipeline runs
- Using the skel-controller to annotate multiple service accounts ( default-editor or pipeline-runner)
- Bringing data from a data lake to a local volume
- Data versioning and lineage with Rok
The Process
Single namespace
1. Create GCP service account
gcloud iam service-accounts create GSA_NAME \
--project=GSA_PROJECT
2. Confirm service account has the appropriate roles
gcloud projects add-iam-policy-binding PROJECT_ID
\
--member "serviceAccount:GSA_NAME
@GSA_PROJECT
.iam.gserviceaccount.com" \
--role "ROLE_NAME
"
3. Create IAM policy binding. ( default-editor SA for notebooks)
gcloud iam service-accounts add-iam-policy-binding GSA_NAME@GSA_PROJECT.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/default-editor]"
4. Annotate the service account
kubectl annotate serviceaccount default-editor \
--namespace NAMESPACE \
iam.gke.io/gcp-service-account=GSA_NAME@GSA_PROJECT.iam.gserviceaccount.com
5. test
apiVersion: v1
kind: Pod
metadata:
name: workload-identity-test
namespace: NAMESPACE
spec:
containers:
- image: google/cloud-sdk:slim
name: workload-identity-test
command: ["sleep","infinity"]
serviceAccountName: KSA_NAME
nodeSelector:
iam.gke.io/gke-metadata-server-enabled: "true"
from pod :
curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/
"If the service accounts are correctly configured, the IAM service account email address is listed as the active (and only) identity. This demonstrates that by default, the Pod acts as the IAM service account's authority when calling Google Cloud APIs."
Now for BQ we need to do some additional steps from the notebook
Now for BQ we need to do some additional steps from the notebook
pip install --upgrade 'google-cloud-bigquery[bqstorage,pandas]'
%load_ext google.cloud.bigquery
%%bigquery
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15
you may see some errors about accessing BQ and not having the appropriate roles. Just adjust the GCP Service account.
Otherwise,
You should see a table from the publicly available BQ data.
Otherwise,
You should see a table from the publicly available BQ data.
Using the skel-controller to leverage a generic service account across all namespaces
The process is VERY simlar to above. The only difference is we will be using our rok-tools pod to apply changes.
- Navigate to the skel-resources ~/deploy/patches directory.
kubectl exec -t rok-tools -- bash
cd ~/ops/deployments/kubeflow/manifests/common/skel-resources/overlays/deploy - create a patch for the pipeline-runner service-account with the desired annotation
vim pipeline-runner.yaml
apiVersion: v1 kind: ServiceAccount metadata: name: pipeline-runner annotations: iam.gke.io/gcp-service-account: GSA_NAME@GSA_PROJECT.iam.gserviceaccount.com
Note This is assuming that GSA_NAME is a static name that goes across all pipelines. If you want your service accounts to use UNIQUE GCP service account mappings ( I.E. each unique namespace SA will be mapped to a specific GCP SA for more granular control) you will need to use our templating capabilitles like below.
### skel-templating example
apiVersion: v1
kind: ServiceAccount
metadata:
name: pipeline-runner
annotations:
iam.gke.io/gcp-service-account: '{{ .Namespace|trimPrefix "kubeflow-" }}@GSA_PROJECT.iam.gserviceaccount.com' - Add the patch definition to the ~/ops/deployments/kubeflow/manifests/common/skel-resources/overlays/deploy/kustomization.yaml
-
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kubeflow-skel
resources:
- ../../base
patches:
- patches/pipeline-runner.yaml
-
- run the rok-deploy install command from ~ops/deployments
-
rok-deploy --apply kubeflow/manifests/common/skel-resources/overlays/deploy
-
- Verify the the annotation was applied. The below example added kubeflow-user's pipeline runner service accounts annotation to be user@GSA_PROJECT.iam.gserviceaccount.com using the skel templating example.
-
kubectl get sa pipeline-runner -n kubeflow-user -o yaml
# OUTPUT
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: user@GSA_PROJECT.iam.gserviceaccount.com
creationTimestamp: "2022-04-08T03:47:06Z"
labels:
app.kubernetes.io/created-by: skel-controller
skel.kubeflow.org/reconcile-fingerprint: profiles-deployment-6cc8c668cb-st4gt-1653336120559473145
....
-
-
Create IAM policy binding. ( pipeline-runner for pipelines)
gcloud iam service-accounts add-iam-policy-binding user@GSA_PROJECT.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:PROJECT_ID.svc.id.goog[kubeflow-user/pipeline-runner]
Comments
0 comments
Please sign in to leave a comment.