Install JobSet

This page describes how to install Alauda Build of JobSet on Alauda Container Platform through the Operator Hub. JobSet is shipped as a Helm-based operator: installing it from the Operator Hub deploys the jobset-operator controller; creating a JobSetOperatorCtl instance then deploys the JobSet controller and webhook that reconcile JobSet custom resources.

Upload the Operator Package

Download the JobSet Operator package, for example jobset-operator.ALL.xxxx.tgz.

Use violet to upload the package to the platform repository:

violet push --platform-address=<platform-access-address> --platform-username=<platform-admin> --platform-password=<platform-admin-password> jobset-operator.ALL.xxxx.tgz

Install the Operator

In Administrator view:

  1. Go to Marketplace / Operator Hub.
  2. Select the destination cluster.
  3. Search for Alauda Build of JobSet (the package name in the marketplace is jobset-operator).
  4. Click Install.
  5. Keep the default installation settings. The operator installs into the jobset-system namespace by default.
  6. Complete the installation.

Confirm the operator controller is running:

kubectl get pods -n jobset-system

The jobset-operator pod should be in Running state, for example:

NAME                              READY   STATUS    RESTARTS   AGE
jobset-operator-xxxxxxxxxx-xxxxx  1/1     Running   0          1m

Create a JobSetOperatorCtl Instance

Installing the operator alone does not deploy the JobSet controller — the operator reconciles JobSetOperatorCtl resources and only deploys the JobSet controller and webhook when one exists.

Save the following as jobsetoperatorctl.yaml:

apiVersion: jobset-operator.alauda.io/v1
kind: JobSetOperatorCtl
metadata:
  name: jobset
  namespace: jobset-system
spec:
  controller:
    replicas: 1
    leaderElection:
      enable: true
  certManager:
    enable: true
  prometheus:
    enable: false

Apply it:

kubectl apply -f jobsetoperatorctl.yaml

After the operator reconciles the resource, the JobSet controller pod is deployed in the same namespace:

kubectl get pods -n jobset-system

Expected output (the operator pod and the JobSet controller pod side by side):

NAME                                READY   STATUS    RESTARTS   AGE
jobset-operator-xxxxxxxxxx-xxxxx    1/1     Running   0          5m
jobset-controller-xxxxxxxxxx-xxxxx  1/1     Running   0          1m

The JobSet CRD is registered by the operator and can be verified:

kubectl get crd jobsets.jobset.x-k8s.io

Next Steps

Once the JobSet controller is running, JobSet custom resources can be submitted to run distributed AI/ML and HPC workloads.

Continue with the Quickstart to run a simple JobSet demo.