diff --git a/episodes/08-CLI-workflows.md b/episodes/08-CLI-workflows.md index 8794f5ad..52a4cc5f 100644 --- a/episodes/08-CLI-workflows.md +++ b/episodes/08-CLI-workflows.md @@ -121,23 +121,23 @@ Create a file called `xgb_job.yaml`: ```yaml # xgb_job.yaml — Vertex AI custom training job config -displayName: cli-xgb-titanic -jobSpec: - workerPoolSpecs: - - machineSpec: - machineType: n1-standard-4 - replicaCount: 1 - containerSpec: - imageUri: us-docker.pkg.dev/vertex-ai/training/xgboost-cpu.2-1:latest - args: - - "--train=gs://doe-titanic/titanic_train.csv" - - "--max_depth=6" - - "--eta=0.3" - - "--subsample=0.8" - - "--colsample_bytree=0.8" - - "--num_round=100" - baseOutputDirectory: - outputUriPrefix: gs://doe-titanic/artifacts/xgb/cli-run/ +# Note: display_name goes on the command line (--display-name), not in this file. +# The --config file describes the job *spec* only, using snake_case field names. +worker_pool_specs: + - machine_spec: + machine_type: n1-standard-4 + replica_count: 1 + container_spec: + image_uri: us-docker.pkg.dev/vertex-ai/training/xgboost-cpu.2-1:latest + args: + - "--train=gs://doe-titanic/titanic_train.csv" + - "--max_depth=6" + - "--eta=0.3" + - "--subsample=0.8" + - "--colsample_bytree=0.8" + - "--num_round=100" +base_output_directory: + output_uri_prefix: gs://doe-titanic/artifacts/xgb/cli-run/ ``` Replace the bucket name and hyperparameters to match your setup. @@ -147,10 +147,45 @@ Replace the bucket name and hyperparameters to match your setup. ```bash gcloud ai custom-jobs create \ --region=us-central1 \ + --display-name=cli-xgb-titanic \ --config=xgb_job.yaml ``` -Vertex AI provisions a VM, runs your training container, and writes outputs to the `baseOutputDirectory`. The job runs on GCP's infrastructure, not on your machine — you can close your terminal and it keeps going. +::::::::::::::::::::::::::::::::::::: callout + +### Windows users — line continuation syntax + +The `\` at the end of each line is a **Linux / macOS** line continuation character. It does **not** work in the Windows Command Prompt. You have three options: + +1. **Put the command on one line** (easiest): + + ``` + gcloud ai custom-jobs create --region=us-central1 --display-name=cli-xgb-titanic --config=xgb_job.yaml + ``` + +2. **Use the `^` continuation character** (Windows CMD): + + ``` + gcloud ai custom-jobs create ^ + --region=us-central1 ^ + --display-name=cli-xgb-titanic ^ + --config=xgb_job.yaml + ``` + +3. **Use the backtick continuation character** (PowerShell): + + ``` + gcloud ai custom-jobs create ` + --region=us-central1 ` + --display-name=cli-xgb-titanic ` + --config=xgb_job.yaml + ``` + +This applies to **all** multi-line commands in this episode, not just this one. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +Vertex AI provisions a VM, runs your training container, and writes outputs to the `base_output_directory`. The job runs on GCP's infrastructure, not on your machine — you can close your terminal and it keeps going. ### GPU example (PyTorch) @@ -158,24 +193,22 @@ For the PyTorch GPU job from Episode 5, the config includes an `acceleratorType` ```yaml # pytorch_gpu_job.yaml -displayName: cli-pytorch-titanic-gpu -jobSpec: - workerPoolSpecs: - - machineSpec: - machineType: n1-standard-8 - acceleratorType: NVIDIA_TESLA_T4 - acceleratorCount: 1 - replicaCount: 1 - containerSpec: - imageUri: us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-4.py310:latest - args: - - "--train=gs://doe-titanic/data/train_data.npz" - - "--val=gs://doe-titanic/data/val_data.npz" - - "--epochs=500" - - "--learning_rate=0.001" - - "--patience=50" - baseOutputDirectory: - outputUriPrefix: gs://doe-titanic/artifacts/pytorch/cli-gpu-run/ +worker_pool_specs: + - machine_spec: + machine_type: n1-standard-8 + accelerator_type: NVIDIA_TESLA_T4 + accelerator_count: 1 + replica_count: 1 + container_spec: + image_uri: us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-4.py310:latest + args: + - "--train=gs://doe-titanic/data/train_data.npz" + - "--val=gs://doe-titanic/data/val_data.npz" + - "--epochs=500" + - "--learning_rate=0.001" + - "--patience=50" +base_output_directory: + output_uri_prefix: gs://doe-titanic/artifacts/pytorch/cli-gpu-run/ ``` Submit the same way: @@ -183,6 +216,7 @@ Submit the same way: ```bash gcloud ai custom-jobs create \ --region=us-central1 \ + --display-name=cli-pytorch-titanic-gpu \ --config=pytorch_gpu_job.yaml ``` @@ -327,7 +361,7 @@ Using the XGBoost YAML config shown above (adjusted for your bucket name), submi ```bash # Edit xgb_job.yaml with your bucket name, then: -gcloud ai custom-jobs create --region=us-central1 --config=xgb_job.yaml +gcloud ai custom-jobs create --region=us-central1 --display-name=cli-xgb-titanic --config=xgb_job.yaml # Confirm it's running: gcloud ai custom-jobs list --region=us-central1 diff --git a/notebooks/08-CLI-workflows.ipynb b/notebooks/08-CLI-workflows.ipynb index ed3bd13d..9f0a5998 100644 --- a/notebooks/08-CLI-workflows.ipynb +++ b/notebooks/08-CLI-workflows.ipynb @@ -180,26 +180,7 @@ "id": "63779dc4", "metadata": {}, "outputs": [], - "source": [ - "# xgb_job.yaml — Vertex AI custom training job config\n", - "displayName: cli-xgb-titanic\n", - "jobSpec:\n", - " workerPoolSpecs:\n", - " - machineSpec:\n", - " machineType: n1-standard-4\n", - " replicaCount: 1\n", - " containerSpec:\n", - " imageUri: us-docker.pkg.dev/vertex-ai/training/xgboost-cpu.2-1:latest\n", - " args:\n", - " - \"--train=gs://doe-titanic/titanic_train.csv\"\n", - " - \"--max_depth=6\"\n", - " - \"--eta=0.3\"\n", - " - \"--subsample=0.8\"\n", - " - \"--colsample_bytree=0.8\"\n", - " - \"--num_round=100\"\n", - " baseOutputDirectory:\n", - " outputUriPrefix: gs://doe-titanic/artifacts/xgb/cli-run/" - ] + "source": "# xgb_job.yaml — Vertex AI custom training job config\n# Note: display_name goes on the command line (--display-name), not in this file.\n# The --config file describes the job *spec* only, using snake_case field names.\nworker_pool_specs:\n - machine_spec:\n machine_type: n1-standard-4\n replica_count: 1\n container_spec:\n image_uri: us-docker.pkg.dev/vertex-ai/training/xgboost-cpu.2-1:latest\n args:\n - \"--train=gs://doe-titanic/titanic_train.csv\"\n - \"--max_depth=6\"\n - \"--eta=0.3\"\n - \"--subsample=0.8\"\n - \"--colsample_bytree=0.8\"\n - \"--num_round=100\"\nbase_output_directory:\n output_uri_prefix: gs://doe-titanic/artifacts/xgb/cli-run/" }, { "cell_type": "markdown", @@ -215,26 +196,13 @@ "cell_type": "markdown", "id": "98dc68ec", "metadata": {}, - "source": [ - "**Run in Cloud Shell / terminal:**\n", - "```bash\n", - "gcloud ai custom-jobs create \\\n", - " --region=us-central1 \\\n", - " --config=xgb_job.yaml\n", - "```" - ] + "source": "**Run in Cloud Shell / terminal:**\n```bash\ngcloud ai custom-jobs create \\\n --region=us-central1 \\\n --display-name=cli-xgb-titanic \\\n --config=xgb_job.yaml\n```\n\n> **Windows users — line continuation syntax**\n>\n> The `\\` at the end of each line is a **Linux / macOS** line continuation character. It does **not** work in the Windows Command Prompt. You have three options:\n>\n> 1. **Put the command on one line** (easiest):\n> ```\n> gcloud ai custom-jobs create --region=us-central1 --display-name=cli-xgb-titanic --config=xgb_job.yaml\n> ```\n> 2. **Use `^`** (Windows CMD):\n> ```\n> gcloud ai custom-jobs create ^\n> --region=us-central1 ^\n> --display-name=cli-xgb-titanic ^\n> --config=xgb_job.yaml\n> ```\n> 3. **Use backtick** (PowerShell):\n> ```\n> gcloud ai custom-jobs create `\n> --region=us-central1 `\n> --display-name=cli-xgb-titanic `\n> --config=xgb_job.yaml\n> ```\n>\n> This applies to **all** multi-line commands in this episode." }, { "cell_type": "markdown", "id": "cd0d7089", "metadata": {}, - "source": [ - "Vertex AI provisions a VM, runs your training container, and writes outputs to the `baseOutputDirectory`. The job runs on GCP's infrastructure, not on your machine — you can close your terminal and it keeps going.\n", - "\n", - "### GPU example (PyTorch)\n", - "\n", - "For the PyTorch GPU job from Episode 5, the config includes an `acceleratorType` and `acceleratorCount`. Note that the argument names must match exactly what `train_nn.py` expects (`--train`, `--val`, `--learning_rate`, etc.):" - ] + "source": "Vertex AI provisions a VM, runs your training container, and writes outputs to the `base_output_directory`. The job runs on GCP's infrastructure, not on your machine — you can close your terminal and it keeps going.\n\n### GPU example (PyTorch)\n\nFor the PyTorch GPU job from Episode 5, the config includes an `accelerator_type` and `accelerator_count`. Note that the argument names must match exactly what `train_nn.py` expects (`--train`, `--val`, `--learning_rate`, etc.):" }, { "cell_type": "code", @@ -242,27 +210,7 @@ "id": "9350f9e1", "metadata": {}, "outputs": [], - "source": [ - "# pytorch_gpu_job.yaml\n", - "displayName: cli-pytorch-titanic-gpu\n", - "jobSpec:\n", - " workerPoolSpecs:\n", - " - machineSpec:\n", - " machineType: n1-standard-8\n", - " acceleratorType: NVIDIA_TESLA_T4\n", - " acceleratorCount: 1\n", - " replicaCount: 1\n", - " containerSpec:\n", - " imageUri: us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-4.py310:latest\n", - " args:\n", - " - \"--train=gs://doe-titanic/data/train_data.npz\"\n", - " - \"--val=gs://doe-titanic/data/val_data.npz\"\n", - " - \"--epochs=500\"\n", - " - \"--learning_rate=0.001\"\n", - " - \"--patience=50\"\n", - " baseOutputDirectory:\n", - " outputUriPrefix: gs://doe-titanic/artifacts/pytorch/cli-gpu-run/" - ] + "source": "# pytorch_gpu_job.yaml\nworker_pool_specs:\n - machine_spec:\n machine_type: n1-standard-8\n accelerator_type: NVIDIA_TESLA_T4\n accelerator_count: 1\n replica_count: 1\n container_spec:\n image_uri: us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-4.py310:latest\n args:\n - \"--train=gs://doe-titanic/data/train_data.npz\"\n - \"--val=gs://doe-titanic/data/val_data.npz\"\n - \"--epochs=500\"\n - \"--learning_rate=0.001\"\n - \"--patience=50\"\nbase_output_directory:\n output_uri_prefix: gs://doe-titanic/artifacts/pytorch/cli-gpu-run/" }, { "cell_type": "markdown", @@ -276,14 +224,7 @@ "cell_type": "markdown", "id": "e2d0ee36", "metadata": {}, - "source": [ - "**Run in Cloud Shell / terminal:**\n", - "```bash\n", - "gcloud ai custom-jobs create \\\n", - " --region=us-central1 \\\n", - " --config=pytorch_gpu_job.yaml\n", - "```" - ] + "source": "**Run in Cloud Shell / terminal:**\n```bash\ngcloud ai custom-jobs create \\\n --region=us-central1 \\\n --display-name=cli-pytorch-titanic-gpu \\\n --config=pytorch_gpu_job.yaml\n```" }, { "cell_type": "markdown", @@ -596,4 +537,4 @@ "metadata": {}, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file