Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 71 additions & 37 deletions episodes/08-CLI-workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,23 +121,23 @@ Create a file called `xgb_job.yaml`:

```yaml
# xgb_job.yaml — Vertex AI custom training job config
displayName: cli-xgb-titanic
jobSpec:
workerPoolSpecs:
- machineSpec:
machineType: n1-standard-4
replicaCount: 1
containerSpec:
imageUri: us-docker.pkg.dev/vertex-ai/training/xgboost-cpu.2-1:latest
args:
- "--train=gs://doe-titanic/titanic_train.csv"
- "--max_depth=6"
- "--eta=0.3"
- "--subsample=0.8"
- "--colsample_bytree=0.8"
- "--num_round=100"
baseOutputDirectory:
outputUriPrefix: gs://doe-titanic/artifacts/xgb/cli-run/
# Note: display_name goes on the command line (--display-name), not in this file.
# The --config file describes the job *spec* only, using snake_case field names.
worker_pool_specs:
- machine_spec:
machine_type: n1-standard-4
replica_count: 1
container_spec:
image_uri: us-docker.pkg.dev/vertex-ai/training/xgboost-cpu.2-1:latest
args:
- "--train=gs://doe-titanic/titanic_train.csv"
- "--max_depth=6"
- "--eta=0.3"
- "--subsample=0.8"
- "--colsample_bytree=0.8"
- "--num_round=100"
base_output_directory:
output_uri_prefix: gs://doe-titanic/artifacts/xgb/cli-run/
```

Replace the bucket name and hyperparameters to match your setup.
Expand All @@ -147,42 +147,76 @@ Replace the bucket name and hyperparameters to match your setup.
```bash
gcloud ai custom-jobs create \
--region=us-central1 \
--display-name=cli-xgb-titanic \
--config=xgb_job.yaml
```

Vertex AI provisions a VM, runs your training container, and writes outputs to the `baseOutputDirectory`. The job runs on GCP's infrastructure, not on your machine — you can close your terminal and it keeps going.
::::::::::::::::::::::::::::::::::::: callout

### Windows users — line continuation syntax

The `\` at the end of each line is a **Linux / macOS** line continuation character. It does **not** work in the Windows Command Prompt. You have three options:

1. **Put the command on one line** (easiest):

```
gcloud ai custom-jobs create --region=us-central1 --display-name=cli-xgb-titanic --config=xgb_job.yaml
```

2. **Use the `^` continuation character** (Windows CMD):

```
gcloud ai custom-jobs create ^
--region=us-central1 ^
--display-name=cli-xgb-titanic ^
--config=xgb_job.yaml
```

3. **Use the backtick continuation character** (PowerShell):

```
gcloud ai custom-jobs create `
--region=us-central1 `
--display-name=cli-xgb-titanic `
--config=xgb_job.yaml
```

This applies to **all** multi-line commands in this episode, not just this one.

::::::::::::::::::::::::::::::::::::::::::::::::::

Vertex AI provisions a VM, runs your training container, and writes outputs to the `base_output_directory`. The job runs on GCP's infrastructure, not on your machine — you can close your terminal and it keeps going.

### GPU example (PyTorch)

For the PyTorch GPU job from Episode 5, the config includes an `acceleratorType` and `acceleratorCount`. Note that the argument names must match exactly what `train_nn.py` expects (`--train`, `--val`, `--learning_rate`, etc.):

```yaml
# pytorch_gpu_job.yaml
displayName: cli-pytorch-titanic-gpu
jobSpec:
workerPoolSpecs:
- machineSpec:
machineType: n1-standard-8
acceleratorType: NVIDIA_TESLA_T4
acceleratorCount: 1
replicaCount: 1
containerSpec:
imageUri: us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-4.py310:latest
args:
- "--train=gs://doe-titanic/data/train_data.npz"
- "--val=gs://doe-titanic/data/val_data.npz"
- "--epochs=500"
- "--learning_rate=0.001"
- "--patience=50"
baseOutputDirectory:
outputUriPrefix: gs://doe-titanic/artifacts/pytorch/cli-gpu-run/
worker_pool_specs:
- machine_spec:
machine_type: n1-standard-8
accelerator_type: NVIDIA_TESLA_T4
accelerator_count: 1
replica_count: 1
container_spec:
image_uri: us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-4.py310:latest
args:
- "--train=gs://doe-titanic/data/train_data.npz"
- "--val=gs://doe-titanic/data/val_data.npz"
- "--epochs=500"
- "--learning_rate=0.001"
- "--patience=50"
base_output_directory:
output_uri_prefix: gs://doe-titanic/artifacts/pytorch/cli-gpu-run/
```

Submit the same way:

```bash
gcloud ai custom-jobs create \
--region=us-central1 \
--display-name=cli-pytorch-titanic-gpu \
--config=pytorch_gpu_job.yaml
```

Expand Down Expand Up @@ -327,7 +361,7 @@ Using the XGBoost YAML config shown above (adjusted for your bucket name), submi

```bash
# Edit xgb_job.yaml with your bucket name, then:
gcloud ai custom-jobs create --region=us-central1 --config=xgb_job.yaml
gcloud ai custom-jobs create --region=us-central1 --display-name=cli-xgb-titanic --config=xgb_job.yaml

# Confirm it's running:
gcloud ai custom-jobs list --region=us-central1
Expand Down
71 changes: 6 additions & 65 deletions notebooks/08-CLI-workflows.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -180,26 +180,7 @@
"id": "63779dc4",
"metadata": {},
"outputs": [],
"source": [
"# xgb_job.yaml — Vertex AI custom training job config\n",
"displayName: cli-xgb-titanic\n",
"jobSpec:\n",
" workerPoolSpecs:\n",
" - machineSpec:\n",
" machineType: n1-standard-4\n",
" replicaCount: 1\n",
" containerSpec:\n",
" imageUri: us-docker.pkg.dev/vertex-ai/training/xgboost-cpu.2-1:latest\n",
" args:\n",
" - \"--train=gs://doe-titanic/titanic_train.csv\"\n",
" - \"--max_depth=6\"\n",
" - \"--eta=0.3\"\n",
" - \"--subsample=0.8\"\n",
" - \"--colsample_bytree=0.8\"\n",
" - \"--num_round=100\"\n",
" baseOutputDirectory:\n",
" outputUriPrefix: gs://doe-titanic/artifacts/xgb/cli-run/"
]
"source": "# xgb_job.yaml — Vertex AI custom training job config\n# Note: display_name goes on the command line (--display-name), not in this file.\n# The --config file describes the job *spec* only, using snake_case field names.\nworker_pool_specs:\n - machine_spec:\n machine_type: n1-standard-4\n replica_count: 1\n container_spec:\n image_uri: us-docker.pkg.dev/vertex-ai/training/xgboost-cpu.2-1:latest\n args:\n - \"--train=gs://doe-titanic/titanic_train.csv\"\n - \"--max_depth=6\"\n - \"--eta=0.3\"\n - \"--subsample=0.8\"\n - \"--colsample_bytree=0.8\"\n - \"--num_round=100\"\nbase_output_directory:\n output_uri_prefix: gs://doe-titanic/artifacts/xgb/cli-run/"
},
{
"cell_type": "markdown",
Expand All @@ -215,54 +196,21 @@
"cell_type": "markdown",
"id": "98dc68ec",
"metadata": {},
"source": [
"**Run in Cloud Shell / terminal:**\n",
"```bash\n",
"gcloud ai custom-jobs create \\\n",
" --region=us-central1 \\\n",
" --config=xgb_job.yaml\n",
"```"
]
"source": "**Run in Cloud Shell / terminal:**\n```bash\ngcloud ai custom-jobs create \\\n --region=us-central1 \\\n --display-name=cli-xgb-titanic \\\n --config=xgb_job.yaml\n```\n\n> **Windows users — line continuation syntax**\n>\n> The `\\` at the end of each line is a **Linux / macOS** line continuation character. It does **not** work in the Windows Command Prompt. You have three options:\n>\n> 1. **Put the command on one line** (easiest):\n> ```\n> gcloud ai custom-jobs create --region=us-central1 --display-name=cli-xgb-titanic --config=xgb_job.yaml\n> ```\n> 2. **Use `^`** (Windows CMD):\n> ```\n> gcloud ai custom-jobs create ^\n> --region=us-central1 ^\n> --display-name=cli-xgb-titanic ^\n> --config=xgb_job.yaml\n> ```\n> 3. **Use backtick** (PowerShell):\n> ```\n> gcloud ai custom-jobs create `\n> --region=us-central1 `\n> --display-name=cli-xgb-titanic `\n> --config=xgb_job.yaml\n> ```\n>\n> This applies to **all** multi-line commands in this episode."
},
{
"cell_type": "markdown",
"id": "cd0d7089",
"metadata": {},
"source": [
"Vertex AI provisions a VM, runs your training container, and writes outputs to the `baseOutputDirectory`. The job runs on GCP's infrastructure, not on your machine — you can close your terminal and it keeps going.\n",
"\n",
"### GPU example (PyTorch)\n",
"\n",
"For the PyTorch GPU job from Episode 5, the config includes an `acceleratorType` and `acceleratorCount`. Note that the argument names must match exactly what `train_nn.py` expects (`--train`, `--val`, `--learning_rate`, etc.):"
]
"source": "Vertex AI provisions a VM, runs your training container, and writes outputs to the `base_output_directory`. The job runs on GCP's infrastructure, not on your machine — you can close your terminal and it keeps going.\n\n### GPU example (PyTorch)\n\nFor the PyTorch GPU job from Episode 5, the config includes an `accelerator_type` and `accelerator_count`. Note that the argument names must match exactly what `train_nn.py` expects (`--train`, `--val`, `--learning_rate`, etc.):"
},
{
"cell_type": "code",
"execution_count": null,
"id": "9350f9e1",
"metadata": {},
"outputs": [],
"source": [
"# pytorch_gpu_job.yaml\n",
"displayName: cli-pytorch-titanic-gpu\n",
"jobSpec:\n",
" workerPoolSpecs:\n",
" - machineSpec:\n",
" machineType: n1-standard-8\n",
" acceleratorType: NVIDIA_TESLA_T4\n",
" acceleratorCount: 1\n",
" replicaCount: 1\n",
" containerSpec:\n",
" imageUri: us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-4.py310:latest\n",
" args:\n",
" - \"--train=gs://doe-titanic/data/train_data.npz\"\n",
" - \"--val=gs://doe-titanic/data/val_data.npz\"\n",
" - \"--epochs=500\"\n",
" - \"--learning_rate=0.001\"\n",
" - \"--patience=50\"\n",
" baseOutputDirectory:\n",
" outputUriPrefix: gs://doe-titanic/artifacts/pytorch/cli-gpu-run/"
]
"source": "# pytorch_gpu_job.yaml\nworker_pool_specs:\n - machine_spec:\n machine_type: n1-standard-8\n accelerator_type: NVIDIA_TESLA_T4\n accelerator_count: 1\n replica_count: 1\n container_spec:\n image_uri: us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-4.py310:latest\n args:\n - \"--train=gs://doe-titanic/data/train_data.npz\"\n - \"--val=gs://doe-titanic/data/val_data.npz\"\n - \"--epochs=500\"\n - \"--learning_rate=0.001\"\n - \"--patience=50\"\nbase_output_directory:\n output_uri_prefix: gs://doe-titanic/artifacts/pytorch/cli-gpu-run/"
},
{
"cell_type": "markdown",
Expand All @@ -276,14 +224,7 @@
"cell_type": "markdown",
"id": "e2d0ee36",
"metadata": {},
"source": [
"**Run in Cloud Shell / terminal:**\n",
"```bash\n",
"gcloud ai custom-jobs create \\\n",
" --region=us-central1 \\\n",
" --config=pytorch_gpu_job.yaml\n",
"```"
]
"source": "**Run in Cloud Shell / terminal:**\n```bash\ngcloud ai custom-jobs create \\\n --region=us-central1 \\\n --display-name=cli-pytorch-titanic-gpu \\\n --config=pytorch_gpu_job.yaml\n```"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -596,4 +537,4 @@
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}
}
Loading