By the end of this workshop, participants will be able to:
- Analyze current AWS costs and identify optimization opportunities using AWS Console
- Navigate Google Cloud Console billing and cost management tools effectively
- Build a compelling business case for SageMaker to Vertex AI migration
- Establish cost baselines and tracking mechanisms for migration success
- AWS Management Console access with billing permissions
- Google Cloud Console access with billing account access
- Basic AWS and Google Cloud knowledge
- Web browser (Chrome, Firefox, Safari, or Edge)
- Participants are recommended to use browser capabilities such as Incognito or In private Browser Sessions due to single-sign-on and cached credential login challenges.
- Billing account configured
This comprehensive workshop establishes the financial foundation for migrating AWS SageMaker workloads to Google Cloud Vertex AI. Participants will master cost management tools on both platforms using only web-based console interfaces and develop compelling business cases for migration decisions.
- Duration: 4 hours
- Tools Required: AWS Management Console, web browser, spreadsheet application
- Difficulty: Intermediate
- Master AWS Cost Explorer and AWS Budgets for SageMaker workloads
- Analyze current SageMaker spending patterns and trends
- Identify cost optimization opportunities in existing ML infrastructure
- Create detailed cost breakdown for migration planning
Set up and familiarize yourself with the AWS Cost Explorer interface.
- Open your web browser
- Navigate to the AWS Management Console
- Sign in with your AWS credentials
- In the AWS Console Search Bar, type Cost Explorer
- Select Cost Explorer from the Billing and Cost Management feature
- (Optional) Click Launch Cost Explorer (first-time users may experience up to a 24-hour delay for data population)
Setup Checklist:
- Verify the main dashboard is visible
- Set time range to Last 3 months using the date picker, and select Apply
- Set granularity to Monthly
- Group by Service
- In the Advanced options section, select Show Forecasted values at the bottom of the chart from the additional data settings.
- Explore the dashboard chart area
- Try the built-in chart types (bar, line, stacked)
- Review the filters panel
π‘ Pro Tip: Use preconfigured views like βMonthly costs by serviceβ or βRI Utilizationβ for quick insights. Capture screenshots of your initial dashboard for later comparison.
- In the Filters panel, click Service
- Search and select SageMaker
- Click Apply
- Set the Date range to 6 months
- Change granularity to Daily
- Select Usage Type and review.
- Identify top cost drivers
- Click chart segments to drill down
- Log findings in a spreadsheet. Ask presenter to share template and example worksheet.
| Component | Monthly Cost | % of Total | Trend | Instance Types/Notes |
|---|---|---|---|---|
| Training Instances | $X,XXX | XX% | β/β/β | ml.p3.2xlarge, ml.g5.xlarge |
| Notebook Instances | $XXX | XX% | β/β/β | Development environments |
| Endpoints | $X,XXX | XX% | β/β/β | Production model serving |
| Storage (S3) | $XXX | XX% | β/β/β | Training data, models |
| Data Transfer | $XXX | XX% | β/β/β | Inter-region, internet |
| Other Services | $XXX | XX% | β/β/β | Supporting AWS services |
- The following questions should be answered in your analysis:
- Which instance types consume the most budget?
- When is usage the highest?
- Any idle periods or unnecessary resources?
- Ratio between training and inference costs?
Important: Enabling hourly granularity can take up to 48 hours to populate data. For hourly-level data, we charge $0.01 per 1,000 usage records monthly.
Note: This is optional but may not be available for all accounts.
- Switch to Hourly granularity for the recent week
- Detect peak usage periods and idle patterns
- Switch back to Daily to document seasonal behavior
- Use Amazon Q Developer for insights like:
- βWhich region had the largest cost increase last month?β
Try filters for:
- Region: Concentration of activity
- Usage Type: (e.g., ml.t3.medium)
New Feature: Cost Anomaly Detection enabled by default (as of March 2025)
- In the AWS Console Search Bar, type Cost Anomaly Detection
- Select the service
- Select Get started, and take the tour if available. Do not create a new cost monitor. If needed, cancel to return to the main dashboard.
- Review existing detected anomalies and cost anomaly detection summary.
- Click into cost monitors and Alert subscriptions to review existing configurations.
- Document findings
π‘ Pro Tip: Idle weekends or underutilized resources are low-hanging fruit for cost cuts.
- In the AWS Console search bar, type βBudgetsβ
- Select βAWS Budgetsβ from the search results
- Click the Create budget button
Budget setup:
- Under Budget setup, choose Customize (advanced)
- Budget type: Cost budget
- Select Next
- Budget name:
SageMaker-ML-Workloads-Monthly - Period: Monthly
- Budget renewal type: Recurring budget
- Start month: Choose the current month
Set budget amount:
- Budgeting method: Fixed
- Enter amount:
$[Current monthly SageMaker spend + 20% buffer] - Advanced options: Unblended costs
Budget scope (Filters):
- Select Filter specific AWS cost dimensions
- Click Add filter
- Select Service from the dropdown
- Choose Amazon SageMaker
- Click Apply filter
Note: You do not need to create every alert threshold.
-
Click Next to proceed to alert configuration
-
Select Add an alert threshold
-
Set up multiple alerts:
-
Alert 1 β Early Warning
- Threshold: 75%
- Trigger: Forecasted
- Recipients: ML team leadsβ emails
-
Alert 2 β Critical Alert
- Click Add an alert threshold
- Threshold: 90%
- Trigger: Actual
- Recipients: Additional stakeholders
-
Alert 3 β Budget Exceeded
- Click Add an alert threshold
- Threshold: 100%
- Trigger: Actual
- Recipients: Management escalation contacts
- Select Next
-
Enhanced Alert Options (2025 Feature):
- Check AWS User Notifications
- Configure AWS Chatbot alerts for Slack/Chime if available
- Click Next, review configuration, then Create budget
Important: > This task is optional. Participants can skip this task to proceed to the GCP section which is the main focus of the workshop.
Development Environment Budget:
- Return to the Budgets dashboard and click Create budget
- Select Customize (advanced) β Cost budget
- Budget name:
ML-Development-Environment - Period: Monthly
- Method: Fixed
- Amount:
$500 - Filters:
- Tags:
Environment:Development - Service: Amazon SageMaker
- Tags:
Production Environment Budget:
- Click Create budget again
- Budget name:
ML-Production-Environment - Period: Monthly
- Method: Fixed
- Amount:
$5,000 - Filters focused on production resources
- In the budget creation wizard, scroll to Actions
- Click Add action
- Configure automated response:
- Action type: Apply IAM policy or target EC2/RDS instances
- Threshold: 90% actual spend
- Execution: Require approval (recommended) or Automatic
- Define policy or instance targeting criteria
- Navigate to the main Budgets dashboard
- Review all created budgets in list view
- Click each budget radio box to view details in the right panel Budget preview.
- Check the Thresholds column for status (Green, Yellow, Red) Green is OK, Yellow is approaching threshold, Red is over budget.
- Document the monitoring process
- AWS cost analysis report with detailed SageMaker breakdown
- Configured AWS budgets with alert thresholds and actions
- Optimization recommendations document
- Ongoing monitoring process documentation
- Baseline cost data for GCP migration comparison
- Duration: 4 hours
- Tools Required: Google Cloud Console only, web browser, spreadsheet application
- Difficulty: Intermediate
- Master Google Cloud billing and cost management tools using console interface only
- Understand GCP pricing models for Vertex AI and supporting services
- Implement cost controls and monitoring for GCP ML workloads through console
- Create comprehensive cost comparison framework between AWS and GCP
- Open your web browser
- Navigate to the Google Cloud Console (https://console.cloud.google.com/) and sign in: Google Cloud Console
- In the top left corner of the console by the Google Cloud logo, Select the Project Picker. The project picker may display your organization name or a project name. For example, it may display cloudlearningsolution or the project MFAv2. It may also display No organization if you are not part of an organization.
- After selecting the project picker regardless of the default setting, ergo; cloudlearningsolution. In the Select a resource box, select if not already displayed the organization cloudlearningsolution, then select your project by clicking All and then selecting your project name. Review the Type column to ensure you are selecting a project and not an organization or folder. Note: You may need to expand the domain name to see and select your project.
3. In the GCP Console Search Bar, type All Products and select All Products from the drop-down options.
4. On the All Products page, under the Management section, click Billing
5. Select your billing account from the Billing screen. You may need to select My projects to see your billing account.
- Click Account management
- Verify permissions:
- Click or view + Add principal (Note: You may not have permission)
- Confirm your billing role (Billing Account; Administrator, User, or Viewer) by reviewing the Role / Principal column. Click to expand the role names to view the users or services known as principals.
- Document your access level
- Review account hierarchy in the Account management section:
- Note linked projects
- Check payment settings:
- Click Payment settings
- Review but do not edit or request to edit payment methods and automatic payment alerts
- Click Overview
- Review current month spending:
- Note total month-to-date spend
- Identify top spending services
- Check spending by viewing the billing report
- Select View report under the Top Services chart
- Analyze service-level breakdown: click Group by (Service) (Note: By default the report groups cost by "Service")
- Interact with the Chart: Hover your pointer over any part of the chart. A tooltip will appear showing costs for each service.
- In the GCP Console Search Bar, type
Reportsand select Reports, which is also available under Cost management - Familiarize yourself with the Reports interface
- Set time range to Last 90 days
- Group by: Service, Project, Project hierarchy (folder-level), or SKU. Leave as default Service.
- Apply filters: Projects, Services, SKUs, Locations, Labels, Folders & Organizations
- Use the Filters panel to include only AI/ML services: Note: To toggle panel filters, click the Filters button in the top right corner of the Reports interface its icon looks like the following symbol "<|"
- Vertex AI
- Compute Engine
- Cloud Storage
- Artifact Registry
- Participants take note of the current baseline:
- Note usage patterns
- Save custom view: click Save as new, name βML Migration Baselineβ and set it as your monitoring view
- Export report data: click Print or select Download CSV for offline AWS/GCP comparison
- In the Reports interface date picker, try:
- Last 30 days for recent patterns
- Last 12 months for long-term trends
- Toggle granularity between Monthly and Daily
-
In the GCP Console Search Bar, type
FinOps huband select FinOps hub (formerly Savings & credits). In a later task, you will learn how to use the FinOps hub to analyze savings and credits. -
Review subcategories in Cost optimization: committed use discounts and CUD analysis.
Create this table in your spreadsheet for future ML workload planning:
| Service Category | Current Monthly Cost | Projected ML Cost | Migration Notes |
|---|---|---|---|
| Vertex AI Training | $0 | $X,XXX | Based on AWS SageMaker analysis |
| Vertex AI Prediction | $0 | $XXX | Endpoint hosting equivalent |
| Compute Engine | $XXX | $X,XXX | Custom training VMs |
| Cloud Storage | $XXX | $XXX | Data storage migration |
| Networking | $XXX | $XXX | Data transfer and egress |
| Other Services | $XXX | $XXX | Supporting infrastructure |
-
Navigate to Budget Creation
- In the billing console left sidebar, click βBudgets & alertsβ
- Click βCreate budgetβ button
-
Budget Scope and Configuration
Budget details:
- Name:
Vertex-AI-ML-Workloads - Time range: Monthly (recurring)
- Enhanced options (2025): Monthly, Quarterly, Yearly, or Custom range available
Budget scope:
- Projects: Select βAll projectsβ or specific ML-related projects
- Services: Click βAdd filterβ β βServicesβ then select Vertex AI, Compute Engine, Cloud Storage
- Optional filters: Labels, folders, or credit filters (leave unfiltered for broader coverage)
- Name:
-
Set Budget Amount
- Type: βSpecified amountβ
- Alternative (2025): βLast periodβs spendβ for dynamic budgeting
- Amount:
$[Enter estimated monthly ML spend based on AWS analysis] - Currency: Auto-detected based on billing account (verify)
-
Configure Alert Thresholds
- Click βNextβ to proceed
- Set multiple threshold rules:
- 50% Alert β Early Warning (Actual spend) to billing admins and users
- 75% Alert β Planning Alert (Actual spend)
- 90% Alert β Critical (Actual spend)
- 100% Forecasted Alert (Forecasted spend)
- Click βFinishβ to create the budget
-
Create Development Environment Budget
- In Budgets & alerts, click βCreate budgetβ
- Name:
ML-Development-Environment - Time range: Monthly
- Budget amount: $500
- Currency: [Verify billing account currency]
- Scope: Projects for development/testing and Services filter for Vertex AI and Compute Engine
- Alert thresholds:
- 50% β Early alert
- 75% β Mid-cycle planning
- 90% β Critical
- Click βFinishβ
Tip: Define projects using naming conventions (e.g., dev-, test-) for governance efficiency.
-
Create Production Environment Budget
- Click βCreate budgetβ
- Name:
ML-Production-Environment - Time range: Monthly
- Budget amount: $5,000
- Currency: [Verify billing account currency]
- Scope: Projects for production and Services for prediction-related services (Vertex AI, Cloud Storage)
- Alert thresholds:
- 25% β Initial spend signal
- 50% β Midpoint
- 75% β Escalation
- 90% β Critical
- 100% Forecasted β Predictive alert
- Click βFinishβ
-
Budget Dashboard Review
- Return to βBudgets & alertsβ in the billing console
- Review all budgets: Confirm naming, amounts, and scope; ensure thresholds are set
- Monitor current spend vs. budget
- Test alert email functionality: temporarily adjust threshold to 1%, monitor for alert email, restore original threshold
Note: Alert testing may not generate emails immediately due to notification lags and single-trigger-per-threshold limits.
-
Access Google Cloud Pricing Calculator
- Open a new browser tab
- Navigate to: Google Cloud Pricing Calculator
- Select + Add to estimate
- Review available products by clicking Sort by most popular, then select Sort by product name from the drop down menu
-
Prepare for ML Workload Estimation
- Reference your AWS analysis: have your SageMaker cost breakdown available
- Reference region to match typical AWS usage by region
-
Add Vertex AI Custom Training
-
In the calculator, Search by product name by searching for βVertex AI trainingβ
-
Configure training parameters based on your AWS analysis:
- Region: Use the same region as your AWS SageMaker usage
- Machine type: n1-standard-8 (closest to ml.m5.2xlarge). Alternatively, use machine types similar to your AWS instance types. Note: Machine Types Website: https://cloud.google.com/vertex-ai/docs/training/configure-compute#machine-types
- In Accelerator Type, select No accelerator, alternatively, select GPU, or TPU if your AWS SageMaker training job uses acclerators.
- In the Machine Type Machine Family section, select General purpose, alternatively select Compute optimized or Memory optimized based on your AWS instance type.
- In Series, select N1 or N2 based on your AWS instance type.
- In Machine Type, select n1-standard-4, or n2-standard-4.
- Average training job length (hours): [Based on AWS SageMaker analysis] Enter 6
- Number of training jobs per month: [Based on current frequency] Enter 90
-
Do not add any additional training jobs at this time. The following information is for future consideration:
-
Add GPU configuration if needed:
- GPU type: NVIDIA V100 (if using GPU instances)
- Number of GPUs: match AWS configuration
- GPU hours per month: [Based on AWS usage]
-
The following information can be completed by using the same task steps as aforementioned in the section Add Vertex AI Custom Trianing.
-
Add storage for training:
- Select from Service type, Persistent disk: 100 GB SSD
- Zonal Standard PD: 100 GB
- Ensure the same region as your training jobs
- Snapshot Storage: 100 GiB
- Total Disks: Enter 1 (default)
- (Optional) Adjust size based on your storage needs
-
-
(Optional) Configure Multiple Training Scenarios
- Development training: smaller instances, fewer hours
- Production training: larger instances, include batch jobs
-
Add Vertex AI Prediction Endpoints
- In the calculator, click βVertex AIβ β "predictionβ
-
Add Cloud Storage Estimation
- Search for βCloud Storageβ
- Configure storage tiers:
- Standard: [GB] for active data
- Nearline: [GB] for infrequent data
- Archive: [GB] for long-term retention
- Operations:
- Class A operations (uploads/writes): [Count]
- Class B operations (downloads/reads): [Count]
-
Add Compute Engine for Custom ML Workloads
- Search for βCompute Engineβ
- Configure VM instances:
-
Add Networking Costs
- Search for βNetworkβ
- Configure data transfer:
- Egress to internet: [GB]
- Inter-region traffic: [GB]
-
Review Total Estimate
- Scroll to the Estimated Costs section
- Review monthly total
- Select Open Detailed View to see breakdown by service
- Review the Cost Estimate Summary
-
Save and Export Estimate
- Share estimate: click βShare" and save the URL Link
- Export data: click Download CSV β download CSV for detailed analysis
- Compile Detailed Cost Comparison
Create a comprehensive comparison matrix in your spreadsheet:
| Service Component | AWS SageMaker | GCP Vertex AI | Monthly Difference | Annual Difference | Notes |
|---|---|---|---|---|---|
| Training Compute | $X,XXX | $X,XXX | Β±$XXX | Β±$X,XXX | Include GPU costs |
| Prediction Endpoints | $X,XXX | $X,XXX | Β±$XXX | Β±$X,XXX | Auto-scaling comparison |
| Development Environment | $XXX | $XXX | Β±$XX | Β±$XXX | Notebook instances vs Workbench |
| Storage Costs | $XXX | $XXX | Β±$XX | Β±$XXX | S3 vs Cloud Storage |
| Data Transfer | $XXX | $XXX | Β±$XX | Β±$XXX | Egress and inter-region |
| Management Overhead | $XXX | $XXX | Β±$XX | Β±$XXX | Operational costs |
| Support and SLA | $XXX | $XXX | Β±$XX | Β±$XXX | Enterprise support levels |
| Total Monthly | $X,XXX | $X,XXX | Β±$XXX | Β±$X,XXX | Net difference |
- Document Key Assumptions
- Usage patterns remain constant
- Similar performance requirements
- Equivalent SLA requirements
Identify cost variables:
- Factors that could increase costs
- Potential for additional savings
- Regional pricing differences
Note service capability differences:
- Features available in one platform but not the other
- Performance differences that might affect costs
- Expand Analysis Beyond Direct Cloud Costs
Create a comprehensive TCO analysis:
| Cost Category | One-Time Costs | Ongoing Monthly Costs | Notes |
|---|---|---|---|
| Direct Cloud Costs | β | $X,XXX | From comparison above |
| Migration Costs | $X,XXX | β | Data transfer, application modification |
| Training and Certification | $X,XXX | β | Team upskilling |
| Operational Changes | $X,XXX | $XXX | New tools and processes |
| Risk Mitigation | $XXX | $XXX | Security and compliance |
| Opportunity Costs | $X,XXX | β | Development delays |
| Total TCO | $X,XXX | $X,XXX | Complete picture |
- Calculate Break-Even Analysis
- Determine monthly savings: GCP monthly cost β AWS monthly cost
- Calculate break-even period: Total one-time costs Γ· Monthly savings
- Create scenarios:
- Best case (maximum savings)
- Realistic case (expected savings)
- Worst case (minimal savings)
- Google Cloud Console Mobile App Setup (2025 Feature)
- Download Google Cloud Console mobile app from app store
- Sign in with your Google Cloud credentials
- Navigate to billing information features
- Set up mobile notifications for budget alerts
- Test mobile access to cost estimates and billing reports
- Advanced Monitoring Setup
- In Google Cloud Console, navigate to Monitoring
- Create custom dashboards for cost monitoring:
- Click Dashboards β Create Dashboard
- Name:
ML Workload Cost Monitoring - Add charts for cost-tracking metrics
- Configure time ranges and aggregation
- Save dashboard for regular monitoring
- Create Cost Review Process Documentation
- Check billing overview dashboard
- Review any budget alerts
- Validate no unexpected resource creation
- Generate and review cost reports
- Analyze spending trends using Reports interface
- Update cost forecasts based on current usage
- Review and adjust budgets if necessary
- Complete budget reconciliation
- Implement identified cost optimizations
- Prepare stakeholder cost summary reports
- Update pricing calculator estimates based on actual usage
- Review and validate cost allocation across projects
- FinOps Team: [Contact information]
- Cloud Operations: [Contact information]
- MLOPS Team Lead Strategist: [Contact information]
- Manager: [Contact information]
- Model Ops Director: [Contact information]
- Google Cloud cost analysis report with projected ML costs
- Configured GCP budgets with appropriate alert thresholds
- Comprehensive cost comparison framework between AWS and GCP
- TCO analysis with break-even calculations
- Cost governance framework documentation
- Mobile monitoring setup for ongoing cost management
Technical Competency Validation
- Navigation Proficiency: can independently navigate both AWS and GCP billing consoles
- Data Analysis Skills: successfully extract and analyze cost data from both platforms
- Tool Configuration: properly configure budgets, alerts, and monitoring
- Export Capabilities: ability to export and save cost data for offline analysis
Business Analysis Validation
- Cost Baseline Established: documented current AWS ML spending with detailed breakdown
- Projection Accuracy: realistic GCP cost projections using pricing calculator
- Optimization Identification: specific, actionable cost optimization opportunities
- TCO Understanding: comprehensive understanding of total cost factors
Process Implementation Validation
- Monitoring Setup: working budget alerts and monitoring processes
- Documentation Quality: clear, actionable documentation and runbooks
- Governance Framework: appropriate cost review and approval processes
- Stakeholder Communication: clear articulation of findings and recommendations
Financial Foundation Established
- β AWS Cost Baseline: comprehensive understanding of current SageMaker spending patterns and optimization opportunities
- β GCP Cost Projections: realistic estimates created using official Google Cloud pricing calculator
- β Cost Comparison Framework: detailed AWS vs GCP comparison with total cost of ownership analysis
- β Financial Governance: established cost monitoring, budgeting, and review processes
Technical Competencies Developed
- β Console Mastery: proficient navigation of AWS and GCP billing consoles
- β Cost Analysis Skills: ability to extract insights from billing data and identify trends
- β Monitoring Configuration: working budget alerts and cost monitoring systems
- β Data Export and Analysis: skills to export, analyze, and present cost data effectively
Business Capabilities Enhanced
- β ROI Analysis: clear understanding of migration financial benefits and timeline
- β Risk Assessment: identified and quantified financial risks and mitigation strategies
- β Stakeholder Communication: ability to present compelling business case for migration
- β Decision Support: framework for making informed, data-driven migration decisions
Organizational Impact
- β Process Documentation: clear, actionable cost management procedures
- β Knowledge Transfer: documented processes enable team knowledge sharing
- β Continuous Improvement: framework for ongoing cost optimization and management
- β Strategic Alignment: cost management integrated with broader migration strategy
This runbook provides standardized operational procedures for managing Google Cloud costs for machine learning workloads, specifically designed for teams migrating from AWS SageMaker to Google Cloud Vertex AI. All procedures are console-based and require no programming knowledge.
- Budget variance: <10% monthly variance from planned spend
- Cost optimization: 5% quarterly cost reduction through optimization
- Alert response time: <2 hours for budget threshold alerts
- Monthly reporting: Complete cost analysis within 3 business days of month-end
β±οΈ Estimated Time: 15β20 minutes
1. Billing Overview Dashboard Review
- Navigate to Billing Overview
- Check current month spend vs. budget
- Review "Top spending services"
- Verify daily spend trend vs. historical patterns Action Required: If spend exceeds 150%, trigger emergency procedures
2. Budget Alert Status Check
- Go to Budgets & Alerts
- Check budget alert status for all ML workloads Action Required: Investigate if any budget exceeds 75% utilization mid-month
3. Cloud Monitoring Dashboard Check
- Access Cloud Monitoring
- Open "ML Workload Cost Monitoring" dashboard
- Review charts and anomalies Action Required: If cost deviates >20% from baseline, document and investigate
4. Vertex AI Resource Check
- Navigate to Vertex AI
- Check for running training jobs or idle endpoints
- Confirm Workbench instances are stopped outside hours Action Required: Stop unnecessary resources
5. Compute Engine Instance Review
- Navigate to Compute Engine
- Scan for unauthorized/idle VMs Action Required: Document and escalate as needed
- Date: ___________
- Daily Budget Status: β On Track /
β οΈ Warning / π¨ Alert - Unexpected Resources Found: Yes / No
- Issues Requiring Follow-up: ___________
- Completed by: ___________
β±οΈ Estimated Time: 45β60 minutes
-
Generate Detailed Cost Reports
- Cost Reports β Time Range: βLast 7 daysβ
- Group by Service & Project
- Export to CSV Deliverable: Weekly trend analysis spreadsheet
-
Service-Level Deep Dive
- Filter for Vertex AI
- Analyze training vs. prediction, Storage, Compute Deliverable: Optimization recommendations
-
Usage Pattern Analysis
- Set to Daily granularity
- Compare weekend vs. weekday usage
- Identify off-peak scheduling opportunities Deliverable: Usage optimization schedule
-
Budget Performance Review
- Budgets & Alerts β Compare usage to thresholds Action Required: Adjust budgets if needed
-
Forecast Validation and Updates
- Pricing Calculator β Update saved estimates Deliverable: Updated cost forecast
- Week of: ___________
- Total Weekly Spend: $___________
- Variance: Β±___%
- Top Cost Driver: ___________
- Key Optimization: ___________
- Next Week Forecast: $___________
- Red Flags: ___________
- Analyst: ___________
β±οΈ Estimated Time: 2β3 hours
-
Budget Reconciliation
- Compare actual vs. budgeted for ML services Deliverable: Budget variance report
-
Implement Cost Optimizations
- Right-size, clean up unused resources Deliverable: Optimization log
-
Stakeholder Reporting
- Executive summary + trend charts + ROI update Deliverable: Leadership report
-
Cross-Platform Cost Comparison
- Update AWS vs. GCP matrix Deliverable: Cost-benefit analysis
-
Project Cost Allocation Review
- Group by Project
- Validate labeling & cost centers Deliverable: Allocation accuracy report
- Month: ___________
- Spend: $___________
- Variance: Β±___%
- Top Cost Area: ___________
- Savings Achieved: $___________
- ROI Status: On Track / Behind / Ahead
- Recommendations: ___________
- Prepared by: ___________
- Approved by: ___________
- Distribution: Finance, IT, ML Teams
β±οΈ Estimated Time: 4β6 hours
- Use 3-month cost data
- Update actual vs. projected TCO
- Recalculate migration ROI Deliverable: Updated 3-Year TCO Model
- Identify under-utilized resources
- Review and optimize scaling policies Deliverable: Optimization roadmap
- Forecast seasonal/project needs
- Update estimates for new initiatives Deliverable: Next quarter budget
- Quarter: Q___ 20___
- Variance: Β±___%
- Migration Savings: $___________
- Top Opportunities:
-
- Impact Assessment: ___________
- Recommendations: ___________
- Committee: ___________
1. Access Billing Console
- Navigate to Billing Overview
- Identify specific services causing overage
- Document exact overage amount and timeframe
2. Resource Usage Investigation
- Check Vertex AI for unexpected training jobs
- Review Compute Engine for unauthorized instances
- Examine Cloud Storage for data transfer spikes
3. Immediate Cost Controls
- Stop non-critical training jobs
- Scale down over-provisioned prediction endpoints
- Implement temporary spending limits if available
4. Stakeholder Notification
- Email finance team with initial findings
- Notify ML team leads of service disruptions
- Escalate to management if overage >25% of monthly budget
5. Root Cause Analysis
- Use Cost Reports to identify the source
- Check recent changes or deployments
- Document the timeline of contributing events
6. Prevention Planning
- Update budget thresholds
- Implement additional monitoring alerts
- Plan process improvements
1. Time-Based Analysis
- Set Cost Reports granularity to Hourly
- Identify cost spike window
- Correlate with recent deployments or updates
2. Service Identification
- Filter reports by Service
- Investigate unusual usage or anomalies
- Review supporting services for cascading costs
3. Resource Correlation
- Cross-reference with Cloud Monitoring metrics
- Identify resource scaling events or deviations
- Verify performance or configuration changes
4. Mitigation Implementation
- Apply immediate controls
- Enable targeted alerts
- Document the incident
| Alert Level | Escalation Path |
|---|---|
| >10% monthly budget variance | ML Team Lead β Finance Business Partner |
| >25% monthly budget variance | IT Ops Manager β Finance Manager β Program Director |
| >50% monthly budget variance | Finance Director β IT Director β Executive Leadership |
| >100% budget breach (Critical) | Full escalation + External vendor support |
- Finance Team: finance-ml@sysco.com |
- Cloud Operations: cloudops@sysco.com |
- Lead MLOPS Strategist: david.santana@sysco.com |
- Escalation Manager: cost-escalation@sysco.com |
- IT Operations Center: (555) 999-0000
- Finance Emergency Line: (555) 999-0001
- Executive On-Call: (555) 999-0002
- Google Cloud Support: Support Case Portal
- Account Manager: [From Google Cloud Console]
- TAM: [If applicable]
Runbook Maintenance
- Last Updated: [Date]
- Version: 1.0
- Next Review Date: [Quarterly]
- Owner: [Cost Management Team]
Change History
| Date | Version | Changes | Approved By |
|---|---|---|---|
| [Date] | 1.0 | Initial runbook creation | [Name] |
- Required Training: Google Cloud Cost Management Fundamentals
- Certification Renewal: Annual
- Training Records: Maintained in [System/Location]
Print and keep at desk for emergencies
- Open Billing Overview β Identify spike
- Review recent resource usage
- Stop non-critical services
- Email finance team findings
- Escalate if >25% budget variance
- β Billing overview dashboard
- β Budget alerts status
- β Monitoring dashboard
- β Resource scan
- β Issue documentation
KEY CONTACTS
- Finance: (555) 123-4567
- CloudOps: (555) 234-5678
- Emergency: (555) 999-0000
The financial discipline and analytical skills you've developed in this module will be essential throughout your cloud migration. Continue reviewing, optimizing, and aligning costs with strategic goals.
Congratulations! Youβve built a robust financial foundation for your AWS β GCP ML migration. The budgeting, tracking, and governance systems youβve implemented will drive success in future modules.
This workshop content is provided under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please review the contributing guidelines before submitting pull requests.
For help with this workshop:
- Create an issue in the repository
- Contact the workshop maintainers
- Review the troubleshooting section above
Β© 2025 - 2026 ML Migration Workshop Series

