[APPSEC-61865] Add AppSec setup for AWS Lambda#144
Conversation
Initialize AppSec context around each Lambda invocation, push request and response events through the AppSec gateway, and record security events on the aws.lambda span. The extension handles tag propagation to inferred spans. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adding an Exclude key overrides rubocop's default excludes, which includes vendor/**. Re-add it explicitly so CI lint passes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Precompiled FFI binaries have glibc mismatch with Lambda AL2 runtime, causing crashes when AppSec loads the libddwaf chain. Force source compilation and remove precompiled variants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add appsec-request handler with DD_APPSEC_ENABLED=true and an input event containing Arachni user-agent to trigger WAF detection. Snapshots will be recorded on first AWS deploy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wrap gateway pushes in catch(Datadog::AppSec::Ext::INTERRUPT) in both on_start and on_finish. When WAF decides to block, build a Lambda-shaped response override (statusCode/headers/body) via AppSec::Response. The listener exposes response_override for wrap to short-circuit the handler on request-phase blocks or replace the response on response-phase blocks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Normalize raw AWS event payloads into a standard key set before passing to DataContainer and Request. This removes v1/v2 detection from Request and aligns with dd-trace-rb's simplified WAFAddresses that consume standard keys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
V2 events were missing the `query` key, causing `server.request.query` WAF address to be empty for API Gateway V2 payloads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename add_appsec_tags to tag_and_keep, move from create_context to on_start for visibility - Pass cold_start flag from listener through to AppSec.on_start instead of tracking @oneshot_tags_sent module state - Align guard clause with Rack: return unless trace && span - Improve test quality: inline event values, use receive_messages, remove instance_variable_set for @request, relax unrelated assertions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The layer was built on ruby:X.Y (Debian Bookworm) which has libffi.so.8, but Lambda AL2 runtime only has libffi.so.6. Source-compiled FFI linked against .so.8 and crashed at runtime. Switch to public.ecr.aws/lambda/ruby:X.Y as builder so native extensions compile against the same system libraries available at runtime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generated with unreleased dd-trace-rb (appsec aws_lambda contrib). These snapshots will pass once the tracer is released and pinned. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| RUN MAKEFLAGS="-j$(nproc)" \ | ||
| gem install datadog-lambda --install-dir "/opt/ruby/gems/$runtime" --no-document | ||
| RUN MAKEFLAGS="-j$(nproc)" \ | ||
| gem install datadog -v 2.12 --install-dir "/opt/ruby/gems/$runtime" --no-document |
There was a problem hiding this comment.
This will be pinned to 2.32 release as the changes require is already merged.
6cf6883 to
442d864
Compare
zarirhamza
left a comment
There was a problem hiding this comment.
Nice clean split between this PR (event normalization + gateway push) and the dd-trace-rb side (Contrib::AwsLambda watcher + WAFAddresses). The normalized request schema lines up exactly with what WAFAddresses.from_request consumes — good interface boundary.
Found three bugs in the response-override / cross-invocation state plumbing that are worth fixing before merge. Details inline. Test suite is otherwise strong; just missing a couple of cases that would have caught these.
| cold = @is_cold_start | ||
| @listener&.on_start(event:, request_context: context, cold_start: cold) | ||
| @response = block.call | ||
| @response = @listener&.response_override || block.call |
There was a problem hiding this comment.
@response is never reset between invocations — stale-response bug.
@response is a module-level instance variable on Datadog::Lambda. Lambda execution environments reuse the runtime, so it persists across invocations. If invocation N succeeds with response X and invocation N+1's block.call raises a StandardError, the ensure block still runs:
@listener&.on_end(response: @response, request_context: context)…with @response still set to X from the previous invocation. The AppSec watcher then runs the WAF against the wrong response — potential incorrect blocking decision and a data-leak risk if the previous response carried sensitive data.
One-line fix at the top of wrap:
def self.wrap(event, context, &block)
@listener ||= initialize_listener
@response = nil
record_enhanced('invocations', context)
...| # rubocop:enable Metrics/AbcSize | ||
|
|
||
| def on_end(response:, request_context:) | ||
| @response_override = Datadog::Lambda::AppSec.on_finish(response) |
There was a problem hiding this comment.
on_end clobbers a request-time response_override with nil from on_finish.
AppSec.on_finish only returns the override hash when a response-time interrupt fires; otherwise its trailing expression is ... if interrupt_params, which evaluates to nil. So this assignment unconditionally clears any request-time override that was set in on_start.
It happens to still produce the right end-user response because wrap does @response = @listener.response_override || @response and the || @response catches the clobber. But @listener.response_override itself ends up nil when it should hold the override hash — anything else that reads it (future caller, telemetry, debug logging) sees the wrong value.
Guard the assignment:
def on_end(response:, request_context:)
finish_override = Datadog::Lambda::AppSec.on_finish(response)
@response_override = finish_override if finish_override
...| context.export_metrics | ||
| context.export_request_telemetry | ||
|
|
||
| response_override(interrupt_params, headers: @request.headers) if interrupt_params |
There was a problem hiding this comment.
@request.headers can NoMethodError on nil.
If on_start activated the AppSec context but then raised before @request = Request.from_normalized(event) ran (e.g. EventNormalizer.normalize blowing up on a malformed payload), @request stays at the @request = nil set at the top of on_start. The outer rescue StandardError swallows the error and does not deactivate the context. On the next invocation, on_finish finds an active context, hits a response-time interrupt, and dereferences nil.headers.
Two fixes worth pairing:
-
In
on_start, deactivate the context in arescuebefore swallowing, so a partially-initialized context doesn't survive into the next invocation:rescue StandardError => e Datadog::AppSec::Context.deactivate if Datadog::AppSec::Context.active Datadog::Utils.logger.debug("failed to start AppSec: #{e}") end
-
Guard
@request.headersinon_finish:response_override(interrupt_params, headers: @request&.headers || {}) if interrupt_params
| module AppSec | ||
| # Normalizes API Gateway v1/v2 event payloads into a standard key set. | ||
| module EventNormalizer | ||
| module_function |
There was a problem hiding this comment.
Asymmetric event shape between request and response paths.
This module normalizes the API Gateway event into a stable schema (method, path, headers, etc.) before it's pushed to the gateway, but the response path in AppSec.on_finish pushes the raw Lambda handler return value with its native camelCase keys (statusCode, headers). WAFAddresses.from_response reads payload['statusCode'] directly, so it works — but the asymmetry is easy to trip over.
Either a one-line comment here explaining "we deliberately don't normalize responses because the Lambda return value already has a canonical shape", or a thin ResponseNormalizer that's a no-op for now, would save the next maintainer a stare.
| @@ -0,0 +1,397 @@ | |||
| # frozen_string_literal: true | |||
There was a problem hiding this comment.
Tests are strong on the unit side, but miss the three bugs flagged in this review. Worth adding:
- Cross-invocation
@response: callDatadog::Lambda.wraptwice — first succeeds, second raises — asserton_endreceivesnil(or the new invocation's response), not the prior one's. Easy to write with a stubbed listener and twowrapcalls in the sameitblock. response_overridepreservation throughon_end: trigger a request-time blocking interrupt, run throughon_end, assert@listener.response_overrideis still the override hash (not clobbered tonil).- Partial-init context cleanup: stub
EventNormalizer.normalizeto raise; assertDatadog::AppSec::Context.deactivatewas called and the next invocation gets a fresh context.
The integration snapshots cover the happy blocking path well, but each invocation runs in a fresh container so they can't catch cross-invocation state issues.
Context
This is implementation of Endpoint Discovery & Correlation from Inferred Spans RFC.
What this PR does
Adds AppSec integration for AWS Lambda invocations. On each invocation the library:
aws.lambdaspanInferred spans and AppSec tag propagation to them are handled by the datadog-lambda-extension (
propagate_appsecinspan_inferrer.rs), so this PR does not create inferred spans or copy tags — it only fills in the AppSec data on the service-entry span.Layer build changes
ruby:X.Y(Debian Bookworm) topublic.ecr.aws/lambda/ruby:X.Y(AL2/AL2023). The previous image shipslibffi.so.8but Lambda runtime only haslibffi.so.6— native extensions compiled on Bookworm crash at runtime.libffi-develto build deps so FFI compiles from source against the correct system library.yum, Ruby 3.3+ (AL2023) usesdnf— auto-detected in Dockerfile.MAKEFLAGS="-j$(nproc)"on allgem installsteps.Build time improvement
gem install datadog-lambdagem install datadoggem install ffiTypes of changes
Test plan
bundle exec rubocop— 0 offensesbundle exec rake test— 75 examples, 0 failures