Skip to content

[APPSEC-61865] Add AppSec setup for AWS Lambda#144

Open
Strech wants to merge 15 commits into
mainfrom
appsec-61865-add-appsec-setup
Open

[APPSEC-61865] Add AppSec setup for AWS Lambda#144
Strech wants to merge 15 commits into
mainfrom
appsec-61865-add-appsec-setup

Conversation

@Strech
Copy link
Copy Markdown
Member

@Strech Strech commented Apr 30, 2026

Context

This is implementation of Endpoint Discovery & Correlation from Inferred Spans RFC.

What this PR does

Adds AppSec integration for AWS Lambda invocations. On each invocation the library:

  • Initializes an AppSec context on the aws.lambda span
  • Pushes request/response events through the AppSec gateway
  • Records security events and exports telemetry

Inferred spans and AppSec tag propagation to them are handled by the datadog-lambda-extension (propagate_appsec in span_inferrer.rs), so this PR does not create inferred spans or copy tags — it only fills in the AppSec data on the service-entry span.

Layer build changes

  • Switched builder base image from ruby:X.Y (Debian Bookworm) to public.ecr.aws/lambda/ruby:X.Y (AL2/AL2023). The previous image ships libffi.so.8 but Lambda runtime only has libffi.so.6 — native extensions compiled on Bookworm crash at runtime.
  • Added libffi-devel to build deps so FFI compiles from source against the correct system library.
  • AL2/AL2023 compatibility: Ruby 3.2 (AL2) uses yum, Ruby 3.3+ (AL2023) uses dnf — auto-detected in Dockerfile.
  • Pinned FFI to 1.17.4 and force-recompile from source after datadog gem install.
  • Parallel native compilation via MAKEFLAGS="-j$(nproc)" on all gem install steps.

Build time improvement

Step Before After
gem install datadog-lambda ~2 min ~2 min
gem install datadog ~6 min ~5 min
gem install ffi ~3 min ~2 min
Total ~11 min ~9 min

Types of changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Test plan

  • bundle exec rubocop — 0 offenses
  • bundle exec rake test — 75 examples, 0 failures
  • Verified zero inferred span references in the diff
  • Integration test snapshots generated on CI (Lambda AL2 base image, all 3 runtimes)
  • Snapshots will pass once dd-trace-rb AppSec aws_lambda contrib is released and pinned

Strech and others added 6 commits April 30, 2026 20:07
Initialize AppSec context around each Lambda invocation, push request and
response events through the AppSec gateway, and record security events on
the aws.lambda span. The extension handles tag propagation to inferred spans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adding an Exclude key overrides rubocop's default excludes, which
includes vendor/**. Re-add it explicitly so CI lint passes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Precompiled FFI binaries have glibc mismatch with Lambda AL2 runtime,
causing crashes when AppSec loads the libddwaf chain. Force source
compilation and remove precompiled variants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add appsec-request handler with DD_APPSEC_ENABLED=true and an input
event containing Arachni user-agent to trigger WAF detection. Snapshots
will be recorded on first AWS deploy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wrap gateway pushes in catch(Datadog::AppSec::Ext::INTERRUPT) in both
on_start and on_finish. When WAF decides to block, build a Lambda-shaped
response override (statusCode/headers/body) via AppSec::Response.

The listener exposes response_override for wrap to short-circuit the
handler on request-phase blocks or replace the response on response-phase
blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Normalize raw AWS event payloads into a standard key set before
passing to DataContainer and Request. This removes v1/v2 detection
from Request and aligns with dd-trace-rb's simplified WAFAddresses
that consume standard keys.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@y9v y9v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool stuff!

Strech and others added 2 commits May 6, 2026 14:43
V2 events were missing the `query` key, causing `server.request.query`
WAF address to be empty for API Gateway V2 payloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename add_appsec_tags to tag_and_keep, move from create_context to
  on_start for visibility
- Pass cold_start flag from listener through to AppSec.on_start instead
  of tracking @oneshot_tags_sent module state
- Align guard clause with Rack: return unless trace && span
- Improve test quality: inline event values, use receive_messages,
  remove instance_variable_set for @request, relax unrelated assertions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Strech Strech marked this pull request as ready for review May 6, 2026 19:04
@Strech Strech requested review from a team as code owners May 6, 2026 19:04
Comment thread Dockerfile Outdated
Comment thread Dockerfile Outdated
Strech and others added 2 commits May 8, 2026 11:34
The layer was built on ruby:X.Y (Debian Bookworm) which has libffi.so.8,
but Lambda AL2 runtime only has libffi.so.6. Source-compiled FFI linked
against .so.8 and crashed at runtime.

Switch to public.ecr.aws/lambda/ruby:X.Y as builder so native extensions
compile against the same system libraries available at runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generated with unreleased dd-trace-rb (appsec aws_lambda contrib).
These snapshots will pass once the tracer is released and pinned.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread Dockerfile Outdated
RUN MAKEFLAGS="-j$(nproc)" \
gem install datadog-lambda --install-dir "/opt/ruby/gems/$runtime" --no-document
RUN MAKEFLAGS="-j$(nproc)" \
gem install datadog -v 2.12 --install-dir "/opt/ruby/gems/$runtime" --no-document
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be pinned to 2.32 release as the changes require is already merged.

@Strech Strech force-pushed the appsec-61865-add-appsec-setup branch from 6cf6883 to 442d864 Compare May 8, 2026 10:54
Copy link
Copy Markdown
Contributor

@zarirhamza zarirhamza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean split between this PR (event normalization + gateway push) and the dd-trace-rb side (Contrib::AwsLambda watcher + WAFAddresses). The normalized request schema lines up exactly with what WAFAddresses.from_request consumes — good interface boundary.

Found three bugs in the response-override / cross-invocation state plumbing that are worth fixing before merge. Details inline. Test suite is otherwise strong; just missing a couple of cases that would have caught these.

Comment thread lib/datadog/lambda.rb
cold = @is_cold_start
@listener&.on_start(event:, request_context: context, cold_start: cold)
@response = block.call
@response = @listener&.response_override || block.call
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@response is never reset between invocations — stale-response bug.

@response is a module-level instance variable on Datadog::Lambda. Lambda execution environments reuse the runtime, so it persists across invocations. If invocation N succeeds with response X and invocation N+1's block.call raises a StandardError, the ensure block still runs:

@listener&.on_end(response: @response, request_context: context)

…with @response still set to X from the previous invocation. The AppSec watcher then runs the WAF against the wrong response — potential incorrect blocking decision and a data-leak risk if the previous response carried sensitive data.

One-line fix at the top of wrap:

def self.wrap(event, context, &block)
  @listener ||= initialize_listener
  @response = nil
  record_enhanced('invocations', context)
  ...

# rubocop:enable Metrics/AbcSize

def on_end(response:, request_context:)
@response_override = Datadog::Lambda::AppSec.on_finish(response)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on_end clobbers a request-time response_override with nil from on_finish.

AppSec.on_finish only returns the override hash when a response-time interrupt fires; otherwise its trailing expression is ... if interrupt_params, which evaluates to nil. So this assignment unconditionally clears any request-time override that was set in on_start.

It happens to still produce the right end-user response because wrap does @response = @listener.response_override || @response and the || @response catches the clobber. But @listener.response_override itself ends up nil when it should hold the override hash — anything else that reads it (future caller, telemetry, debug logging) sees the wrong value.

Guard the assignment:

def on_end(response:, request_context:)
  finish_override = Datadog::Lambda::AppSec.on_finish(response)
  @response_override = finish_override if finish_override
  ...

context.export_metrics
context.export_request_telemetry

response_override(interrupt_params, headers: @request.headers) if interrupt_params
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@request.headers can NoMethodError on nil.

If on_start activated the AppSec context but then raised before @request = Request.from_normalized(event) ran (e.g. EventNormalizer.normalize blowing up on a malformed payload), @request stays at the @request = nil set at the top of on_start. The outer rescue StandardError swallows the error and does not deactivate the context. On the next invocation, on_finish finds an active context, hits a response-time interrupt, and dereferences nil.headers.

Two fixes worth pairing:

  1. In on_start, deactivate the context in a rescue before swallowing, so a partially-initialized context doesn't survive into the next invocation:

    rescue StandardError => e
      Datadog::AppSec::Context.deactivate if Datadog::AppSec::Context.active
      Datadog::Utils.logger.debug("failed to start AppSec: #{e}")
    end
  2. Guard @request.headers in on_finish:

    response_override(interrupt_params, headers: @request&.headers || {}) if interrupt_params

module AppSec
# Normalizes API Gateway v1/v2 event payloads into a standard key set.
module EventNormalizer
module_function
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asymmetric event shape between request and response paths.

This module normalizes the API Gateway event into a stable schema (method, path, headers, etc.) before it's pushed to the gateway, but the response path in AppSec.on_finish pushes the raw Lambda handler return value with its native camelCase keys (statusCode, headers). WAFAddresses.from_response reads payload['statusCode'] directly, so it works — but the asymmetry is easy to trip over.

Either a one-line comment here explaining "we deliberately don't normalize responses because the Lambda return value already has a canonical shape", or a thin ResponseNormalizer that's a no-op for now, would save the next maintainer a stare.

@@ -0,0 +1,397 @@
# frozen_string_literal: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are strong on the unit side, but miss the three bugs flagged in this review. Worth adding:

  • Cross-invocation @response: call Datadog::Lambda.wrap twice — first succeeds, second raises — assert on_end receives nil (or the new invocation's response), not the prior one's. Easy to write with a stubbed listener and two wrap calls in the same it block.
  • response_override preservation through on_end: trigger a request-time blocking interrupt, run through on_end, assert @listener.response_override is still the override hash (not clobbered to nil).
  • Partial-init context cleanup: stub EventNormalizer.normalize to raise; assert Datadog::AppSec::Context.deactivate was called and the next invocation gets a fresh context.

The integration snapshots cover the happy blocking path well, but each invocation runs in a fresh container so they can't catch cross-invocation state issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants