Scrubber eats too much memory by midigofrank · Pull Request #4310 · OpenFn/lightning

midigofrank · 2026-01-13T15:30:55Z

Description

This PR does 2 things:

Limits the number of sensitive the credential body can have. This is configurable using MAX_CREDENTIAL_SENSITIVE_VALUES. By default, it's 50
Removes duplicate sensitive values to be scrubbed. This way, we don't unnecessary iterations

Closes #4307

Validation steps

Start your server with MAX_CREDENTIAL_SENSITIVE_VALUES=2 mix phx.server
Try creating a raw json credential with multiple sensitive keys (more than 2). Please look at https://github.com/OpenFn/lightning/blob/main/lib/lightning/credentials/sensitive_values.ex#L11 to see safe keys. Safe keys should not count up

Please disclose how you've used AI in this work (it's cool, we just want to know!):

You can read more details in our Responsible AI Policy

Pre-submission checklist

I have performed a self-review of my code.
I have implemented and tested all related authorization policies. (e.g., :owner, :admin, :editor, :viewer)
I have updated the changelog.
I have ticked a box in "AI usage" in this PR

codecov · 2026-01-13T15:36:27Z

Codecov Report

❌ Patch coverage is 84.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.21%. Comparing base (2fcd819) to head (83062c4).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
...ing_web/live/credential_live/raw_body_component.ex	72.72%	3 Missing ⚠️
lib/lightning/credentials/credential_body.ex	90.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4310      +/-   ##
==========================================
- Coverage   89.29%   89.21%   -0.09%     
==========================================
  Files         425      425              
  Lines       19915    19940      +25     
==========================================
+ Hits        17783    17789       +6     
- Misses       2132     2151      +19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

elias-ba

Hey @midigofrank this is really a good solution for a hard problem. I have few observations and questions. Could you check them and let me know what you think ?

elias-ba · 2026-01-14T21:29:40Z


+    config :lightning, Lightning.Scrubber,
+      max_credential_sensitive_values:
+        env!("MAX_CREDENTIAL_SENSITIVE_VALUES", :integer, 50)


Should we add this to .env.example ? And also maybe to DEPLOYMENT.md ?

I didn't think it was necessary given the default was already a higher a value. But yeah, let me include it in the DEPLOYMENT.md docs

Agreeing with Frank here (disclaimer I rarely look at or use .env.example), while we usually throw in an exhaustive list of options in that file (essentially listing the defaults) it's such a niche option probably fine to not include it.

elias-ba · 2026-01-14T21:31:53Z

      {%{}, types}
      |> Ecto.Changeset.cast(params, [:value])
      |> maybe_add_error(valid?, error, touched)
+      |> validate_sensitive_values_count(body, touched)


This parses the body string again via parse_for_validation/1, but it was already parsed in parse_body/1 above. Could we consider passing the already-parsed body_json to avoid double parsing ?

No it doesn't. We need the body as a map and body_json is a binary. It uses the body and it only parses the body if it is a binary

I'm with Elias here, it does get parse twice when it's a string. we enter via validate_body, calling parse_body with a binary and validate_sensitive_value_count which called parse_for_validation. Also the the parse_body taking a string and parsing it as a map and then in another match it takes a non-empty map and encodes it as a string - thats hurting my brain!

Thinking some more, while I'm sure it does get parsed twice when it's a string; the previous (i.e. current) implementation in this component smells a bit now around the validation logic.

We don't use CredentialBody.changeset here, I'm not sure why though. That would require a bit of work to change.

elias-ba · 2026-01-14T21:33:31Z

+      {:error, "contains too many sensitive keys (250). Max allowed is 200"}
+  """
+  def validate_sensitive_values_count(body) do
+    max_values = Lightning.Config.max_credential_sensitive_values()


What happens to existing credentials that already exceed this limit? If a user tries to update an unrelated field on a credential with 100+ sensitive values, will the update fail? Could we consider whether a migration or audit is needed to identify any existing credentials over the limit ?

Nothing happens to them if they don't touch the body. In the prod db, the highest has 19. It's only the irregular one that has 500 and it will be dealt with at a user level.

elias-ba · 2026-01-14T21:48:56Z

      String.downcase(k) in @safe_keys || is_nil(v)
    end)
    |> Enum.map(fn {_k, v} -> v end)
+    |> Enum.uniq()


Good optimization. Just noting this subtly changes the return value. Previously duplicate values were included multiple times. Should be fine but worth a sanity check that no code path relied on the duplicate count.

Yeah, we're good. The scrubber looks for the values as it scrubs. Duplicate values would just mean that the scrubber would make extra iterations

…-eats-too-much-memory

midigofrank self-assigned this Jan 13, 2026

github-project-automation Bot added this to Core Jan 13, 2026

github-project-automation Bot moved this to New Issues in Core Jan 13, 2026

midigofrank added 3 commits January 14, 2026 11:51

limit the number of sensitive values a credential can have

37b1d42

remove duplicate values from the sensitive values to scrub

313c694

update changelog

42fb158

midigofrank force-pushed the 4307-scrubber-eats-too-much-memory branch from a150e2a to 42fb158 Compare January 14, 2026 09:05

ensure error message shows in the form

9060d23

midigofrank marked this pull request as ready for review January 14, 2026 13:14

midigofrank requested review from elias-ba and stuartc January 14, 2026 13:14

elias-ba requested changes Jan 14, 2026

View reviewed changes

github-project-automation Bot moved this from New Issues to In review in Core Jan 14, 2026

midigofrank added 2 commits January 15, 2026 11:49

document env in DEPLOYMENT.md

c033b6a

Merge branch 'main' of github.com:OpenFn/lightning into 4307-scrubber…

4a16e78

…-eats-too-much-memory

midigofrank requested a review from elias-ba January 15, 2026 08:50

Test for safe/unsafe keys at the same time

83062c4

stuartc approved these changes Jan 15, 2026

View reviewed changes

stuartc merged commit b2b2596 into main Jan 15, 2026
6 of 8 checks passed

stuartc deleted the 4307-scrubber-eats-too-much-memory branch January 15, 2026 14:43

github-project-automation Bot moved this from In review to Done in Core Jan 15, 2026

Conversation

midigofrank commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Validation steps

Pre-submission checklist

Uh oh!

codecov Bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

elias-ba left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

midigofrank commented Jan 13, 2026 •

edited

Loading

codecov Bot commented Jan 13, 2026 •

edited

Loading