Skip to content

Conversation

@DoneDeal0
Copy link
Owner

@DoneDeal0 DoneDeal0 commented Jan 1, 2026

🚀 NEW FEATURE: getTextDiff

import { getTextDiff } from "@donedeal0/superdiff";

Compares two texts and returns a structured diff at a character, word, or sentence level.

FORMAT

Input

  previousText: string | null | undefined,
  currentText: string | null | undefined,
  options?: {
    separation?: "character" | "word" | "sentence", // "word" by default
    accuracy?: "normal" | "high", // "normal" by default
    detectMoves?: boolean // false by default
    ignoreCase?: boolean, // false by default
    ignorePunctuation?: boolean, // false by default
    locale?: Intl.Locale | string // undefined by default
  }
  • previousText: the original text.
  • currentText: the current text.
  • options
    • separation whether you want a character, word or sentence based diff.
    • accuracy:
      • normal (default): fastest mode, simple tokenization.
      • high: slower but exact tokenization. Handles all language subtleties (Unicode, emoji, CJK scripts, locale‑aware segmentation when a locale is provided).
    • detectMoves:
      • false (default): optimized for readability. Token moves are ignored so insertions don’t cascade and break equality (recommended for UI diffing).
      • true: semantically precise, but noisier — a single insertion shifts all following tokens, breaking equality.
    • ignoreCase: if true, hello and HELLO are considered equal.
    • ignorePunctuation: if true, hello! and hello are considered equal.
    • locale: the locale of your text. Enables locale‑aware segmentation in high accuracy mode.

Output

type TextDiff = {
  type: "text";
  status: "added" | "deleted" | "equal" | "updated";
  diff: {
    value: string;
    index: number | null;
    previousValue?: string;
    previousIndex: number | null;
    status: "added" | "deleted" | "equal" | "moved" | "updated";
  }[];
};

USAGE

WITHOUT MOVES DETECTION

This is the default output. Token moves are ignored so insertions don’t cascade and break equality. Updates are rendered as two entries (added + deleted). The algorithm uses longest common subsequence (LCS), similar to GitHub diffs.

Input

getTextDiff(
- "The brown fox jumped high",
+ "The orange cat has jumped",
{ detectMoves: false, separation: "word" }
);

Output

{
      type: "text",
+     status: "updated",
      diff: [
        {
          value: 'The',
          index: 0,
          previousIndex: 0,
          status: 'equal',
        },
-       {
-         value: "brown",
-         index: null,
-         previousIndex: 1,
-         status: "deleted",
-       },
-       {
-         value: "fox",
-         index: null,
-         previousIndex: 2,
-         status: "deleted",
-       },
+       {
+         value: "orange",
+         index: 1,
+         previousIndex: null,
+         status: "added",
+       },
+       {
+         value: "cat",
+         index: 2,
+         previousIndex: null,
+         status: "added",
+       },
+       {
+         value: "has",
+         index: 3,
+         previousIndex: null,
+         status: "added",
+       },
        {
          value: "jumped",
          index: 4,
          previousIndex: 3,
          status: "equal",
        },
-       {
-         value: "high",
-         index: null,
-         previousIndex: 4,
-         status: "deleted",
-       }
      ],
    }

WITH MOVE DETECTION

If you prefer a semantically precise diff, activate the detectMoves option. Direct token swaps are considered updated.

Input

getTextDiff(
- "The brown fox jumped high",
+ "The orange cat has jumped",
{ detectMoves: true, separation: "word" }
);

Output

{
      type: "text",
+     status: "updated",
      diff: [
        {
          value: 'The',
          index: 0,
          previousIndex: 0,
          status: 'equal',
        },
+       {
+         value: "orange",
+         index: 1,
+         previousValue: "brown",
+         previousIndex: null,
+         status: "updated",
+       },
+       {
+         value: "cat",
+         index: 2,
+         previousValue: "fox",
+         previousIndex: null,
+         status: "updated",
+       },
+       {
+         value: "has",
+         index: 3,
+         previousIndex: null,
+         status: "added",
+       },
+       {
+         value: "jumped",
+         index: 4,
+         previousIndex: 3,
+         status: "moved",
+       },
-       {
-         value: "high",
-         index: null,
-         previousIndex: 4,
-         status: "deleted",
-       }
      ],
    }

📊 BENCHMARK

Scenario Superdiff diff
10k words 1.13 ms 3.68 ms
100k words 21.68 ms 45.93 ms
10k sentences 2.30 ms 5.61 ms
100k sentences 21.95 ms 62.03 ms

(Superdiff uses its normal accuracy settings to match diff's behavior)

@DoneDeal0 DoneDeal0 force-pushed the text-diff branch 10 times, most recently from ebfc55a to 6cf30a4 Compare January 7, 2026 20:35
@DoneDeal0 DoneDeal0 force-pushed the text-diff branch 3 times, most recently from 943efdf to aee6be8 Compare January 11, 2026 14:48
@DoneDeal0 DoneDeal0 self-assigned this Jan 11, 2026
@DoneDeal0 DoneDeal0 force-pushed the main branch 10 times, most recently from 42b0ec3 to 8d21774 Compare January 12, 2026 19:59
@DoneDeal0 DoneDeal0 changed the title [DRAFT] getTextDiff getTextDiff Jan 21, 2026
@DoneDeal0 DoneDeal0 force-pushed the text-diff branch 2 times, most recently from 6ef2035 to 3d054eb Compare January 26, 2026 19:44
@DoneDeal0 DoneDeal0 force-pushed the text-diff branch 5 times, most recently from 43173c7 to 0e3ec70 Compare February 1, 2026 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants