diff --git a/docs/codeql/codeql-language-guides/codeql-for-javascript.rst b/docs/codeql/codeql-language-guides/codeql-for-javascript.rst index 35df0b91752b..141acb915b70 100644 --- a/docs/codeql/codeql-language-guides/codeql-for-javascript.rst +++ b/docs/codeql/codeql-language-guides/codeql-for-javascript.rst @@ -18,6 +18,7 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat abstract-syntax-tree-classes-for-working-with-javascript-and-typescript-programs data-flow-cheat-sheet-for-javascript customizing-library-models-for-javascript + migrating-javascript-dataflow-queries - :doc:`Basic query for JavaScript and TypeScript code `: Learn to write and run a simple CodeQL query. @@ -37,4 +38,6 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat - :doc:`Data flow cheat sheet for JavaScript `: This article describes parts of the JavaScript libraries commonly used for variant analysis and in data flow queries. -- :doc:`Customizing library models for JavaScript `: You can model frameworks and libraries that your codebase depends on using data extensions and publish them as CodeQL model packs. +- :doc:`Customizing library models for JavaScript `: You can model frameworks and libraries that your codebase depends on using data extensions and publish them as CodeQL model packs. + +- :doc:`Migrating JavaScript dataflow queries `: Guide on migrating data flow queries to the new data flow library. diff --git a/docs/codeql/codeql-language-guides/migrating-javascript-dataflow-queries.rst b/docs/codeql/codeql-language-guides/migrating-javascript-dataflow-queries.rst new file mode 100644 index 000000000000..e531695d9c01 --- /dev/null +++ b/docs/codeql/codeql-language-guides/migrating-javascript-dataflow-queries.rst @@ -0,0 +1,301 @@ +.. _migrating-javascript-dataflow-queries: + +Migrating JavaScript Dataflow Queries +===================================== + +The JavaScript analysis used to have its own data flow library, which differed from the shared data flow +library used by other languages. This library has now been deprecated in favor of the shared library. + +This article explains how to migrate JavaScript data flow queries to use the shared data flow library, +and some important differences to be aware of. Note that the article on :ref:`analyzing data flow in JavaScript and TypeScript ` +provides a general guide to the new data flow library, whereas this article aims to help with migrating existing queries from the old data flow library. + +Note that the ``DataFlow::Configuration`` class is still backed by the original data flow library, but has been marked as deprecated. +This means data flow queries using this class will continue to work, albeit with deprecation warnings, until the 1-year deprecation period expires in early 2026. +It is recommended that all custom queries are migrated before this time, to ensure they continue to work in the future. + +Data flow queries should be migrated to use ``DataFlow::ConfigSig``-style modules instead of the ``DataFlow::Configuration`` class. +This is identical to the interface found in other languages. +When making this switch, the query will become backed by the shared data flow library instead. That is, data flow queries will only work +with the shared data flow library when they have been migrated to ``ConfigSig``-style, as shown in the following table: + +.. list-table:: Data flow libraries + :widths: 20 80 + :header-rows: 1 + + * - API + - Implementation + * - ``DataFlow::Configuration`` + - Old library (deprecated, to be removed in early 2026) + * - ``DataFlow::ConfigSig`` + - Shared library + +A straightforward translation to ``DataFlow::ConfigSig``-style is usually possible, although there are some complications +that may cause the query to behave differently. +We'll first cover some straightforward migration examples, and then go over some of the complications that may arise. + +Simple migration example +------------------------ + +A simple example of a query using the old data flow library is shown below: + +.. code-block:: ql + + /** @kind path-problem */ + import javascript + import DataFlow::PathGraph + + class MyConfig extends DataFlow::Configuration { + MyConfig() { this = "MyConfig" } + + override predicate isSource(DataFlow::Node node) { ... } + + override predicate isSink(DataFlow::Node node) { ... } + } + + from MyConfig cfg, DataFlow::PathNode source, DataFlow::PathNode sink + where cfg.hasFlowPath(source, sink) + select sink, source, sink, "Flow found" + +With the new style this would look like this: + +.. code-block:: ql + + /** @kind path-problem */ + import javascript + + module MyConfig implements DataFlow::ConfigSig { + predicate isSource(DataFlow::Node node) { ... } + + predicate isSink(DataFlow::Node node) { ... } + } + + module MyFlow = DataFlow::Global; + + import MyFlow::PathGraph + + from MyFlow::PathNode source, MyFlow::PathNode sink + where MyFlow::flowPath(source, sink) + select sink, source, sink, "Flow found" + +The changes can be summarized as: + +- The ``DataFlow::Configuration`` class was replaced with a module implementing ``DataFlow::ConfigSig``. +- The characteristic predicate was removed (modules have no characteristic predicates). +- Predicates such as ``isSource`` no longer have the ``override`` keyword (as they are defined in a module now). +- The configuration module is being passed to ``DataFlow::Global``, resulting in a new module, called ``MyFlow`` in this example. +- The query imports ``MyFlow::PathGraph`` instead of ``DataFlow::PathGraph``. +- The ``MyConfig cfg`` variable was removed from the ``from`` clause. +- The ``hasFlowPath`` call was replaced with ``MyFlow::flowPath``. +- The type ``DataFlow::PathNode`` was replaced with ``MyFlow::PathNode``. + +With these changes, we have produced an equivalent query that is backed by the new data flow library. + +Taint tracking +-------------- + +For configuration classes extending ``TaintTracking::Configuration``, the migration is similar but with a few differences: + +- The ``TaintTracking::Global`` module should be used instead of ``DataFlow::Global``. +- Some predicates originating from ``TaintTracking::Configuration`` should be renamed to match the ``DataFlow::ConfigSig`` interface: + - ``isSanitizer`` should be renamed to ``isBarrier``. + - ``isAdditionalTaintStep`` should be renamed to ``isAdditionalFlowStep``. + +Note that there is no such thing as ``TaintTracking::ConfigSig``. The ``DataFlow::ConfigSig`` interface is used for both data flow and taint tracking. + +For example: + +.. code-block:: ql + + class MyConfig extends TaintTracking::Configuration { + MyConfig() { this = "MyConfig" } + + predicate isSanitizer(DataFlow::Node node) { ... } + predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) { ... } + ... + } + +The above configuration can be migrated to the shared data flow library as follows: + +.. code-block:: ql + + module MyConfig implements DataFlow::ConfigSig { + predicate isBarrier(DataFlow::Node node) { ... } + predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) { ... } + ... + } + + module MyFlow = TaintTracking::Global; + + +Flow labels and flow states +--------------------------- + +The ``DataFlow::FlowLabel`` class has been deprecated. Queries that relied on flow labels should use the new `flow state` concept instead. +This is done by implementing ``DataFlow::StateConfigSig`` instead of ``DataFlow::ConfigSig``, and passing the module to ``DataFlow::GlobalWithState`` +or ``TaintTracking::GlobalWithState``. See :ref:`using flow state ` for more details about flow state. + +Some changes to be aware of: + +- The 4-argument version of ``isAdditionalFlowStep`` now takes parameters in a different order. + It now takes ``node1, state1, node2, state2`` instead of ``node1, node2, state1, state2``. +- Taint steps apply to all flow states, not just the ``taint`` flow label. See more details further down in this article. + +Barrier guards +-------------- + +The predicates ``isBarrierGuard`` and ``isSanitizerGuard`` have been removed. + +Instead, the ``isBarrier`` predicate must be used to define all barriers. To do this, barrier guards can be reduced to a set of barrier nodes using the ``DataFlow::MakeBarrierGuard`` module. + +For example, consider this data flow configuration using a barrier guard: + +.. code-block:: ql + + class MyConfig extends DataFlow::Configuration { + override predicate isBarrierGuard(DataFlow::BarrierGuardNode node) { + node instanceof MyBarrierGuard + } + .. + } + + class MyBarrierGuard extends DataFlow::BarrierGuardNode { + MyBarrierGuard() { ... } + + override predicate blocks(Expr e, boolean outcome) { ... } + } + +This can be migrated to the shared data flow library as follows: + +.. code-block:: ql + + module MyConfig implements DataFlow::ConfigSig { + predicate isBarrier(DataFlow::Node node) { + node = DataFlow::MakeBarrierGuard::getABarrierNode() + } + .. + } + + class MyBarrierGuard extends DataFlow::Node { + MyBarrierGuard() { ... } + + predicate blocksExpr(Expr e, boolean outcome) { ... } + } + +The changes can be summarized as: +- The contents of ``isBarrierGuard`` have been moved to ``isBarrier``. +- The ``node instanceof MyBarrierGuard`` check was replaced with ``node = DataFlow::MakeBarrierGuard::getABarrierNode()``. +- The ``MyBarrierGuard`` class no longer has ``DataFlow::BarrierGuardNode`` as a base class. We simply use ``DataFlow::Node`` instead. +- The ``blocks`` predicate has been renamed to ``blocksExpr`` and no longer has the ``override`` keyword. + +See :ref:`using flow state ` for examples of how to use barrier guards with flow state. + +Query-specific load and store steps +----------------------------------- + +The predicates ``isAdditionalLoadStep``, ``isAdditionalStoreStep``, and ``isAdditionalLoadStoreStep`` have been removed. There is no way to emulate the original behavior. + +Library models can still contribute such steps, but they will be applicable to all queries. Also see the section on jump steps further down. + +Changes in behavior +-------------------- + +When the query has been migrated to the new interface, it may seem to behave differently due to some technical differences in the internals of +the two data flow libraries. The most significant changes are described below. + +Taint steps now propagate all flow states +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There's an important change from the old data flow library when using flow state and taint-tracking together. + +When using ``TaintTracking::GlobalWithState``, all flow states can propagate along taint steps. +In the old data flow library, only the ``taint`` flow label could propagate along taint steps. +A straightforward translation of such a query may therefore result in new flow paths being found, which might be unexpected. + +To emulate the old behavior, use ``DataFlow::GlobalWithState`` instead of ``TaintTracking::GlobalWithState``, +and manually add taint steps using ``isAdditionalFlowStep``. The predicate ``TaintTracking::defaultTaintStep`` can be used to access to the set of taint steps. + +For example: + +.. code-block:: ql + + module MyConfig implements DataFlow::StateConfigSig { + class FlowState extends string { + FlowState() { this = ["taint", "foo"] } + } + + predicate isAdditionalFlowStep(DataFlow::Node node1, FlowState state1, DataFlow::Node node2, FlowState state2) { + // Allow taint steps to propagate the "taint" flow state + TaintTracking::defaultTaintStep(node1, node2) and + state1 = "taint" and + state2 = state + } + + ... + } + + module MyFlow = DataFlow::GlobalWithState; + + +Jump steps across function boundaries +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a flow step crosses a function boundary, that is, it starts and ends in two different functions, it will now be classified as a "jump" step. + +Jump steps can be problematic in some cases. Roughly speaking, the data flow library will "forget" which call site it came from when following a jump step. +This can lead to spurious flow paths that go into a function through one call site, and back out of a different call site. + +If the step was generated by a library model, that is, the step is applicable to all queries, this is best mitigated by converting the step to a flow summary. +For example, the following library model adds a taint step from ``x`` to ``y`` in ``foo.bar(x, y => {})``: + +.. code-block:: ql + + class MyStep extends TaintTracking::SharedTaintStep { + override predicate step(DataFlow::Node node1, DataFlow::Node node2) { + exists(DataFlow::CallNode call | + call = DataFlow::moduleMember("foo", "bar").getACall() and + node1 = call.getArgument(0) and + node2 = call.getCallback(1).getParameter(0) + ) + } + } + +Because this step crosses a function boundary, it becomes a jump step. This can be avoided by converting it to a flow summary as follows: + +.. code-block:: ql + + class MySummary extends DataFlow::SummarizedCallable { + MySummary() { this = "MySummary" } + + override DataFlow::CallNode getACall() { result = DataFlow::moduleMember("foo", "bar").getACall() } + + override predicate propagatesFlow(string input, string output, boolean preservesValue) { + input = "Argument[this]" and + output = "Argument[1].Parameter[0]" and + preservesValue = false // taint step + } + } + +See :ref:`customizing library models for JavaScript ` for details about the format of the ``input`` and ``output`` strings. +The aforementioned article also provides guidance on how to store the flow summary in a data extension. + +For query-specific steps that cross function boundaries, that is, steps added with ``isAdditionalFlowStep``, there is currently no way to emulate the original behavior. +A possible workaround is to convert the query-specific step to a flow summary. In this case it should be stored in a data extension to avoid performance issues, although this also means +that all other queries will be able to use the flow summary. + +Barriers block all flows +~~~~~~~~~~~~~~~~~~~~~~~~ + +In the shared data flow library, a barrier blocks all flows, even if the tracked value is inside a content. + +In the old data flow library, only barriers specific to the ``data`` flow label blocked flows when the tracked value was inside a content. + +This rarely has significant impact, but some users may observe some result changes because of this. + +There is currently no way to emulate the original behavior. + +Further reading +--------------- + +- :ref:`Analyzing data flow in JavaScript and TypeScript ` provides a general guide to the new data flow library. +- :ref:`Using flow state for precise data flow analysis ` provides a general guide on using flow state. diff --git a/javascript/ql/lib/change-notes/2025-01-07-dataflow-deprecation.md b/javascript/ql/lib/change-notes/2025-01-07-dataflow-deprecation.md new file mode 100644 index 000000000000..411a836d653f --- /dev/null +++ b/javascript/ql/lib/change-notes/2025-01-07-dataflow-deprecation.md @@ -0,0 +1,6 @@ +--- +category: deprecated +--- +* Custom data flow queries will need to be migrated in order to use the shared data flow library. Until migrated, such queries will compile with deprecation warnings and run with a + deprecated copy of the old data flow library. The deprecation layer will be removed in early 2026, after which any unmigrated queries will stop working. + See more information in the [migration guide](https://codeql.github.com/docs/codeql-language-guides/migrating-javascript-dataflow-queries). diff --git a/javascript/ql/lib/change-notes/2025-01-07-dataflow.md b/javascript/ql/lib/change-notes/2025-01-07-dataflow.md new file mode 100644 index 000000000000..9c94c47cd377 --- /dev/null +++ b/javascript/ql/lib/change-notes/2025-01-07-dataflow.md @@ -0,0 +1,5 @@ +--- +category: majorAnalysis +--- +* All data flow queries are now using the same underlying data flow library as the other languages analyses, replacing the old one written specifically for JavaScript/TypeScript. + This is a significant change and users may consequently observe differences in the alerts generated by the analysis. diff --git a/javascript/ql/lib/semmle/javascript/dataflow/Configuration.qll b/javascript/ql/lib/semmle/javascript/dataflow/Configuration.qll index bb032596475e..4f6f74561504 100644 --- a/javascript/ql/lib/semmle/javascript/dataflow/Configuration.qll +++ b/javascript/ql/lib/semmle/javascript/dataflow/Configuration.qll @@ -6,10 +6,6 @@ * Additional data flow edges can be specified, and conversely certain nodes or * edges can be designated as _barriers_ that block flow. * - * NOTE: The API of this library is not stable yet and may change in - * the future. - * - * * # Technical overview * * This module implements a summarization-based inter-procedural data flow @@ -76,6 +72,11 @@ private import semmle.javascript.internal.CachedStages private import AdditionalFlowSteps /** + * DEPRECATED. + * Subclasses of this class should be replaced by a module implementing the new `ConfigSig` or `StateConfigSig` interface. + * See the [migration guide](https://codeql.github.com/docs/codeql-language-guides/migrating-javascript-dataflow-queries) for more details. + * + * #### Legacy documentation * A data flow tracking configuration for finding inter-procedural paths from * sources to sinks. * diff --git a/javascript/ql/lib/semmle/javascript/dataflow/TaintTracking.qll b/javascript/ql/lib/semmle/javascript/dataflow/TaintTracking.qll index 7f8094caa4b2..237c7c45dd66 100644 --- a/javascript/ql/lib/semmle/javascript/dataflow/TaintTracking.qll +++ b/javascript/ql/lib/semmle/javascript/dataflow/TaintTracking.qll @@ -8,9 +8,6 @@ * substrings. As for data flow configurations, additional flow edges can be * specified, and conversely certain nodes or edges can be designated as taint * _sanitizers_ that block flow. - * - * NOTE: The API of this library is not stable yet and may change in - * the future. */ import javascript @@ -27,6 +24,13 @@ module TaintTracking { import AdditionalTaintSteps /** + * DEPRECATED. + * Subclasses of this class should be replaced by a module implementing the new `ConfigSig` or `StateConfigSig` interface. + * See the [migration guide](https://codeql.github.com/docs/codeql-language-guides/migrating-javascript-dataflow-queries) for more details. + * + * When migrating a `TaintTracking::Configuration` to `DataFlow::ConfigSig`, use `TaintTracking::Global<...>` instead of `DataFlow::Global<...>`. + * + * #### Legacy documentation * A data flow tracking configuration that considers taint propagation through * objects, arrays, promises and strings in addition to standard data flow. *