Currently, most From Source CI testing jobs are fine, but there are a few cases where the From Source job builds and tests a non-trivial portion of the repository. Those jobs can take a long time to run (>30 minutes) and commonly see failures due to intermittent test flakiness, either due to timing issues in the test and resource contention in the runner or due to external issues such as pulling Test Proxy test recordings or timeouts in Test Proxy due to resource contention. We should look into a potential redesign of the From Source job.
A potential redesign would be leveraging an initializer stage that generates the job matrix on the fly rather than using a static job matrix. This would then allow for non-From Source jobs to be injected as-is, as they're relatively scoped, while allowing the From Source job to be potentially split into many jobs if it will be building and testing a large portion of the repository. This logic could go as follows:
- Determine the number of projects that will be built and tested using a modified version of
Generate FromSource POM.
- If under a certain number of projects, use a single
From Source job as we do today. If over a certain number of projects, do the following.
- Select projects to be considered the
From Source entry point based on the number of downstream projects the trigger to be tested. For example, azure-storage-common, the common dependency for azure-storage-* libraries which are frequently used by other SDKs could be a root as it triggers many libraries to be built. /sdk/core and /sdk/identity libraries would be excluded as they effectively cause the entire repo to be built.
- Each root project would trigger a
From Source job to be injected into the matrix.
Making this change would split large From Source jobs into a few smaller From Source jobs. While overall CI time usage would go up as it would cause common functionality to be repeated and may result in library build and testing overlap, it would remove long running and frequently flaky jobs resulting in a better overall experience.
Currently, most
From SourceCI testing jobs are fine, but there are a few cases where theFrom Sourcejob builds and tests a non-trivial portion of the repository. Those jobs can take a long time to run (>30 minutes) and commonly see failures due to intermittent test flakiness, either due to timing issues in the test and resource contention in the runner or due to external issues such as pulling Test Proxy test recordings or timeouts in Test Proxy due to resource contention. We should look into a potential redesign of theFrom Sourcejob.A potential redesign would be leveraging an initializer stage that generates the job matrix on the fly rather than using a static job matrix. This would then allow for non-
From Sourcejobs to be injected as-is, as they're relatively scoped, while allowing theFrom Sourcejob to be potentially split into many jobs if it will be building and testing a large portion of the repository. This logic could go as follows:Generate FromSource POM.From Sourcejob as we do today. If over a certain number of projects, do the following.From Sourceentry point based on the number of downstream projects the trigger to be tested. For example,azure-storage-common, the common dependency forazure-storage-*libraries which are frequently used by other SDKs could be a root as it triggers many libraries to be built./sdk/coreand/sdk/identitylibraries would be excluded as they effectively cause the entire repo to be built.From Sourcejob to be injected into the matrix.Making this change would split large
From Sourcejobs into a few smallerFrom Sourcejobs. While overall CI time usage would go up as it would cause common functionality to be repeated and may result in library build and testing overlap, it would remove long running and frequently flaky jobs resulting in a better overall experience.