Cache CLI extractor paths across Actions steps by mario-campos · Pull Request #3950 · github/codeql-action

mario-campos · 2026-06-04T13:53:53Z

Similar to #3943, this PR caches the output of codeql resolve languages, which contains the paths to the various extractors so that repeated calls to resolveLanguages() are idempotent. Additionally, re-implement resolveExtractor() as a wrapper over resolveLanguages() (to re-use the cached output) rather than shell out to codeql resolve extractor.

In one experiment, I counted seven instances of shelling out to codeql resolve extractor. When you dig into the code, you can see why: resolveExtractor() is not called often or from many places; But one caller is isTracedLanguage(), which is wrapped by isScannedLanguage(). And these functions are often used in a loop/map over all/some languages. This can explain why we see consecutive executions of codeql resolve extractor.

Risk assessment

For internal use only. Please select the risk level of this change:

Low risk: Changes are fully under feature flags, or have been fully tested and validated in pre-production environments and are highly observable, or are documentation or test only.

Which use cases does this change impact?

Workflow types:

Advanced setup - Impacts users who have custom CodeQL workflows.
Managed - Impacts users with dynamic workflows (Default Setup, Code Quality, ...).

Products:

Code Scanning - The changes impact analyses when analysis-kinds: code-scanning.
Code Quality - The changes impact analyses when analysis-kinds: code-quality.
Other first-party - The changes impact other first-party analyses.
Third-party analyses - The changes affect the upload-sarif action.

Environments:

Dotcom - Impacts CodeQL workflows on github.com and/or GitHub Enterprise Cloud with Data Residency.
GHES - Impacts CodeQL workflows on GitHub Enterprise Server.
Testing/None - This change does not impact any CodeQL workflows in production.

How did/will you validate this change?

Unit tests - I am depending on unit test coverage (i.e. tests in .test.ts files).
End-to-end tests - I am depending on PR checks (i.e. tests in pr-checks).
Other - Manual/local testing

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

Feature flags - All new or changed code paths can be fully disabled with corresponding feature flags.
Rollback - Change can only be disabled by rolling back the release or releasing a new version with a fix.
Development/testing only - This change cannot cause any failures in production.
Other - Please provide details.

How will you know if something goes wrong after this change is released?

Telemetry - I rely on existing telemetry or have made changes to the telemetry.
- Dashboards - I will watch relevant dashboards for issues after the release. Consider whether this requires this change to be released at a particular time rather than as part of a regular release.
- Alerts - New or existing monitors will trip if something goes wrong with this change.
Other - Please provide details.

Are there any special considerations for merging or releasing this change?

No special considerations - This change can be merged at any time.
Special considerations - This change should only be merged once certain preconditions are met. Please provide details of those or link to this PR from an internal issue.

Merge / deployment checklist

Confirm this change is backwards compatible with existing workflows.
Consider adding a changelog entry for this change.
Confirm the readme and docs have been updated if necessary.

Repeated calls to `resolveLanguages()` will only pay the performance penalty of executing `codeql resolve languages` once.

By wrapping `resolveLanguages()`, which is memoized, we can avoid executing `codeql resolve extractor` several times over the course of an analysis.

henrymercer

Caching these invocations makes a lot of sense! I have a high level comment and a couple of lower level comments.

The main point is that now that we're caching multiple invocations, it might be a good opportunity to generalise the design. For instance, you could imagine something like:

const versionCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_VERSION_INFO, validate: isVersionInfo });
const resolveLanguagesCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_RESOLVE_LANGUAGES, validate: isResolveLanguagesOutput });

where createPersistedCliCache handles memoising in the Action and persisting between Actions steps with an environment variable.

Some smaller things:

Ideally the cache entry would also depend on getExtraOptionsFromEnv(["resolve","languages"])
We should remove the cache in testing-utils.ts like we do for the CodeQL version cache

henrymercer · 2026-06-05T11:19:28Z

+      // This can be a bit slow due to the JVM startup cost. Instead, get
+      // the extractor path from resolveLanguages(), which caches its output.
+      const extractors = await this.resolveLanguages();
+      return extractors[language][0];


Does this correctly handle language aliases? We do currently normalise languages to their original names, but it would be good to at least document this change of behaviour if not resolve aliases.

Also, we should make sure that extractors[language] is defined and if not, error nicely.

mario-campos added 2 commits June 4, 2026 08:39

Cache the output of codeql resolve languages

d3127b7

Repeated calls to `resolveLanguages()` will only pay the performance penalty of executing `codeql resolve languages` once.

Reimplement resolveExtractor() as wrapper over resolveLanguages()

e34c66b

By wrapping `resolveLanguages()`, which is memoized, we can avoid executing `codeql resolve extractor` several times over the course of an analysis.

github-actions Bot added the size/S Should be easy to review label Jun 4, 2026

henrymercer reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache CLI extractor paths across Actions steps#3950

Cache CLI extractor paths across Actions steps#3950
mario-campos wants to merge 2 commits into
mainfrom
mario-campos/cache-cli-resolve-langs

mario-campos commented Jun 4, 2026

Uh oh!

henrymercer left a comment

Uh oh!

henrymercer Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mario-campos commented Jun 4, 2026

Risk assessment

Which use cases does this change impact?

How did/will you validate this change?

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

How will you know if something goes wrong after this change is released?

Are there any special considerations for merging or releasing this change?

Merge / deployment checklist

Uh oh!

henrymercer left a comment

Choose a reason for hiding this comment

Uh oh!

henrymercer Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants