Cache CLI extractor paths across Actions steps#3950
Conversation
Repeated calls to `resolveLanguages()` will only pay the performance penalty of executing `codeql resolve languages` once.
By wrapping `resolveLanguages()`, which is memoized, we can avoid executing `codeql resolve extractor` several times over the course of an analysis.
henrymercer
left a comment
There was a problem hiding this comment.
Caching these invocations makes a lot of sense! I have a high level comment and a couple of lower level comments.
The main point is that now that we're caching multiple invocations, it might be a good opportunity to generalise the design. For instance, you could imagine something like:
const versionCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_VERSION_INFO, validate: isVersionInfo });
const resolveLanguagesCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_RESOLVE_LANGUAGES, validate: isResolveLanguagesOutput });where createPersistedCliCache handles memoising in the Action and persisting between Actions steps with an environment variable.
Some smaller things:
- Ideally the cache entry would also depend on
getExtraOptionsFromEnv(["resolve","languages"]) - We should remove the cache in
testing-utils.tslike we do for the CodeQL version cache
| // This can be a bit slow due to the JVM startup cost. Instead, get | ||
| // the extractor path from resolveLanguages(), which caches its output. | ||
| const extractors = await this.resolveLanguages(); | ||
| return extractors[language][0]; |
There was a problem hiding this comment.
Does this correctly handle language aliases? We do currently normalise languages to their original names, but it would be good to at least document this change of behaviour if not resolve aliases.
Also, we should make sure that extractors[language] is defined and if not, error nicely.
Similar to #3943, this PR caches the output of
codeql resolve languages, which contains the paths to the various extractors so that repeated calls toresolveLanguages()are idempotent. Additionally, re-implementresolveExtractor()as a wrapper overresolveLanguages()(to re-use the cached output) rather than shell out tocodeql resolve extractor.In one experiment, I counted seven instances of shelling out to
codeql resolve extractor. When you dig into the code, you can see why:resolveExtractor()is not called often or from many places; But one caller isisTracedLanguage(), which is wrapped byisScannedLanguage(). And these functions are often used in a loop/map over all/some languages. This can explain why we see consecutive executions ofcodeql resolve extractor.Risk assessment
For internal use only. Please select the risk level of this change:
Which use cases does this change impact?
Workflow types:
dynamicworkflows (Default Setup, Code Quality, ...).Products:
analysis-kinds: code-scanning.analysis-kinds: code-quality.upload-sarifaction.Environments:
github.comand/or GitHub Enterprise Cloud with Data Residency.How did/will you validate this change?
.test.tsfiles).pr-checks).If something goes wrong after this change is released, what are the mitigation and rollback strategies?
How will you know if something goes wrong after this change is released?
Are there any special considerations for merging or releasing this change?
Merge / deployment checklist