Fix safe-outputs YAML corruption when OTLP header masking is enabled by Copilot · Pull Request #37068 · github/gh-aw

Copilot · 2026-06-05T05:53:56Z

observability.otlp.headers could produce invalid workflow YAML in safe_outputs when GitHub App token minting is also present. The app-token step insertion could land inside another step's multi-line block, causing run: content to be emitted at with: indentation.

Problem scope
- Header-triggered OTLP masking adds setup-time steps in safe_outputs.
- Attribute-triggered OTLP masking (GH_AW_OTLP_ATTRIBUTES) similarly shifts the insertion point.
- Existing app-token insertion index logic did not reliably align to step boundaries under these compositions.
Compiler changes
- Updated safe_outputs app-token insertion index calculation to account for OTLP mask step injection, deriving the offset dynamically from the actual helper output (strings.Count) so the arithmetic stays self-consistent if either helper grows to emit more than one step.
- Added a step-boundary realignment guard before insertion:
  - If computed index points into the middle of a serialized step block, advance to the next - name: boundary.
- Added a warning log when the boundary-scan loop exhausts the steps slice, making miscalculations visible in production logs.
- Result: token step insertion no longer fragments neighboring YAML blocks.
Regression coverage
- Added focused regression test for safe-outputs.github-app + OTLP headers path (OTEL_EXPORTER_OTLP_HEADERS).
- Added regression test for the GH_AW_OTLP_ATTRIBUTES-only path.
- Added regression test for the combined path where both env vars are present (highest-risk composition — insertIndex incremented twice).
- All three tests assert the setup-agent-output run block remains intact and is not relocated under a with: block.

for insertIndex < len(steps) && !strings.HasPrefix(steps[insertIndex], stepNameLinePrefix) {
	insertIndex++
}
if insertIndex == len(steps) {
	consolidatedSafeOutputsJobLog.Printf(
		"WARN: app-token insertion reached end of steps slice (len=%d); step ordering may be incorrect",
		len(steps),
	)
}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

This PR fixes a workflow YAML corruption issue in the consolidated safe_outputs job when both OTLP header masking steps and GitHub App token minting are enabled. The corruption was caused by inserting the app-token step at a slice index that could fall inside another step’s multi-line block, producing invalid indentation (e.g., run: content ending up under a with: block).

Changes:

Adjusts the GitHub App token insertion index calculation to account for OTLP header/attribute masking step injection.
Adds a step-boundary realignment guard to ensure insertion occurs at the next canonical step start line.
Adds a regression test covering the GitHub App + OTLP headers composition and asserting the setup-agent-output-env run: | block remains intact.

Show a summary per file

File	Description
`pkg/workflow/compiler_safe_outputs_job.go`	Updates app-token insertion index logic to include OTLP mask steps and realigns insertion to step boundaries to prevent YAML block fragmentation.
`pkg/workflow/compiler_safe_outputs_job_test.go`	Adds a focused regression test that reproduces the OTLP-headers masking + GitHub App insertion scenario and checks for the prior YAML corruption pattern.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 0

github-actions · 2026-06-05T07:20:22Z

@copilot review all comments and address unresolved review feedback.

Generated by 👨‍🍳 PR Sous Chef · 78.8 AIC · ◷

github-actions · 2026-06-05T07:20:23Z

@copilot refresh the branch and rerun checks, then summarize any remaining blockers.

Generated by 👨‍🍳 PR Sous Chef · 78.8 AIC · ◷

github-actions · 2026-06-05T10:58:53Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-06-05T10:58:54Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions · 2026-06-05T10:59:03Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #37068 does not have the 'implementation' label (has_implementation_label=false) and has 63 new lines of code in business logic directories, which is at or below the 100-line threshold (requires_adr_by_default_volume=false).

github-actions · 2026-06-05T10:59:34Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions

Skills-Based Review 🧠

Applied /diagnose and /tdd — commenting with non-blocking suggestions to improve test coverage and make the index arithmetic more resilient.

📋 Key Themes & Highlights

Key Themes

Test coverage gaps: The regression test covers OTEL_EXPORTER_OTLP_HEADERS but not GH_AW_OTLP_ATTRIBUTES, and no test exercises both simultaneously.
Fragile offset arithmetic: insertIndex++ hardcodes a step-count assumption that the while-loop guard compensates for but does not remove.
Silent failure mode: The boundary-scan loop has no log or error when it exhausts the slice, making a miscalculation invisible in production.

Positive Highlights

✅ Two-layer defence: explicit offset increments + a while-loop boundary guard is a well-structured approach — the guard provides a real safety net even when the arithmetic drifts.
✅ Focused regression test: TestGitHubAppTokenStepWithOTLPHeaders targets the exact corruption signature (with:\n echo ...) rather than a generic golden-file comparison. Easy to understand and maintain.
✅ stepNameLinePrefix as a named constant: elevating the sentinel to a package constant with a doc-comment is the right call; it makes the intent clear and is a single point of change.
✅ Clear PR description: root cause, fix strategy, and regression coverage are all explained concisely. Great template for future compiler bug fixes.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.7M · 245.9 AIC

github-actions · 2026-06-05T11:07:16Z

+	workflowData := &WorkflowData{
+		Name: "Test Workflow",
+		Env: `"env":
+  OTEL_EXPORTER_OTLP_HEADERS: "Authorization=******"`,


[/tdd] The regression test only exercises the isOTLPHeadersPresent path (OTEL_EXPORTER_OTLP_HEADERS). The isOTLPAttributesPresent path (GH_AW_OTLP_ATTRIBUTES) has no coverage — but the fix also increments insertIndex for it.

💡 Suggested addition

Add a second test (or extend this one) with GH_AW_OTLP_ATTRIBUTES set in Env, repeating the same ordering assertions. Then add a third scenario with both env vars set to confirm insertIndex is incremented twice correctly.

Env: `"env": OTEL_EXPORTER_OTLP_HEADERS: "Authorization=******" GH_AW_OTLP_ATTRIBUTES: "{\"service.name\":\"test\"}"`

Without this, a regression in the attributes path would go undetected.

github-actions · 2026-06-05T11:07:16Z

+	assert.Less(t, setupEnvEchoIndex, tokenIndex, "app token step must be inserted after setup-agent-output-env run block")
+	// Regression guard: app token insertion must not split setup-agent-output-env and leave
+	// an `echo ...` run line nested under a previous `with:` block.
+	assert.NotContains(t, stepsContent, "with:\n          echo \"GH_AW_AGENT_OUTPUT=/tmp/gh-aw/agent_output.json\" >> \"$GITHUB_OUTPUT\"")


[/tdd] No test exercises the combined scenario where both OTEL_EXPORTER_OTLP_HEADERS and GH_AW_OTLP_ATTRIBUTES are present at the same time. That is the highest-risk composition — insertIndex is incremented twice — yet it has no regression guard.

💡 Why this matters

If one of the two insertIndex++ increments were accidentally removed or a mask step were to emit more than one entry, the combined path would silently corrupt YAML in a way that the single-variable tests would not detect. A dedicated combined test would catch that class of regression immediately.

github-actions · 2026-06-05T11:07:16Z

+		if isOTLPHeadersPresent(data) {
+			insertIndex++
+		}
+		if isOTLPAttributesPresent(data) {
+			insertIndex++
+		}


[/diagnose] The insertIndex++ increments are load-bearing: each assumes the corresponding mask helper appends exactly one string entry to the steps slice. The comment documents this ("Each mask helper currently appends one string entry"), but if generateOTLPHeadersMaskStep() or generateOTLPAttributesMaskStep() ever grow to emit two entries (e.g. an added env-echo step), the arithmetic silently undercounts and the while-loop guard must compensate.

💡 More robust alternative

The offset could be derived from the actual number of entries appended, keeping the arithmetic self-consistent regardless of how many lines each helper emits:

if isOTLPHeadersPresent(data) { mask := generateOTLPHeadersMaskStep() // len == 1 today, but this survives future growth insertIndex += strings.Count(mask, " - name: ") }

Or expose a step-count constant alongside each generator. Either approach removes the silent undercounting risk.

github-actions · 2026-06-05T11:07:16Z

+		for insertIndex < len(steps) && !strings.HasPrefix(steps[insertIndex], stepNameLinePrefix) {
+			insertIndex++
+		}


[/diagnose] If the loop exhausts steps without finding a stepNameLinePrefix line, insertIndex equals len(steps) and the app-token block is silently appended at the very end — after checkout and safe-output steps that reference steps.safe-outputs-app-token.outputs.token. The YAML would be structurally valid but semantically broken, and there is no log message or error to surface it.

💡 Suggested guard

A short diagnostic log after the loop would make this case visible:

for insertIndex < len(steps) && !strings.HasPrefix(steps[insertIndex], stepNameLinePrefix) { insertIndex++ } if insertIndex == len(steps) { consolidatedSafeOutputsJobLog.Printf( "WARN: app-token insertion reached end of steps slice (len=%d); step ordering may be incorrect", len(steps), ) }

This is a rare edge case today, but becomes more likely if future step sequences are ever added that push the final checkout step past the end of the current emission order.

github-actions · 2026-06-05T11:10:50Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 80/100 — Excellent

Analyzed 1 test: 1 design test (behavioral contract), 0 implementation tests, 1 minor guideline note (missing assertion message on assert.NotContains).

📊 Metrics & Test Classification (1 test analyzed)

Metric	Value
New/modified tests analyzed	1
✅ Design tests (behavioral contracts)	1 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	1 (100%)
Duplicate test clusters	0
Test inflation detected	Yes — 44 test lines / 19 production lines ≈ 2.3:1 (attributable to necessary WorkflowData setup verbosity)
🚨 Coding-guideline violations	1 — missing assertion message on `assert.NotContains` (minor)

Test Classification Details

Test	File	Classification	Issues Detected
`TestGitHubAppTokenStepWithOTLPHeaders`	`pkg/workflow/compiler_safe_outputs_job_test.go:816`	✅ Design	Minor: 1 assertion missing a message argument

Language Support

Tests analyzed:

🐹 Go (*_test.go): 1 test — unit (//go:build !integration)
🟨 JavaScript (*.test.cjs, *.test.js): 0 tests

⚠️ Flagged Tests — Minor Issue (1)

⚠️ `TestGitHubAppTokenStepWithOTLPHeaders` — missing assertion message

Classification: ✅ Design test
Issue: The final assert.NotContains regression guard omits a trailing message argument, making test failure output less informative.

// Current — missing message:
assert.NotContains(t, stepsContent,
    "with:\n          echo \"GH_AW_AGENT_OUTPUT=/tmp/gh-aw/agent_output.json\" >> \"$GITHUB_OUTPUT\"")

// Suggested fix:
assert.NotContains(t, stepsContent,
    "with:\n          echo \"GH_AW_AGENT_OUTPUT=/tmp/gh-aw/agent_output.json\" >> \"$GITHUB_OUTPUT\"",
    "app token step must not split setup-agent-output-env into a nested echo under with:")

This is a minor style issue — the test itself is high-quality and correctly guards the YAML-corruption regression.

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). TestGitHubAppTokenStepWithOTLPHeaders is a well-structured regression guard for an observable compiler-output contract: it verifies that steps in the generated YAML remain in the correct order when OTLP header masking is active, and that the app-token step is not inserted mid-block.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §26998808540

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 2.5M · 457.4 AIC · ◷

github-actions

Test Quality Sentinel: 80/100. Test quality is acceptable. 0% of new tests are implementation tests (threshold: 30%). The new test TestGitHubAppTokenStepWithOTLPHeaders is a well-structured regression guard for an observable compiler-output contract. Minor note: the final assert.NotContains call is missing a descriptive message argument (see comment above).

github-actions · 2026-06-05T11:33:32Z

@copilot review all comments and address unresolved review feedback.

Generated by 👨‍🍳 PR Sous Chef · 42.2 AIC · ◷

github-actions · 2026-06-05T11:33:32Z

Please refresh the branch, rerun checks, and summarize any remaining blockers.

Generated by 👨‍🍳 PR Sous Chef · 42.2 AIC · ◷

pelikhan · 2026-06-05T11:40:47Z

@copilot apply copilot-revie skill

…ounting Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-06-05T11:53:05Z

@copilot apply copilot-revie skill

Applied the copilot-review skill and addressed all four open review threads. Changes in the latest commit:

Tests (PRRT_kwDOPc1QR86HV6as) — Added TestGitHubAppTokenStepWithOTLPAttributes covering the GH_AW_OTLP_ATTRIBUTES-only path.
Tests (PRRT_kwDOPc1QR86HV6az) — Added TestGitHubAppTokenStepWithOTLPHeadersAndAttributes for the combined OTEL_EXPORTER_OTLP_HEADERS + GH_AW_OTLP_ATTRIBUTES scenario (the path where insertIndex is incremented twice).
Robustness (PRRT_kwDOPc1QR86HV6a0) — Replaced the hardcoded insertIndex++ increments with strings.Count(generateOTLP*MaskStep(), stepNameLinePrefix) so the offset is derived from the actual helper output and stays self-consistent if either helper ever grows to emit more than one step.
Diagnostics (PRRT_kwDOPc1QR86HV6a1) — Added a consolidatedSafeOutputsJobLog.Printf("WARN: ...") after the boundary-scan loop to surface the exhaustion case.
Style — Added the missing descriptive message argument to assert.NotContains in TestGitHubAppTokenStepWithOTLPHeaders.

Initial plan

5071ce1

Copilot AI assigned Copilot and pelikhan Jun 5, 2026

Copilot started work on behalf of pelikhan June 5, 2026 05:54 View session

Copilot AI linked an issue Jun 5, 2026 that may be closed by this pull request

observability.otlp with headers produces invalid YAML in safe-outputs job (v0.74.3, v0.77.5) #37067

Open

Copilot AI and others added 3 commits June 5, 2026 06:05

Fix safe-outputs GitHub App insertion when OTLP masking is enabled

cdbaf26

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Harden safe-outputs token insertion boundary logic

1ecc9f8

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Clarify safe-outputs insertion regression guards

7cd1c84

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix invalid YAML generation in safe-outputs job with OTLP headers~~ Fix safe-outputs YAML corruption when OTLP header masking is enabled Jun 5, 2026

Copilot finished work on behalf of pelikhan June 5, 2026 06:08

Copilot AI requested a review from pelikhan June 5, 2026 06:08

pelikhan marked this pull request as ready for review June 5, 2026 06:13

Copilot AI review requested due to automatic review settings June 5, 2026 06:13

Copilot started reviewing on behalf of pelikhan June 5, 2026 06:13 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

This was referenced Jun 5, 2026

[aw] No-Op Runs #37018

Open

[Contribution Check Report] Contribution Check — 2026-06-05 #37036

Open

github-actions Bot reviewed Jun 5, 2026

View reviewed changes

github-actions Bot approved these changes Jun 5, 2026

View reviewed changes

Copilot started work on behalf of pelikhan June 5, 2026 11:40 View session

Address review feedback: add tests, warning log, and dynamic offset c…

4f9fa25

…ounting Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan June 5, 2026 11:53

Conversation

Copilot AI commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 5, 2026

🧪 Test Quality Sentinel Report

Test Classification Details

Language Support

⚠️ TestGitHubAppTokenStepWithOTLPHeaders — missing assertion message

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

pelikhan commented Jun 5, 2026

Uh oh!

Copilot AI commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading

⚠️ `TestGitHubAppTokenStepWithOTLPHeaders` — missing assertion message