Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

skip test if TE is not compiled with cusolver community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3096 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
increased a bit tolerance for pytorch/distributed/run_numerics.py community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3095 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
Test failing from .resolve() when TE is installend in a venv community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3094 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
NVFP4: cache GEMM-swizzled weight scale factors across micro-batches community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3093 opened Jun 5, 2026 by cael-ling Contributor Loading…
3 of 13 tasks
Added thd cudnn guard community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3092 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
guarding max_logits fused attention for cudnn < 9.21.0 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3091 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
Make NVTE tensor handle pool size configurable community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3090 opened Jun 5, 2026 by lhb8125 Contributor Draft
[PyTorch] Fix wrong stream capture for cuteDSL delayed wgrad GEMM community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3089 opened Jun 5, 2026 by Wohox Contributor Loading…
4 tasks done
fix(topk): fix UB and prevent vector load splitting in standalone_topk community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3088 opened Jun 5, 2026 by solos Loading…
5 of 13 tasks
[JAX] Extend tensor inspect utility to dump out tensors in identifiable names
#3086 opened Jun 4, 2026 by tdophung Collaborator Loading…
6 of 13 tasks
[JAX] Fix norm workspace on global shapes
#3085 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
8 of 13 tasks
[JAX] MoEBlock tutorial
#3084 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
13 tasks
[JAX] Hopper BF16 grouped GEMM v2 support
#3083 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
8 of 13 tasks
add attention docs
#3081 opened Jun 4, 2026 by sudhakarsingh27 Member Draft
13 tasks
[PyTorch] Add joint forward-backward op fusion pass enhancement New feature or request
#3080 opened Jun 4, 2026 by timmoon10 Member Loading…
8 of 13 tasks
[Common] Pack attention arguments as structs
#3079 opened Jun 3, 2026 by cyanguwa Collaborator Draft
13 tasks
[Pytorch] Add variable-K Cutlass GroupGEMM for fine-grained MoE wgrad community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3069 opened Jun 1, 2026 by cassiewilliam Contributor Loading…
6 of 8 tasks
Optimize NVFP4 4over6 candidate error path community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3068 opened Jun 1, 2026 by zianglih Contributor Loading…
9 of 13 tasks
[PyTorch] Propagate skip_fp8_weight_update in GroupedLinear during FP8 CUDA graph capture community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3065 opened May 31, 2026 by LeSingh1 Contributor Loading…
fix unfused padding causal sdpa community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3063 opened May 31, 2026 by hungryGeek16 Loading…
[JAX] Grouped quant+GEMM custom partitioning rules
#3058 opened May 28, 2026 by jberchtold-nvidia Collaborator Loading…
8 of 13 tasks
[Common/PyTorch] bugfix: Token-linear fused RoPE impl. for THD tensors. community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3057 opened May 28, 2026 by plugyawn Loading…
7 of 13 tasks
[JAX] [PyT] [Common] Enable D=256 BWD cuDNN fused attn for Blackwell CC 10.x
#3056 opened May 28, 2026 by KshitijLakhani Collaborator Loading…
7 of 13 tasks
[PyTorch] [torch.compile] torch.compile support for Linear
#3053 opened May 28, 2026 by pggPL Collaborator Draft
13 tasks
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.