-
Notifications
You must be signed in to change notification settings - Fork 740
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
skip test if TE is not compiled with cusolver
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3096
opened Jun 5, 2026 by
francesco-bertolotti
Contributor
Loading…
6 of 13 tasks
increased a bit tolerance for pytorch/distributed/run_numerics.py
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3095
opened Jun 5, 2026 by
francesco-bertolotti
Contributor
Loading…
6 of 13 tasks
Test failing from .resolve() when TE is installend in a venv
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3094
opened Jun 5, 2026 by
francesco-bertolotti
Contributor
Loading…
6 of 13 tasks
NVFP4: cache GEMM-swizzled weight scale factors across micro-batches
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3093
opened Jun 5, 2026 by
cael-ling
Contributor
Loading…
3 of 13 tasks
Added thd cudnn guard
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3092
opened Jun 5, 2026 by
francesco-bertolotti
Contributor
Loading…
6 of 13 tasks
guarding max_logits fused attention for cudnn < 9.21.0
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3091
opened Jun 5, 2026 by
francesco-bertolotti
Contributor
Loading…
6 of 13 tasks
Make NVTE tensor handle pool size configurable
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
[PyTorch] Fix wrong stream capture for cuteDSL delayed wgrad GEMM
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3089
opened Jun 5, 2026 by
Wohox
Contributor
Loading…
4 tasks done
fix(topk): fix UB and prevent vector load splitting in standalone_topk
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3088
opened Jun 5, 2026 by
solos
Loading…
5 of 13 tasks
[JAX] Extend tensor inspect utility to dump out tensors in identifiable names
#3086
opened Jun 4, 2026 by
tdophung
Collaborator
Loading…
6 of 13 tasks
[JAX] Fix norm workspace on global shapes
#3085
opened Jun 4, 2026 by
jberchtold-nvidia
Collaborator
•
Draft
8 of 13 tasks
[JAX] Hopper BF16 grouped GEMM v2 support
#3083
opened Jun 4, 2026 by
jberchtold-nvidia
Collaborator
•
Draft
8 of 13 tasks
[PyTorch] Add joint forward-backward op fusion pass
enhancement
New feature or request
#3080
opened Jun 4, 2026 by
timmoon10
Member
Loading…
8 of 13 tasks
[Pytorch] Add variable-K Cutlass GroupGEMM for fine-grained MoE wgrad
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3069
opened Jun 1, 2026 by
cassiewilliam
Contributor
Loading…
6 of 8 tasks
Optimize NVFP4 4over6 candidate error path
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3068
opened Jun 1, 2026 by
zianglih
Contributor
Loading…
9 of 13 tasks
[PyTorch] Propagate skip_fp8_weight_update in GroupedLinear during FP8 CUDA graph capture
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3065
opened May 31, 2026 by
LeSingh1
Contributor
Loading…
fix unfused padding causal sdpa
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3063
opened May 31, 2026 by
hungryGeek16
Loading…
[JAX] Grouped quant+GEMM custom partitioning rules
#3058
opened May 28, 2026 by
jberchtold-nvidia
Collaborator
Loading…
8 of 13 tasks
[Common/PyTorch] bugfix: Token-linear fused RoPE impl. for THD tensors.
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3057
opened May 28, 2026 by
plugyawn
Loading…
7 of 13 tasks
[JAX] [PyT] [Common] Enable D=256 BWD cuDNN fused attn for Blackwell CC 10.x
#3056
opened May 28, 2026 by
KshitijLakhani
Collaborator
Loading…
7 of 13 tasks
[PyTorch] Integrate the cuBLAS single GEMM MXFP8 NN, NT support for sm120
#3050
opened May 28, 2026 by
KshitijLakhani
Collaborator
•
Draft
7 of 13 tasks
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.