-
Notifications
You must be signed in to change notification settings - Fork 19.2k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
mtmd : add Apple CoreML backend for vision encoding
examples
python
python script changes
#24163
opened Jun 5, 2026 by
tc-mb
Contributor
Loading…
opencl: improve get_rows, cpy, concat and q6_k flat gemv
ggml
changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
ui: fix mobile chat form overflow and bust stale bundle cache
examples
server/ui
#24158
opened Jun 5, 2026 by
ServeurpersoCom
Contributor
Loading…
FIX #16761 - model-loader : add --reclaim-mmap-source to drop dormant mmap pages
#24156
opened Jun 5, 2026 by
markkobo
Loading…
server : return HTTP 400 on invalid grammar (#24144)
examples
python
python script changes
server
#24154
opened Jun 5, 2026 by
Anuj-Attri
Loading…
Sycl --split-mode tensor
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#24152
opened Jun 5, 2026 by
Spruill-1
Loading…
server: add "schema" and validation
examples
server
#24150
opened Jun 4, 2026 by
ngxson
Contributor
Loading…
ui: add ignore-scripts=true to npmrc
examples
server/ui
#24149
opened Jun 4, 2026 by
ngxson
Contributor
Loading…
Fix/server prompt cache no consume on load
examples
python
python script changes
server
#24143
opened Jun 4, 2026 by
alainnothere
Loading…
[PoC] server: support requantizing kv cache
examples
server
#24134
opened Jun 4, 2026 by
wadealexc
Loading…
HIP: add gfx1152 and gfx1153 to RDNA3.5
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#24129
opened Jun 4, 2026 by
harkgill-amd
Loading…
CUDA: refactor MMQ kernel configuration
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#24127
opened Jun 4, 2026 by
JohannesGaessler
Contributor
Loading…
Add ctx-per-slot argument for unified KV cache
examples
server
#24124
opened Jun 4, 2026 by
bartowski1182
Contributor
Loading…
vulkan: add changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
v_dot2_f32_f16 support in matrix-matrix multiplication and Flash Attention
ggml
#24123
opened Jun 4, 2026 by
0cc4m
Contributor
Loading…
server: allow missing/null content key in OpenAI Responses API
examples
server
#24121
opened Jun 4, 2026 by
Mrfence97
Loading…
server: add -pp parameter to force enable/disable pipeline parallelism
#24095
opened Jun 4, 2026 by
dark-penguin
Loading…
fix: don't build AMX by default with Apple clang
ggml
changes relating to the ggml tensor library for machine learning
#24094
opened Jun 3, 2026 by
banksio
Loading…
server : guard chat-template thinking probe against apply-time jinja errors
examples
server
#24093
opened Jun 3, 2026 by
palios-taey
Loading…
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.