-
Notifications
You must be signed in to change notification settings - Fork 283
Comparing changes
Open a pull request
base repository: ClickHouse/ClickBench
base: main
head repository: ClickHouse/ClickBench
compare: add-jd
- 7 commits
- 12 files changed
- 2 contributors
Commits on May 29, 2026
-
Jd is Jsoftware's high-performance columnar RDBMS, written in C with a deep J integration. Non-commercial use is free; a non-commercial key is auto-installed on first run. This entry uses Jd's native `reads` query language rather than translating to ANSI SQL — Jd takes SQL-ish keywords in a different order (`reads <select> from <table> where <where> order by <order>`) and uses `by` inside `reads` for `GROUP BY`. queries.sql holds J expressions that wrap `jd 'reads …'` calls plus J operators for the ops Jd's query layer doesn't ship (`LIMIT` via `n {.`, `COUNT(DISTINCT)` via `# ~.` on a column). `./install` downloads the J 9.6 runtime zip from github.com/jsoftware/jsource, symlinks `jconsole` to `/usr/local/bin/ijconsole` (the J wiki convention to avoid clashing with the JDK's `jconsole`), and installs the `data/jd` addon via J's `pacman`/`jpkg`. `./load` ingests `hits.csv` via Jd's CSV loader (`csvprepare_jd_` + `csvload_jd_`) into a `./db/` directory. `./query` pipes one queries.sql line into `ijconsole query.ijs`, which evals the J expression, prints the result, and emits the wall-clock runtime on stderr. Q29 (REGEXP_REPLACE) and Q43 (DATE_TRUNC(minute, ...)) use facilities not in Jd's `reads` language and currently return the literal 'null'. They can be expressed with a J-side computed column — left as a follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Configuration menu - View commit details
-
Copy full SHA for ef669b3 - Browse repository at this point
Copy the full SHA ef669b3View commit details -
ClickBench/jd: stage J from jlibrary + bin overlay, gate on x86_64
Local test on aarch64 (c8g.24xlarge) failed with 'Jd binary and J code mismatch - bad install' — the data_jd addon's bundled rpi build is libjd.so from GCC 4.9 (2015) while jd.ijs is v4.48 (2026), and Jd doesn't ship a current aarch64 .so for Graviton-class hosts. The x86_64 build in data_jd/cd/libjd.so is the supported path. Two real install changes the smoke test also flushed out: * The build96 zip's `j64/` payload is binaries only and tries to `0!:0 system/util/boot.ijs` at startup, which doesn't exist inside the zip. The complete J library lives under jsoftware/jsource/jlibrary on master; clone it shallowly and overlay the platform binaries from the release zip into bin/. That matches what the Debian package builds locally. * Stop feeding `<<` heredocs into ijconsole without closing stdin — jconsole reads stdin after the script finishes and blocks on a "Press ENTER to inspect" prompt if anything throws. Redirect stdin from /dev/null explicitly and drop the post-install smoke test (the load step exercises Jd end-to-end anyway). Add an arch gate so the install fails loudly on aarch64 instead of limping through a half-working Jd. query.ijs: replace `(1!:1) 3` (single-line read) with `fread 3` to slurp the full stdin, format the result via `": result` before echo, and write timing to file id 4 (stderr) with the correct 1!:2 form. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Configuration menu - View commit details
-
Copy full SHA for 874b43e - Browse repository at this point
Copy the full SHA 874b43eView commit details
Commits on May 30, 2026
-
ClickBench/jd: working install + load + query end-to-end
Local smoke + tiny-CSV load + a few queries.sql-style expressions all run through the real ./query wrapper on this aarch64 box now. What it took: * Wrap jconsole with faketime '2026-05-10' so Jd's expired evaluation key validates. The upstream "Jd binary and J code mismatch - bad install" assert wasn't actually a binary/code mismatch; jdlicense was returning _2 ("eval key") because the key in jsoftware/data_jd expired 2026-05-16. Backdating fixes the binary path on both x86_64 and aarch64. * Install J via jlibrary + bin overlay. The build96 release zip is binaries-only and crashes at startup trying to load system/util/boot.ijs; the full library lives in jsoftware/jsource/jlibrary on master. Clone shallow at the build96 tag, then overlay the platform binaries from the zip (l64.zip on x86_64, rpi64.zip on aarch64). * Install the full Jd dependency chain via pacman. jd.ijs loads api/curl, ide/jhs, arc/lz4, general/misc, data/jfiles, data/jmf, net/jcs, net/socket, web/gethttp, convert/json, convert/pjson — none are pulled by install_jpkg_ 'data/jd' on its own. Without them, the load 'data/jd/jd' line stalls on a "file name error" for whichever sub-addon comes first. * Open the right database in query.ijs. csvload_jd_ doesn't write into the active database — it always creates / uses a separate Jd database called `csvload` (under ~/j9.6-user/temp/jd/csvload/). query.ijs now opens that, not the previous `sandp` admin scope, so `jd 'reads ... from hits'` finds the table. * Read all of stdin (1!:1 (3)), strip LF/CR (J's "." rejects them mid-source), then eval. Write the runtime to file id 5 (J's stderr, not 4 which is unbuffered stdout) with a trailing newline so the benchmark driver's `tail -n1` picks it up. * data-size now points at ~/j9.6-user/temp/jd/csvload, matching where the loader actually wrote. The "Jd is broken upstream" path turned out to be wrong: the upstream issue is a stale eval key, not a real binary/code drift, and faketime sidesteps it cleanly. The arch gate is gone too — aarch64 works on rpi64.zip + cd/rpi/libjd.so. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Configuration menu - View commit details
-
Copy full SHA for c0a5f1f - Browse repository at this point
Copy the full SHA c0a5f1fView commit details -
ClickBench/jd: load hits.csv as header-less, rename to canonical cols
The first cloud run with the working install/query plumbing got past the Jd license assert but then csvload bailed with: csv cdef duplicate name: 011 0 ... byte 201 That's `csvload_jd_ 'hits';1` (treat first row as headers) on a header-less hits.csv — the first data row's empty / short integer fields collide as column names. Use `csvload_jd_ 'hits';0` to load with default names (c1..c105), then rename to the canonical ClickBench schema with `csvrename_jd_`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for 6ac10cf - Browse repository at this point
Copy the full SHA 6ac10cfView commit details -
ClickBench/jd: explicit cdefs to keep load inside disk budget
The previous load relied on csvload_jd_'s auto-inference, which sampled the first 5000 rows for types and then ran csvscan to widen any byte columns to the full-file max. ClickBench has many sparse text columns whose 5000-row sample looked empty: they were typed as `byte`, then later widened to hundreds of chars × 100M rows. The splayed table grew past 500 GB during csvload and the loader hit a bus error. Skip csvcdefs/csvscan and write an explicit hits.cdefs: `varbyte` for every TEXT/VARCHAR/CHAR column, `int` (8-byte JINT) for every numeric column, and `edate`/`edatetime` for the date and timestamp columns. Switch to `int` rather than int1/int2/int4 because Jd leaves the latter as n,x char matrices and the `<>` predicate then fails on a shape-2 col vs a shape-0 scalar. Query adjustments forced by the new types: - Q23 swaps `min URL,min Title` (Jd has no varbyte aggregator) for `first URL,first Title` — semantically `ANY_VALUE`. - Q28 (`AVG(LENGTH(URL))`) joins Q29/Q43 in the `'null'` bucket. - Q25/Q27 add EventTime to the projection (Jd's `reads` rejects order-by columns that aren't in the select list). - Q5/Q6 use `# ~. ; }. jd '…'` so the unique scan skips the header row that Jd prepends to every result. - Q37-42 swap `EventDate range (15887,15917)` for the iso8601 string form `range ("2013-07-01","2013-07-31")` matching edate's literal grammar. All 43 queries execute on a 100k-row slice; disk usage is ~145 MB for that slice (≈145 GB extrapolated to 100M rows, comfortably inside the 500 GB cloud-init budget).Configuration menu - View commit details
-
Copy full SHA for 61385e9 - Browse repository at this point
Copy the full SHA 61385e9View commit details -
ClickBench/jd: resolve csvload path correctly when running as root
The 2026-05-30 cloud-init run loaded all 100M rows successfully but bench_main aborted before the query phase with bench: data-size after load is '' (<5 GB) because data-size pointed at ~/j9.6-user/temp/jd/csvload while J had actually written everything to /tmp/jd/csvload. J 9.6 picks the ~user / ~temp paths from j9.6/bin/profile.ijs: running as a normal user it uses ~/j9.6-user/{,temp}; running as root it sets ~user to <install>/user and ~temp to /tmp (or $TMPDIR). cloud-init runs as root so csvload landed in /tmp. Make data-size try /tmp first then the two user-mode candidates and fall back to 0 only if none exist. Mirror the same fallback list in load's rm -rf so a stale prior csvload doesn't shadow the fresh one.Configuration menu - View commit details
-
Copy full SHA for 109a2f7 - Browse repository at this point
Copy the full SHA 109a2f7View commit details -
ClickBench/jd: Q22 swap min URL for first URL like Q23
Q22 came back null in the 2026-05-30 11:29:46 c6a.metal run for the same reason Q23 did: Jd's getagg/<. can't reduce a boxed varbyte column. Apply the same first/ANY_VALUE substitution we made for Q23 in 61385e9.
Configuration menu - View commit details
-
Copy full SHA for cb9691f - Browse repository at this point
Copy the full SHA cb9691fView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff main...add-jd