Comparing changes

Jd is Jsoftware's high-performance columnar RDBMS, written in C with a deep J integration. Non-commercial use is free; a non-commercial key is auto-installed on first run. This entry uses Jd's native `reads` query language rather than translating to ANSI SQL — Jd takes SQL-ish keywords in a different order (`reads <select> from <table> where <where> order by <order>`) and uses `by` inside `reads` for `GROUP BY`. queries.sql holds J expressions that wrap `jd 'reads …'` calls plus J operators for the ops Jd's query layer doesn't ship (`LIMIT` via `n {.`, `COUNT(DISTINCT)` via `# ~.` on a column). `./install` downloads the J 9.6 runtime zip from github.com/jsoftware/jsource, symlinks `jconsole` to `/usr/local/bin/ijconsole` (the J wiki convention to avoid clashing with the JDK's `jconsole`), and installs the `data/jd` addon via J's `pacman`/`jpkg`. `./load` ingests `hits.csv` via Jd's CSV loader (`csvprepare_jd_` + `csvload_jd_`) into a `./db/` directory. `./query` pipes one queries.sql line into `ijconsole query.ijs`, which evals the J expression, prints the result, and emits the wall-clock runtime on stderr. Q29 (REGEXP_REPLACE) and Q43 (DATE_TRUNC(minute, ...)) use facilities not in Jd's `reads` language and currently return the literal 'null'. They can be expressed with a J-side computed column — left as a follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Local test on aarch64 (c8g.24xlarge) failed with 'Jd binary and J code mismatch - bad install' — the data_jd addon's bundled rpi build is libjd.so from GCC 4.9 (2015) while jd.ijs is v4.48 (2026), and Jd doesn't ship a current aarch64 .so for Graviton-class hosts. The x86_64 build in data_jd/cd/libjd.so is the supported path. Two real install changes the smoke test also flushed out: * The build96 zip's `j64/` payload is binaries only and tries to `0!:0 system/util/boot.ijs` at startup, which doesn't exist inside the zip. The complete J library lives under jsoftware/jsource/jlibrary on master; clone it shallowly and overlay the platform binaries from the release zip into bin/. That matches what the Debian package builds locally. * Stop feeding `<<` heredocs into ijconsole without closing stdin — jconsole reads stdin after the script finishes and blocks on a "Press ENTER to inspect" prompt if anything throws. Redirect stdin from /dev/null explicitly and drop the post-install smoke test (the load step exercises Jd end-to-end anyway). Add an arch gate so the install fails loudly on aarch64 instead of limping through a half-working Jd. query.ijs: replace `(1!:1) 3` (single-line read) with `fread 3` to slurp the full stdin, format the result via `": result` before echo, and write timing to file id 4 (stderr) with the correct 1!:2 form. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Local smoke + tiny-CSV load + a few queries.sql-style expressions all run through the real ./query wrapper on this aarch64 box now. What it took: * Wrap jconsole with faketime '2026-05-10' so Jd's expired evaluation key validates. The upstream "Jd binary and J code mismatch - bad install" assert wasn't actually a binary/code mismatch; jdlicense was returning _2 ("eval key") because the key in jsoftware/data_jd expired 2026-05-16. Backdating fixes the binary path on both x86_64 and aarch64. * Install J via jlibrary + bin overlay. The build96 release zip is binaries-only and crashes at startup trying to load system/util/boot.ijs; the full library lives in jsoftware/jsource/jlibrary on master. Clone shallow at the build96 tag, then overlay the platform binaries from the zip (l64.zip on x86_64, rpi64.zip on aarch64). * Install the full Jd dependency chain via pacman. jd.ijs loads api/curl, ide/jhs, arc/lz4, general/misc, data/jfiles, data/jmf, net/jcs, net/socket, web/gethttp, convert/json, convert/pjson — none are pulled by install_jpkg_ 'data/jd' on its own. Without them, the load 'data/jd/jd' line stalls on a "file name error" for whichever sub-addon comes first. * Open the right database in query.ijs. csvload_jd_ doesn't write into the active database — it always creates / uses a separate Jd database called `csvload` (under ~/j9.6-user/temp/jd/csvload/). query.ijs now opens that, not the previous `sandp` admin scope, so `jd 'reads ... from hits'` finds the table. * Read all of stdin (1!:1 (3)), strip LF/CR (J's "." rejects them mid-source), then eval. Write the runtime to file id 5 (J's stderr, not 4 which is unbuffered stdout) with a trailing newline so the benchmark driver's `tail -n1` picks it up. * data-size now points at ~/j9.6-user/temp/jd/csvload, matching where the loader actually wrote. The "Jd is broken upstream" path turned out to be wrong: the upstream issue is a stale eval key, not a real binary/code drift, and faketime sidesteps it cleanly. The arch gate is gone too — aarch64 works on rpi64.zip + cd/rpi/libjd.so. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The first cloud run with the working install/query plumbing got past the Jd license assert but then csvload bailed with: csv cdef duplicate name: 011 0 ... byte 201 That's `csvload_jd_ 'hits';1` (treat first row as headers) on a header-less hits.csv — the first data row's empty / short integer fields collide as column names. Use `csvload_jd_ 'hits';0` to load with default names (c1..c105), then rename to the canonical ClickBench schema with `csvrename_jd_`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The previous load relied on csvload_jd_'s auto-inference, which sampled the first 5000 rows for types and then ran csvscan to widen any byte columns to the full-file max. ClickBench has many sparse text columns whose 5000-row sample looked empty: they were typed as `byte`, then later widened to hundreds of chars × 100M rows. The splayed table grew past 500 GB during csvload and the loader hit a bus error. Skip csvcdefs/csvscan and write an explicit hits.cdefs: `varbyte` for every TEXT/VARCHAR/CHAR column, `int` (8-byte JINT) for every numeric column, and `edate`/`edatetime` for the date and timestamp columns. Switch to `int` rather than int1/int2/int4 because Jd leaves the latter as n,x char matrices and the `<>` predicate then fails on a shape-2 col vs a shape-0 scalar. Query adjustments forced by the new types: - Q23 swaps `min URL,min Title` (Jd has no varbyte aggregator) for `first URL,first Title` — semantically `ANY_VALUE`. - Q28 (`AVG(LENGTH(URL))`) joins Q29/Q43 in the `'null'` bucket. - Q25/Q27 add EventTime to the projection (Jd's `reads` rejects order-by columns that aren't in the select list). - Q5/Q6 use `# ~. ; }. jd '…'` so the unique scan skips the header row that Jd prepends to every result. - Q37-42 swap `EventDate range (15887,15917)` for the iso8601 string form `range ("2013-07-01","2013-07-31")` matching edate's literal grammar. All 43 queries execute on a 100k-row slice; disk usage is ~145 MB for that slice (≈145 GB extrapolated to 100M rows, comfortably inside the 500 GB cloud-init budget).

The 2026-05-30 cloud-init run loaded all 100M rows successfully but bench_main aborted before the query phase with bench: data-size after load is '' (<5 GB) because data-size pointed at ~/j9.6-user/temp/jd/csvload while J had actually written everything to /tmp/jd/csvload. J 9.6 picks the ~user / ~temp paths from j9.6/bin/profile.ijs: running as a normal user it uses ~/j9.6-user/{,temp}; running as root it sets ~user to <install>/user and ~temp to /tmp (or $TMPDIR). cloud-init runs as root so csvload landed in /tmp. Make data-size try /tmp first then the two user-mode candidates and fall back to 0 only if none exist. Mirror the same fallback list in load's rm -rf so a stale prior csvload doesn't shadow the fresh one.

Q22 came back null in the 2026-05-30 11:29:46 c6a.metal run for the same reason Q23 did: Jd's getagg/<. can't reduce a boxed varbyte column. Apply the same first/ANY_VALUE substitution we made for Q23 in 61385e9.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Uh oh!

Commits on May 29, 2026

Commits on May 30, 2026

This comparison is taking too long to generate.

Uh oh!