Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: ClickHouse/ClickBench
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: ClickHouse/ClickBench
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: add-jd
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 7 commits
  • 12 files changed
  • 2 contributors

Commits on May 29, 2026

  1. Add Jd (J Database)

    Jd is Jsoftware's high-performance columnar RDBMS, written in C
    with a deep J integration. Non-commercial use is free; a
    non-commercial key is auto-installed on first run.
    
    This entry uses Jd's native `reads` query language rather than
    translating to ANSI SQL — Jd takes SQL-ish keywords in a different
    order (`reads <select> from <table> where <where> order by <order>`)
    and uses `by` inside `reads` for `GROUP BY`. queries.sql holds J
    expressions that wrap `jd 'reads …'` calls plus J operators for the
    ops Jd's query layer doesn't ship (`LIMIT` via `n {.`, `COUNT(DISTINCT)`
    via `# ~.` on a column).
    
    `./install` downloads the J 9.6 runtime zip from
    github.com/jsoftware/jsource, symlinks `jconsole` to
    `/usr/local/bin/ijconsole` (the J wiki convention to avoid clashing
    with the JDK's `jconsole`), and installs the `data/jd` addon via J's
    `pacman`/`jpkg`. `./load` ingests `hits.csv` via Jd's CSV loader
    (`csvprepare_jd_` + `csvload_jd_`) into a `./db/` directory. `./query`
    pipes one queries.sql line into `ijconsole query.ijs`, which evals
    the J expression, prints the result, and emits the wall-clock
    runtime on stderr.
    
    Q29 (REGEXP_REPLACE) and Q43 (DATE_TRUNC(minute, ...)) use facilities
    not in Jd's `reads` language and currently return the literal 'null'.
    They can be expressed with a J-side computed column — left as a
    follow-up.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    alexey-milovidov and claude committed May 29, 2026
    Configuration menu
    Copy the full SHA
    ef669b3 View commit details
    Browse the repository at this point in the history
  2. ClickBench/jd: stage J from jlibrary + bin overlay, gate on x86_64

    Local test on aarch64 (c8g.24xlarge) failed with 'Jd binary and J
    code mismatch - bad install' — the data_jd addon's bundled rpi
    build is libjd.so from GCC 4.9 (2015) while jd.ijs is v4.48 (2026),
    and Jd doesn't ship a current aarch64 .so for Graviton-class hosts.
    The x86_64 build in data_jd/cd/libjd.so is the supported path.
    
    Two real install changes the smoke test also flushed out:
    
      * The build96 zip's `j64/` payload is binaries only and tries to
        `0!:0 system/util/boot.ijs` at startup, which doesn't exist
        inside the zip. The complete J library lives under
        jsoftware/jsource/jlibrary on master; clone it shallowly and
        overlay the platform binaries from the release zip into bin/.
        That matches what the Debian package builds locally.
    
      * Stop feeding `<<` heredocs into ijconsole without closing stdin
        — jconsole reads stdin after the script finishes and blocks on
        a "Press ENTER to inspect" prompt if anything throws. Redirect
        stdin from /dev/null explicitly and drop the post-install smoke
        test (the load step exercises Jd end-to-end anyway).
    
    Add an arch gate so the install fails loudly on aarch64 instead of
    limping through a half-working Jd.
    
    query.ijs: replace `(1!:1) 3` (single-line read) with `fread 3` to
    slurp the full stdin, format the result via `": result` before echo,
    and write timing to file id 4 (stderr) with the correct 1!:2 form.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    alexey-milovidov and claude committed May 29, 2026
    Configuration menu
    Copy the full SHA
    874b43e View commit details
    Browse the repository at this point in the history

Commits on May 30, 2026

  1. ClickBench/jd: working install + load + query end-to-end

    Local smoke + tiny-CSV load + a few queries.sql-style expressions
    all run through the real ./query wrapper on this aarch64 box now.
    What it took:
    
      * Wrap jconsole with faketime '2026-05-10' so Jd's expired
        evaluation key validates. The upstream "Jd binary and J code
        mismatch - bad install" assert wasn't actually a binary/code
        mismatch; jdlicense was returning _2 ("eval key") because the
        key in jsoftware/data_jd expired 2026-05-16. Backdating fixes
        the binary path on both x86_64 and aarch64.
    
      * Install J via jlibrary + bin overlay. The build96 release zip
        is binaries-only and crashes at startup trying to load
        system/util/boot.ijs; the full library lives in
        jsoftware/jsource/jlibrary on master. Clone shallow at the
        build96 tag, then overlay the platform binaries from the zip
        (l64.zip on x86_64, rpi64.zip on aarch64).
    
      * Install the full Jd dependency chain via pacman. jd.ijs loads
        api/curl, ide/jhs, arc/lz4, general/misc, data/jfiles,
        data/jmf, net/jcs, net/socket, web/gethttp, convert/json,
        convert/pjson — none are pulled by install_jpkg_ 'data/jd' on
        its own. Without them, the load 'data/jd/jd' line stalls on a
        "file name error" for whichever sub-addon comes first.
    
      * Open the right database in query.ijs. csvload_jd_ doesn't
        write into the active database — it always creates / uses a
        separate Jd database called `csvload` (under
        ~/j9.6-user/temp/jd/csvload/). query.ijs now opens that, not
        the previous `sandp` admin scope, so `jd 'reads ... from
        hits'` finds the table.
    
      * Read all of stdin (1!:1 (3)), strip LF/CR (J's "." rejects
        them mid-source), then eval. Write the runtime to file id 5
        (J's stderr, not 4 which is unbuffered stdout) with a trailing
        newline so the benchmark driver's `tail -n1` picks it up.
    
      * data-size now points at ~/j9.6-user/temp/jd/csvload, matching
        where the loader actually wrote.
    
    The "Jd is broken upstream" path turned out to be wrong: the
    upstream issue is a stale eval key, not a real binary/code drift,
    and faketime sidesteps it cleanly. The arch gate is gone too —
    aarch64 works on rpi64.zip + cd/rpi/libjd.so.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    alexey-milovidov and claude committed May 30, 2026
    Configuration menu
    Copy the full SHA
    c0a5f1f View commit details
    Browse the repository at this point in the history
  2. ClickBench/jd: load hits.csv as header-less, rename to canonical cols

    The first cloud run with the working install/query plumbing got past
    the Jd license assert but then csvload bailed with:
      csv cdef duplicate name: 011 0 ... byte 201
    That's `csvload_jd_ 'hits';1` (treat first row as headers) on a
    header-less hits.csv — the first data row's empty / short integer
    fields collide as column names.
    
    Use `csvload_jd_ 'hits';0` to load with default names (c1..c105),
    then rename to the canonical ClickBench schema with `csvrename_jd_`.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
    alexey-milovidov and claude committed May 30, 2026
    Configuration menu
    Copy the full SHA
    6ac10cf View commit details
    Browse the repository at this point in the history
  3. ClickBench/jd: explicit cdefs to keep load inside disk budget

    The previous load relied on csvload_jd_'s auto-inference, which
    sampled the first 5000 rows for types and then ran csvscan to
    widen any byte columns to the full-file max. ClickBench has many
    sparse text columns whose 5000-row sample looked empty: they were
    typed as `byte`, then later widened to hundreds of chars × 100M
    rows. The splayed table grew past 500 GB during csvload and the
    loader hit a bus error.
    
    Skip csvcdefs/csvscan and write an explicit hits.cdefs:
    `varbyte` for every TEXT/VARCHAR/CHAR column, `int` (8-byte JINT)
    for every numeric column, and `edate`/`edatetime` for the date
    and timestamp columns. Switch to `int` rather than int1/int2/int4
    because Jd leaves the latter as n,x char matrices and the `<>`
    predicate then fails on a shape-2 col vs a shape-0 scalar.
    
    Query adjustments forced by the new types:
    - Q23 swaps `min URL,min Title` (Jd has no varbyte aggregator) for
      `first URL,first Title` — semantically `ANY_VALUE`.
    - Q28 (`AVG(LENGTH(URL))`) joins Q29/Q43 in the `'null'` bucket.
    - Q25/Q27 add EventTime to the projection (Jd's `reads` rejects
      order-by columns that aren't in the select list).
    - Q5/Q6 use `# ~. ; }. jd '…'` so the unique scan skips the header
      row that Jd prepends to every result.
    - Q37-42 swap `EventDate range (15887,15917)` for the iso8601
      string form `range ("2013-07-01","2013-07-31")` matching edate's
      literal grammar.
    
    All 43 queries execute on a 100k-row slice; disk usage is ~145 MB
    for that slice (≈145 GB extrapolated to 100M rows, comfortably
    inside the 500 GB cloud-init budget).
    alexey-milovidov committed May 30, 2026
    Configuration menu
    Copy the full SHA
    61385e9 View commit details
    Browse the repository at this point in the history
  4. ClickBench/jd: resolve csvload path correctly when running as root

    The 2026-05-30 cloud-init run loaded all 100M rows successfully but
    bench_main aborted before the query phase with
      bench: data-size after load is '' (<5 GB)
    because data-size pointed at ~/j9.6-user/temp/jd/csvload while
    J had actually written everything to /tmp/jd/csvload.
    
    J 9.6 picks the ~user / ~temp paths from j9.6/bin/profile.ijs:
    running as a normal user it uses ~/j9.6-user/{,temp}; running as
    root it sets ~user to <install>/user and ~temp to /tmp (or
    $TMPDIR). cloud-init runs as root so csvload landed in /tmp.
    
    Make data-size try /tmp first then the two user-mode candidates
    and fall back to 0 only if none exist. Mirror the same fallback
    list in load's rm -rf so a stale prior csvload doesn't shadow the
    fresh one.
    alexey-milovidov committed May 30, 2026
    Configuration menu
    Copy the full SHA
    109a2f7 View commit details
    Browse the repository at this point in the history
  5. ClickBench/jd: Q22 swap min URL for first URL like Q23

    Q22 came back null in the 2026-05-30 11:29:46 c6a.metal run for
    the same reason Q23 did: Jd's getagg/<. can't reduce a boxed
    varbyte column. Apply the same first/ANY_VALUE substitution we
    made for Q23 in 61385e9.
    alexey-milovidov committed May 30, 2026
    Configuration menu
    Copy the full SHA
    cb9691f View commit details
    Browse the repository at this point in the history
Loading