DuckDB ranged requests

When I go to shell.duckdb.org and run the following query, only 6.5 MB of data is downloaded (the whole file is about 17 MB), thanks to HTTP Range Requests:

SELECT
   site_num, date, tmax
 FROM
   read_parquet('https://jimjam-slam.github.io/test-quarto-duckdb-parquet-ranged/data/acorn-sat.parquet')
 WHERE date = '2021-12-20';

But when I use Quarto’s built-in version of DuckDB-WASM, or Observable’s, I get the following console warning and the whole file is downloaded:

acorn = DuckDBClient.of({
  metadata: FileAttachment("https://jimjam-slam.github.io/test-quarto-duckdb-parquet-ranged/data/acorn-metadata.csv"),
  obs: FileAttachment("https://jimjam-slam.github.io/test-quarto-duckdb-parquet-ranged/data/acorn-sat.parquet")
})

acorn

Let’s grab a subset of the ACORN-SAT data:

acornData = acorn.sql`
  SELECT
    site_num, date, tmax
  FROM
    obs
  WHERE
    date = '2021-12-20'
`
Inputs.table(acornData)

Import a newer DuckDB manually?

Maybe we can import a newer version of DuckDB and use that? I’ve forked the original CRU DuckDB client for Observable and bumped the underlying DuckDB version to 1.29.0:

newDuck = import("https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm@1.29.0/+esm")

newDuck
import { DuckClient } from "@jimjamslam/duck"

DuckClient
import { arrow } from "@jimjamslam/duck"

arrow
Note

I’ve renamed DuckDBClient to DuckClient in this fork so that I can test the two simultaneously; Observable’s static notebook imports don’t seem to support aliases.

Let’s try it again with the new client:

acornNew = DuckClient.of({
  obs: FileAttachment("https://jimjam-slam.github.io/test-quarto-duckdb-parquet-ranged/data/acorn-sat.parquet")
})
acornDataNew = acornNew.sql`
  SELECT
    site_num, date, tmax
  FROM
    obs
  WHERE
    date = '2021-12-20'
`
Inputs.table(acornDataNew)