= DuckDBClient.of({
acorn metadata: FileAttachment("https://jimjam-slam.github.io/test-quarto-duckdb-parquet-ranged/data/acorn-metadata.csv"),
obs: FileAttachment("https://jimjam-slam.github.io/test-quarto-duckdb-parquet-ranged/data/acorn-sat.parquet")
})
acorn
DuckDB ranged requests
When I go to shell.duckdb.org and run the following query, only 6.5 MB of data is downloaded (the whole file is about 17 MB), thanks to HTTP Range Requests:
SELECT
date, tmax
site_num, FROM
'https://jimjam-slam.github.io/test-quarto-duckdb-parquet-ranged/data/acorn-sat.parquet')
read_parquet(WHERE date = '2021-12-20';
But when I use Quarto’s built-in version of DuckDB-WASM, or Observable’s, I get the following console warning and the whole file is downloaded:
falling back to full HTTP read for: https://jimjam-slam.github.io/test-quarto-duckdb-parquet-ranged/data/acorn-sat.parquet
Let’s grab a subset of the ACORN-SAT data:
= acorn.sql`
acornData SELECT
site_num, date, tmax
FROM
obs
WHERE
date = '2021-12-20'
`
.table(acornData) Inputs
Import a newer DuckDB manually?
Maybe we can import a newer version of DuckDB and use that? I’ve forked the original CRU DuckDB client for Observable and bumped the underlying DuckDB version to 1.29.0:
= import("https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm@1.29.0/+esm")
newDuck
newDuck
import { DuckClient } from "@jimjamslam/duck"
DuckClient
import { arrow } from "@jimjamslam/duck"
arrow
I’ve renamed DuckDBClient
to DuckClient
in this fork so that I can test the two simultaneously; Observable’s static notebook imports don’t seem to support aliases.
Let’s try it again with the new client:
= DuckClient.of({
acornNew obs: FileAttachment("https://jimjam-slam.github.io/test-quarto-duckdb-parquet-ranged/data/acorn-sat.parquet")
})
= acornNew.sql`
acornDataNew SELECT
site_num, date, tmax
FROM
obs
WHERE
date = '2021-12-20'
`
.table(acornDataNew) Inputs