Loading the Data

Read GeoParquet with DuckDB

Query Land Records .parquet exports with DuckDB's spatial extension — including row-group pruning, columnar projection, and direct reads from object storage.

GeoParquet is the right choice for analytical workloads: columnar storage, row-group statistics for predicate pushdown, and dramatically smaller files than GeoPackage at equivalent fidelity. DuckDB reads it natively when the spatial extension is loaded.

Install / load extensions

INSTALL spatial;
LOAD spatial;

parquet is built in; spatial adds GeoParquet metadata handling and the ST_* functions.

1. Inspect

DuckDB recognizes GeoParquet via the geo metadata block embedded in the file:

SELECT * FROM parquet_metadata('parcels-2026.2.parquet') LIMIT 5;
SELECT * FROM parquet_schema('parcels-2026.2.parquet');

You'll see one row per row-group. Each row-group carries min/max statistics for every column — DuckDB uses these to skip entire row-groups when your WHERE clause is selective enough.

2. Query directly

SELECT lrid, ownername, parceladdr, ST_AsText(geom) AS geom_wkt
FROM read_parquet('parcels-2026.2.parquet')
WHERE statefp = '01' AND countyfp = '003'
LIMIT 10;

The Land Records publisher writes files sorted by (statefp, countyfp) and tunes the row-group size so a single-county filter typically reads only a few row-groups. State-level filters read a few hundred.

3. Aggregations (no geometry needed)

When you don't reference geom, DuckDB only reads the columns you ask for — orders of magnitude faster than loading the same data from GeoPackage:

SELECT
    statefp,
    parcelstate,
    COUNT(*) AS parcels,
    SUM(calcarea) / 1e6 AS total_km2
FROM read_parquet('parcels-2026.2.parquet')
GROUP BY 1, 2
ORDER BY parcels DESC;

This typically completes in seconds on the nationwide file because no row-group containing only irrelevant columns is fetched at all.

4. Spatial filters

geom is stored as WKB. ST_GeomFromWKB reconstructs it; with LOAD spatial DuckDB also recognizes the GeoParquet geo metadata, so ST_* functions accept the column directly:

SELECT lrid, ownername, calcarea
FROM read_parquet('parcels-2026.2.parquet')
WHERE ST_DWithin_Spheroid(
    geom,
    ST_Point(-86.81, 33.52),
    5 * 1609.344
);

For a single-point lookup against the nationwide file, combine a bounding-box prefilter (which prunes row-groups via the bbox column statistics) with the exact geographic predicate:

WITH near AS (
    SELECT lrid, ownername, calcarea, geom
    FROM read_parquet('parcels-2026.2.parquet')
    WHERE centroidx BETWEEN -86.9 AND -86.7
      AND centroidy BETWEEN  33.4 AND  33.6
)
SELECT lrid, ownername, calcarea
FROM near
WHERE ST_DWithin_Spheroid(geom, ST_Point(-86.81, 33.52), 5 * 1609.344);

centroidx and centroidy are stored as scalars in the file, so DuckDB can prune row-groups using their per-row-group min/max stats before it ever decodes any geometry.

5. Reading from object storage

DuckDB can query Parquet directly from GCS or S3 — no download step:

INSTALL httpfs;
LOAD httpfs;

SELECT count(*)
FROM read_parquet('s3://my-bucket/landrecords/parcels-2026.2.parquet')
WHERE statefp = '06';                   -- California

For GCS, point httpfs at gs://... with a configured SECRET:

CREATE SECRET (
    TYPE GCS,
    KEY_ID '<access-key>',
    SECRET '<secret-key>'
);
SELECT count(*) FROM read_parquet('gs://my-bucket/landrecords/parcels-2026.2.parquet');

DuckDB issues Range HTTP requests for only the row-groups that pass the WHERE clause — querying a single state from the nationwide file transfers a few hundred MB rather than the full file.

6. Materialize for repeated heavy use

ATTACH 'landrecords.duckdb' AS lr;
CREATE TABLE lr.parcels AS
SELECT * FROM read_parquet('parcels-2026.2.parquet');

CREATE INDEX parcels_geom_rtree ON lr.parcels USING RTREE (geom);

For purely analytical workloads (no point-in-polygon lookups), the raw Parquet file is often faster than a materialized table because it stays compressed on disk and DuckDB streams through the column projections you ask for.

When to pick GeoParquet over GeoPackage

WorkloadPick GeoParquetPick GeoPackage
Aggregations across many parcels
Reading from S3 / GCS without download
Loading into BigQuery / Snowflake / Athena
Point-in-polygon, repeated geo queries
Editing parcels in QGIS
Joining to multiple in-file attribute tables

On this page