{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Land Records GeoPackage — GeoPandas Walkthrough\n",
    "\n",
    "This notebook accompanies the [Read GeoPackage with GeoPandas](https://landrecords.us/documentation/loading-data/geopackage-geopandas) doc page. It assumes you've downloaded a Land Records nationwide `.gpkg` file (e.g. `parcels-2026.2.gpkg`) and have it sitting next to the notebook.\n",
    "\n",
    "Workflows covered:\n",
    "\n",
    "1. Inspect the file (layers, columns, CRS).\n",
    "2. Read one state with a pushed-down filter.\n",
    "3. Filter by attribute and by bounding box.\n",
    "4. Reproject to an equal-area CRS to compute area in km².\n",
    "5. Plot a county.\n",
    "6. Stream the whole file when you can't hold it in RAM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pip install \"geopandas>=1.0\" pyogrio shapely matplotlib  # if not already installed\n",
    "import geopandas as gpd\n",
    "import pyogrio\n",
    "import pandas as pd\n",
    "\n",
    "GPKG_PATH = \"parcels-2026.2.gpkg\"\n",
    "LAYER = \"lr_parcel_us\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Inspect"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Layers:\", pyogrio.list_layers(GPKG_PATH))\n",
    "info = pyogrio.read_info(GPKG_PATH, layer=LAYER)\n",
    "print(\"Feature count:\", info[\"features\"])\n",
    "print(\"CRS:          \", info[\"crs\"])\n",
    "print(\"Geometry type:\", info[\"geometry_type\"])\n",
    "print(\"Bounds:       \", info[\"total_bounds\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Read one state\n",
    "\n",
    "Push the `statefp` filter down to GDAL so we never materialize the whole layer. `statefp = '01'` is Alabama."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "al = gpd.read_file(\n",
    "    GPKG_PATH,\n",
    "    layer=LAYER,\n",
    "    where=\"statefp = '01'\",\n",
    ")\n",
    "print(al.shape)\n",
    "al.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Filter by attribute — federally-owned parcels in this state\n",
    "\n",
    "The 2026.2 schema includes `parceltype`, which separates federal/state/local government parcels from privately-owned ones."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "federal = al[al[\"parceltype\"] == \"GOVT FEDERAL\"]\n",
    "print(f\"{len(federal):,} federally-owned parcels in AL\")\n",
    "federal[[\"lrid\", \"ownername\", \"placename\", \"calcarea\"]].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Filter by bounding box — downtown Denver\n",
    "\n",
    "`bbox` is pushed down to GDAL's spatial index. This is the right approach for any radius / box-style spatial selection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "denver_bbox = (-105.11, 39.61, -104.85, 39.91)\n",
    "denver = gpd.read_file(GPKG_PATH, layer=LAYER, bbox=denver_bbox)\n",
    "print(denver.shape)\n",
    "denver.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Reproject for accurate area\n",
    "\n",
    "Never compute area in EPSG:4326. Project to an equal-area CRS first — NAD83 / Conus Albers (EPSG:5070) is the standard for the lower-48."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "denver_alb = denver.to_crs(\"EPSG:5070\")\n",
    "denver_alb[\"sqmi\"] = denver_alb.geometry.area / 2_589_988.11\n",
    "denver_alb[[\"lrid\", \"ownername\", \"parceladdr\", \"sqmi\"]].sort_values(\"sqmi\", ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Plot a county"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "ax = denver.plot(figsize=(10, 10), column=\"parceltype\", legend=True, linewidth=0.1, edgecolor=\"#444\")\n",
    "ax.set_axis_off()\n",
    "ax.set_title(\"Parcels in downtown Denver, colored by ownership category\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Stream the whole file (chunked, no geometry)\n",
    "\n",
    "For nationwide aggregations on a laptop, iterate so you never hold more than one chunk at a time. Dropping geometry (`read_geometry=False`) cuts memory by ~95%."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyogrio import read_dataframe\n",
    "\n",
    "CHUNK = 500_000\n",
    "total_features = info[\"features\"]\n",
    "\n",
    "partial = []\n",
    "for offset in range(0, total_features, CHUNK):\n",
    "    chunk = read_dataframe(\n",
    "        GPKG_PATH,\n",
    "        layer=LAYER,\n",
    "        columns=[\"statefp\", \"countyfp\", \"calcarea\"],\n",
    "        read_geometry=False,\n",
    "        skip_features=offset,\n",
    "        max_features=CHUNK,\n",
    "        use_arrow=True,\n",
    "    )\n",
    "    partial.append(chunk.groupby([\"statefp\", \"countyfp\"], as_index=False)[\"calcarea\"].sum())\n",
    "    print(f\"  processed {offset + len(chunk):,} / {total_features:,}\")\n",
    "\n",
    "totals = pd.concat(partial).groupby([\"statefp\", \"countyfp\"], as_index=False)[\"calcarea\"].sum()\n",
    "totals[\"km2\"] = totals[\"calcarea\"] / 1e6\n",
    "totals.sort_values(\"km2\", ascending=False).head(10)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
