
Most of the parcels in our nationwide dataset begin life as a GIS export — a shapefile or a feature service a county hands us, already drawn as machine-readable geometry. Cornish, Maine was not one of those counties. Cornish publishes its tax maps as eighteen scanned PDF sheets: 30″×24″ AutoCAD plots of property maps originally drafted by hand in 1976 by John E. O’Donnell & Associates. The parcel lines on those sheets are pixels in a raster image, not coordinates. This is the story of how we turned eighteen pictures into 2,547 georeferenced parcels.
What we started with
If you want a sense of the long tail of American land records, Cornish is a good place to look. The town’s tax maps are scans, and the scans carry almost none of the metadata a GIS pipeline normally leans on:
- It’s raster, not vector. The parcel boundaries are drawn lines baked into the image. There is no underlying geometry to read — it has to be traced.
- There’s no coordinate system. No embedded CRS, no grid ticks, no latitude/longitude anywhere on the sheet. The only georeferenced anchor we could trust was the town’s own outline — the U.S. Census county-subdivision boundary (
cousubfp 14485, from TIGERtl_2024_23_cousub). - The sheets don’t overlap. Each detail sheet covers a different slice of town, and no two share a parcel, so there are no common points to stitch them together by.
- The scale is in the drawing, not the file. Each sheet’s plot scale —
1″=500′for rural sheets,1″=100′for the village — had to be read off the sheet itself.
The pipeline
We built a ten-stage pipeline, each stage runnable on its own, that takes the eighteen PDFs in one end and produces a clean, georeferenced parcel fabric. The interesting parts:
- Render & calibrate. Each sheet is rasterized to grayscale at 400 dpi. The pipeline detects the map’s neatline (the printed frame) and reads the true plot scale, so every later measurement is anchored to real-world feet.
- Vectorize with a “road-sealing watershed.” This is the heart of it. Naively flooding the image to find enclosed regions fails, because parcels leak into the road network and merge into one giant blob. So we first seal the road corridors, flood the exterior in from the frame border, recover the interior lots the flood swallowed, and then run a watershed back onto the inked boundary lines. The result is a set of regions that are disjoint by construction — no parcel can overlap another.
- Read the labels. Lot IDs and printed acreages come from the PDF’s vector text layer where it exists, and from Tesseract OCR where it doesn’t, then get attached to the right polygon by a point-in-polygon test.
- Assemble without ground control. The whole-town “Overall” sheet is fit to the Census boundary first. Each detail sheet is then dropped into place at its true scale by chamfer-matching its road network against the assembled whole, seeded by where it sits on the map index.
- Georeference, clip, and validate. A first-wins planar partition turns everything into one disjoint fabric; coordinates are projected from UTM 19N to WGS84; every parcel is intersected with the town boundary and bordering parcels are snapped onto the town line. Finally, every area, containment, and overlap check is run geodesically on the GRS80 ellipsoid — never on flat, projected coordinates — so the acreages are honest.

Detail along the western (Parsonsfield) line: bordering parcels are clipped to and snapped onto the Census boundary in red, so their outer edges sit exactly on the town line.
How well did it work?
Cornish covers 14,318 geodesic acres. Against that, the digitized fabric came out clean:
- 2,547 parcels, all bounded by the town and disjoint from one another.
- 0 overlapping pairs — a clean planar coverage, with no double-counted land.
- 100% of parcel area falls inside the town boundary, and 100% of the town perimeter is traced to within 5 meters of the Census line.
- Area accuracy: across the 241 parcels where we could OCR the printed acreage, the median digitized area came within about 3% of the figure on the map.
Nationwide parcel coverage isn’t hard because of the big counties — they hand you clean GIS data. It’s hard because of the long tail of jurisdictions like Cornish, where the authoritative record is still a scanned drawing. Building a repeatable pipeline that can take eighteen PDFs and emit a validated, georeferenced parcel layer is exactly the kind of work that lets us keep filling in that tail without compromising on data integrity.
Curious about coverage in your area, or want to put our data to work? Browse the US Nationwide Parcel Dataset, or reach us anytime at hello@landrecords.us — we always want to hear what you’re building.