What data do we have?

CAP includes all official, book-published state and federal United States case law — every volume or case designated as an official report of decisions by a court within the United States.

Our scope includes all state courts, federal courts, and territorial courts for American Samoa, Dakota Territory, Guam, Native American Courts, Navajo Nation, and the Northern Mariana Islands.

Our earliest case is from 1658, and we currently include all volumes published through 2020, with new data releases on a rolling basis at the beginning of each year.

Each volume has been converted into structured, case-level data broken out by majority and dissenting opinion, with human-checked metadata for party names, docket number, citation, and date.

We also offer PDFs with selectable OCR text for each case published up to 2018.

Data sources

Harvard Law School Collection

We created CAP's initial collection by digitizing roughly 40 million pages of court decisions contained in roughly 40,000 bound volumes owned by the Harvard Law School Library.

The Harvard Law School Collection includes volumes published through 2018.

The Harvard Law School Collection was digitized on site at Langdell Hall. Members of our team created metadata for each volume, including a unique barcode, reporter name, title, jurisdiction, publication date and other volume-level information. We then used a high-speed scanner to produce JP2 and TIF images of every page. A vendor then used OCR to extract the text of every case, creating case-level XML files. Key metadata fields, like case name, citation, court and decision date, were corrected for accuracy, while the text of each case was left as raw OCR output. In addition, for cases from volumes not yet in the public domain, our vendor redacted any headnotes.

Harvard Law School Collection scope limits:

The Harvard Law School Collection does not include:

  • Cases not designated as officially published, such as most lower court decisions.
  • Non-published trial documents such as party filings, orders, and exhibits.
  • Parallel versions of cases from regional reporters, unless those cases were designated by a court as official.
  • Cases officially published in digital form, such as recent cases from Illinois, Arkansas, New Mexico, and North Carolina.
  • Copyrighted material such as headnotes, for cases still under copyright.

Fastcase Collection

Our collection is augmented with yearly caselaw donations courtesy of Fastcase.

Fastcase provides updates, on a yearly basis, of all caselaw volumes published more than one year ago that is not yet in the case.law corpus. We currently provide Fastcase cases for volumes published through 2020.

Fastcase volumes are delivered to us in an internal XML/HTML format, and we process each case to match the Harvard Law School Collection data formats, so researchers can write consistent code across both collections.

Fastcase Collection scope limits:

The Fastcase Collection includes only:

  • Cases published in the reporters A.3d, B.R., Cal. Rptr. 3d, F.3d, F. Supp. 3d, N.E.3d, N.W.2d, P.3d, S.Ct., S.E.2d, So.3d, S.W.3d, and U.S.
  • Cases published in volumes during or before 2020.
  • Cases not otherwise published in the Harvard Law School Collection.

The Fastcase Collection does not include:

  • PDF page images.
  • Copyrighted material such as headnotes.

By the numbers

Here are some tsv-formatted spreadsheets with specific counts from our collection, and links to view those cases in the API:

Data quality

Harvard Law School Collection data is generated by OCR from page scans, using ABBYY FineReader. Case metadata, such as the party names, docket number, citation, and date, has received human review. Case text and general head matter has been generated by machine OCR and has not received human review.

You can report errors of all kinds at our contact form, or view existing issues at our Github issue tracker. We particularly welcome volume-level metadata corrections, feature requests, and suggestions for large-scale algorithmic changes. We are not currently able to process individual OCR corrections, but welcome general suggestions on the OCR correction process.

Data citation

Data made available through the Caselaw Access Project API and bulk download service is citable. View our suggested citation in these standard formats:

APA
Caselaw Access Project. (2018). Retrieved [date], from [url].

MLA
The President and Fellows of Harvard University. "Caselaw Access Project." 2018, [url].

Chicago / Turabian
Caselaw Access Project. "Caselaw Access Project." Last modified [date], [url].

Have you used Caselaw Access Project data in your research? Tell us about it.

Usage & access

The CAP data is free for the public to use and access.

Case metadata, such as the case name, citation, court, date, etc., is freely and openly accessible without limitation. Full case text can be freely viewed or downloaded but you must register for an account to do so, and currently you may view or download no more than 500 cases per day. In addition, research scholars can qualify for bulk data access by agreeing to certain use and redistribution restrictions. You can request a bulk access agreement by creating an account and then visiting your account page.

Access limitations on full text and bulk data are a component of Harvard’s collaboration agreement with Ravel Law, Inc. (now part of Lexis-Nexis). These limitations will end, at the latest, in February of 2024. In addition, these limitations apply only to cases from jurisdictions that continue to publish their official case law in print form. Once a jurisdiction transitions from print-first publishing to digital-first publishing, these limitations cease. Thus far, Illinois, Arkansas, New Mexico, and North Carolina have made this important and positive shift and, as a result, all historical cases from these jurisdictions are freely available to the public without restriction. We hope many other jurisdictions will follow their example soon.

Press

Friends & Partners

Contributors

  • Anastasia AizmanDesigner and Lead Creative Technologist
  • Kendra Albert Research Associate
  • Karen Beck Manager, Historical & Special Collections
  • Zachary Bodnar Digitization Specialist
  • June Casey Librarian for Open Access Initiatives & Scholarly Communication
  • Stephen ChapmanManager, Digital Strategies for Collections
  • Deborah Chase Digitization Specialist
  • Jack CushmanDirector
  • Kim DulinLibrary Innovation Lab Director
  • Lindsay Dumas Digital Projects Archivist
  • Kate Edrington Digitization Specialist
  • Harmony EidolonProgram Coordinator
  • Kelly FitzpatrickResearch Associate
  • Kerri FlemingDigital Projects Archivist
  • Gerard Fowke 2018 Harvard Law School Library Intern
  • Andy Gu2021 Summer Research Assistant
  • Jane Kelly Historical & Special Collections Assistant
  • Erica Leeman Digitization Specialist
  • Dustin LewisProject Manager
  • Andrew MacTaggartSenior Digitization Specialist
  • Emily Magagnosc Digitization Specialist
  • Margaret Peachy Curator of Digital Collections
  • Lori Schulsinger Collection Development Coordinator
  • Andy SilvaDeveloper
  • Ben SteinbergDevOps
  • Shailin ThomasAffiliate, Berkman Center for Internet & Society
  • Caroline Walters Collection Development Librarian for U.S. Law
  • Suzanne WonesExecutive Director, Harvard Law School Library
  • Adam ZieglerDirector
  • Jonathan ZittrainHarvard Faculty and Law School Library Director

The Caselaw Access Project team cannot help with personal legal research problems or legal representation. Our data is valuable for scholarship, but it is a work in progress and is not kept up to date. Please do not rely on our data set to solve personal legal problems.

Finding a lawyer: see the list of links on the Harvard Law School Library's page Where can I get legal advice?

Alternate databases: if you need to conduct up-to-date research for use in a legal proceeding, consider one of these alternate databases.

Learning to conduct legal research: If you have access to a public law library, its librarians should be able to help you learn legal research skills. The Harvard Law School Library Reference Desk may also be able to offer assistance through their Ask a Librarian service.