I think actual documents should include a sha256 hash even if they are still linked directly to the DOJ’s website. Once they have been first added add a hash to it, there could be a change over time to the file could be an easy way to detect and verify changes. Including the hash on the website will also make it easier for people looking at the files offline and from different sources.
Very good idea, essential actually
There are a lot of dead links to https://www.justice.gov Maybe see if you can find a 3rd party to host the original files. Maybe link to archive.org matched by hash. Decentralized file storage costs a lot. Watch your hosting bill Jmail got a $46k hosting fee, tho the CEO said he would pay it.
Solid suggestion. File integrity verification makes a lot of sense, especially with documents disappearing from the DOJ site. If people are comparing offline copies from different sources, hashes are the only reliable way to confirm they match.
Hashing all 1.6M+ files is a big lift, but starting with the core indexed documents is doable. Adding it to the roadmap.
On the dead links: we’re aware and it’s frustrating. Wayback Machine is the current fallback, and @mall’s thread about tracking removed documents (#102) ties into this nicely. Having hashes would let people verify that archived copies match the originals.
Appreciate the hosting cost heads-up too. We’ve been careful about that on our end.
You could compare the hashes on this repo GitHub - rodrigopolo/epstein-doj-library-sha256: SHA256 hashes of the DOJ Disclosures this repo GitHub - beak2825/epstein-files-archive: This is not archiving the files themselves, this is only archiving the server responses, useful for checksum and Last-Modified, and spotlighting the mistakes or things the DOJ is doing that is getting swept under the rug. has been tracking changes and file deletions. A case study in PDF forensics: The Epstein PDFs – PDF Association There are other options to detect other changes I am sure you know. I am going to start to mess with implementing a few things. Another thing people should be aware of with how popular these documents are it’s a easy attack vector for malicious pdf files.
Looking into this now
Update: This is now live.
We ingested all 1,380,911 SHA-256 hashes from @rodrigopolo’s repository covering all 12 EFTA datasets, and all deletion/modification tracking data from @beak2825’s archive.
Here is what shipped today:
Hash verification on every document:
- Every document page now shows an integrity badge (green shield = verified, red alert = mismatch, red banner = removed from DOJ)
- Click the badge to see the full SHA-256 hash, dataset, verification timestamp, and modification history
- Documents deleted by the DOJ show a prominent banner explaining the removal
Public Integrity Dashboard:
- /integrity shows real-time stats: total hashes, verification status, DOJ deletions, modifications
- Dataset breakdown table showing exactly how many documents were deleted or modified in each of the 12 datasets
- DS9 spotlight section documenting the 866 deletions and what we know
- Full methodology explanation so anyone can independently verify
Automated monitoring:
- An integrity agent checks the DOJ servers twice daily, verifying a random sample of 50 documents
- Weekly deep verification downloads full PDFs and recomputes SHA-256 hashes against our baseline
- If any document disappears or changes, an alert is generated automatically
PDF security:
- The PDF viewer now runs magic byte detection and content analysis on every document
- JavaScript detection, embedded file scanning, and suspicious redaction identification
- Security headers (CSP, nosniff) on all PDF responses
The numbers: 892 documents deleted from DOJ servers (866 from DS9, 26 from other datasets), 32 documents modified after initial publication, and 24 fragmentary entity records are all that survive from the deleted documents.
We also published a full investigation into the deletions on the blog: The DOJ Quietly Deleted 892 Epstein Documents
Thanks @pHat for pushing on this. The repos you linked were exactly the data sources we used. If anyone wants to verify independently, clone rodrigopolo’s repo and run sha256sum against any document.
This great. Added that really quick. Hopefully this will be helpful to others.
/integrity shows that 928 files have been deleted, and 0 modified after the initial release of Data Set 9.
This might be an AI hallucination as it never said anything like that but the current numbers are 401 Changed, 866 Deleted, for data set 9.
-beak2825