Connecting files?

Hello,

So sometimes when reviewing a document I cross-reference it with Jmail since sometimes names will appear unredacted there or I can find more context for the email if it’s in a chain. This is probably pretty obvious information that everyone knows but i realised that multiple versions of the same email exist depending on what Doj drop they came from. The result is multiple versions of the same email existing which is no doubt inflating the amount of total documents to review.

My first thought was ‘Isn’t there some sort of program that can delete duplicates?’ but i know nothing about programming and I imagine the risk of accidentally deleting something important isn’t worth it in the grand scheme of things. But i think the way it is all these emails with no context are misleading reveiwers into mistagging things.

For example

on it’s own appears to be sent to a victim but the subsequent response I found, seperate from the document given to review, shows that he’s talking to someone called steve.

Once again i know nothing about programing but is there possibly a way to group emails for review if they share subject lines maybe? If that’s not possible perhaps a way for reviewers to connect documents belonging to a chain/being the same file. There is a category for duplicates but with the sheer number of emails the chance the same person will review different versions of the same email and remember it’s a duplicate seems really low, and for that to happen with three people is even more unlikely.

1 Like

Great observation @Ggg. You are right that the same email can appear in multiple DOJ datasets, sometimes with slightly different redactions or formatting. This is something we have been aware of and working on.

The site already has a document integrity system that computes SHA-256 hashes and can detect duplicates across datasets. We have been expanding this to identify near-duplicates (same content, different redaction levels) as well.

For now, when you are reviewing a document and notice it is a duplicate of something you have already seen, the most helpful thing you can do is note it in the review comments. Something like “Duplicate of EFTA00XXXXXX from Dataset X” helps us link them together.

Connecting related files across datasets (not just duplicates but also email chains, attachments that reference each other, etc) is one of the bigger features on the roadmap. Your instinct to cross-reference with Jmail is exactly the right approach. Keep it up.

1 Like