Improving document relationships and clustering

I love the site as it currently stands, but I have a few suggestions that may help people navigate documents more effectively.

Currently, there’s a related documents section, but I believe that could be expanded significantly to categorize corroborating and contradictory evidence. For instance, if a certain tip is not supported by forensic evidence for instance, the forensic evidence should be included with the tip.

Of course, community investigations can absolutely help with this, but I’m sure there are ways to employ LLMs just to get started. At least, categorizing the current related documents would be quite useful.

Additionally, something like this could help create a “document reliability” measure which can tell users if this document is strongly corroborated evidence or a one-off.

If you’d like to implement this, I’d be happy to assist with the development or design.

2 Likes

Yes to this. Let’s take the example of a particular house or room described by witnesses or victims: we may find photographs that match the description, and we would like to look at them side-by-side, while scrolling a column on the side with the witness description. We could then tag or heart photos that seem to be linked - say, photos of the same room could be linked together in a cluster.

Both great suggestions @blafy and @J4Vic.

Document clustering and relationship mapping is high on the roadmap. Right now the “related documents” section uses basic tag and person overlap, but you are right that it needs to be much deeper. Corroborating vs contradictory evidence, witness testimony matched to physical evidence, photos matched to witness descriptions, these are all connections that would make the archive far more useful.

The side-by-side view @J4Vic described (scrolling a witness description next to photos of the location they describe) is a compelling use case. That kind of interface would turn raw document review into actual investigation.

I have been exploring using vector embeddings to find semantically similar documents even when they do not share exact keywords. Combined with community-provided tags and links, this could get powerful.

No timeline on this yet, but it is actively being worked on.

1 Like

Wow, super!! :clap: You’re like a one-person developer-machine.