Connection Mapping technology

Maybe this is already sorted out:

I am not sure about the underlying database used for this, but it seems people centric. I am familiar with N-hierarchical data stored in a relational DB and using recursive queries against indexes and partition-indexes. With the right data structure you should be able to map every sort of connection that anyone could care about. Like Person A is seen here at a party with Person B and we know that they were both on the same plane back in X Date/time.

The structure is fairly simple. I call it BEERS. Usually, the B is Business, but it can also be B Basic.

Basic Entity to Entity Relationship System

Any time you uniquely identify a Person, Place, Thing, Group, Concept - pretty much any noun, you add it to an Entity table with a unique cryptoguid id (or BIGINT if you aren’t at all concerned about security). It is an optimistic model where it is presumed unique and if it is later determined to be a duplicate, it is merged and references replaced. Not hard, as there is only 2 foreign keys (described below).

There is a Entity Type table which is also an optimistic auto-add. Person, Party, Pool, Plane, Photograph etc. They needn’t start with the letter P, that’s pure coincidence mmhmm, yes.

Any time you have a relationship you want to explain like PARTY and ATTENDEE or PLANE and PASSENGER, the relationship is optimistically added to a relationship table that has the 2 entities/things in 2 columns. You would have a nomenclature standard for the relationship Description like [Plane/Passenger] such that if there is a larger/smaller it is generally described in that order. Each these relationships gets a BIGINT identifier (or cryptoguid but indexing becomes harder).

The “Tall/Skinny” table comes in last. This is where every entity relationship is recorded with some minor metadata and a time frame.
Entity1ID, Entity2ID, RelationshipID, UTCStartDateTime, UTCEndDateTime

The entities are indexed and the table has partition indexes based on RelationshipID to manage index size.

Queries are recursive, checks for infinite loops (This series seems familiar! eject eject), and can connect the dots between a great many things. PersonA was in a Town hall meeting that previously attended a party with another person who is a member of a group that has ownership of property that includes a pool that a pool party was held at and Person B swam in it. Significant? Maybe. Maybe not. But it allows for the systematic collection of the information in an organized way.

But, maybe you have that all sorted out already. If so, that’s wonderful.

-SyntheticJester

3 Likes

Hey SyntheticJester, welcome to the board.

You’re actually describing something really close to what we already have under the hood. The core data model is entity/relationship based. Every person, document, flight, email, location, and organization gets a unique ID, and the connections between them are stored as typed relationships with metadata and timestamps. That’s how the ‘degrees of separation’ tool works on the site. You pick two people and it walks the graph to find the shortest documented path between them, showing the supporting evidence (flights, documents, emails) at each hop.

The BEERS pattern you’re describing is solid and it maps well to what we’re doing. The main difference is we’re also layering NLP extracted entities on top (107k named entities pulled from the corpus so far), so a lot of the relationship discovery is semi automated rather than purely manual entry. The system finds a name mentioned in a document, links it to a known entity, and creates the relationship with the source document as provenance.

Where we could definitely use help is exactly the kind of recursive querying you’re talking about. Right now the degrees of separation tool does basic shortest path traversal, but richer queries like “show me every person who was at location X within 30 days of person Y being there” or “find all entities that share two or more relationship types” would be a massive upgrade. That’s the kind of thing where someone who’s actually built these systems professionally could move the needle fast.

If you want to dig in, the data pipeline is open source: https://github.com/stonesalltheway1/Epstein-Pipeline

Would love to have you involved. Let me know what interests you most and we’ll figure out where to plug you in

1 Like

Hello Admin,
Sorry for the delay in response. Too many communications channels for me to juggle it seems :stuck_out_tongue:
I was designing a system to do something similar for mapping relationships amongst corporate decision makers on behalf of a major mergers and acquisitions company back in 2007. They cancelled the contract, but took me to a nice lunch and said “get out, no-one working here has ever heard it be this quiet. Something terrible is about to happen” and it did.

I did build what I claim to be the world’s first patient matching system that used fuzzy matching to mathematically calculate a likelihood that the non-responsive patient present was actually another patient in the system. I will see if I can help in some of the discussion.

I will take a look at things and see where I can plug in.

Were I financially secure this would be my only new job.

All the best,
SJ

Is there a 5013c?