Despite our efforts to build robust validation criteria on the front end, we often struggle with duplicate master entity profiles. Of particular note, our master person profiles are a mess.
Some of this is due to our inability to impose validation criteria at the entry point of person profiles in disparate systems that we source profiles for Mark43 ingestion into person profiles. Further, the contextual data elements for these profiles can often be incomplete or contradictory.
I have structured a few different data lake queries and an analytics dashboard to tease out merge candidates (we have more than 10,000 candidates based on complete match on first name, last name, and DOB). The ability to do this work within the analytics module is limited as the profile attributes (i.e., name, DOB, SSN, etc.) are often provided as contextual elements and require 1:1 complete parity of the merge criteria to identify them as candidates. Since the data for person profiles is often structured within the different analytic views (i.e., offense/incident persons, use of force subjects, I typically do this based upon the same first name, last name, and date of birth. However, if I have two profiles that should be merged (1. John Smith 01/01/2000 & 2. John Smyth 01/01/2000), I will not be able to detect their candidacy due to the different spellings of the last names. This is further exacerbated when you consider hyphenated last names and every way personnel can enter/ingest names.
What I have found is that mandatory data elements (i.e., person race) are often recorded as “unknown,” despite previous contacts with the subjects providing a known race. This is not exclusive to the race field, but it is an example. When I stumble upon duplicate master person profiles, I do what I can to merge up all the duplicates (sometimes there are 3, 4, or more duplicates). While this helps resolve the master person profile attributes, it does not resolve contextual data elements. To be clear, I agree with this logic and do not advocate for an update on the contextual information.
Where this presents an issue is within the analytics module and how the data is structured within the analytics views. The person attributes that are provided are context data elements derived from the associated reports, but we know this information is often incomplete or unreliable. Since they are contexted, they are related to an individual report_id within disparate analytic views (i.e., offense/incident, use of force, arrest, field contact, etc.). This means that in order to have a full accounting, a complex merge query needs to be created connecting all these views based upon a match on first, last and DOB (however this is unreliable and requires a lot of processing power to render results). This also does not provide orphan profiles (profiles that exist in the tenant, but have no association to a report) because the views are constrained to persons associated to reports only.
In order to create a workaround, I would like to propose that Mark43 create analytic views for ALL MASTER ENTITIES that provide the relevant data attributes that are conducive to analysis. That way when a master profile is updated (whether through a direct edit, a profile merge, or a subsequent report association), the information displayed for the profile can be homogenous. If the data is mapped into the module as discrete views providing the appropriate key fields (i.e., master_person_id, master_item_id, etc.), then we will have the ability to leverage the merge feature to join these consistent attributes into different looks. It would also be nice to configure the master_id fields to hyperlink directly to the profile details page so we don’t have to create a custom hyperlink dimension to enable consumers to easily navigate to an item.
Examples may include:
Master Person Profiles (Master ID, Names, DOB, Race, Ethnicity, Cautions, Current Home Address)
Master Item Profiles (Master ID, Current Item Status - Stolen/Recovered/etc., most recent description, serial number, color, etc.)
Master Firearm Profiles (master ID, make, model, caliber, finish, serial number, etc.)
Master Vehicle Profiles (master IS, VIN, License Plate State, Plate, colors, description, current status, etc.)
Does anyone else struggle with duplicate entity profiles? What work arounds do you impose to improve your ability to discern information?