Data mining six degrees of separation
By: Hari Mailvaganam
Figure 1. Overview of the Sphere of Influence of Relationship Mining Software
I (Hary) recently read an article in the Wall Street Journal (registration required) about the introduction of Relationship Mining software in companies. USA Today also has a similar article. These software applications help companies ‘mine’ workers’ external personal relationships for business prospects.
The goal of these application is to scan the company’s repositories of contact information – such as address books, electronic calendars, e-mail correspondence, instant message contact lists. Upon scanning the contact information the software build maps of all the relationships found in the repositories.
There is no doubt on the potential usefulness of such a product. Most business transactions are initiated through personal introductions and relationships developed over time. Sales cycles can be shorten if a strong relationship is found in a business prospect.
After reading the WSJ article, I started to think on the technical viabilities of relationship mining software and evaluate how does this involve data mining, if at all.
Most of the challenges of ensuring successful relationship mining in an organization involves cooperation from employees to ensure that contact information are stored in the correct format and synchronized frequently with portable devices.
Is there data mining involved?
My first impressions were that there aren’t any of the better known data mining algorithms running in most of the commercial relationship mining software currently available. The old adage of data mining cannot be forgotten – Data mining is pattern discovery of data. Data patterns are central to the discovery of relationships and the relevancy of the relationships.
With a little tweaking, classification and clustering algorithms will be suitable for relationship mining. To test the idea, I created a version using the Microsoft Clustering algorithm provided by SQL Server 2000. The first part of the process was extracting the contact information from Microsoft Outlook and building a data mining model. I exported the contact information from Outlook to a text file with comma separated values and ran a script to import the data into a SQL 2000 database.
Once the data was imported to SQL Server 2000, it was fairly straight forward to build the data mining model. Running clustering passes through the contact information proved feasible once the data mining model was created. For a commercial product, a data visualization layer will sit above the classification results.
Aside from data extraction and cleansing process, the other labor intensive process in relationship mining is the setting up of context relevancy. Having a contact who is the CEO of a business prospect may be more valuable than a contact that is a Data Administrator for some scenarios. However if the relationship miner is intending to sell storage area networks, the Data Administrator contact may be more valuable.
Figure 2. Simulated Data Visualization of Relationship Mining Search for ‘Steve Ballmer’
There are a number of challenges to setting up the results. This can be illustrated with an example of a search for “Steve Ballmer, CEO of Microsoft” in the relationship mining software.
The value of relationship mining can be a useful tool not only in sales and marketing but also for law enforcement and science research. However there is a threshold barrier under which relationship mining would be not cost effective.
In an organization, of say 15 to 20 employees, it would be more efficient for the marketing manager to e-mail employees asking them if they had a contact for a business prospect that he is working on.
_uacct = “UA-1134011-1”;
Source: Data Warehousing Review