Dev Opportunities from Customer Data Chaos

Over the last decade, enterprise IT managers have spent millions on CRM (customer relationship management) projects. But, now that the frenzied CRM spending is slowing, many companies are waking up to find they still don't have an easy-to-manage, unified customer data system. See why, even in a budget-constrained era, this customer data chaos is creating opportunities for Java, .NET and DBA developers that know the customer data territory.

Tags: Customer, Master, Reference Data, Reliability, Enterprise, Customer Data Integration, Management,

Siperian Inc.

Over the last decade, enterprise IT managers have spent millions on CRM (customer relationship management) projects.

But, now that the frenzied CRM spending is slowing, many companies are waking up to find they still don't have an easy-to-manage, unified customer data system. Even in a budget-constrained era, this customer data chaos is creating opportunities for Java, .NET and DBA developers that know the customer data territory.

Mapping The Territory

The basis for dev/DBA opportunities in CRM arises from the following: Enterprise spending on CRM has given birth to a wide-array of customer-facing "touch points" that connect company with customer and/or partner. These touch points include: multiple web sites, service call centers, targeted sales teams, and even demographic email marketing systems.

These "touch points" have created a doubled-edge sword for many enterprises. While they provide multiple avenues to linking company and customer, they also have strewn the "single view" of customer data across multiple databases and departments.

[In fact, in a large enterprise, it is not unusual to have a single customer's data (history, profile, permissions, etc.) stored across more than 30 different data sources. Even worse, those multiple data sources may not agree]

Defining the Opportunity:

'Mastering' Customer Data

Many company CxOs instinctively feel that "integration" holds the answer to unifying disparate customer data. Top integration techniques include: EAI, Java Connector Architectures, middleware, SQL-to-XML conversions, and even Java or .NET-based web services.

But, in their rush to unify their data, there is a pre-integration step many IT shops overlook. An accurate and reliable foundation for customer data integration projects should come first.

Customer reference data uniquely identifies a customer across different applications and sources is commonly duplicated and is often in conflict across disparate systems. Conflicts in CRD are usually the root cause of pervasive data quality and reliability problems, leaving companies to backpedal on the very initiatives they hoped would drive value. But all is not lost. While an unquestionable master reference data source remains a key challenge, it is never too late to enter into the race.

Current approaches to building a reliable data foundation force-fit technologies, such as data cleansing and ETL (extract-transform-load) tools that are not designed to address the core problem: poor lifecycle management of customer reference data.

Data quality tools are a necessary step to standardize and cleanse dirty data but are inadequate in maintaining data reliability in a business context. For instance, a customer address may be cleansed and verified as a valid postal address, but still may be obsolete or inappropriate (e.g., the address might be the shipping address for the previous 5 years, but it is not current).

In contrast, data reliability requires the capture of business meta-data and the creation of business rules that determine the validity of customer reference information in business context. Further, building a data warehouse using data cleansing and ETL tools can be difficult because such data reliability rules have to be custom coded. The result is an inflexible solution that is laborious to build, hard to maintain and difficult to extend to new data sources over time.

What is needed is a centralized repository that consolidates all the customer reference data-along with the cross-references to source systems-into a master reference store. This store then becomes the best source of truth of customer profile information for all operational and analytical applications. More importantly, any solution that builds such a repository needs to have the ability to manage customer reference data through out its lifecycle.

With a foundation of reliable master reference data, companies can identify customers across systems correctly and combine their transaction data accurately for use by customer-facing employees. The following outlines a 4-step life-cycle process for architects, developers and DBA to increase the reliability and manageability of your customer data.

Implementing the Fix:

Using a Lifecycle Approach to

Customer Master Reference Data

The four (4) lifecycle stages of customer master reference data require these critical capabilities:

  • Consolidate: To effectively consolidate reference data, the data architect must begin a multi-phase project to consolidate the customer data, which includes: (building a composite data model; data cleansing; and creating a master records set.

      1. The composite data model should encapsulate all the source systems that may contribute data to the master store (such as front office, back office, external, custom, or legacy systems). Such a model should be easy to create (ideally, template driven) and must be able to import industry standard model or a pre-existing best practice model. While it may be tempting to standardize on an application-specific data model of a CRM vendor (such as Siebel or SAP), this is often a choice that restricts extensibility since these vendor models are pre-fixed and do not easily encapsulate data structures of external third party or legacy sources.

      2. Next, the data cleansing tasks (such as name standardization and address validation) need to be integral to the process of loading and building the master store. The stand-alone data quality tools are usually sufficient for cleansing and matching but lack tight integration with the critical process of merging records. In most custom solutions today, the matching rules that identify duplicate records for the same customer are custom programmed making it hard for the IT team to revise or manage these rules. In addition, the matched records need to be either merged automatically or queued for manual inspection and there is no user interface today for such exception handling.

      3. When merging multiple matched records into a master record, it is important to do so at cell level. Today, these merge rules are coded or applied at entire record level (e.g. "always take the email from the web site, phone number from the call center system, and address from the billing system"). While this approach greatly limits the extensibility and scalability of the system it has other short-comings including: the rules have no context of time, they do not encapsulate how old a piece of information is, and/or they do not have the ability to look at attributes based upon the syntax.

    To effectively create a reliable customer master, the solution must have a framework that encapsulates all the variables of data reliability, for each source system, through functions, algorithms, and rules which can be specified through a graphical interface without requiring any custom coding. Without such a framework for consolidation, each source added will result in exponential time in programming and rewriting rules to bring in the new source into the existing code base.

  • Manage: The system should separate out the data administration and data content management tasks. For instance, the IT team should manage rules, system configuration and other administrative tasks, while the Data Steward (i.e. the person who understands the data content) should handle the data exception management tasks that require both business context and knowledge of data. As less technical experts, Data Stewards need to operate from a rich and robust graphical user interface. As part of the lifecycle management of data, they also need to be able to audit and research data and drill down to the supporting metadata including lineage and history.

  • Share: The system must provide views of the customer reference data to all consuming applications and must communicate any changes in the master data to all affected applications. A services-oriented architecture that supports multiple modes of data transfer (batch, scheduled, near real-time, or real-time) is an essential part of the solution because both real-time operational and batch-based analytical systems require the master data. Besides a rich set of web service-based application programming interfaces for calling the master data, the repository needs to have the ability to manage unique master record keys and associated cross-reference keys that link back to source systems.

  • Extend: A scaleable solution with low cost of ownership for enterprises requires the ability to efficiently extend the solution to other data sources or eliminate existing data sources (as applications retire or relevant capabilities are phased out) without writing code. The tools provided should enable flexibility to extend the composite data model while applying previously determined match and merge rules to each new system that is added. Ideally, this should not require any custom code development.

  • Conclusion

    Bottom line: An enterprise's IT staff need to first learn to walk before they run when it comes to implementing successful customer data integration efforts. There are opportunities to architects, developers and DBAs that can help their enterprise put in place a foundation to ensure the integrity of customer reference data across all applications because enterprises can maximize the return on their CRM investment and be on path to realize the vision of 360º view of every customer.

    So, before embarking on time-consuming and expensive customer data integration projects, an enterprise IT staff should take the time to ensure that the data they integrate is reliable and trustworthy.

    Ken Hoang is the Founder and CTO at Siperian Inc., a customer data integration solution provider. Siperian's offering enables enterprises to cost-effectively provide trustworthy customer master data to any system or business user.