Predictive Modeling: New Techniques Will Make It Faster and DEEPER

August 17, 2010

Before a bank offers you a mortgage re-fi, a national charity hits you up for a "gift" or a credit card company dangles a low-interest offer before your eyes, some information-systems worker has probably pegged you as a promising prospect. He probably used predictive modeling to do it. And it wasn't a quick, easy task.

Predictive modeling, the computer-supported process of forecasting things like customer behavior or creditworthiness, is cumbersome enough that there are patented methods outlining the steps. It certainly could be more efficient, and it soon will be, because vendors and practitioners now are forging ways to push some functions out of modeling software tools and back into the company databases and warehouses where customer information resides.

This tight coupling of the modeling software and the databases themselves is called "in-database analytics," and it will streamline the front end of predictive modeling. But, what happens when models spawn marketing or other business initiatives? Can companies use modeling methods to make marketing and other business processes even more effective mid- or post-campaign?

Yes, they can, according to information systems professors Michael Goul and Sule Balkan. The scholars are calling their modeling deployment methodology "DEEPER," an acronym representing the steps taken to "design" a campaign based on predictive models, "embed" the offer or analytics into business processes, "empower" employees to support the campaign, measure "performance," "evaluate" results and "retarget" future efforts.

Like front-end predictive modeling techniques, the DEEPER methodology will be all the more efficient and powerful when companies use in-database analytics. Together, in-database analytics and DEEPER could be game-changers for corporate marketing campaign performance and other applications of predictive models.

Why "in" beats "out"

Several methodologies outline the traditional steps of predictive modeling. Among them, you'll find the SAS Institute's SEMMA, which stands for sample, explore, modify, model and assess. Two others are Knowledge Discovery in Databases (KDD) and Cross-Industry Standard Process for Data Mining, which is known as CRISP-DM.

All of these methodologies help business intelligence (BI) analysts construct and deploy models that guide marketing campaigns, aid in fraud detection, support credit scoring and more. And, all of them share some unpleasant traits: they have several steps that are time-consuming, repetitive and laborious, say Balkan and Goul.

Part of the problem stems from the lack of integration between the database being mined and the software tools being used to build the predictive models. "We have to pull things out of the database and put them into these tool suites that, until recently, have been totally separate," Goul explains. The tool suites, he continues, have limitations on how much data they can accommodate, which is why the SEMMA process begins with data sampling.

"Data-set creation takes a lot of time," Balkan notes. "In order to have a predictive model, you must have an event that occurred in the past, such as a transaction. One customer may respond by phone, another by email and a third by direct mail. You may need to go through several databases and compile the data."

Modelers also put time into repetitive naming conventions for these data, missing-value imputations, data-validation tasks, data-merging efforts and the pesky process of culling outliers -- customers who might throw off the accuracy of the model because they exhibit such wildly different behavior from the norm. Plus, modelers have to transform these data into formats the modeling tools can use.

As Balkan explains, different model variables or customer characteristics, such as age and gender, must be transformed or translated into modeling-tool language. Transformation involves a series of easy tasks, she says, "but when you have a team, everyone may be transforming the same variable in different ways. You lose track of which is the best."

Plus, such chores are less efficient than they could be. "Since the modeling tools are not attached or logically interconnected with the database, those activities are sort of kludgy," says Goul. Kludgy is IS jargon that builds off the word kludge: a cobbled-together workaround that seems clever but actually is inelegant.

And, these kludgy activities are repetitive. "How do we get new data to build the next version of the model? We have to go through the whole process again," Goul adds.

The truth, the "hole" truth

Meanwhile, each time a modeler pulls a subset of data from the database, he or she is taking a snapshot in time. However, with each customer transaction, the picture changes, which means that any given sample may eventually contain holes.

Balkan spent 10 years building models for a major financial company, and she recalls members of her team reporting different numbers when asked questions along the lines of how many people had responded to a given credit-card offering. "It happens all the time," she says. "The boss wants one report and gets several people sending in numbers that don't match. Everyone may understand a question differently or pull data in different ways."

She continues: "In-database analytics allows you to have a single source of truth." That means what you see in the model is what actually exists in the source data, because the model is performing certain steps of its process right inside that database.

It's faster, too. Goul says the integration of the modeling tool and database facilitates more rapid modeling because it takes advantage of the parallel-processing capabilities inherent in database systems. Parallel processing occurs when a large task is broken up into smaller pieces that are performed concurrently, which gets the whole job done faster. For example, a retail store is using parallel processing when there are multiple clerks manning cash registers to ring up customers' purchases.

With in-database analytics, "You get the ability of the database to do its parallel-processing magic on creating data sorts, and you leverage your modeling tool suite to move the whole process along faster," Goul says.

Not surprisingly, many technology vendors are focused now on delivering in-database modeling techniques. Goul and Balkan go a step further. While everyone else has been focused on building a better model through in-database analytics, the two scholars realized this new, integrated functionality can support processes that can and probably should occur in the heat of campaigns, to dramatically improve their effectiveness.


"Building a model is not going to create value for a company unless it is deployed," Balkan says. "And deployment has a lot of steps. When we looked at the literature, examining many sources, we found all the articles stopped with instructions to 'deploy the model.'"

So, Balkan and Goul created DEEPER, their methodology designed to help model builders continue using their systems savvy for start-to-end campaign management.

As Goul explains, DEEPER addresses the steps of campaign management that start once the model is built. "How will we deploy this model? What should it enable to help us be more competitive, make more money or treat our customers better?" he asks. There are a lot of design issues involved in this, and that's the first step of the DEEPER process, which Goul and Balkan call "design." They're talking about the design of a campaign, the offers, the up-sell strategies and similar marketing choices.

Once campaign design is done, managers need to "embed" the campaign into company processes. For example, this may involve training employees on a new product offering or making sure call-center representatives see some kind of up-sell prompt on their computer screens when customers with certain characteristics are on the line.

Then, analysts need to evaluate their campaign. "Do we have the right model? Do we build a new one? Should we shut down the campaign?" Those are the kinds of questions managers will be seeking to answer in the evaluation phase of the DEEPER process, Goul says.

The "P" in DEEPER stands for performance measurement. According to Balkan, none of the literature on predictive modeling incorporates this step which, for example, might involve response tracking in a direct-mail campaign. "You have to do it," she says. "You may find the campaign works really well on males age 35-45. Then what?" How do you capitalize on such knowledge quickly?

According to the DEEPER methodology, you do it through the next step -- retargeting -- where you incorporate your campaign results and findings into the design and development of an all-new campaign.

Balkan and Goul maintain that BI professionals are doing a great job using methods like SEMMA or CRISP-DM to guide their model building, but no one has targeted the campaign deployment side of things. "It's another role for in-database analytics," Goul explains.

Already, he and Balkan have been invited to conferences to spread the DEEPER gospel. They also plan to roll it out in business-school curricula, beginning with "Introduction to Information Systems." Those "intro" students may not be sophisticated statisticians and model-builders yet, "but the business model for how we get models deployed is something we have to get into our curriculum as fast as possible," Goul says.

"Predictive analytics is how corporations keep inventories down, reduce fraud, target customers and more," he continues. "We see DEEPER as having a direct link to what undergraduates need to learn to support campaign-driven changes in business processes."

Bottom Line:

  • Predictive modeling helps companies forecast customer behavior, such as who might be the best prospects for certain products and services.
  • Up until recently, predictive models have been built in tools that are completely separate from the databases containing the customer information upon which the models are built.
  • Integration of the modeling and database software is beginning to take place, leading to "in-database analytics."
  • In-database analytics will help companies with model building, as well as with campaign deployment if campaign developers and managers use DEEPER, a new methodology developed by two W. P. Carey School professors.