+1(978)310-4246 credencewriters@gmail.com
  

Read the chapter carefully and then do the work in the doc that l upload. Please just answer the questions on the doc that l uplod.

Sample for RFM Analysis Module 4, Assignment #2
Cust_ID
TSLO
NM_ORD
DOLL_CR
ORDER
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
2
4
3
12
15
5
3
2
1
2
5
4
2
4
16
18
7
4
3
9
7
4
5
9
10
3
7
1
9
7
5
12
4
3
6
6
12
8
7
5
10
4
10
7
6
5
7
4
2
5
3
3
1
2
3
4
9
6
7
2
2
1
2
1
3
1
2
3
7
10
2
2
1
2
4
2
4
2
1
2
2
5
3
2
3
3
6
3
3
4
1
2
5
2
1
1
8
7
5
2
4
10
66
26
50
56
83
220
150
155
52
42
26
42
29
77
18
36
60
160
240
38
42
27
35
84
39
86
42
31
40
42
122
55
55
76
77
133
60
60
84
21
38
114
40
20
21
191
158
121
46
85
240
1
0
0
1
1
1
1
1
1
1
0
0
1
1
0
0
1
1
1
0
0
0
0
1
0
1
0
1
0
0
1
0
1
0
0
1
0
0
0
0
0
1
0
0
0
1
0
1
1
1
1
Page 1
Sample for RFM Analysis Module 4, Assignment #2
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
2
3
2
1
3
2
2
1
2
4
4
4
3
4
3
4
4
4
4
5
5
6
5
5
6
5
5
5
6
7
7
9
7
8
9
7
7
7
7
15
12
12
12
10
10
18
9
10
16
3
2
3
7
6
3
2
2
2
2
1
1
7
5
5
4
3
3
2
2
1
1
9
8
6
5
4
3
2
1
1
1
7
4
4
3
3
2
2
2
2
1
4
3
3
3
2
2
2
2
2
1
10
7
6
155
150
66
52
50
46
42
31
29
160
121
114
86
77
76
42
55
27
26
220
191
133
122
85
77
35
26
21
21
158
84
84
60
60
38
42
42
42
20
83
56
55
60
39
38
36
40
40
18
240
155
150
Page 2
0
0
0
1
0
0
1
0
0
0
1
1
0
1
0
1
0
1
0
0
1
0
0
0
1
0
0
0
0
1
0
0
1
0
1
0
0
0
0
0
1
0
1
0
1
0
0
0
0
1
1
1
Sample for RFM Analysis Module 4, Assignment #2
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
2
1
3
2
2
1
2
4
4
4
3
4
3
4
4
4
4
5
5
6
5
5
6
5
5
5
6
7
7
9
7
8
9
7
7
7
7
15
12
12
12
10
10
18
9
10
16
3
2
2
2
2
1
1
7
5
5
4
3
3
2
2
1
1
9
8
6
5
4
3
2
1
1
1
7
4
4
3
3
2
2
2
2
1
4
3
3
3
2
2
2
2
2
1
66
52
50
46
42
31
29
160
121
114
86
77
76
42
55
27
26
220
191
133
122
85
77
35
26
21
21
158
84
84
60
60
38
42
42
42
20
83
56
55
60
39
38
36
40
40
18
Page 3
1
0
0
1
1
1
0
1
0
0
1
1
0
0
0
0
1
1
0
1
1
0
0
0
0
0
1
1
1
1
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
EXERCISES – CHAPTER 10
1.
Below are 2 customer records as of 3/1/99. Score these two customers on the Mega Telecom response model displayed in Figure
10.3 of Chapter 10:
a) What are the regression scores for both customers?
b) Who is most likely to order the high-speed Internet service from Mega Telecom?
Customer
G. Verizon
B. Atlantic
2.
Age
26
45
Total #
Services
Ordered
Ever
5
2
Sept
1998
$67
$104
Monthly Dollars Paid
Oct
Nov
Dec
Jan
1998
1998
1998
1999
$75
$90
$125
$101
$107
$110
$220
$150
Outside
Questionnaire
– Do You Use
the Internet?
Feb
1999
$76
$98
Yes
No
Non-Basic Services Currently Active on
Long
Multiple
Toll-Free
Distance
Wireless
Lines
Number
Yes
Yes
Yes
No
Yes
No
No
No
Below is a regression model predicting who is likely to order a new gardening book. Comment on each variable used in
this model regarding issues of multicollinearity or significance. Note: The dependent variable used in developing this
model was the typical binary response indicator (1=order, 0=silent).
R Square
Observations
Intercept
TSLO
RATIO_PD/PR
NM_ORD
0_6M ORDER
GARDENING
0.097837658
10,000
Coefficients
0.245769845
-0.024839432
0.038928324
0.062244305
-0.084759345
0.029798843
P-value
0.003495834
0.049583433
0.074394232
0.004938432
0.048329232
0.145938432
Where TSLO = time since last order in months
Where RATIO_PD/PR = ratio of total products paid to total promotions
Where NM_ORD = total number of orders ever
Where 0_6M ORDER = 1 if customer placed an order in the past six months, 0 otherwise
Where GARDENING = 1 if customer answered yes to question “Do you enjoy gardening,” 0 otherwise
3.
A 25,000 sample of names from the ACME Direct database was test mailed for a new music series product offering. The test had
a response rate of 40% for the initial free shipment. To make transmittal of this sample easy, we have sampled down this file to
only 150 names out of the 25,000 tested. I will email the full 150 sample data set to you.
Below is a printout of what the Excel file emailed to you should look like for the first 25 customers out of the full 150
sample.
Cust_ID
TSLO
NM_ORD
DOLL_CR
ORDER
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
2
4
3
12
15
5
3
2
1
2
5
4
2
4
16
18
7
4
3
9
7
4
5
9
10
3
1
2
3
4
9
6
7
2
2
1
2
1
3
1
2
3
7
10
2
2
1
2
4
2
66
26
50
56
83
220
150
155
52
42
26
42
29
77
18
36
60
160
240
38
42
27
35
84
39
1
0
0
1
1
1
1
1
1
1
0
0
1
1
0
0
1
1
1
0
0
0
0
1
0
The legend for each variable is as follows:
Cust_ID = unique customer id number
TSLO = Elapsed time, in months, since last order
NM_ORD = Total number of orders since coming on file
DOLL_CR = Total dollars credited/spent since coming on file
ORDER = 1 means the customer ordered the music series product offering, 0 means they were silent (did not order)
Perform the following using Excel:
a) Using the full 150 sample data set, run a multiple regression model using all three variables simultaneously (TSLO,
DOLL_CR, and NM_ORD) as your predictors and using the order indicator (ORDER) as the dependent variable.
b) Examine the output. Do you see any problems with the coefficients that may be caused by multicollinearity with your
predictors? If so, run a correlation analysis to confirm. What do you see?
c) If there was a problem with one of the variables being correlated with another, rerun your model without the problem variable.
Does everything appear okay now with respect to the coefficients?
Multiple Regression Modeling
209
WXS@CAX
SAS
File Edit View Options Actions Help
HX
SAS Enterprise Miner – Example (Regression)
ROX
REGRESSION

Association
Variable Selection
Modify
Data SetAttributes
Transform Variables
Filter Others
Replacement
Y Clustering
A SOM/Kohonen
Model
Regression
Tree
Neural Network
User Defined Model
Ensemble
Assess
Assessment
– $ Score !
Reporter
Utility
Group Processing
Data Mining Database
SAS Code
C’Score
Control point
sh Subdiagram
os
YO
COMPARE
RESULTS
DEF INE
SAMPLE
CREATE
VAILDATION
CHAID TREE
NEURAL
NETWORK
Example Process Flow Diagram
Diagrams
Tools
Reports
Output – (Untitled
Log – United
Editor- Untitled
IM SAS Enterprise Miner…
Exhibit 10.16 Enterprise Miner Process Flow Diagram
for validation. Next, the analysis sample is fed into the regression node, tree
node, and neural network node. Last, the results of each modeling
technique are fed into the assessment node for comparison via validation
lift charts. No programming code is required. All code is built behind the
scenes upon the execution of this process flow.
On the basis of the results of the lift chart comparisons (see Exhibit
10.17), the analyst can go within, for example, the regression node and
fine-tune the options available. Once fine-tuned, the analyst then reexecutes
the process flow.
Data mining tools also offer graphical capabilities. For example,
Exhibit 10.18 shows the distribution of the variable “gender.” Two- and
three-dimensional views can also be created by the Enterprise Miner
showing how the orders distribute for each category of the predictor
variable.
The Market Miner is also built with a GUI and is point-and-click
operated. With the MarketMiner, you do not build a process flow diagram,
as in the Enterprise Miner, but rather select and save the various analysis
steps you desire in a project folder.
210
OPTIMAL DATABASE MARKETING
DOX
IS Lift Chart
Vertical Axis Value
%Response
15
XResponse
Captured Response
CLift Value
Profit
CROI
14+
13-
Cumulative
Non-Cumulative
12-
11
10-1
Bar Color for
OXCaptured Response
XResponse
Lift Value
Profit loss
RO
9
8-
Target Profile
7
Elrowse..
Apply
6-
5
4
3
10
20
30
40
70
80
90
100
50 60
Percentile
Tool Name
Baseline
Neural
Tree
Reg
Exhibit 10.17 Enterprise Miner Lift Chart Comparisons
Begin by selecting the source of the data via the “Data” tab, as shown
in the following screen print (Exhibit 10.19).
Once the data source is selected, click on the “Transforms” tab as shown
in Exhibit 10.20. In this area of MarketMiner, you create the holdout
sample for validation or training. In addition, this is where you perform the
recoding of your data prior to running any models.
Next, select the various analysis techniques you wish to employ by
clicking on the “Mining Agendas” tab as shown in Exhibit 10.21. In this
area, you have several options, including logistic regression analysis,
tree segmentations, and even cluster analysis as discussed in Chapter 8.
You can learn more about the MarketMiner software package at its
company web site, www.marketminer.com.
Remember, data mining tools do not provide a quick analytical solution.
But they do have the capability, if used properly, to help analysts be more
efficient and effective in their role by providing a suite of analytical tools
at their ready disposal.
Multiple Regression Modeling
211
Variable Histogram
COX
Percentage
55
50-
45
40 –
35
30-
25
20
15
10-
5
0
GENDER
Percentage-
2.75
53.6
Exhibit 10.18 Enterprise Miner Graphing Example
?
Databases e project folder
D
Data Transforms
Mining Agendas Analyses
Data
New Data
Delimited Text File Fixed Width Text File
Delimited Text File
Ted File to Import
Filename
Browse
Text File Options
Delimiter
First line contains variable names.
Tab
Comma
Character
Preview
L
+
Name
Next
Cancel
Exhibit 10.19
Defining the Analysis Data Set in Market Miner
212
OPTIMAL DATABASE MARKETING
LIVe Till te/
Boy
?
Database: e: project folder
@
Data
Transforms
Mining Agendas Analyses
Transforms
New Transform
Name
Type
Mapping
Feature Extraction
Sampling
Delete
Create
Cancel
Exhibit 10.20 Performing Subsampling and Recoding Techniques in MarketMiner
File Special Help
BOY ?
Database: e project folder
Data Transforms
@
E Mining Agendas. Analyses
Mining Agendas
D
Mining Agenda: Mining Agenda 1
Data Source variables Strategies – View & Apply
Strategies
Available Strategies
Selected Strategies
StatNet
Logistic Regression
Decision Tree (C4.5)
K-Nearest Neighbor
StatNet Selected Inputs
Define Segments
None
Number of Segments:
10
Strategy Selection Criteria
Percent Correct-0.5 Threshold
Modify Parameter
Automatically start agenda
Delete
Train
Close
Exhibit 10.21 Selecting the Analysis Techniques to Employ in MarketMiner
Multiple Regression Modeling
213
Ensuring That Your Model Holds Up in Rollout
The success or failure of a model has nothing to do with whether it is built
on an internal or external file or if it was built to predict response or
payment. It has more to do with following certain rules and guidelines of
model building, which, if not adhered to, you will fail, regardless of the
model type. It is that simple.
You can use six guidelines, some of which we have already discussed, to
ensure stability when building any model. We call these guidelines
Modeling with MUSCLE. Adhering to the MUSCLE guidelines will
guarantee your models to be robust and hold up when applied in rollout.
Materials Treatment Consistency
Between Test and Rollout
If
you drastically change your offer or creative approach between test and
rollout, the responders identified by your model will no longer be valid in
rollout. For example, a model built to predict which customers are most likely
to order a product via a soft risk-free offer will differ from a model built for
the same product but with a hard offer. The model built with the soft offer will
incorporate some variables that identify marginally performing names that in
all probability will not be attracted to a hard offer. Using this model to select
names to promote with a hard offer is therefore not advised. The model will
be selecting the wrong types of names to be promoted with a hard offer.
Changes in creative approach can also cause a subtle shift in the composition
of responders. When you build a response model, you are not only modeling
responders to your product but also the product offer (price, terms, etc.).
Universe Application
When you build a response model, you are building it on a sample taken
from a particular universe of customers. Therefore, your forecast based on
this response model will only be valid when applied to the same universe of
customers the model was built on. For example, you cannot build a model
on an entire universe of customers, apply the model to only those customers
within that universe who are over the age of 30, and expect the same fore-
casted gains. The model will not hold up as originally forecasted.
To guarantee that your universe definitions are consistent between test
and rollout, always check to ensure that the customers your programmer or
list shop is pulling meet your specifications. To verify consistency between
the names defined at test and at rollout, compare the distribution of regres-
sion model scores. For example, if 10% of the test names the model was
built on have a score above 0.2576, you should expect close to the same
214
OPTIMAL DATABASE MARKETING
percentage of names scoring above 0.2576 at rollout. If not, you probably
have definitional inconsistencies.
In addition, be aware that your outside list universes may change between
test and rollout without your knowledge. This is caused by changes in the
way list owners build their file. If the names on a list you regularly rent were
obtained by the list owner in a different way this month versus last month
(e.g., a new offer), you can expect the composition of these names to change,
resulting in a difference in response to your promotions. Because you have
no say in how list owners obtain their customers via the “offer,” the best you
can do is to stay informed regarding any changes in the list owners’ promo-
tional strategies. If you notice a major change in a list owner’s offer, you may
want to consider rebuilding a new response model or, at the very least,
adjusting your forecast. This also applies to compiled lists.
Split the Sample for Validation
Before performing your analysis, a portion of the sample should be set
aside and used to validate the findings, as previously discussed in Chapter 6.
Correlation Analysis
When you build a regression model, each predictor variable used in the
model must be independent of one another, as previously discussed in this
chapter.
Lift and Freeze Customer Attributes
at Point-in-Time of the Promotion
For an analysis of past promotional behavior to be valid, the customer
characteristics residing on the sample must reflect the customers’ status at
the time they were promoted, as previously discussed in Chapter 6.
Examine the p Values
All variables used in the final regression model must be significant, as
previously discussed in this chapter.
Strict adherence to these guidelines will help increase the odds that your
model will hold up as forecasted in rollout. Of course, this assumes that
occurrences outside your control do not affect your ability to achieve the fore-
casted gains. What do we mean by that? Consider the following example: Your
Multiple Regression Modeling
215
final response model identified Florida residents as a prime target area, and as
such had a high positive coefficient/weight associated with this region.
A devastating hurricane in Florida prior to delivery of the promotion will
cause your model to partially fail in rollout. These Florida names will no
longer perform as expected. Luckily, these exceptions are far and few between.
The bottom line is, if your model does not hold up as forecasted, odds
are that you did something wrong. Either the model built used unstable
predictors with high p values, the universe changed between test and
rollout, some of the predictor variables were highly correlated with one
another, the characteristics upon which the model was based did not
reflect the customer’s status at the time of the promotion, or you lacked
a validation sample upon which to check your model.
Chapter Summary
Multiple regression is a more sophisticated technique that database
marketers use to predict customer response. In addition to predicting
customer response to an order, multiple regression modeling can also be
used to predict other customer behavior such as payment, returning a
product, or renewing a subscription. Database marketers have to apply mul-
tiple regression modeling techniques in a specific way to optimize the results
and avoid errors. Preparation of the data prior to analysis is an important
step. Variables must be in a proper form for the regression analysis to yield
meaningful results. Often variables have to be recoded for the analysis. In
addition, certain assumptions have to be met about the data. For example,
strong relationships between predictor variables (multicollinearity) may lead
to an unstable model. Analysts attempt to eliminate multicollinearity from
the model to increase stability. The statistical significance of the model also
has to be examined. The significance reflects the degree of confidence that a
manager can place in the ability of the model to predict behavior. Modeling
on external files is also discussed. This allows a marketer to predict the
response of a list of prospects. This chapter also explores other regression
modeling methods, neural nets, and data mining programs. For experienced
analysts, these techniques can provide additional ways to examine the data
to help develop the most stable and effective predictive model. The chapter
concludes with six guidelines (MUSCLE) for model development to ensure
that the model holds up in rollout.
Review Questions
1. What types of customer behavior can be predicted with multiple
regression modeling?
2. Explain how data are prepared prior to multiple regression analysis.
216
OPTIMAL DATABASE MARKETING
3. How is multicollinearity detected and eliminated from a regression
model?
4. Discuss how variable significance is interpreted in regression output.
5. Discuss how data mining tools can help the model building process.
What are the possible problems with using these tools?
6. If you are building a payment model predicting which customers
are most likely to pay for a product, describe the sample composi-
tion.
7. What are the MUSCLE guidelines attempting to achieve?
Summarize the key issues of the guidelines.
Notes
1. How VIF scores are calculated is not discussed in this book. It is covered in most
intermediate or advanced statistics books.
2. The derivation of the p value and its statistical properties are not discussed but
can be found in any intermediate statistics book.

Purchase answer to see full
attachment

error: Content is protected !!