- What is the difference between Supervised and Unsupervised data mining?
- Prediction problems where the variables have numeric values are most accurately defined as __.
- Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes?
a) associations
b) visualization
c) classification
d) clustering
- Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?
- The basic idea behind a(n) __ is that it recursively divides a training set until each division consists entirely or primarily of examples from one class.
- Because of its successful application to retail business problems, association rule mining is commonly called __.
You’ve been hired by the New York Mets baseball team as a junior data scientist to work on an array of different marketing projects. Management is currently pushing your group to assist in a Season Ticket purchasing campaign. From past analysis, it is known that the likelihood of a customer purchasing season tickets past opening day decreases approximately 30%. Therefore, it is critical to get customers to purchase these season ticket packages now.
The best proven method (based on historical data) to convert customers to purchase season tickets is via individual phone calls from the Mets season ticket Sales team. Problem is, there are only about 50 salespeople and approximately 10,000 prospects in your CRM system who have been identified as potential season ticket customers. Note, the max individual list a single salesperson can maintain is 100 people, so you will need to determine a prioritized list of around 500 people out of the 10,000. There is a bottleneck at hand and you are the key to unlock this puzzle.
Data
Data at your disposal from the NY Mets Data Warehouse
• Mets Customer CRM Database
o Basic CRM data for existing customers and prospects (i.e., Names, Income, Family and Dependents, Age Band, whether or not someone has purchased tickets in the past, etc.)
• LinkedIn
o Connection information from LinkedIn (i.e. who is connected with who, Names, Job Titles, etc.)
• Stubhub.com
o Prior games purchased and attended by prospective customers (i.e. Names, Games Attended, Ticket Price of past Games, Concessions purchased at the games, etc.)
• Mets.com
o Transactional purchase data from for merchandise (i.e., Hats, T-shirt, Jerseys, etc.)
Questions
1) From the information in the dataset above, would you be able to implement a supervised model? If so, what would be your Label variable.
2) Explain how you would use at least 3 different data mining techniques, and in which way, to address this problem.