**GCU MIS660 Assignments **

** GCU MIS660 Week 1 Assignment **

**TOPIC 1 AGGREGATING DATA**

The purpose of this assignment is to use a spreadsheet to create a visual representation of a data set.

For this assignment, you will use the “Heights” dataset. In the dataset, the heights (in mm) of n = 199 married couples are recorded. The data comes from a random sample from the much larger population of married couples. Complete each of the steps below to create a visual representation of the dataset.

*Part 1:*

Using Excel functions, calculate the following summary values for each of the three variables:

- Minimum
- First quartile
- Second quartile (Median)
- Third quartile
- Maximum
- Mean
- Range
- Sample standard deviation
- Sample variance
- Coefficient of variation

*Part 2:*

Address each of the following questions in a written Word document.

- On average, are husbands or wives taller? What is the average difference in millimeters between the two genders? Explain your answer.
- How would you interpret the median heights?
- Compare the means and the medians for each dataset. What initial conclusions can be made here regarding the “contour” of each dataset?
- Compare the standard deviation values. Which dataset (husbands or wives) has the most dispersion? What does your conclusion suggest?
- Given the answers in question 1, compare the variability of heights between husbands and wives. Which partner type is more likely to have extremely tall individuals (outliers)?
- Interpret the % coefficient of variation.

*Part 3:*

Your manager has requested some additional information from you regarding the data. Specifically, you have been asked to calculate the differences between “Male Heights” and “Female Heights.” Your manager is only interested in married couples in which the husbands are taller than their wives. Repeat the analyses requested in Part 1 for this new dataset. What conclusions can be drawn here? Include discussion about whether outliers exist in this dataset.

APA format is not required, but solid academic writing is expected.

This assignment uses a grading rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.

You are not required to submit this assignment to LopesWrite.

**GCU MIS660 Week 2 Assignment **

**TOPIC 2 DATA MANIPULATION**

The purpose of this assignment is to use spreadsheet capabilities to perform data manipulation and to explain the process used in the handling of the data.

For this assignment, you will use the “Claims” dataset. In the dataset, the claims data for n = 608 people are recorded. The data derive from a random sample of females diagnosed with ischemic heart disease over 24 months (see Exercise 7.27 in the textbook).

Instead of using urgent care centers, some people rely on the Emergency Room (ER) to address most, if not all, of their medical needs. In fact, someone who has three or more ER visits within 24 months is considered a high ER user. Complete the steps below to execute this assignment.

- Using the dataset and Excel, create a new column titled “High_ER_User” with “Yes” if three or more ER visits; otherwise “No.”
- Duration is measured in days, but 30-day intervals are more appropriate for most reporting purposes. Using Excel, create a new column titled “Duration_Months” by converting the duration into 30-day intervals.
- Many times complications and comorbidities are rare; therefore, these two negative events are summed together. Using Excel, create a new column titled “Comps_Comorbs” by adding complications with comorbidities.
- Many times age is grouped in 10-year intervals. Using Excel’s VLOOKUP function, create a new column titled “Age_Group” with grouped ages of “21-30 yrs,” “31-40 yrs,” and so on for 10-year intervals. The last age group would be “61-70 yrs.” Use a tab titled “Age_Groups” for this task.

Next you will create a pivot table with the data and execute the following (refer to the examples in the resource “Data Manipulation Screenshots”).

- Use “High_ER_User” as a filter to obtain two filtered views of the pivot table.
- Summarize the data to get counts of claims, sum of claims and months, and average of procedures, prescribed drugs, ER visits, and complications/comorbidities.
- Add a calculated field titled “Claims PM” to the pivot table. This calculated field is the sum of claims divided by the sum of duration months and measures the average claim amount per month (PM).

APA format is not required, but solid academic writing is expected.

This assignment uses a grading rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.

You are not required to submit this assignment to LopesWrite.

**GCU MIS660 Week 3 Assignment**

**TOPIC 3 VISUAL REPRESENTATION OF DATA**

The purpose of this assignment is to use pivot tables and pivot charts to aggregate data and to explain the process used for data aggregation.

For this assignment, you will use the “Claims 2” dataset. Use Excel pivot tables and pivot charts for this exercise.

*Part 1:*

Create a dashboard describing the data by age group (e.g., 21-30 yrs, 31-40 yrs, 41-50 yrs, 51-60 yrs, and 61-70 yrs). The dashboard should include the graphs and charts listed in the locations described. The dashboard should be a separate tab in Excel that only includes the five items below. A sample layout is provided below the dashboard description.

**Top Left:**Bar graph showing the average number of ER visits for each of the five age groups. Show the actual average values above each bar.**Middle Left:**Bar graph showing the average number of procedures for each of the five age groups. Show the actual average values above each bar.**Bottom Left:**Bar graph showing the average claim cost for each of the five age groups. Show the actual average values above each bar.**Top Right:**Pie chart showing the percent of the total sum of all claim costs for each of the five age groups. Show the actual percent values of each slice.**Bottom Right:**Line graph showing the percent of each age group that has one or more ER visits. Show the actual percent values of each group. To create this chart, first create a new calculated column, named “Has ER Visit,” that is equal to 1 when the patient has had one or more ER visits; otherwise 0. HINT: The average of a 0-1 column is a percent. Refer to the example in the resource “Visual Representation of Data Screenshot: Preview of the Excel Dashboard.”

*Part 2:*

Interpret the dashboard and the story it is attempting to tell users by writing a 250-word summary that clearly describes the merits of each of the charts used on the dashboard. For example, discuss why a pie chart might be more appropriate than a bar graph for highlighting the information you want key stakeholders to obtain by studying that content on the dashboard. Include specific discussion about why each specific chart is used to illustrate the data it represents.

APA format is not required, but solid academic writing is expected.

This assignment uses a grading rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.

You are not required to submit this assignment to LopesWrite.

**GCU MIS660 Week 4 Assignment**

**TOPIC 4 DATA VISUALIZATION WITH TABLEAU**

The purpose of this assignment is to use data visualization tools to aggregate and depict data and to interpret the data visualization results.

For this assignment, you will use the “Claims 2” dataset. You will use Tableau to replicate the dashboard you created in the Topic 3 assignment. In addition, you will compare and contrast the Excel and Tableau software.

**Part 1:**

Create a dashboard describing the data by age group (e.g., 21-30 yrs, 31-40 yrs, 41-50 yrs, 51-60 yrs, and 61-70 yrs). The dashboard should include the graphs and charts listed in the locations described. The dashboard should be submitted as a Tableau file. A sample layout is provided in the resource, “Data Visualization With Tableau Screenshot: Preview of the Tableau Dashboard.”

**Top Left:**Bar graph showing the average number of ER visits for each of the five age groups. Show the actual average values above each bar.**Middle Left:**Bar graph showing the average number of procedures for each of the five age groups. Show the actual average values above each bar.**Bottom Left:**Bar graph showing the average claim cost for each of the five age groups. Show the actual average values above each bar.**Top Right:**Pie chart showing the percent of the total sum of all claim costs for each of the five age groups. Show the actual percent values of each slice.**Bottom Right:**Line graph showing the percent of each age group that has one or more ER visits. Show the actual percent values of each group. To create this chart, first create a new calculated column, named “Has ER Visit,” that is equal to 1 when the patient has had one or more ER visits; otherwise 0. HINT: The average of a 0-1 column is a percent.

*Part 2:*

In 250 words, compare and contrast the use of Excel and Tableau in data visualization. Include specific discussion about the following in your summary.

- Software ease of use.
- Software visualization capabilities.
- Software limitations.
- Discussion of when each of these software programs is most appropriate for use.

APA format is not required, but solid academic writing is expected.

You are not required to submit this assignment to LopesWrite.

**GCU MIS660 Week 5 Assignment**

**TOPIC 5 BENCHMARK – TELLING THE ANALYTICS STORY**

The purpose of this assignment is to create a data story and communicate findings to key stakeholders.

*Part 1:*

For this assignment, you will use the “Arizona Incomes by Zip Code” dataset. You will use Tableau to create a data story that illustrates the median household incomes of Arizona residents. The data provided includes all zip codes in Arizona. Each record is a unique zip code. The data includes the following columns:

**Zip_Code:**An Arizona zip code.**Metro_Area:**Whether or not the zip code is within the Phoenix-metro area.**City:**The city name of the zip code.**Median_Income:**The median household income of each zip code based on 5-year estimates (2010-2014) from the U.S. Census Bureau.

The marketing manager has asked you to analyze income data for Arizona residents so that leaders in his department can determine the company’s advertising strategy. The marketing manager intends to share this data with other decisions makers and the marketing department staff so that everyone has a thorough understanding of how the income information can be used to determine specific target markets in upcoming advertising campaigns. Because most of these individuals do not have a strong understanding of analytics, they must be able gain the information listed below from studying the charts. In the data story, use visualizations, heat maps, boxplots, etc. to describe the following:

- How do incomes differ across zip codes within the Phoenix-metro area (using geo-mapping)?
- What is the relative difference in incomes across zip codes within the Phoenix-metro area (use a heat map)?
- What are the distribution of incomes across Arizona?
- How do incomes within the Phoenix metro area differ from those outside of that area?

*Part 2:*

Demonstrate the ability to communicate the analytics story to key stakeholders, including the marketing manager, by creating a 6-10 slide PowerPoint presentation (with speaker notes for each slide). Use the charts generated in Tableau to illustrate the data story as it relates to the income of Arizona residents. The slides and speaker notes should address the following for each chart presented.

- What information is the chart providing to stakeholders?
- Why is the information in the chart important to key stakeholders?
- How can this information be used in making decisions about how marketing dollars can be allocated?

Refer to the resource, “Creating Effective PowerPoint Presentations,” located in the Student Success Center, for additional guidance on completing this assignment in the appropriate style.

While APA format is not required for the body of this assignment solid academic writing is expected, and documentation of sources should be presented using APA formatting, guidelines, which can be found in the APA Style Guide, located in the Student Success Center.

This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.

You are not required to submit this assignment to LopesWrite.

Benchmark Information

This benchmark assignment assesses the following programmatic competencies:

MS Business Analytics

1.4: Utilize data visualization techniques to communicate findings.

**GCU MIS660 Week 6 Assignment**

**TOPIC 6 DATA DISTRIBUTIONS**

The purpose of this assignment is to apply data distributions to discrete and continuous data and justify the selection of the distributions.

For this assignment, you will use the “Random Variables” dataset. You will use SPSS to analyze the dataset and address the questions presented. Findings should be presented in a Word document along with the SPSS outputs.

*Part 1:*

Identify if the following random variables are discrete or continuous.

- Number of defected items in a shipment.
- Height of males (in mm) who attend Grand Canyon University.
- Yearly income among all people in the United States.
- Whether or not a high school graduate is accepted into a college.
- Time that it takes for a person to run a mile.
- The number of emergency hospital visits that each person had in the last 12 months.

*Part 2:*

Let *X* be a random variable of the outcome after rolling a six-sided die one time that is *not* fair. In fact, the die is designed to never result in a 1 or 6, while the other outcomes (i.e., 2, 3, 4, and 5) are equally probable.

- What are the individual probabilities for all possible values of
*X*? - What are the cumulative probabilities for all possible values of
*X*? - What is = ?
- What is = ?
- What is = ?

*Part 3:*

The dataset provided consists of the following random variables:

**BMI:**The body mass index of a random set of people.**Distance:**The distance (in feet) that a baseball player hit the ball.**Height:**The height of males (in mm).**Income:**The income (in dollars) of people in a large company.**Pass:**The outcome when taking an exam (1=Pass; 0=Fail).**Wait Time:**The time (in minutes) that it takes when waiting for the train.

Answer each question below. Use SPSS as needed, and include the software outputs as part of the Word document you submit.

- What is a Q-Q plot?
- Given a set of realized values of a random variable, how can a Q-Q plot be used to assess the distribution of the random variable?
- Using histograms and Q-Q plots (except for binomial), match each random variable to one of the following distributions: Binomial (with N=1, P=0.7), Chi-square (with d.f.=20), Exponential, Lognormal, Normal, and Uniform.

APA format is not required, but solid academic writing is expected.

You are not required to submit this assignment to LopesWrite.

**GCU MIS660 Week 7 Assignment **

**TOPIC 7 SIMPLE REGRESSION ANALYSIS**

The purpose of this assignment is to apply simple regression concepts, interpret simple regression analysis models, and justify business predictions based upon the analysis.

For this assignment, you will use the “Trucks” dataset. You will use SPSS to analyze the dataset and address the questions presented. Findings should be presented in a Word document along with the SPSS outputs.

The business characteristics of n = 250 U.S. trucking and delivery companies for calendar year 2011 were recorded. Among the characteristics studied were the number of drivers and the number of trucks (power units) each company employed.

*Part 1:*

Given that the data consists of counts and range of counts is large, a natural log transformation is usually performed to improve the linear model results. Apply a natural log transform to both variables and then plot the Y = log(Trucks) vs. X = log(Drivers).

Is there a linear relationship? Justify your answer by providing the SPSS output as an illustration.

**Part 2:**

Build a simple linear model by regressing Y on X and testing whether or not a relationship exists between the number of drivers and the number of trucks. Address the following questions in your written response:

- After fitting the model, plot the standardized residuals (on vertical axis) vs. the standardize predictions (on horizontal axis). Is there a pattern? How would you interpret the pattern or lack of pattern?
- After fitting the model, derive the normal probability plot and interpret what the plot means.
- What is the coefficient of determination, R2, of the model? How would you interpret the R2?
- What is the estimate of ?1? How would you interpret the estimate of ?1?
- Is the estimate of ?1 significantly different than 0? Assume an ? = 0.01.
- What is a 95% confidence interval for ?1? How would you interpret the 95% confidence interval for ?1?
- If a new trucking and delivery company with 4,900 drivers were to be formed, how many trucks would you expect the company would employ based on the model?

APA format is not required, but solid academic writing is expected.

You are not required to submit this assignment to LopesWrite.

**GCU MIS660 Week 8 Assignment **

**TOPIC 8 MULTIPLE REGRESSION ANALYSIS**

The purpose of this assignment is to apply multiple regression concepts, interpret multiple regression analysis models, and justify business predictions based upon the analysis.

For this assignment, you will use the “Strength” dataset. You will use SPSS to analyze the dataset and address the questions presented. Findings should be presented in a Word document along with the SPSS outputs.

The compressive strength (Y) of concrete is influenced by the mixing proportions and by the time that it is allowed to cure, although the exact relationship between the strength and the components is unknown. The provided data includes the results of n = 1030 concrete strength experiments that include the following:

**Strength**(in MPa): The compressive strength of the concrete.**Age**(in days): The number of days the concrete was allowed to cured.**Coarse_Aggregate**(in kg/m3): The proportion of coarse aggregate in the mix.**Fine_Aggregate**(in kg/m3): The proportion of fine aggregate in the mix.**Cement**(in kg/m3): The proportion of cement in the mix.**Slag**(in kg/m3): The proportion of furnace slag in the mix.**Superplasticizer**(in kg/m3): The proportion of plasticizer in the mix.**Water**(in kg/m3): The proportion of water in the mix.**Ash**(in kg/m3): The proportion of fly ash in the mix.

*Part 1:*

Derive various transformations of compressive strength to determine which transformation, if any, results in a variable that most closely mimics a normal distribution. To do this, plot Q-Q plots after each transformation listed below, and decide which one should be used to build a multiple linear model. Explain your answer and provide the SPSS output as an illustration.

- Strength (no transformation)
- Square root of Strength
- Squared Strength
- (Natural) Log of Strength
- Reciprocal of Strength

*Part 2:*

Based on the transformation selected in Part 1, build a multiple linear regression model with all eight predictors.

- Use t-tests to determine if any of the predictors significantly affect the compressive strength of concrete. Explain why each variable should or should not be included in the model. Assume ? = 0.05. Show the appropriate model results to explain your answer.
- If any predictors from question 1 are found to be not significant, remove them and re-run the model to create a reduced model (RM). Are all the remaining variables still statistically significant? Show the appropriate model results to explain your answer.
- Based on the RM, should there be concern about multicollinearity among the predictors selected? Show the appropriate model results to explain your answer.
- After fitting the RM, derive the residual plot (standardized residuals vs. standardized predicted values) and normal probability plot. Interpret each plot.
- What is the coefficient of determination, R2, of the RM? How would you interpret the R2?
- Based on the RM, what would be the new estimated compressive strength that is currently 50 MPa, after a 10-day increase in curing time? Assume all other predictors are held constant.
- How would you interpret the intercept (constant) in the RM? Does the interpretation make sense given the data you used to build the RM?

*Part 3:*

Given the following components and aging time below, what is the estimated compressive strength based on the RM?

**Age:**50 days**Coarse_Aggregate:**900 kg/m3**Fine_Aggregate:**600 kg/m3**Cement:**300 kg/m3**Slag:**200 kg/m3**Superplasticizer:**7 kg/m3**Water:**190 kg/m3**Ash:**70 kg/m3

*Part 4:*

What is a 95% confidence interval of the estimate in Part 3? How would you interpret the 95% confidence interval? (**Hint:** Use the SPSS scoring wizard to address this question.)

APA format is not required, but solid academic writing is expected.

You are not required to submit this assignment to LopesWrite.