DATA 205 CUNY Queensborough Community College Data Analytics Worksheet

DescriptionData 205
Lab Instructor: Steven Kurniawan
Spring 2023
Unit 4 Homework: Data Distributions
(42 points)
Your Name
For this homework, we will use R built-in data. R comes with several built-in data sets
related to the 50 states of the United States of America. Professor XU has combined these
data sets into a single CSV file named “us_states.csv”. Below is a list of variables in this
data:

name: the full state names.

abb: 2-letter abbreviations for the state names.

region: the geographic region (Northeast, South, North Central, West) that each
state belongs to.

division: the geographic division (New England, Middle Atlantic, South Atlantic,
East South Central, West South Central, East North Central, West North Central,
Mountain, and Pacific) that each state belongs to.

population: population estimate as of July 1, 1975.

income: per capita income (1974).

illiteracy: illiteracy (1970, percent of population).

life_exp: life expectancy in years (1969–71).

murder: murder and non-negligent manslaughter rate per 100,000 population
(1976).

hs_grad: percent high-school graduates (1970).

frost: mean number of days with minimum temperature below freezing (1931–
1960) in capital or large city.

area: land area in square miles.
Exercise 1: Reviewing Variable Labels and Values (14 points)
Let’s start by taking a look at the structure of the U.S. states dataset and what’s included in
it. To do this, we use the str() command.
1
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Question 1. How many observations are there in this data set? How many
variables? (2 points)
Question 2. Which variables are nominal? Which variables are interval or ratio?
(12 points)
Exercise 2: Percentages in Tables and Charts (8 points)
In class and in your readings, we’ve covered different measures of dispersion. The most
basic is the percentage, which we can read from tables and from charts. In this exercise,
we’ll go one step further to characterize the dispersion in a distribution.
Here’s a frequency table for the variable geographic division in the U.S. states dataset,
followed by one that reports on percentage and a bar chart.
2
Data 205
Lab Instructor: Steven Kurniawan
3
Spring 2023
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Question 1. What is the mode of geographic division, and what is the
percentage of states that are located in this division? (4 points)
Question 2. Which geographic division includes the least number of states?
What is the percentage of states that are located in this division? (2 points)
Question 3. How would you describe the distribution of geographic divisions in
terms of dispersion? Low, medium, or high dispersion? Justify your answer. (2
points)
Exercise 3. Medians and Quartiles (10 points)
The summary() command provides summary statistics on continuous/numeric variables
and reports on the minimum and maximum, the quartiles, and the mean and median. Here
we call this command for the variable income:
We’ll pair this text output with a histogram of income so that we can visualize the shape of
the distribution.
4
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Finally, we’ll also look at a boxplot for the same variable. You can do this by changing
geom_histogram argument in the command line to geom_boxplot.
5
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Because there is only one variable to examine here, R gives us a sideways rendering of the
boxplot instead of an up and down one.
Use all three pieces of information–the summary output, the histogram, and the
boxplot to answer the questions in this section.
Question 1. Describe the distribution of per capita income. Use the following
values in your discussion: range, interquartile range, mean, median, skew, and
outliers. (6 points – One point for each correct depiction of the keyword)
6
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Question 2. Compare what you learned about the distribution of per capita
income from the histogram and the boxplot. Which one do you find more
helpful in summarizing the information and why? (4 points)
7
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Exercise 4: Using medians and distributions to compare states in
different geographic divisions (10 points)
In this next exercise, we will again use boxplots, this time to compare the per capita income
for different geographic divisions.
8
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Question 1. Use the text output on means and standard deviations. Which
geographic division had the highest mean per capita income in 1974? Looking at
the standard deviation of its mean, how would you describe the dispersion of its
per capita income relative to other geographic divisions? (2 points)
9
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Question 2. Now compare the income distributions using the boxplot. Which
geographic division had the lowest median per capita income in 1974? How was
the dispersion of the income distribution in this geographic division compared
to other divisions? (3 points)
Question 3. Based on the text output and the boxplot, which geographic division
had the highest level of income inequality in 1974? And why? (3 points)
Question 4. Judging by the distribution of per capita income, which geographic
division would you choose to live? And why? (2 points)
10

Purchase answer to see full
attachment

DescriptionData 205
Lab Instructor: Steven Kurniawan
Spring 2023
Unit 4 Homework: Data Distributions
(42 points)
Your Name
For this homework, we will use R built-in data. R comes with several built-in data sets
related to the 50 states of the United States of America. Professor XU has combined these
data sets into a single CSV file named “us_states.csv”. Below is a list of variables in this
data:
•
name: the full state names.
•
abb: 2-letter abbreviations for the state names.
•
region: the geographic region (Northeast, South, North Central, West) that each
state belongs to.
•
division: the geographic division (New England, Middle Atlantic, South Atlantic,
East South Central, West South Central, East North Central, West North Central,
Mountain, and Pacific) that each state belongs to.
•
population: population estimate as of July 1, 1975.
•
income: per capita income (1974).
•
illiteracy: illiteracy (1970, percent of population).
•
life_exp: life expectancy in years (1969–71).
•
murder: murder and non-negligent manslaughter rate per 100,000 population
(1976).
•
hs_grad: percent high-school graduates (1970).
•
frost: mean number of days with minimum temperature below freezing (1931–
1960) in capital or large city.
•
area: land area in square miles.
Exercise 1: Reviewing Variable Labels and Values (14 points)
Let’s start by taking a look at the structure of the U.S. states dataset and what’s included in
it. To do this, we use the str() command.
1
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Question 1. How many observations are there in this data set? How many
variables? (2 points)
Question 2. Which variables are nominal? Which variables are interval or ratio?
(12 points)
Exercise 2: Percentages in Tables and Charts (8 points)
In class and in your readings, we’ve covered different measures of dispersion. The most
basic is the percentage, which we can read from tables and from charts. In this exercise,
we’ll go one step further to characterize the dispersion in a distribution.
Here’s a frequency table for the variable geographic division in the U.S. states dataset,
followed by one that reports on percentage and a bar chart.
2
Data 205
Lab Instructor: Steven Kurniawan
3
Spring 2023
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Question 1. What is the mode of geographic division, and what is the
percentage of states that are located in this division? (4 points)
Question 2. Which geographic division includes the least number of states?
What is the percentage of states that are located in this division? (2 points)
Question 3. How would you describe the distribution of geographic divisions in
terms of dispersion? Low, medium, or high dispersion? Justify your answer. (2
points)
Exercise 3. Medians and Quartiles (10 points)
The summary() command provides summary statistics on continuous/numeric variables
and reports on the minimum and maximum, the quartiles, and the mean and median. Here
we call this command for the variable income:
We’ll pair this text output with a histogram of income so that we can visualize the shape of
the distribution.
4
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Finally, we’ll also look at a boxplot for the same variable. You can do this by changing
geom_histogram argument in the command line to geom_boxplot.
5
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Because there is only one variable to examine here, R gives us a sideways rendering of the
boxplot instead of an up and down one.
Use all three pieces of information–the summary output, the histogram, and the
boxplot to answer the questions in this section.
Question 1. Describe the distribution of per capita income. Use the following
values in your discussion: range, interquartile range, mean, median, skew, and
outliers. (6 points – One point for each correct depiction of the keyword)
6
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Question 2. Compare what you learned about the distribution of per capita
income from the histogram and the boxplot. Which one do you find more
helpful in summarizing the information and why? (4 points)
7
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Exercise 4: Using medians and distributions to compare states in
different geographic divisions (10 points)
In this next exercise, we will again use boxplots, this time to compare the per capita income
for different geographic divisions.
8
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Question 1. Use the text output on means and standard deviations. Which
geographic division had the highest mean per capita income in 1974? Looking at
the standard deviation of its mean, how would you describe the dispersion of its
per capita income relative to other geographic divisions? (2 points)
9
Data 205
Lab Instructor: Steven Kurniawan
Spring 2023
Question 2. Now compare the income distributions using the boxplot. Which
geographic division had the lowest median per capita income in 1974? How was
the dispersion of the income distribution in this geographic division compared
to other divisions? (3 points)
Question 3. Based on the text output and the boxplot, which geographic division
had the highest level of income inequality in 1974? And why? (3 points)
Question 4. Judging by the distribution of per capita income, which geographic
division would you choose to live? And why? (2 points)
10
Purchase answer to see full
attachment

Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Your Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.