BSMM-quiz-1

SOLUTIONS

Packages

# check if 'librarian' is installed and if not, install it
if (! "librarian" %in% rownames(installed.packages()) ){
  install.packages("librarian")
}
  
# load packages if not already loaded
librarian::shelf(
  ggplot2, magrittr, tidymodels, tidyverse, rsample, broom, recipes, parsnip, modeldata
)

# set the efault theme for plotting
theme_set(theme_bw(base_size = 18) + theme(legend.position = "top"))

Quiz-1 (part 1)

Q-1

Is this data a tidy dataset?

Region	< $1M	$1 - $5M	$5 - $10M	$10 - $100M	> $100M
N America	$50M	$324M	$1045M	$941M	$1200M
EMEA	$10M	$121M	$77M	$80M	$0M

Delete the wrong answer:

SOLUTION: the answer is No

The values of categories appear as column names, while the corresponding dollar values are spread out across the tables. The Tidy version would have three columns - Region, income_range, and dollar value so that all measurements that go together can be found in a row.

This could be achieved by transforming the original table using tidyr::pivot_longer()

# original table
dat <- tibble::tibble(
  Region = c('N America', 'EMEA')
  , '< $1M' = c('$50M', '$10M')
  , '$1 - $5M' = c('$324M', '$121M')
  , '$5 - $10M' = c('$1045M', '$77M')
  , '$10 - $100M' = c('$941M', '$80M')
  , '> $100M' = c('$1200M', '$0M')
) 
dat |> gt::gt() |> gtExtras::gt_theme_espn()

Region	< $1M	$1 - $5M	$5 - $10M	$10 - $100M	> $100M
N America	$50M	$324M	$1045M	$941M	$1200M
EMEA	$10M	$121M	$77M	$80M	$0M

# transformed table
dat |> 
  tidyr::pivot_longer(-Region, names_to = "range", values_to = "$ amount")|> 
  gt::gt() |> gtExtras::gt_theme_espn()

Region	range	$ amount
N America	< $1M	$50M
N America	$1 - $5M	$324M
N America	$5 - $10M	$1045M
N America	$10 - $100M	$941M
N America	> $100M	$1200M
EMEA	< $1M	$10M
EMEA	$1 - $5M	$121M
EMEA	$5 - $10M	$77M
EMEA	$10 - $100M	$80M
EMEA	> $100M	$0M

Q-2

Which resampling method from the resample:: package randomly partitions the data into V sets of roughly equal size?

SOLUTION: The answer is V-fold cross-validation

V-fold cross-validation (also known as k-fold cross-validation) randomly splits the data into V groups of roughly equal size (called “folds”). In the tidyverse you create a dataset for V-fold cross-validation using rsample::vfold_cv() .

Q-3

If I join the two tables below as follows:

dplyr::????_join(employees, departments, by = "department_id")

which type of join would include employee_name == Moe Syzslak?

SOLUTION: the answer is left_join

Inner join will return all the rows with common department ID, and since Moe Syzslak has a NA department ID, with no match in the department ID table,his name won’t appear in the result of the join.

Right join will return the departments table and all the rows of employees with department ID in common with the departments table, and since Moe Syzslak has a NA department ID, with no match in the department ID table,his name won’t appear in the result of the join.

Left join will return the employees table and all the rows of departments with department ID in common with the employees table. Since Moe Syzslak appears in the employees table his name will appear in the result of the join. Since

See the code below.

inner
left
right
all of the above

Delete the incorrect answers.

employees - This table contains each employee’s ID, name, and department ID.

id	employee_name	department_id
1	Homer Simpson	4
2	Ned Flanders	1
3	Barney Gumble	5
4	Clancy Wiggum	3
5	Moe Syzslak	NA

departments - This table contains each department’s ID and name.

department_id	department_name
1	Sales
2	Engineering
3	Human Resources
4	Customer Service
5	Research And Development

tbl1 <- tibble::tribble(
~id , ~employee_name,   ~department_id
,1  ,'Homer Simpson'    ,4
,2  ,'Ned Flanders'   ,1
,3  ,'Barney Gumble'    ,5
,4  ,'Clancy Wiggum'    ,3
,5  ,'Moe Syzslak'    ,NA
)

tbl2 <- tibble::tribble(
~department_id  ,~department_name
,1  ,"Sales"
,2  ,"Engineering"
,3  ,"Human Resources"
,4  ,"Customer Service"
,5  ,"Research And Development"
)

# left_join
dplyr::left_join(tbl1,tbl2,by = "department_id") |> 
  gt::gt() |> gt::tab_header(title = "Left Join") |> 
  gtExtras::gt_theme_espn()

Left Join
id	employee_name	department_id	department_name
1	Homer Simpson	4	Customer Service
2	Ned Flanders	1	Sales
3	Barney Gumble	5	Research And Development
4	Clancy Wiggum	3	Human Resources
5	Moe Syzslak	NA	NA

# right_join
dplyr::right_join(tbl1,tbl2,by = "department_id") |> 
  gt::gt() |> gt::tab_header(title = "Right Join") |> 
  gtExtras::gt_theme_espn()

Right Join
id	employee_name	department_id	department_name
1	Homer Simpson	4	Customer Service
2	Ned Flanders	1	Sales
3	Barney Gumble	5	Research And Development
4	Clancy Wiggum	3	Human Resources
NA	NA	2	Engineering

# inner_join
dplyr::inner_join(tbl1,tbl2,by = "department_id") |> 
  gt::gt() |> gt::tab_header(title = "Inner Join") |> 
  gtExtras::gt_theme_espn()

Inner Join
id	employee_name	department_id	department_name
1	Homer Simpson	4	Customer Service
2	Ned Flanders	1	Sales
3	Barney Gumble	5	Research And Development
4	Clancy Wiggum	3	Human Resources

Q-4

Recall that the first step of a decision-tree regression model will divide the space of predictors into 2 parts and estimate constant prediction values for each part. For a single predictor, the result of the first step estimates the outcome as:

$\hat{y} =\sum_{i=1}^{2}c_i\times I_{(x\in R_i)}$ such that

$\text{SSE}=\left\{ \sum_{i\in R_{1}}\left(y_{i}-c_{i}\right)^{2}+\sum_{i\in R_{2}}\left(y_{i}-c_{i}\right)^{2}\right\}$

is minimized.

On the first split of a decision tree regression model for the following data:

The first two regions that partition $x$ will be (Delete the wrong answer(s) below):

SOLUTION: the answer is [0,1/2] and (1/2, 2/2]

Since the decision tree is minimizing the SSE at each split, you want to minimize the range (max-min) of y values in each split. You can find a $c_i$ value to minimize the SSE within each split, but a wider range of $y$ values will have a larger SSE than a smaller range of $y$ values, due to the squares, and so the splits should have equal ranges.

Since it looks like $y_i = x_i + e_i$ (where $e_i$ is an error term), equal x ranges will give equal y ranges, so the split should be [0,1/2] and (1/2, 2/2].

Q-5

In an ordinary linear regression, regressing the outcome $y$ on a single predictor $x$ , the regression coefficient can be estimated as:

SOLUTION:

In class we showed that the regression coefficient can be estimated as

$\displaystyle\frac{\text{covar(x,y)}}{\text{var(x)}}$

Quiz-1 (part 2)

Q6

Write code to determine the number of species of penguin in the dataset. How many are there?

SOLUTION: there are 3 penguin species in the dataset

palmerpenguins::penguins |> 
  dplyr::distinct(species)

# A tibble: 3 × 1
  species  
  <fct>    
1 Adelie   
2 Gentoo   
3 Chinstrap

# == OR ==
palmerpenguins::penguins$species |>  
  unique() |> length()

[1] 3

Q7

Execute the following code to read sales data from a csv file.

# read sales data
sales_dat <-
  readr::read_csv("data/sales_data_sample.csv", show_col_types = FALSE) |>
  janitor::clean_names() |> 
  dplyr::mutate(
    orderdate = lubridate::as_date(orderdate, format = "%m/%d/%Y %H:%M")
    , orderdate = lubridate::year(orderdate)
  )

Describe what the group_by step does in the code below, and complete the code to produce a sales summary by year, i.e. a data.frame where productline and orderdate are the columns (one column for each year), while each year column contains the sales for each productline that year.

  sales_dat |> 
    dplyr::group_by(orderdate, productline) |> 
    dplyr::summarize( sales = sum(___) ) |> 
    tidyr::pivot_wider(names_from = ___, values_from = ___)

SOLUTION:

the result of the group_by step is: order first by the values of the orderdate column, and then, within each orderdate value, order the rows by the values of the productline column.
the sales summary table produced by the code is given below

# executed code
sales_dat |> 
    dplyr::group_by(orderdate, productline) |> 
    dplyr::summarize( sales = sum(sales) ) |> 
    tidyr::pivot_wider(names_from = orderdate, values_from = sales)

# A tibble: 7 × 4
  productline        `2003`   `2004`  `2005`
  <chr>               <dbl>    <dbl>   <dbl>
1 Classic Cars     1484785. 1762257. 672573.
2 Motorcycles       370896.  560545. 234948.
3 Planes            272258.  502672. 200074.
4 Ships             244821.  341438. 128178.
5 Trains             72802.  116524.  36917.
6 Trucks and Buses  420430.  529303. 178057.
7 Vintage Cars      650988.  911424. 340739.

Q8

For the data below, it is expected that the response variable $y$ can be described by the independent variables $x1$ and $x2$ . This implies that the parameters of the following model should be estimated and tested per the model:

$y = \beta_0 + \beta_1x1 + \beta_2x2 + \epsilon, \epsilon ∼ \mathcal{N}(0, \sigma^2)$

dat <- tibble::tibble(
  x1=c(0.58, 0.86, 0.29, 0.20, 0.56, 0.28, 0.08, 0.41, 0.22, 0.35, 0.59, 0.22, 0.26, 0.12, 0.65, 0.70, 0.30
        , 0.70, 0.39, 0.72, 0.45, 0.81, 0.04, 0.20, 0.95)
  , x2=c(0.71, 0.13, 0.79, 0.20, 0.56, 0.92, 0.01, 0.60, 0.70, 0.73, 0.13, 0.96, 0.27, 0.21, 0.88, 0.30
        , 0.15, 0.09, 0.17, 0.25, 0.30, 0.32, 0.82, 0.98, 0.00)
  , y=c(1.45, 1.93, 0.81, 0.61, 1.55, 0.95, 0.45, 1.14, 0.74, 0.98, 1.41, 0.81, 0.89, 0.68, 1.39, 1.53
        , 0.91, 1.49, 1.38, 1.73, 1.11, 1.68, 0.66, 0.69, 1.98)
)

Calculate the parameter estimates ( $\hat{\beta}_0$ , $\hat{\beta}_1$ , and $\hat{\beta}_2$ ); in addition find the usual 95% confidence intervals for $\beta_0$ , $\beta_1$ , $\beta_2$ .

SOLUTION:

Using broom::tidy(conf.int = TRUE) with a regression model:

# your code goes here
fit_Q8 <- lm(y ~ ., data = dat)
fit_Q8 |> broom::tidy(conf.int = TRUE) |> 
  gt::gt() |> gtExtras::gt_theme_espn()

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	0.433547115	0.06598300	6.57058782	1.313239e-06	0.2967067	0.5703875
x1	1.652993451	0.09524539	17.35510141	2.525004e-14	1.4554666	1.8505203
x2	0.003944875	0.07485382	0.05270105	9.584457e-01	-0.1512924	0.1591822

Q9

Using the .resid column created by broom::augment(___, dat) , calculate $\hat{\sigma}^2$ .

SOLUTION: the variance of the residual is 0.0116

broom::augment(fit_Q8, dat) |> 
  dplyr::pull(.resid) |> 
  var()

[1] 0.01164646

Q10

Does the following code train a model on the full training set of the modeldata::ames housing dataset and then evaluate the model using a test set?

Is any step missing?
When the recipe is baked and prepped, do you think all categories will be converted to dummy variables and all numeric predictors will be normalized?

SOLUTION:

# Load the ames housing dataset
data(ames)

# Create an initial split of the data
set.seed(123)
ames_split <- initial_split(ames, prop = 0.8, strata = Sale_Price)
ames_train <- training(ames_split)
ames_test  <- testing(ames_split)

# Create a recipe
ames_recipe <- recipe(Sale_Price ~ ., data = ames_train) |>
  step_log(Sale_Price, base = 10) |>  
  step_dummy(all_nominal_predictors()) |>  
  step_zv(all_predictors()) |> 
  step_normalize(all_numeric_predictors())

# Create a workflow
ames_workflow <- workflow() |>
  add_recipe(ames_recipe) |>
  add_model(lm_spec)

# Fit the model and evaluate on the test set
ames_fit <- ames_workflow |> last_fit(ames_split)

# View the metrics
ames_fit |> collect_metrics()

If you run the code chunk above you will get an error error, “‘lm_spec’ is missing”. A workflow requires a pre-processing step (by formula or recipe) and a model specification. That is what is missing here.
Looking at the data in the ames dataset, there are 40 factor variables and 34 numeric variables.

skim_data <- ames |> skimr::skim()
skim_data$skim_type |> table()


 factor numeric 
     40      34

The author of the recipe (see below) likely wanted the factor variables to be converted to dummy variables, and the numeric variables to be normalized.

However, the order of the steps in the recipe turns the factor variables into dummy variables before the numeric variable were normalized, and dummy variables are numeric. Given the order of the steps, all the dummy variables will be normalized and all variables will be normalized numeric variables.

No dummy variables remain after the bake operation!

ames_recipe |> prep() |> 
  bake(new_data = ames) |> 
  dplyr::glimpse()

Rows: 2,930
Columns: 273
$ Lot_Frontage                                          <dbl> 2.46645944, 0.66…
$ Lot_Area                                              <dbl> 2.553049311, 0.1…
$ Year_Built                                            <dbl> -0.39216369, -0.…
$ Year_Remod_Add                                        <dbl> -1.17030684, -1.…
$ Mas_Vnr_Area                                          <dbl> 0.05161557, -0.5…
$ BsmtFin_SF_1                                          <dbl> -0.97323408, 0.8…
$ BsmtFin_SF_2                                          <dbl> -0.2948976, 0.54…
$ Bsmt_Unf_SF                                           <dbl> -0.26264166, -0.…
$ Total_Bsmt_SF                                         <dbl> 0.066772012, -0.…
$ First_Flr_SF                                          <dbl> 1.24235505, -0.6…
$ Second_Flr_SF                                         <dbl> -0.7768908, -0.7…
$ Gr_Liv_Area                                           <dbl> 0.3012894541, -1…
$ Bsmt_Full_Bath                                        <dbl> 1.0688790, -0.82…
$ Bsmt_Half_Bath                                        <dbl> -0.2447658, -0.2…
$ Full_Bath                                             <dbl> -1.0350779, -1.0…
$ Half_Bath                                             <dbl> -0.7505082, -0.7…
$ Bedroom_AbvGr                                         <dbl> 0.1586909, -1.04…
$ Kitchen_AbvGr                                         <dbl> -0.2142401, -0.2…
$ TotRms_AbvGrd                                         <dbl> 0.3483003, -0.91…
$ Fireplaces                                            <dbl> 2.1485023, -0.92…
$ Garage_Cars                                           <dbl> 0.3157983, -1.00…
$ Garage_Area                                           <dbl> 0.269501338, 1.2…
$ Wood_Deck_SF                                          <dbl> 0.9247677, 0.372…
$ Open_Porch_SF                                         <dbl> 0.21567955, -0.7…
$ Enclosed_Porch                                        <dbl> -0.364656, -0.36…
$ Three_season_porch                                    <dbl> -0.1027822, -0.1…
$ Screen_Porch                                          <dbl> -0.2849706, 1.83…
$ Pool_Area                                             <dbl> -0.06118388, -0.…
$ Misc_Val                                              <dbl> -0.08608582, -0.…
$ Mo_Sold                                               <dbl> -0.42445939, -0.…
$ Year_Sold                                             <dbl> 1.666407, 1.6664…
$ Longitude                                             <dbl> 0.9003850, 0.900…
$ Latitude                                              <dbl> 1.0642009, 1.008…
$ Sale_Price                                            <dbl> 5.332438, 5.0211…
$ MS_SubClass_One_Story_1945_and_Older                  <dbl> -0.2198252, -0.2…
$ MS_SubClass_One_Story_with_Finished_Attic_All_Ages    <dbl> -0.04135376, -0.…
$ MS_SubClass_One_and_Half_Story_Unfinished_All_Ages    <dbl> -0.08027027, -0.…
$ MS_SubClass_One_and_Half_Story_Finished_All_Ages      <dbl> -0.3211104, -0.3…
$ MS_SubClass_Two_Story_1946_and_Newer                  <dbl> -0.4996264, -0.4…
$ MS_SubClass_Two_Story_1945_and_Older                  <dbl> -0.2066988, -0.2…
$ MS_SubClass_Two_and_Half_Story_All_Ages               <dbl> -0.08549097, -0.…
$ MS_SubClass_Split_or_Multilevel                       <dbl> -0.2066988, -0.2…
$ MS_SubClass_Split_Foyer                               <dbl> -0.1283979, -0.1…
$ MS_SubClass_Duplex_All_Styles_and_Ages                <dbl> -0.2055737, -0.2…
$ MS_SubClass_One_Story_PUD_1946_and_Newer              <dbl> -0.2707322, -0.2…
$ MS_SubClass_One_and_Half_Story_PUD_All_Ages           <dbl> -0.02066363, -0.…
$ MS_SubClass_Two_Story_PUD_1946_and_Newer              <dbl> -0.2187562, -0.2…
$ MS_SubClass_PUD_Multilevel_Split_Level_Foyer          <dbl> -0.07469546, -0.…
$ MS_SubClass_Two_Family_conversion_All_Styles_and_Ages <dbl> -0.1399371, -0.1…
$ MS_Zoning_Residential_High_Density                    <dbl> -0.09509974, 10.…
$ MS_Zoning_Residential_Low_Density                     <dbl> 0.5420306, -1.84…
$ MS_Zoning_Residential_Medium_Density                  <dbl> -0.4344558, -0.4…
$ MS_Zoning_A_agr                                       <dbl> -0.02066363, -0.…
$ MS_Zoning_C_all                                       <dbl> -0.09735866, -0.…
$ MS_Zoning_I_all                                       <dbl> -0.02922903, -0.…
$ Street_Pave                                           <dbl> 0.06868034, 0.06…
$ Alley_No_Alley_Access                                 <dbl> 0.2661636, 0.266…
$ Alley_Paved                                           <dbl> -0.1648677, -0.1…
$ Lot_Shape_Slightly_Irregular                          <dbl> 1.4134589, -0.70…
$ Lot_Shape_Moderately_Irregular                        <dbl> -0.1607238, -0.1…
$ Lot_Shape_Irregular                                   <dbl> -0.07174967, -0.…
$ Land_Contour_HLS                                      <dbl> -0.2066988, -0.2…
$ Land_Contour_Low                                      <dbl> -0.1430754, -0.1…
$ Land_Contour_Lvl                                      <dbl> 0.3401764, 0.340…
$ Utilities_NoSeWa                                      <dbl> -0.02066363, -0.…
$ Utilities_NoSewr                                      <dbl> -0.02922903, -0.…
$ Lot_Config_CulDSac                                    <dbl> -0.2577909, -0.2…
$ Lot_Config_FR2                                        <dbl> -0.1793294, -0.1…
$ Lot_Config_FR3                                        <dbl> -0.07174967, -0.…
$ Lot_Config_Inside                                     <dbl> -1.6286599, 0.61…
$ Land_Slope_Mod                                        <dbl> -0.2187562, -0.2…
$ Land_Slope_Sev                                        <dbl> -0.07174967, -0.…
$ Neighborhood_College_Creek                            <dbl> -0.3186783, -0.3…
$ Neighborhood_Old_Town                                 <dbl> -0.287611, -0.28…
$ Neighborhood_Edwards                                  <dbl> -0.2679979, -0.2…
$ Neighborhood_Somerset                                 <dbl> -0.2596688, -0.2…
$ Neighborhood_Northridge_Heights                       <dbl> -0.2384008, -0.2…
$ Neighborhood_Gilbert                                  <dbl> -0.2423742, -0.2…
$ Neighborhood_Sawyer                                   <dbl> -0.2302931, -0.2…
$ Neighborhood_Northwest_Ames                           <dbl> -0.2166052, -0.2…
$ Neighborhood_Sawyer_West                              <dbl> -0.2155231, -0.2…
$ Neighborhood_Mitchell                                 <dbl> -0.2010204, -0.2…
$ Neighborhood_Brookside                                <dbl> -0.1904408, -0.1…
$ Neighborhood_Crawford                                 <dbl> -0.1928346, -0.1…
$ Neighborhood_Iowa_DOT_and_Rail_Road                   <dbl> -0.1916409, -0.1…
$ Neighborhood_Timberland                               <dbl> -0.1550442, -0.1…
$ Neighborhood_Northridge                               <dbl> -0.1607238, -0.1…
$ Neighborhood_Stone_Brook                              <dbl> -0.1351039, -0.1…
$ Neighborhood_South_and_West_of_Iowa_State_University  <dbl> -0.1157946, -0.1…
$ Neighborhood_Clear_Creek                              <dbl> -0.1213469, -0.1…
$ Neighborhood_Meadow_Village                           <dbl> -0.113887, -0.11…
$ Neighborhood_Briardale                                <dbl> -0.1079726, -0.1…
$ Neighborhood_Bloomington_Heights                      <dbl> -0.09956823, -0.…
$ Neighborhood_Veenker                                  <dbl> -0.09041895, -0.…
$ Neighborhood_Northpark_Villa                          <dbl> -0.09278786, -0.…
$ Neighborhood_Blueste                                  <dbl> -0.05853314, -0.…
$ Neighborhood_Greens                                   <dbl> -0.05066948, -0.…
$ Neighborhood_Green_Hills                              <dbl> -0.02922903, -0.…
$ Neighborhood_Landmark                                 <dbl> -0.02066363, -0.…
$ Condition_1_Feedr                                     <dbl> -0.2462976, 4.05…
$ Condition_1_Norm                                      <dbl> 0.403473, -2.477…
$ Condition_1_PosA                                      <dbl> -0.08549097, -0.…
$ Condition_1_PosN                                      <dbl> -0.1176728, -0.1…
$ Condition_1_RRAe                                      <dbl> -0.09735866, -0.…
$ Condition_1_RRAn                                      <dbl> -0.1301046, -0.1…
$ Condition_1_RRNe                                      <dbl> -0.05066948, -0.…
$ Condition_1_RRNn                                      <dbl> -0.05474101, -0.…
$ Condition_2_Feedr                                     <dbl> -0.0654701, -0.0…
$ Condition_2_Norm                                      <dbl> 0.09509974, 0.09…
$ Condition_2_PosA                                      <dbl> -0.02066363, -0.…
$ Condition_2_PosN                                      <dbl> -0.02922903, -0.…
$ Condition_2_RRAe                                      <dbl> -0.02066363, -0.…
$ Condition_2_RRAn                                      <dbl> -0.02066363, -0.…
$ Condition_2_RRNn                                      <dbl> -0.02922903, -0.…
$ Bldg_Type_TwoFmCon                                    <dbl> -0.1415143, -0.1…
$ Bldg_Type_Duplex                                      <dbl> -0.2055737, -0.2…
$ Bldg_Type_Twnhs                                       <dbl> -0.1880208, -0.1…
$ Bldg_Type_TwnhsE                                      <dbl> -0.3029889, -0.3…
$ House_Style_One_and_Half_Unf                          <dbl> -0.08549097, -0.…
$ House_Style_One_Story                                 <dbl> 0.983694, 0.9836…
$ House_Style_SFoyer                                    <dbl> -0.1689205, -0.1…
$ House_Style_SLvl                                      <dbl> -0.2166052, -0.2…
$ House_Style_Two_and_Half_Fin                          <dbl> -0.05474101, -0.…
$ House_Style_Two_and_Half_Unf                          <dbl> -0.08549097, -0.…
$ House_Style_Two_Story                                 <dbl> -0.6547801, -0.6…
$ Overall_Cond_Poor                                     <dbl> -0.06209708, -0.…
$ Overall_Cond_Fair                                     <dbl> -0.1351039, -0.1…
$ Overall_Cond_Below_Average                            <dbl> -0.1892341, -0.1…
$ Overall_Cond_Average                                  <dbl> 0.876672, -1.140…
$ Overall_Cond_Above_Average                            <dbl> -0.4700736, 2.12…
$ Overall_Cond_Good                                     <dbl> -0.3875958, -0.3…
$ Overall_Cond_Very_Good                                <dbl> -0.232341, -0.23…
$ Overall_Cond_Excellent                                <dbl> -0.113887, -0.11…
$ Roof_Style_Gable                                      <dbl> -1.9639943, 0.50…
$ Roof_Style_Gambrel                                    <dbl> -0.07753179, -0.…
$ Roof_Style_Hip                                        <dbl> 2.0815862, -0.48…
$ Roof_Style_Mansard                                    <dbl> -0.06209708, -0.…
$ Roof_Style_Shed                                       <dbl> -0.04135376, -0.…
$ Roof_Matl_CompShg                                     <dbl> 0.1157946, 0.115…
$ Roof_Matl_Membran                                     <dbl> -0.02066363, -0.…
$ Roof_Matl_Metal                                       <dbl> -0.02066363, -0.…
$ Roof_Matl_Tar.Grv                                     <dbl> -0.08292059, -0.…
$ Roof_Matl_WdShake                                     <dbl> -0.05853314, -0.…
$ Roof_Matl_WdShngl                                     <dbl> -0.04135376, -0.…
$ Exterior_1st_AsphShn                                  <dbl> -0.02922903, -0.…
$ Exterior_1st_BrkComm                                  <dbl> -0.0462448, -0.0…
$ Exterior_1st_BrkFace                                  <dbl> 5.7382892, -0.17…
$ Exterior_1st_CemntBd                                  <dbl> -0.2155231, -0.2…
$ Exterior_1st_HdBoard                                  <dbl> -0.4267943, -0.4…
$ Exterior_1st_ImStucc                                  <dbl> -0.02066363, -0.…
$ Exterior_1st_MetalSd                                  <dbl> -0.4232945, -0.4…
$ Exterior_1st_Plywood                                  <dbl> -0.288480, -0.28…
$ Exterior_1st_PreCast                                  <dbl> -0.02066363, -0.…
$ Exterior_1st_Stone                                    <dbl> -0.02066363, -0.…
$ Exterior_1st_Stucco                                   <dbl> -0.1195232, -0.1…
$ Exterior_1st_VinylSd                                  <dbl> -0.7345379, 1.36…
$ Exterior_1st_Wd.Sdng                                  <dbl> -0.3991715, -0.3…
$ Exterior_1st_WdShing                                  <dbl> -0.1461515, -0.1…
$ Exterior_2nd_AsphShn                                  <dbl> -0.03580575, -0.…
$ Exterior_2nd_Brk.Cmn                                  <dbl> -0.09041895, -0.…
$ Exterior_2nd_BrkFace                                  <dbl> -0.1249191, -0.1…
$ Exterior_2nd_CBlock                                   <dbl> -0.02066363, -0.…
$ Exterior_2nd_CmentBd                                  <dbl> -0.2155231, -0.2…
$ Exterior_2nd_HdBoard                                  <dbl> -0.4070422, -0.4…
$ Exterior_2nd_ImStucc                                  <dbl> -0.07469546, -0.…
$ Exterior_2nd_MetalSd                                  <dbl> -0.4232945, -0.4…
$ Exterior_2nd_Other                                    <dbl> -0.02066363, -0.…
$ Exterior_2nd_Plywood                                  <dbl> 3.0517540, -0.32…
$ Exterior_2nd_PreCast                                  <dbl> -0.02066363, -0.…
$ Exterior_2nd_Stone                                    <dbl> -0.0462448, -0.0…
$ Exterior_2nd_Stucco                                   <dbl> -0.1195232, -0.1…
$ Exterior_2nd_VinylSd                                  <dbl> -0.7283491, 1.37…
$ Exterior_2nd_Wd.Sdng                                  <dbl> -0.3854073, -0.3…
$ Exterior_2nd_Wd.Shng                                  <dbl> -0.1689205, -0.1…
$ Mas_Vnr_Type_BrkFace                                  <dbl> -0.6634428, -0.6…
$ Mas_Vnr_Type_CBlock                                   <dbl> -0.02066363, -0.…
$ Mas_Vnr_Type_None                                     <dbl> -1.2199205, 0.81…
$ Mas_Vnr_Type_Stone                                    <dbl> 3.2366522, -0.30…
$ Exter_Cond_Fair                                       <dbl> -0.1550442, -0.1…
$ Exter_Cond_Good                                       <dbl> -0.3393949, -0.3…
$ Exter_Cond_Poor                                       <dbl> -0.02922903, -0.…
$ Exter_Cond_Typical                                    <dbl> 0.3890515, 0.389…
$ Foundation_CBlock                                     <dbl> 1.1411815, 1.141…
$ Foundation_PConc                                      <dbl> -0.8927767, -0.8…
$ Foundation_Slab                                       <dbl> -0.1367326, -0.1…
$ Foundation_Stone                                      <dbl> -0.06868034, -0.…
$ Foundation_Wood                                       <dbl> -0.02922903, -0.…
$ Bsmt_Cond_Fair                                        <dbl> -0.1975475, -0.1…
$ Bsmt_Cond_Good                                        <dbl> 4.9442976, -0.20…
$ Bsmt_Cond_No_Basement                                 <dbl> -0.1728887, -0.1…
$ Bsmt_Cond_Poor                                        <dbl> -0.0462448, -0.0…
$ Bsmt_Cond_Typical                                     <dbl> -2.8539373, 0.35…
$ Bsmt_Exposure_Gd                                      <dbl> 3.0742999, -0.32…
$ Bsmt_Exposure_Mn                                      <dbl> -0.3055009, -0.3…
$ Bsmt_Exposure_No                                      <dbl> -1.3381530, 0.74…
$ Bsmt_Exposure_No_Basement                             <dbl> -0.1767779, -0.1…
$ BsmtFin_Type_1_BLQ                                    <dbl> 3.0895747, -0.32…
$ BsmtFin_Type_1_GLQ                                    <dbl> -0.6481321, -0.6…
$ BsmtFin_Type_1_LwQ                                    <dbl> -0.2333596, -0.2…
$ BsmtFin_Type_1_No_Basement                            <dbl> -0.1728887, -0.1…
$ BsmtFin_Type_1_Rec                                    <dbl> -0.3291359, 3.03…
$ BsmtFin_Type_1_Unf                                    <dbl> -0.6342108, -0.6…
$ BsmtFin_Type_2_BLQ                                    <dbl> -0.1491694, -0.1…
$ BsmtFin_Type_2_GLQ                                    <dbl> -0.1176728, -0.1…
$ BsmtFin_Type_2_LwQ                                    <dbl> -0.1780576, 5.61…
$ BsmtFin_Type_2_No_Basement                            <dbl> -0.1741936, -0.1…
$ BsmtFin_Type_2_Rec                                    <dbl> -0.194022, -0.19…
$ BsmtFin_Type_2_Unf                                    <dbl> 0.4183756, -2.38…
$ Heating_GasA                                          <dbl> 0.1249191, 0.124…
$ Heating_GasW                                          <dbl> -0.09278786, -0.…
$ Heating_Grav                                          <dbl> -0.05853314, -0.…
$ Heating_OthW                                          <dbl> -0.02066363, -0.…
$ Heating_Wall                                          <dbl> -0.05066948, -0.…
$ Heating_QC_Fair                                       <dbl> 5.2487339, -0.19…
$ Heating_QC_Good                                       <dbl> -0.4379219, -0.4…
$ Heating_QC_Poor                                       <dbl> -0.03580575, -0.…
$ Heating_QC_Typical                                    <dbl> -0.6441494, 1.55…
$ Central_Air_Y                                         <dbl> 0.265243, 0.2652…
$ Electrical_FuseF                                      <dbl> -0.1351039, -0.1…
$ Electrical_FuseP                                      <dbl> -0.05066948, -0.…
$ Electrical_Mix                                        <dbl> -0.02066363, -0.…
$ Electrical_SBrkr                                      <dbl> 0.3088293, 0.308…
$ Electrical_Unknown                                    <dbl> -0.02066363, -0.…
$ Functional_Maj2                                       <dbl> -0.06209708, -0.…
$ Functional_Min1                                       <dbl> -0.1506577, -0.1…
$ Functional_Min2                                       <dbl> -0.1621157, -0.1…
$ Functional_Mod                                        <dbl> -0.1119485, -0.1…
$ Functional_Sal                                        <dbl> -0.02922903, -0.…
$ Functional_Sev                                        <dbl> -0.02922903, -0.…
$ Functional_Typ                                        <dbl> 0.2814762, 0.281…
$ Garage_Type_Basment                                   <dbl> -0.1176728, -0.1…
$ Garage_Type_BuiltIn                                   <dbl> -0.2587311, -0.2…
$ Garage_Type_CarPort                                   <dbl> -0.07174967, -0.…
$ Garage_Type_Detchd                                    <dbl> -0.5966212, -0.5…
$ Garage_Type_More_Than_Two_Types                       <dbl> -0.08549097, -0.…
$ Garage_Type_No_Garage                                 <dbl> -0.2423742, -0.2…
$ Garage_Finish_No_Garage                               <dbl> -0.244342, -0.24…
$ Garage_Finish_RFn                                     <dbl> -0.6229746, -0.6…
$ Garage_Finish_Unf                                     <dbl> -0.8406517, 1.18…
$ Garage_Cond_Fair                                      <dbl> -0.1550442, -0.1…
$ Garage_Cond_Good                                      <dbl> -0.06209708, -0.…
$ Garage_Cond_No_Garage                                 <dbl> -0.244342, -0.24…
$ Garage_Cond_Poor                                      <dbl> -0.07469546, -0.…
$ Garage_Cond_Typical                                   <dbl> 0.3154172, 0.315…
$ Paved_Drive_Partial_Pavement                          <dbl> 6.9116757, -0.14…
$ Paved_Drive_Paved                                     <dbl> -3.1286491, 0.31…
$ Pool_QC_Fair                                          <dbl> -0.02922903, -0.…
$ Pool_QC_Good                                          <dbl> -0.03580575, -0.…
$ Pool_QC_No_Pool                                       <dbl> 0.0654701, 0.065…
$ Pool_QC_Typical                                       <dbl> -0.02066363, -0.…
$ Fence_Good_Wood                                       <dbl> -0.1975475, -0.1…
$ Fence_Minimum_Privacy                                 <dbl> -0.3548348, 2.81…
$ Fence_Minimum_Wood_Wire                               <dbl> -0.06209708, -0.…
$ Fence_No_Fence                                        <dbl> 0.4902687, -2.03…
$ Misc_Feature_Gar2                                     <dbl> -0.04135376, -0.…
$ Misc_Feature_None                                     <dbl> 0.1855737, 0.185…
$ Misc_Feature_Othr                                     <dbl> -0.03580575, -0.…
$ Misc_Feature_Shed                                     <dbl> -0.1741936, -0.1…
$ Misc_Feature_TenC                                     <dbl> -0.02066363, -0.…
$ Sale_Type_Con                                         <dbl> -0.04135376, -0.…
$ Sale_Type_ConLD                                       <dbl> -0.09278786, -0.…
$ Sale_Type_ConLI                                       <dbl> -0.05474101, -0.…
$ Sale_Type_ConLw                                       <dbl> -0.0462448, -0.0…
$ Sale_Type_CWD                                         <dbl> -0.06868034, -0.…
$ Sale_Type_New                                         <dbl> -0.2893471, -0.2…
$ Sale_Type_Oth                                         <dbl> -0.05066948, -0.…
$ Sale_Type_VWD                                         <dbl> -0.02066363, -0.…
$ Sale_Type_WD.                                         <dbl> 0.3890515, 0.389…
$ Sale_Condition_AdjLand                                <dbl> -0.0654701, -0.0…
$ Sale_Condition_Alloca                                 <dbl> -0.09735866, -0.…
$ Sale_Condition_Family                                 <dbl> -0.1266697, -0.1…
$ Sale_Condition_Normal                                 <dbl> 0.4619311, 0.461…
$ Sale_Condition_Partial                                <dbl> -0.292798, -0.29…

The code used in this question was written by chatGPT. Use these tools with caution.

Grading (10 pts)

Part	Points
Part 1 - Conceptual	5
Part 2 - Applied	5
Total	10