BSMM-quiz-1

SOLUTIONS

Packages

# check if 'librarian' is installed and if not, install it
if (! "librarian" %in% rownames(installed.packages()) ){
  install.packages("librarian")
}
  
# load packages if not already loaded
librarian::shelf(
  ggplot2, magrittr, tidymodels, tidyverse, rsample, broom, recipes, parsnip, modeldata
)

# set the efault theme for plotting
theme_set(theme_bw(base_size = 18) + theme(legend.position = "top"))

Quiz-1 (part 1)

Q-1

Is this data a tidy dataset?

Region < $1M $1 - $5M $5 - $10M $10 - $100M > $100M
N America $50M $324M $1045M $941M $1200M
EMEA $10M $121M $77M $80M $0M

Delete the wrong answer:

SOLUTION: the answer is No

The values of categories appear as column names, while the corresponding dollar values are spread out across the tables. The Tidy version would have three columns - Region, income_range, and dollar value so that all measurements that go together can be found in a row.

This could be achieved by transforming the original table using tidyr::pivot_longer()

# original table
dat <- tibble::tibble(
  Region = c('N America', 'EMEA')
  , '< $1M' = c('$50M', '$10M')
  , '$1 - $5M' = c('$324M', '$121M')
  , '$5 - $10M' = c('$1045M', '$77M')
  , '$10 - $100M' = c('$941M', '$80M')
  , '> $100M' = c('$1200M', '$0M')
) 
dat |> gt::gt() |> gtExtras::gt_theme_espn()
Region < $1M $1 - $5M $5 - $10M $10 - $100M > $100M
N America $50M $324M $1045M $941M $1200M
EMEA $10M $121M $77M $80M $0M
# transformed table
dat |> 
  tidyr::pivot_longer(-Region, names_to = "range", values_to = "$ amount")|> 
  gt::gt() |> gtExtras::gt_theme_espn()
Region range $ amount
N America < $1M $50M
N America $1 - $5M $324M
N America $5 - $10M $1045M
N America $10 - $100M $941M
N America > $100M $1200M
EMEA < $1M $10M
EMEA $1 - $5M $121M
EMEA $5 - $10M $77M
EMEA $10 - $100M $80M
EMEA > $100M $0M

Q-2

Which resampling method from the resample:: package randomly partitions the data into V sets of roughly equal size?

SOLUTION: The answer is V-fold cross-validation

V-fold cross-validation (also known as k-fold cross-validation) randomly splits the data into V groups of roughly equal size (called “folds”). In the tidyverse you create a dataset for V-fold cross-validation using rsample::vfold_cv() .

Q-3

If I join the two tables below as follows:

dplyr::????_join(employees, departments, by = "department_id")

which type of join would include employee_name == Moe Syzslak?

SOLUTION: the answer is left_join

Inner join will return all the rows with common department ID, and since Moe Syzslak has a NA department ID, with no match in the department ID table,his name won’t appear in the result of the join.

Right join will return the departments table and all the rows of employees with department ID in common with the departments table, and since Moe Syzslak has a NA department ID, with no match in the department ID table,his name won’t appear in the result of the join.

Left join will return the employees table and all the rows of departments with department ID in common with the employees table. Since Moe Syzslak appears in the employees table his name will appear in the result of the join. Since

See the code below.

  • inner
  • left
  • right
  • all of the above

Delete the incorrect answers.

employees - This table contains each employee’s ID, name, and department ID.

id employee_name department_id
1 Homer Simpson 4
2 Ned Flanders 1
3 Barney Gumble 5
4 Clancy Wiggum 3
5 Moe Syzslak NA

departments - This table contains each department’s ID and name.

department_id department_name
1 Sales
2 Engineering
3 Human Resources
4 Customer Service
5 Research And Development
tbl1 <- tibble::tribble(
~id , ~employee_name,   ~department_id
,1  ,'Homer Simpson'    ,4
,2  ,'Ned Flanders'   ,1
,3  ,'Barney Gumble'    ,5
,4  ,'Clancy Wiggum'    ,3
,5  ,'Moe Syzslak'    ,NA
)

tbl2 <- tibble::tribble(
~department_id  ,~department_name
,1  ,"Sales"
,2  ,"Engineering"
,3  ,"Human Resources"
,4  ,"Customer Service"
,5  ,"Research And Development"
)

# left_join
dplyr::left_join(tbl1,tbl2,by = "department_id") |> 
  gt::gt() |> gt::tab_header(title = "Left Join") |> 
  gtExtras::gt_theme_espn()
Left Join
id employee_name department_id department_name
1 Homer Simpson 4 Customer Service
2 Ned Flanders 1 Sales
3 Barney Gumble 5 Research And Development
4 Clancy Wiggum 3 Human Resources
5 Moe Syzslak NA NA
# right_join
dplyr::right_join(tbl1,tbl2,by = "department_id") |> 
  gt::gt() |> gt::tab_header(title = "Right Join") |> 
  gtExtras::gt_theme_espn()
Right Join
id employee_name department_id department_name
1 Homer Simpson 4 Customer Service
2 Ned Flanders 1 Sales
3 Barney Gumble 5 Research And Development
4 Clancy Wiggum 3 Human Resources
NA NA 2 Engineering
# inner_join
dplyr::inner_join(tbl1,tbl2,by = "department_id") |> 
  gt::gt() |> gt::tab_header(title = "Inner Join") |> 
  gtExtras::gt_theme_espn()
Inner Join
id employee_name department_id department_name
1 Homer Simpson 4 Customer Service
2 Ned Flanders 1 Sales
3 Barney Gumble 5 Research And Development
4 Clancy Wiggum 3 Human Resources

Q-4

Recall that the first step of a decision-tree regression model will divide the space of predictors into 2 parts and estimate constant prediction values for each part. For a single predictor, the result of the first step estimates the outcome as:

ŷ=i=12ci×I(xRi) \hat{y} =\sum_{i=1}^{2}c_i\times I_{(x\in R_i)} such that

SSE={iR1(yici)2+iR2(yici)2} \text{SSE}=\left\{ \sum_{i\in R_{1}}\left(y_{i}-c_{i}\right)^{2}+\sum_{i\in R_{2}}\left(y_{i}-c_{i}\right)^{2}\right\}

is minimized.

On the first split of a decision tree regression model for the following data:

The first two regions that partition xx will be (Delete the wrong answer(s) below):

SOLUTION: the answer is [0,1/2] and (1/2, 2/2]

Since the decision tree is minimizing the SSE at each split, you want to minimize the range (max-min) of y values in each split. You can find a cic_i value to minimize the SSE within each split, but a wider range of yy values will have a larger SSE than a smaller range of yy values, due to the squares, and so the splits should have equal ranges.

Since it looks like yi=xi+eiy_i = x_i + e_i (where eie_i is an error term), equal x ranges will give equal y ranges, so the split should be [0,1/2] and (1/2, 2/2].

Q-5

In an ordinary linear regression, regressing the outcome yy on a single predictor xx, the regression coefficient can be estimated as:

SOLUTION:

In class we showed that the regression coefficient can be estimated as

covar(x,y)var(x) \displaystyle\frac{\text{covar(x,y)}}{\text{var(x)}}

Quiz-1 (part 2)

Q6

Write code to determine the number of species of penguin in the dataset. How many are there?

SOLUTION: there are 3 penguin species in the dataset
palmerpenguins::penguins |> 
  dplyr::distinct(species)
# A tibble: 3 × 1
  species  
  <fct>    
1 Adelie   
2 Gentoo   
3 Chinstrap
# == OR ==
palmerpenguins::penguins$species |>  
  unique() |> length()
[1] 3

Q7

Execute the following code to read sales data from a csv file.

# read sales data
sales_dat <-
  readr::read_csv("data/sales_data_sample.csv", show_col_types = FALSE) |>
  janitor::clean_names() |> 
  dplyr::mutate(
    orderdate = lubridate::as_date(orderdate, format = "%m/%d/%Y %H:%M")
    , orderdate = lubridate::year(orderdate)
  )

Describe what the group_by step does in the code below, and complete the code to produce a sales summary by year, i.e. a data.frame where productline and orderdate are the columns (one column for each year), while each year column contains the sales for each productline that year.

  sales_dat |> 
    dplyr::group_by(orderdate, productline) |> 
    dplyr::summarize( sales = sum(___) ) |> 
    tidyr::pivot_wider(names_from = ___, values_from = ___)
SOLUTION:
  • the result of the group_by step is: order first by the values of the orderdate column, and then, within each orderdate value, order the rows by the values of the productline column.

  • the sales summary table produced by the code is given below

# executed code
sales_dat |> 
    dplyr::group_by(orderdate, productline) |> 
    dplyr::summarize( sales = sum(sales) ) |> 
    tidyr::pivot_wider(names_from = orderdate, values_from = sales)
# A tibble: 7 × 4
  productline        `2003`   `2004`  `2005`
  <chr>               <dbl>    <dbl>   <dbl>
1 Classic Cars     1484785. 1762257. 672573.
2 Motorcycles       370896.  560545. 234948.
3 Planes            272258.  502672. 200074.
4 Ships             244821.  341438. 128178.
5 Trains             72802.  116524.  36917.
6 Trucks and Buses  420430.  529303. 178057.
7 Vintage Cars      650988.  911424. 340739.

Q8

For the data below, it is expected that the response variable yy can be described by the independent variables x1x1 and x2x2. This implies that the parameters of the following model should be estimated and tested per the model:

y=β0+β1x1+β2x2+ϵ,ϵ𝒩(0,σ2) y = \beta_0 + \beta_1x1 + \beta_2x2 + \epsilon, \epsilon ∼ \mathcal{N}(0, \sigma^2)

dat <- tibble::tibble(
  x1=c(0.58, 0.86, 0.29, 0.20, 0.56, 0.28, 0.08, 0.41, 0.22, 0.35, 0.59, 0.22, 0.26, 0.12, 0.65, 0.70, 0.30
        , 0.70, 0.39, 0.72, 0.45, 0.81, 0.04, 0.20, 0.95)
  , x2=c(0.71, 0.13, 0.79, 0.20, 0.56, 0.92, 0.01, 0.60, 0.70, 0.73, 0.13, 0.96, 0.27, 0.21, 0.88, 0.30
        , 0.15, 0.09, 0.17, 0.25, 0.30, 0.32, 0.82, 0.98, 0.00)
  , y=c(1.45, 1.93, 0.81, 0.61, 1.55, 0.95, 0.45, 1.14, 0.74, 0.98, 1.41, 0.81, 0.89, 0.68, 1.39, 1.53
        , 0.91, 1.49, 1.38, 1.73, 1.11, 1.68, 0.66, 0.69, 1.98)
)

Calculate the parameter estimates ( β̂0\hat{\beta}_0, β̂1\hat{\beta}_1, and β̂2\hat{\beta}_2); in addition find the usual 95% confidence intervals for β0\beta_0, β1\beta_1, β2\beta_2.

SOLUTION:

Using broom::tidy(conf.int = TRUE) with a regression model:

# your code goes here
fit_Q8 <- lm(y ~ ., data = dat)
fit_Q8 |> broom::tidy(conf.int = TRUE) |> 
  gt::gt() |> gtExtras::gt_theme_espn()
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 0.433547115 0.06598300 6.57058782 1.313239e-06 0.2967067 0.5703875
x1 1.652993451 0.09524539 17.35510141 2.525004e-14 1.4554666 1.8505203
x2 0.003944875 0.07485382 0.05270105 9.584457e-01 -0.1512924 0.1591822

Q9

Using the .resid column created by broom::augment(___, dat) , calculate σ̂2\hat{\sigma}^2.

SOLUTION: the variance of the residual is 0.0116
broom::augment(fit_Q8, dat) |> 
  dplyr::pull(.resid) |> 
  var()
[1] 0.01164646

Q10

Does the following code train a model on the full training set of the modeldata::ames housing dataset and then evaluate the model using a test set?

  1. Is any step missing?

  2. When the recipe is baked and prepped, do you think all categories will be converted to dummy variables and all numeric predictors will be normalized?

SOLUTION:
# Load the ames housing dataset
data(ames)

# Create an initial split of the data
set.seed(123)
ames_split <- initial_split(ames, prop = 0.8, strata = Sale_Price)
ames_train <- training(ames_split)
ames_test  <- testing(ames_split)

# Create a recipe
ames_recipe <- recipe(Sale_Price ~ ., data = ames_train) |>
  step_log(Sale_Price, base = 10) |>  
  step_dummy(all_nominal_predictors()) |>  
  step_zv(all_predictors()) |> 
  step_normalize(all_numeric_predictors())  
# Create a workflow
ames_workflow <- workflow() |>
  add_recipe(ames_recipe) |>
  add_model(lm_spec)

# Fit the model and evaluate on the test set
ames_fit <- ames_workflow |> last_fit(ames_split)

# View the metrics
ames_fit |> collect_metrics()
  1. If you run the code chunk above you will get an error error, “‘lm_spec’ is missing”. A workflow requires a pre-processing step (by formula or recipe) and a model specification. That is what is missing here.

  2. Looking at the data in the ames dataset, there are 40 factor variables and 34 numeric variables.

skim_data <- ames |> skimr::skim()
skim_data$skim_type |> table()

 factor numeric 
     40      34 

The author of the recipe (see below) likely wanted the factor variables to be converted to dummy variables, and the numeric variables to be normalized.

However, the order of the steps in the recipe turns the factor variables into dummy variables before the numeric variable were normalized, and dummy variables are numeric. Given the order of the steps, all the dummy variables will be normalized and all variables will be normalized numeric variables.

No dummy variables remain after the bake operation!

ames_recipe |> prep() |> 
  bake(new_data = ames) |> 
  dplyr::glimpse()
Rows: 2,930
Columns: 273
$ Lot_Frontage                                          <dbl> 2.46645944, 0.66…
$ Lot_Area                                              <dbl> 2.553049311, 0.1…
$ Year_Built                                            <dbl> -0.39216369, -0.…
$ Year_Remod_Add                                        <dbl> -1.17030684, -1.…
$ Mas_Vnr_Area                                          <dbl> 0.05161557, -0.5…
$ BsmtFin_SF_1                                          <dbl> -0.97323408, 0.8…
$ BsmtFin_SF_2                                          <dbl> -0.2948976, 0.54…
$ Bsmt_Unf_SF                                           <dbl> -0.26264166, -0.…
$ Total_Bsmt_SF                                         <dbl> 0.066772012, -0.…
$ First_Flr_SF                                          <dbl> 1.24235505, -0.6…
$ Second_Flr_SF                                         <dbl> -0.7768908, -0.7…
$ Gr_Liv_Area                                           <dbl> 0.3012894541, -1…
$ Bsmt_Full_Bath                                        <dbl> 1.0688790, -0.82…
$ Bsmt_Half_Bath                                        <dbl> -0.2447658, -0.2…
$ Full_Bath                                             <dbl> -1.0350779, -1.0…
$ Half_Bath                                             <dbl> -0.7505082, -0.7…
$ Bedroom_AbvGr                                         <dbl> 0.1586909, -1.04…
$ Kitchen_AbvGr                                         <dbl> -0.2142401, -0.2…
$ TotRms_AbvGrd                                         <dbl> 0.3483003, -0.91…
$ Fireplaces                                            <dbl> 2.1485023, -0.92…
$ Garage_Cars                                           <dbl> 0.3157983, -1.00…
$ Garage_Area                                           <dbl> 0.269501338, 1.2…
$ Wood_Deck_SF                                          <dbl> 0.9247677, 0.372…
$ Open_Porch_SF                                         <dbl> 0.21567955, -0.7…
$ Enclosed_Porch                                        <dbl> -0.364656, -0.36…
$ Three_season_porch                                    <dbl> -0.1027822, -0.1…
$ Screen_Porch                                          <dbl> -0.2849706, 1.83…
$ Pool_Area                                             <dbl> -0.06118388, -0.…
$ Misc_Val                                              <dbl> -0.08608582, -0.…
$ Mo_Sold                                               <dbl> -0.42445939, -0.…
$ Year_Sold                                             <dbl> 1.666407, 1.6664…
$ Longitude                                             <dbl> 0.9003850, 0.900…
$ Latitude                                              <dbl> 1.0642009, 1.008…
$ Sale_Price                                            <dbl> 5.332438, 5.0211…
$ MS_SubClass_One_Story_1945_and_Older                  <dbl> -0.2198252, -0.2…
$ MS_SubClass_One_Story_with_Finished_Attic_All_Ages    <dbl> -0.04135376, -0.…
$ MS_SubClass_One_and_Half_Story_Unfinished_All_Ages    <dbl> -0.08027027, -0.…
$ MS_SubClass_One_and_Half_Story_Finished_All_Ages      <dbl> -0.3211104, -0.3…
$ MS_SubClass_Two_Story_1946_and_Newer                  <dbl> -0.4996264, -0.4…
$ MS_SubClass_Two_Story_1945_and_Older                  <dbl> -0.2066988, -0.2…
$ MS_SubClass_Two_and_Half_Story_All_Ages               <dbl> -0.08549097, -0.…
$ MS_SubClass_Split_or_Multilevel                       <dbl> -0.2066988, -0.2…
$ MS_SubClass_Split_Foyer                               <dbl> -0.1283979, -0.1…
$ MS_SubClass_Duplex_All_Styles_and_Ages                <dbl> -0.2055737, -0.2…
$ MS_SubClass_One_Story_PUD_1946_and_Newer              <dbl> -0.2707322, -0.2…
$ MS_SubClass_One_and_Half_Story_PUD_All_Ages           <dbl> -0.02066363, -0.…
$ MS_SubClass_Two_Story_PUD_1946_and_Newer              <dbl> -0.2187562, -0.2…
$ MS_SubClass_PUD_Multilevel_Split_Level_Foyer          <dbl> -0.07469546, -0.…
$ MS_SubClass_Two_Family_conversion_All_Styles_and_Ages <dbl> -0.1399371, -0.1…
$ MS_Zoning_Residential_High_Density                    <dbl> -0.09509974, 10.…
$ MS_Zoning_Residential_Low_Density                     <dbl> 0.5420306, -1.84…
$ MS_Zoning_Residential_Medium_Density                  <dbl> -0.4344558, -0.4…
$ MS_Zoning_A_agr                                       <dbl> -0.02066363, -0.…
$ MS_Zoning_C_all                                       <dbl> -0.09735866, -0.…
$ MS_Zoning_I_all                                       <dbl> -0.02922903, -0.…
$ Street_Pave                                           <dbl> 0.06868034, 0.06…
$ Alley_No_Alley_Access                                 <dbl> 0.2661636, 0.266…
$ Alley_Paved                                           <dbl> -0.1648677, -0.1…
$ Lot_Shape_Slightly_Irregular                          <dbl> 1.4134589, -0.70…
$ Lot_Shape_Moderately_Irregular                        <dbl> -0.1607238, -0.1…
$ Lot_Shape_Irregular                                   <dbl> -0.07174967, -0.…
$ Land_Contour_HLS                                      <dbl> -0.2066988, -0.2…
$ Land_Contour_Low                                      <dbl> -0.1430754, -0.1…
$ Land_Contour_Lvl                                      <dbl> 0.3401764, 0.340…
$ Utilities_NoSeWa                                      <dbl> -0.02066363, -0.…
$ Utilities_NoSewr                                      <dbl> -0.02922903, -0.…
$ Lot_Config_CulDSac                                    <dbl> -0.2577909, -0.2…
$ Lot_Config_FR2                                        <dbl> -0.1793294, -0.1…
$ Lot_Config_FR3                                        <dbl> -0.07174967, -0.…
$ Lot_Config_Inside                                     <dbl> -1.6286599, 0.61…
$ Land_Slope_Mod                                        <dbl> -0.2187562, -0.2…
$ Land_Slope_Sev                                        <dbl> -0.07174967, -0.…
$ Neighborhood_College_Creek                            <dbl> -0.3186783, -0.3…
$ Neighborhood_Old_Town                                 <dbl> -0.287611, -0.28…
$ Neighborhood_Edwards                                  <dbl> -0.2679979, -0.2…
$ Neighborhood_Somerset                                 <dbl> -0.2596688, -0.2…
$ Neighborhood_Northridge_Heights                       <dbl> -0.2384008, -0.2…
$ Neighborhood_Gilbert                                  <dbl> -0.2423742, -0.2…
$ Neighborhood_Sawyer                                   <dbl> -0.2302931, -0.2…
$ Neighborhood_Northwest_Ames                           <dbl> -0.2166052, -0.2…
$ Neighborhood_Sawyer_West                              <dbl> -0.2155231, -0.2…
$ Neighborhood_Mitchell                                 <dbl> -0.2010204, -0.2…
$ Neighborhood_Brookside                                <dbl> -0.1904408, -0.1…
$ Neighborhood_Crawford                                 <dbl> -0.1928346, -0.1…
$ Neighborhood_Iowa_DOT_and_Rail_Road                   <dbl> -0.1916409, -0.1…
$ Neighborhood_Timberland                               <dbl> -0.1550442, -0.1…
$ Neighborhood_Northridge                               <dbl> -0.1607238, -0.1…
$ Neighborhood_Stone_Brook                              <dbl> -0.1351039, -0.1…
$ Neighborhood_South_and_West_of_Iowa_State_University  <dbl> -0.1157946, -0.1…
$ Neighborhood_Clear_Creek                              <dbl> -0.1213469, -0.1…
$ Neighborhood_Meadow_Village                           <dbl> -0.113887, -0.11…
$ Neighborhood_Briardale                                <dbl> -0.1079726, -0.1…
$ Neighborhood_Bloomington_Heights                      <dbl> -0.09956823, -0.…
$ Neighborhood_Veenker                                  <dbl> -0.09041895, -0.…
$ Neighborhood_Northpark_Villa                          <dbl> -0.09278786, -0.…
$ Neighborhood_Blueste                                  <dbl> -0.05853314, -0.…
$ Neighborhood_Greens                                   <dbl> -0.05066948, -0.…
$ Neighborhood_Green_Hills                              <dbl> -0.02922903, -0.…
$ Neighborhood_Landmark                                 <dbl> -0.02066363, -0.…
$ Condition_1_Feedr                                     <dbl> -0.2462976, 4.05…
$ Condition_1_Norm                                      <dbl> 0.403473, -2.477…
$ Condition_1_PosA                                      <dbl> -0.08549097, -0.…
$ Condition_1_PosN                                      <dbl> -0.1176728, -0.1…
$ Condition_1_RRAe                                      <dbl> -0.09735866, -0.…
$ Condition_1_RRAn                                      <dbl> -0.1301046, -0.1…
$ Condition_1_RRNe                                      <dbl> -0.05066948, -0.…
$ Condition_1_RRNn                                      <dbl> -0.05474101, -0.…
$ Condition_2_Feedr                                     <dbl> -0.0654701, -0.0…
$ Condition_2_Norm                                      <dbl> 0.09509974, 0.09…
$ Condition_2_PosA                                      <dbl> -0.02066363, -0.…
$ Condition_2_PosN                                      <dbl> -0.02922903, -0.…
$ Condition_2_RRAe                                      <dbl> -0.02066363, -0.…
$ Condition_2_RRAn                                      <dbl> -0.02066363, -0.…
$ Condition_2_RRNn                                      <dbl> -0.02922903, -0.…
$ Bldg_Type_TwoFmCon                                    <dbl> -0.1415143, -0.1…
$ Bldg_Type_Duplex                                      <dbl> -0.2055737, -0.2…
$ Bldg_Type_Twnhs                                       <dbl> -0.1880208, -0.1…
$ Bldg_Type_TwnhsE                                      <dbl> -0.3029889, -0.3…
$ House_Style_One_and_Half_Unf                          <dbl> -0.08549097, -0.…
$ House_Style_One_Story                                 <dbl> 0.983694, 0.9836…
$ House_Style_SFoyer                                    <dbl> -0.1689205, -0.1…
$ House_Style_SLvl                                      <dbl> -0.2166052, -0.2…
$ House_Style_Two_and_Half_Fin                          <dbl> -0.05474101, -0.…
$ House_Style_Two_and_Half_Unf                          <dbl> -0.08549097, -0.…
$ House_Style_Two_Story                                 <dbl> -0.6547801, -0.6…
$ Overall_Cond_Poor                                     <dbl> -0.06209708, -0.…
$ Overall_Cond_Fair                                     <dbl> -0.1351039, -0.1…
$ Overall_Cond_Below_Average                            <dbl> -0.1892341, -0.1…
$ Overall_Cond_Average                                  <dbl> 0.876672, -1.140…
$ Overall_Cond_Above_Average                            <dbl> -0.4700736, 2.12…
$ Overall_Cond_Good                                     <dbl> -0.3875958, -0.3…
$ Overall_Cond_Very_Good                                <dbl> -0.232341, -0.23…
$ Overall_Cond_Excellent                                <dbl> -0.113887, -0.11…
$ Roof_Style_Gable                                      <dbl> -1.9639943, 0.50…
$ Roof_Style_Gambrel                                    <dbl> -0.07753179, -0.…
$ Roof_Style_Hip                                        <dbl> 2.0815862, -0.48…
$ Roof_Style_Mansard                                    <dbl> -0.06209708, -0.…
$ Roof_Style_Shed                                       <dbl> -0.04135376, -0.…
$ Roof_Matl_CompShg                                     <dbl> 0.1157946, 0.115…
$ Roof_Matl_Membran                                     <dbl> -0.02066363, -0.…
$ Roof_Matl_Metal                                       <dbl> -0.02066363, -0.…
$ Roof_Matl_Tar.Grv                                     <dbl> -0.08292059, -0.…
$ Roof_Matl_WdShake                                     <dbl> -0.05853314, -0.…
$ Roof_Matl_WdShngl                                     <dbl> -0.04135376, -0.…
$ Exterior_1st_AsphShn                                  <dbl> -0.02922903, -0.…
$ Exterior_1st_BrkComm                                  <dbl> -0.0462448, -0.0…
$ Exterior_1st_BrkFace                                  <dbl> 5.7382892, -0.17…
$ Exterior_1st_CemntBd                                  <dbl> -0.2155231, -0.2…
$ Exterior_1st_HdBoard                                  <dbl> -0.4267943, -0.4…
$ Exterior_1st_ImStucc                                  <dbl> -0.02066363, -0.…
$ Exterior_1st_MetalSd                                  <dbl> -0.4232945, -0.4…
$ Exterior_1st_Plywood                                  <dbl> -0.288480, -0.28…
$ Exterior_1st_PreCast                                  <dbl> -0.02066363, -0.…
$ Exterior_1st_Stone                                    <dbl> -0.02066363, -0.…
$ Exterior_1st_Stucco                                   <dbl> -0.1195232, -0.1…
$ Exterior_1st_VinylSd                                  <dbl> -0.7345379, 1.36…
$ Exterior_1st_Wd.Sdng                                  <dbl> -0.3991715, -0.3…
$ Exterior_1st_WdShing                                  <dbl> -0.1461515, -0.1…
$ Exterior_2nd_AsphShn                                  <dbl> -0.03580575, -0.…
$ Exterior_2nd_Brk.Cmn                                  <dbl> -0.09041895, -0.…
$ Exterior_2nd_BrkFace                                  <dbl> -0.1249191, -0.1…
$ Exterior_2nd_CBlock                                   <dbl> -0.02066363, -0.…
$ Exterior_2nd_CmentBd                                  <dbl> -0.2155231, -0.2…
$ Exterior_2nd_HdBoard                                  <dbl> -0.4070422, -0.4…
$ Exterior_2nd_ImStucc                                  <dbl> -0.07469546, -0.…
$ Exterior_2nd_MetalSd                                  <dbl> -0.4232945, -0.4…
$ Exterior_2nd_Other                                    <dbl> -0.02066363, -0.…
$ Exterior_2nd_Plywood                                  <dbl> 3.0517540, -0.32…
$ Exterior_2nd_PreCast                                  <dbl> -0.02066363, -0.…
$ Exterior_2nd_Stone                                    <dbl> -0.0462448, -0.0…
$ Exterior_2nd_Stucco                                   <dbl> -0.1195232, -0.1…
$ Exterior_2nd_VinylSd                                  <dbl> -0.7283491, 1.37…
$ Exterior_2nd_Wd.Sdng                                  <dbl> -0.3854073, -0.3…
$ Exterior_2nd_Wd.Shng                                  <dbl> -0.1689205, -0.1…
$ Mas_Vnr_Type_BrkFace                                  <dbl> -0.6634428, -0.6…
$ Mas_Vnr_Type_CBlock                                   <dbl> -0.02066363, -0.…
$ Mas_Vnr_Type_None                                     <dbl> -1.2199205, 0.81…
$ Mas_Vnr_Type_Stone                                    <dbl> 3.2366522, -0.30…
$ Exter_Cond_Fair                                       <dbl> -0.1550442, -0.1…
$ Exter_Cond_Good                                       <dbl> -0.3393949, -0.3…
$ Exter_Cond_Poor                                       <dbl> -0.02922903, -0.…
$ Exter_Cond_Typical                                    <dbl> 0.3890515, 0.389…
$ Foundation_CBlock                                     <dbl> 1.1411815, 1.141…
$ Foundation_PConc                                      <dbl> -0.8927767, -0.8…
$ Foundation_Slab                                       <dbl> -0.1367326, -0.1…
$ Foundation_Stone                                      <dbl> -0.06868034, -0.…
$ Foundation_Wood                                       <dbl> -0.02922903, -0.…
$ Bsmt_Cond_Fair                                        <dbl> -0.1975475, -0.1…
$ Bsmt_Cond_Good                                        <dbl> 4.9442976, -0.20…
$ Bsmt_Cond_No_Basement                                 <dbl> -0.1728887, -0.1…
$ Bsmt_Cond_Poor                                        <dbl> -0.0462448, -0.0…
$ Bsmt_Cond_Typical                                     <dbl> -2.8539373, 0.35…
$ Bsmt_Exposure_Gd                                      <dbl> 3.0742999, -0.32…
$ Bsmt_Exposure_Mn                                      <dbl> -0.3055009, -0.3…
$ Bsmt_Exposure_No                                      <dbl> -1.3381530, 0.74…
$ Bsmt_Exposure_No_Basement                             <dbl> -0.1767779, -0.1…
$ BsmtFin_Type_1_BLQ                                    <dbl> 3.0895747, -0.32…
$ BsmtFin_Type_1_GLQ                                    <dbl> -0.6481321, -0.6…
$ BsmtFin_Type_1_LwQ                                    <dbl> -0.2333596, -0.2…
$ BsmtFin_Type_1_No_Basement                            <dbl> -0.1728887, -0.1…
$ BsmtFin_Type_1_Rec                                    <dbl> -0.3291359, 3.03…
$ BsmtFin_Type_1_Unf                                    <dbl> -0.6342108, -0.6…
$ BsmtFin_Type_2_BLQ                                    <dbl> -0.1491694, -0.1…
$ BsmtFin_Type_2_GLQ                                    <dbl> -0.1176728, -0.1…
$ BsmtFin_Type_2_LwQ                                    <dbl> -0.1780576, 5.61…
$ BsmtFin_Type_2_No_Basement                            <dbl> -0.1741936, -0.1…
$ BsmtFin_Type_2_Rec                                    <dbl> -0.194022, -0.19…
$ BsmtFin_Type_2_Unf                                    <dbl> 0.4183756, -2.38…
$ Heating_GasA                                          <dbl> 0.1249191, 0.124…
$ Heating_GasW                                          <dbl> -0.09278786, -0.…
$ Heating_Grav                                          <dbl> -0.05853314, -0.…
$ Heating_OthW                                          <dbl> -0.02066363, -0.…
$ Heating_Wall                                          <dbl> -0.05066948, -0.…
$ Heating_QC_Fair                                       <dbl> 5.2487339, -0.19…
$ Heating_QC_Good                                       <dbl> -0.4379219, -0.4…
$ Heating_QC_Poor                                       <dbl> -0.03580575, -0.…
$ Heating_QC_Typical                                    <dbl> -0.6441494, 1.55…
$ Central_Air_Y                                         <dbl> 0.265243, 0.2652…
$ Electrical_FuseF                                      <dbl> -0.1351039, -0.1…
$ Electrical_FuseP                                      <dbl> -0.05066948, -0.…
$ Electrical_Mix                                        <dbl> -0.02066363, -0.…
$ Electrical_SBrkr                                      <dbl> 0.3088293, 0.308…
$ Electrical_Unknown                                    <dbl> -0.02066363, -0.…
$ Functional_Maj2                                       <dbl> -0.06209708, -0.…
$ Functional_Min1                                       <dbl> -0.1506577, -0.1…
$ Functional_Min2                                       <dbl> -0.1621157, -0.1…
$ Functional_Mod                                        <dbl> -0.1119485, -0.1…
$ Functional_Sal                                        <dbl> -0.02922903, -0.…
$ Functional_Sev                                        <dbl> -0.02922903, -0.…
$ Functional_Typ                                        <dbl> 0.2814762, 0.281…
$ Garage_Type_Basment                                   <dbl> -0.1176728, -0.1…
$ Garage_Type_BuiltIn                                   <dbl> -0.2587311, -0.2…
$ Garage_Type_CarPort                                   <dbl> -0.07174967, -0.…
$ Garage_Type_Detchd                                    <dbl> -0.5966212, -0.5…
$ Garage_Type_More_Than_Two_Types                       <dbl> -0.08549097, -0.…
$ Garage_Type_No_Garage                                 <dbl> -0.2423742, -0.2…
$ Garage_Finish_No_Garage                               <dbl> -0.244342, -0.24…
$ Garage_Finish_RFn                                     <dbl> -0.6229746, -0.6…
$ Garage_Finish_Unf                                     <dbl> -0.8406517, 1.18…
$ Garage_Cond_Fair                                      <dbl> -0.1550442, -0.1…
$ Garage_Cond_Good                                      <dbl> -0.06209708, -0.…
$ Garage_Cond_No_Garage                                 <dbl> -0.244342, -0.24…
$ Garage_Cond_Poor                                      <dbl> -0.07469546, -0.…
$ Garage_Cond_Typical                                   <dbl> 0.3154172, 0.315…
$ Paved_Drive_Partial_Pavement                          <dbl> 6.9116757, -0.14…
$ Paved_Drive_Paved                                     <dbl> -3.1286491, 0.31…
$ Pool_QC_Fair                                          <dbl> -0.02922903, -0.…
$ Pool_QC_Good                                          <dbl> -0.03580575, -0.…
$ Pool_QC_No_Pool                                       <dbl> 0.0654701, 0.065…
$ Pool_QC_Typical                                       <dbl> -0.02066363, -0.…
$ Fence_Good_Wood                                       <dbl> -0.1975475, -0.1…
$ Fence_Minimum_Privacy                                 <dbl> -0.3548348, 2.81…
$ Fence_Minimum_Wood_Wire                               <dbl> -0.06209708, -0.…
$ Fence_No_Fence                                        <dbl> 0.4902687, -2.03…
$ Misc_Feature_Gar2                                     <dbl> -0.04135376, -0.…
$ Misc_Feature_None                                     <dbl> 0.1855737, 0.185…
$ Misc_Feature_Othr                                     <dbl> -0.03580575, -0.…
$ Misc_Feature_Shed                                     <dbl> -0.1741936, -0.1…
$ Misc_Feature_TenC                                     <dbl> -0.02066363, -0.…
$ Sale_Type_Con                                         <dbl> -0.04135376, -0.…
$ Sale_Type_ConLD                                       <dbl> -0.09278786, -0.…
$ Sale_Type_ConLI                                       <dbl> -0.05474101, -0.…
$ Sale_Type_ConLw                                       <dbl> -0.0462448, -0.0…
$ Sale_Type_CWD                                         <dbl> -0.06868034, -0.…
$ Sale_Type_New                                         <dbl> -0.2893471, -0.2…
$ Sale_Type_Oth                                         <dbl> -0.05066948, -0.…
$ Sale_Type_VWD                                         <dbl> -0.02066363, -0.…
$ Sale_Type_WD.                                         <dbl> 0.3890515, 0.389…
$ Sale_Condition_AdjLand                                <dbl> -0.0654701, -0.0…
$ Sale_Condition_Alloca                                 <dbl> -0.09735866, -0.…
$ Sale_Condition_Family                                 <dbl> -0.1266697, -0.1…
$ Sale_Condition_Normal                                 <dbl> 0.4619311, 0.461…
$ Sale_Condition_Partial                                <dbl> -0.292798, -0.29…

The code used in this question was written by chatGPT. Use these tools with caution.

Grading (10 pts)

Part Points
Part 1 - Conceptual 5
Part 2 - Applied 5
Total 10