Logistic Regression
Description of data set
Within this study, the participants were aurally presented string
combinations at varying audio volumes.
The file module1-auditory_strings.csv contains the following
data:
stimulus: character string that was aurally presented
condition: the volume at which it was presented (1: very quiet to
100: very loud)
response_correct: whether the response given by the participant
was correct or incorrect
response_time: response time in seconds
Tasks
- Read the data file module1-auditory_strings.csv into R (assign it to
a variable called “dat”).
Create three new variables in dat:
“volume” that contains the volume from “condition” as a numeric
vector (e.g., 63 for the “condition” volume_63; Hint: You can use the
function str_split_fixed() from the stringr
package)
“stimulus_length” that contains the length of the “stimulus”
variable (Hint: You can use the function str_length from
the stringr package)
“response_correct” that contains the value 1 when the response
was correct and 0 otherwise
<
- Estimate a logistic regression model for “response_correct” as
target and “volume”, “stimulus_length”, and “response_time” as features.
How to interpret the coefficients of this model? In other words, what’s
the effect of each feature on the target in standardized units (i.e.,
odds ratio)?
- Using the model from task 3, calculate the predicted probability for
a correct response for each observation and save it as
“prob_correct_pred” in dat.
- Manually calculate the predicted value of “response_correct” using a
cutoff value of 0.5 for the probabilities calculated in task 4 and save
it as “response_correct_pred” in dat. Is the result equivalent to the
default prediction of
mlr3’s predict()
method?
- Assess the prediction performance of the model by comparing the
actual “response_correct” to the predicted “response_correct_pred” from
task 5 using a contingency table. What’s the prediction accuracy of the
model?
- Calculate the predicted probability for a correct response for a
“stimulus_length” of 3, the mean “volume”, and the first and third
quartiles of “response_time”. Do the logistic model’s predictions for
“response_correct” differ between the first and third quartiles of
“response_time” using a cutoff value of 0.5?
- Bonus: Also estimate a linear regression model for the model as
specified in task 3, that is, for “response_correct” as target and
“volume”, “stimulus_length”, and “response_time” as features.
- Bonus: Why is the linear model estimated in task 8 fundamentally
wrong? (Hint: Use both models, i.e., linear and logistic regression, to
predict the probability for a correct response for the following new
data set “dat_new2” with extremely high “stimulus_length”, and compare
the results)
dat_new2 <- data.frame('stimulus_length' = 50
, 'response_time' = 5*max(dat$response_time)
, 'volume' = mean(dat$volume)
)
LS0tDQp0aXRsZTogIk1vZHVsZSAxOiBUdXRvcmlhbDogUmVncmVzc2lvbiINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KZWRpdG9yX29wdGlvbnM6IA0KICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lDQotLS0NCg0KIyBMb2dpc3RpYyBSZWdyZXNzaW9uDQoNCiMjIERlc2NyaXB0aW9uIG9mIGRhdGEgc2V0DQoNCldpdGhpbiB0aGlzIHN0dWR5LCB0aGUgcGFydGljaXBhbnRzIHdlcmUgYXVyYWxseSBwcmVzZW50ZWQgc3RyaW5nIGNvbWJpbmF0aW9ucyBhdCB2YXJ5aW5nIGF1ZGlvIHZvbHVtZXMuDQoNClRoZSBmaWxlIG1vZHVsZTEtYXVkaXRvcnlfc3RyaW5ncy5jc3YgY29udGFpbnMgdGhlIGZvbGxvd2luZyBkYXRhOg0KDQotICAgc3RpbXVsdXM6IGNoYXJhY3RlciBzdHJpbmcgdGhhdCB3YXMgYXVyYWxseSBwcmVzZW50ZWQNCg0KLSAgIGNvbmRpdGlvbjogdGhlIHZvbHVtZSBhdCB3aGljaCBpdCB3YXMgcHJlc2VudGVkICgxOiB2ZXJ5IHF1aWV0IHRvIDEwMDogdmVyeSBsb3VkKQ0KDQotICAgcmVzcG9uc2VfY29ycmVjdDogd2hldGhlciB0aGUgcmVzcG9uc2UgZ2l2ZW4gYnkgdGhlIHBhcnRpY2lwYW50IHdhcyBjb3JyZWN0IG9yIGluY29ycmVjdA0KDQotICAgcmVzcG9uc2VfdGltZTogcmVzcG9uc2UgdGltZSBpbiBzZWNvbmRzDQoNCiMjIFRhc2tzDQoNCjEuICBSZWFkIHRoZSBkYXRhIGZpbGUgbW9kdWxlMS1hdWRpdG9yeV9zdHJpbmdzLmNzdiBpbnRvIFIgKGFzc2lnbiBpdCB0byBhIHZhcmlhYmxlIGNhbGxlZCAiZGF0IikuDQoNCmBgYHtyfQ0KDQpgYGANCg0KMi4gIENyZWF0ZSB0aHJlZSBuZXcgdmFyaWFibGVzIGluIGRhdDoNCg0KICAgIC0gICAidm9sdW1lIiB0aGF0IGNvbnRhaW5zIHRoZSB2b2x1bWUgZnJvbSAiY29uZGl0aW9uIiBhcyBhIG51bWVyaWMgdmVjdG9yIChlLmcuLCA2MyBmb3IgdGhlICJjb25kaXRpb24iIHZvbHVtZV82MzsgSGludDogWW91IGNhbiB1c2UgdGhlIGZ1bmN0aW9uIGBzdHJfc3BsaXRfZml4ZWQoKWAgZnJvbSB0aGUgYHN0cmluZ3JgIHBhY2thZ2UpDQoNCiAgICAtICAgInN0aW11bHVzX2xlbmd0aCIgdGhhdCBjb250YWlucyB0aGUgbGVuZ3RoIG9mIHRoZSAic3RpbXVsdXMiIHZhcmlhYmxlIChIaW50OiBZb3UgY2FuIHVzZSB0aGUgZnVuY3Rpb24gYHN0cl9sZW5ndGhgIGZyb20gdGhlIGBzdHJpbmdyYCBwYWNrYWdlKQ0KDQogICAgLSAgICJyZXNwb25zZV9jb3JyZWN0IiB0aGF0IGNvbnRhaW5zIHRoZSB2YWx1ZSAxIHdoZW4gdGhlIHJlc3BvbnNlIHdhcyBjb3JyZWN0IGFuZCAwIG90aGVyd2lzZQ0KDQpgYGB7cn0NCjwNCmBgYA0KDQozLiAgRXN0aW1hdGUgYSBsb2dpc3RpYyByZWdyZXNzaW9uIG1vZGVsIGZvciAicmVzcG9uc2VfY29ycmVjdCIgYXMgdGFyZ2V0IGFuZCAidm9sdW1lIiwgInN0aW11bHVzX2xlbmd0aCIsIGFuZCAicmVzcG9uc2VfdGltZSIgYXMgZmVhdHVyZXMuIEhvdyB0byBpbnRlcnByZXQgdGhlIGNvZWZmaWNpZW50cyBvZiB0aGlzIG1vZGVsPyBJbiBvdGhlciB3b3Jkcywgd2hhdCdzIHRoZSBlZmZlY3Qgb2YgZWFjaCBmZWF0dXJlIG9uIHRoZSB0YXJnZXQgaW4gc3RhbmRhcmRpemVkIHVuaXRzIChpLmUuLCBvZGRzIHJhdGlvKT8NCg0KYGBge3J9DQoNCmBgYA0KDQo0LiAgVXNpbmcgdGhlIG1vZGVsIGZyb20gdGFzayAzLCBjYWxjdWxhdGUgdGhlIHByZWRpY3RlZCBwcm9iYWJpbGl0eSBmb3IgYSBjb3JyZWN0IHJlc3BvbnNlIGZvciBlYWNoIG9ic2VydmF0aW9uIGFuZCBzYXZlIGl0IGFzICJwcm9iX2NvcnJlY3RfcHJlZCIgaW4gZGF0Lg0KDQpgYGB7cn0NCg0KYGBgDQoNCjUuICBNYW51YWxseSBjYWxjdWxhdGUgdGhlIHByZWRpY3RlZCB2YWx1ZSBvZiAicmVzcG9uc2VfY29ycmVjdCIgdXNpbmcgYSBjdXRvZmYgdmFsdWUgb2YgMC41IGZvciB0aGUgcHJvYmFiaWxpdGllcyBjYWxjdWxhdGVkIGluIHRhc2sgNCBhbmQgc2F2ZSBpdCBhcyAicmVzcG9uc2VfY29ycmVjdF9wcmVkIiBpbiBkYXQuIElzIHRoZSByZXN1bHQgZXF1aXZhbGVudCB0byB0aGUgZGVmYXVsdCBwcmVkaWN0aW9uIG9mIGBtbHIzYCdzIGBwcmVkaWN0KClgIG1ldGhvZD8NCg0KYGBge3J9DQoNCmBgYA0KDQo2LiAgQXNzZXNzIHRoZSBwcmVkaWN0aW9uIHBlcmZvcm1hbmNlIG9mIHRoZSBtb2RlbCBieSBjb21wYXJpbmcgdGhlIGFjdHVhbCAicmVzcG9uc2VfY29ycmVjdCIgdG8gdGhlIHByZWRpY3RlZCAicmVzcG9uc2VfY29ycmVjdF9wcmVkIiBmcm9tIHRhc2sgNSB1c2luZyBhIGNvbnRpbmdlbmN5IHRhYmxlLiBXaGF0J3MgdGhlIHByZWRpY3Rpb24gYWNjdXJhY3kgb2YgdGhlIG1vZGVsPw0KDQpgYGB7cn0NCg0KYGBgDQoNCjcuICBDYWxjdWxhdGUgdGhlIHByZWRpY3RlZCBwcm9iYWJpbGl0eSBmb3IgYSBjb3JyZWN0IHJlc3BvbnNlIGZvciBhICJzdGltdWx1c19sZW5ndGgiIG9mIDMsIHRoZSBtZWFuICJ2b2x1bWUiLCBhbmQgdGhlIGZpcnN0IGFuZCB0aGlyZCBxdWFydGlsZXMgb2YgInJlc3BvbnNlX3RpbWUiLiBEbyB0aGUgbG9naXN0aWMgbW9kZWwncyBwcmVkaWN0aW9ucyBmb3IgInJlc3BvbnNlX2NvcnJlY3QiIGRpZmZlciBiZXR3ZWVuIHRoZSBmaXJzdCBhbmQgdGhpcmQgcXVhcnRpbGVzIG9mICJyZXNwb25zZV90aW1lIiB1c2luZyBhIGN1dG9mZiB2YWx1ZSBvZiAwLjU/DQoNCmBgYHtyfQ0KDQpgYGANCg0KOC4gIEJvbnVzOiBBbHNvIGVzdGltYXRlIGEgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwgZm9yIHRoZSBtb2RlbCBhcyBzcGVjaWZpZWQgaW4gdGFzayAzLCB0aGF0IGlzLCBmb3IgInJlc3BvbnNlX2NvcnJlY3QiIGFzIHRhcmdldCBhbmQgInZvbHVtZSIsICJzdGltdWx1c19sZW5ndGgiLCBhbmQgInJlc3BvbnNlX3RpbWUiIGFzIGZlYXR1cmVzLg0KDQpgYGB7cn0NCg0KYGBgDQoNCjkuICBCb251czogV2h5IGlzIHRoZSBsaW5lYXIgbW9kZWwgZXN0aW1hdGVkIGluIHRhc2sgOCBmdW5kYW1lbnRhbGx5IHdyb25nPyAoSGludDogVXNlIGJvdGggbW9kZWxzLCBpLmUuLCBsaW5lYXIgYW5kIGxvZ2lzdGljIHJlZ3Jlc3Npb24sIHRvIHByZWRpY3QgdGhlIHByb2JhYmlsaXR5IGZvciBhIGNvcnJlY3QgcmVzcG9uc2UgZm9yIHRoZSBmb2xsb3dpbmcgbmV3IGRhdGEgc2V0ICJkYXRfbmV3MiIgd2l0aCBleHRyZW1lbHkgaGlnaCAic3RpbXVsdXNfbGVuZ3RoIiwgYW5kIGNvbXBhcmUgdGhlIHJlc3VsdHMpDQoNCmBgYHtyfQ0KZGF0X25ldzIgPC0gZGF0YS5mcmFtZSgnc3RpbXVsdXNfbGVuZ3RoJyA9IDUwDQogICAgICAgICAgICAgICAgICAgICAgICwgJ3Jlc3BvbnNlX3RpbWUnID0gNSptYXgoZGF0JHJlc3BvbnNlX3RpbWUpDQogICAgICAgICAgICAgICAgICAgICAgICwgJ3ZvbHVtZScgPSBtZWFuKGRhdCR2b2x1bWUpDQogICAgICAgICAgICAgICAgICAgICAgICkNCmBgYA0KDQpgYGB7cn0NCg0KYGBgDQo=