class: title-slide <!-- https://pelledolce.com/wp-content/uploads/abalones.jpg --> <!-- https://miro.medium.com/max/3478/1*W3e117artUa9v4JrpjX9Gw.jpeg --> background-image: url("./assets/abalone.png") background-position: 100% 50% background-size: 50% 100% # .white[Abalone Age] ## Group Project ### M09B Early 5 ### November 2020 --- exclude: true --- ## Data Description - Abalone - Dataset concerning .brand-red[**marine snails**], via Marine Research Laboratories, Taroona. |Variable Name |Variable Type |Description | |:--------------|:-------------|:----------------------------------------| |Sex |Factor |Adult, Female, or Infant (Unidentified) | |Length |Continuous |Longest shell measurement | |Diameter |Continuous |Perpendicular to length | |Height |Continuous |With meat in shell | |Whole Weight |Continuous |Grams whole abalone | |Shucked Weight |Continuous |Grams weight of meat | |Viscera Weight |Continuous |Grams gut weight (after bleeding) | |Shell Weight |Continuous |Grams after being dried | |Rings |Integer |Number of Rings. +1.5 gives age in years | .footnote[Table 1: Variables description of abalone dataset] --- ## Examining outliers <img src="index_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> .footnote[Figure 1: Examining outliers - abalone rings against height] --- ## Investigation Question, Initial Inspection and Transformations .brand-red[* Question: Can we generate a model to predict the number of rings an abalone has, purely from easily measured physical attributes?] .pull-left[ ### Initial
.small[.brand-red[Excluding `Sex` factor.]] ] .pull-right[ ### Transformed
.small[.brand-red[Including `Sex`; utilising dummy-coding to construct a contrast matrix.]] ] .footnote[Figure 2 and Figure 3: Correlation Matrices] --- .pull-left[## .brand-red[Initial, excluding `sex`] ![](index_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ] .pull-right[## .brand-red[Transformed, including `sex`] ![](index_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] .footnote[Figure 4 and Figure 5: Examination of data outputs before and after transformation] --- ## Model Selection <table style="border-collapse:collapse; border:none;"> <tr> <th style="border-top: double; text-align:center; font-style:italic; font-weight:normal; padding:0.2cm; border-bottom:1px solid black; text-align:left; "> </th> <th colspan="2" style="border-top: double; text-align:center; font-style:italic; font-weight:normal; padding:0.2cm; border-bottom:1px solid black;">Forward model</th> <th colspan="2" style="border-top: double; text-align:center; font-style:italic; font-weight:normal; padding:0.2cm; border-bottom:1px solid black;">Backward model</th> </tr> <tr> <td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; border-bottom:1px solid black; text-align:left; ">Predictors</td> <td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; border-bottom:1px solid black; ">Estimates</td> <td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; border-bottom:1px solid black; ">p</td> <td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; border-bottom:1px solid black; ">Estimates</td> <td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; border-bottom:1px solid black; ">p</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">(Intercept)</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">1.43</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">1.45</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_shell</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">0.11</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">0.11</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_shucked</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">-0.19</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">-0.19</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_whole</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">0.19</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">0.20</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">sex_i</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">-0.02</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">-0.01</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_viscera</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">-0.03</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">-0.02</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">sqrt_height</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">0.13</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong>0.007</strong></td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">0.12</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong>0.012</strong></td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_diam</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "></td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "></td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">0.07</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong>0.005</strong></td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_length</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "></td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "></td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">-0.08</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong>0.005</strong></td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm; border-top:1px solid;">Observations</td> <td style=" padding:2px;text-align:left;vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:center; border-top:1px solid;" colspan="2">4175</td> <td style=" padding:2px;text-align:left;vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:center; border-top:1px solid;" colspan="2">4175</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">R<sup>2</sup> / R<sup>2</sup> adjusted</td> <td style=" padding:2px;text-align:left;vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:center;" colspan="2">0.647 / 0.647</td> <td style=" padding:2px;text-align:left;vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:center;" colspan="2">0.648 / 0.647</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">AIC</td> <td style=" padding:2px;text-align:left;vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:center;" colspan="2">-10882.310</td> <td style=" padding:2px;text-align:left;vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:center;" colspan="2">-10887.886</td> </tr> </table> .footnote[Table 2: Results for Forward and Backward AIC model selection] --- ## Assumption Checking - Independence <image src="./assets/Independence.png" style="height:500px; margin-left:auto;margin-right:auto;display:block;"> .footnote[Figure 6: Tasmania and Bass Strait, showing the five survey areas of abalones] --- ## Assumption Checking - Forward .pull-left[ ![](index_files/figure-html/unnamed-chunk-10-1.png)<!-- --> ] .pull-right[ ![](index_files/figure-html/unnamed-chunk-11-1.png)<!-- --> ] .footnote[Figure 7 and Figure 8: Residuals against Fitted and QQ plot of Forward model] --- ## Assumption Checking - Back .pull-left[ ![](index_files/figure-html/unnamed-chunk-12-1.png)<!-- --> ] .pull-right[ ![](index_files/figure-html/unnamed-chunk-13-1.png)<!-- --> ] .footnote[Figure 9 and Figure 10: Residuals against Fitted and QQ plot of Backward model] --- ## Comparing RMSE and MAE .pull-left[ ![](index_files/figure-html/unnamed-chunk-16-1.png)<!-- --> ] .pull-right[ ![](index_files/figure-html/unnamed-chunk-17-1.png)<!-- --> ] .footnote[Figure 11 and Figure 12: Comparing RMSE and MAE values between forward and backward models] --- ## Final model .pull-left[ <table style="border-collapse:collapse; border:none;"> <tr> <th style="border-top: double; text-align:center; font-style:italic; font-weight:normal; padding:0.2cm; border-bottom:1px solid black; text-align:left; "> </th> <th colspan="2" style="border-top: double; text-align:center; font-style:italic; font-weight:normal; padding:0.2cm; border-bottom:1px solid black;">Final model</th> </tr> <tr> <td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; border-bottom:1px solid black; text-align:left; ">Predictors</td> <td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; border-bottom:1px solid black; ">Estimates</td> <td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; border-bottom:1px solid black; ">p</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">(Intercept)</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">1.45</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_whole</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">0.20</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_shucked</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">-0.19</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_viscera</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">-0.02</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_shell</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">0.11</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_diam</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">0.07</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong>0.005</strong></td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">log_length</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">-0.08</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong>0.005</strong></td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">sqrt_height</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">0.12</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong>0.012</strong></td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; ">sex_i</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; ">-0.01</td> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:center; "><strong><0.001</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm; border-top:1px solid;">Observations</td> <td style=" padding:2px;text-align:left;vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:center; border-top:1px solid;" colspan="2">4175</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">R<sup>2</sup> / R<sup>2</sup> adjusted</td> <td style=" padding:2px;text-align:left;vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:center;" colspan="2">0.648 / 0.647</td> </tr> <tr> <td style=" padding:2px;text-align:left;vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">AIC</td> <td style=" padding:2px;text-align:left;vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:center;" colspan="2">-10887.886</td> </tr> </table> ] .pull-right[ > `\(\widehat{\sqrt{log(rings)}} = 1.45 + 0.20 log(whole)\)` > > `\(- 0.19 log(shucked) - 0.02 log(viscera)\)` > > `\(+ 0.11 log(shell) + 0.07 log(diameter)\)` > > `\(- 0.08 log(length) + 0.12 \sqrt{height}\)` > > `\(- 0.01 Sex_{infant}\)` ## .brand-red[Discussion/Question] - Able to predict age. - Unexplained variance of 0.353. - Further research into Abalone factors that can help predict age. ] .footnote[Table 3: Backward model as Final model] --- ## Citations - Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686 - Broman KW (2015) R/qtlcharts: interactive graphics for quantitative trait locus mapping. Genetics 199:359-361 doi:10.1534/genetics.114.172742 - Taiyun Wei and Viliam Simko (2017). R package "corrplot": Visualization of a Correlation Matrix (Version 0.84). Available from https://github.com/taiyun/corrplot - Barret Schloerke, Di Cook, Joseph Larmarange, Francois Briatte, Moritz Marbach, Edwin Thoen, Amos Elberg and Jason Crowley (2020). GGally: Extension to 'ggplot2'. R package version 2.0.0. https://CRAN.R-project.org/package=GGally - Yuan Tang, Masaaki Horikoshi, and Wenxuan Li. "ggfortify: Unified Interface to Visualize Statistical Result of Popular R Packages." The R Journal 8.2 (2016): 478-489. - Masaaki Horikoshi and Yuan Tang (2016). ggfortify: Data Visualization Tools for Statistical Analysis Results. https://CRAN.R-project.org/package=ggfortify - Max Kuhn (2020). caret: Classification and Regression Training. R package version 6.0-86. https://CRAN.R-project.org/package=caret - Lüdecke D (2020). _sjPlot: Data Visualization for Statistics in Social Science_. R package version 2.8.6, <URL: https://CRAN.R-project.org/package=sjPlot>.