While working on my PhD, I made heavy use of statistical modeling to better understand the processes I was studying. R was my language of choice, and that of my peers in life science academia. Given R’s primary purpose as a language for statistical computing, it is unparalleled when it comes to building linear models.
As my project progressed, the types of data problems I was working on changed. The volume of data increased, and the goal of each experiment became more complex and varied. I was now working with many more variables, and problems such as how to visualize the patterns in data became more difficult. I found myself more frequently interested in making predictions on new data, rather than, or in addition to, just understanding the underlying biology itself. Sometimes, the complex relationships in the data were difficult to represent manually with traditional modeling methods. At other times, I simply wanted to know how many distinct groups existed in the data.