-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
142 lines (93 loc) · 4.46 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
output:
md_document:
variant: markdown_github
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-",
warning = FALSE,
message=FALSE
)
library(baseballDBR)
```
# BaseballDBR <img src="man/figures/baseballDBR_hex.png" align="right" />
[![Build Status](https://travis-ci.org/keberwein/baseballDBR.png?branch=master)](https://travis-ci.org/keberwein/baseballDBR)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/baseballDBR)](http://www.r-pkg.org/badges/version/baseballDBR)
[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
# Install
* Install from CRAN
```{r eval=FALSE}
install.packages("baseballDBR")
```
* Or, install the latest development version from GitHub:
```{r eval=FALSE}
devtools::install_github("keberwein/baseballDBR")
```
# Gathering Data
The `baseballDBR` package requires data that is formatted similar to the [Baseball Databank](https://github.com/chadwickbureau/baseballdatabank) or Sean Lahman's [Baseball Database](http://www.seanlahman.com/baseball-archive/statistics/). The package also contains the `get_bbdb()` function, which allows us to download the most up-to-date tables directly from the Chadwick Bureau's GitHub repository. For example, we can easily load the "Batting" table into our R environment.
```{r}
library(baseballDBR)
get_bbdb(table = "Batting")
head(Batting)
```
### Use with the Lahman Package
```{r}
library(Lahman)
library(baseballDBR)
Batting <- Lahman::Batting
head(Batting)
```
# Adding Basic Metrics
Simple batting metrics can be easily added to any batting data frame. For example, we can add slugging percentage, on-base percentage and on-base plus slugging. Note that OPS and OBP appears as "NA" for the years before IBB was tracked.
```{r}
library(baseballDBR)
Batting$SLG <- SLG(Batting)
Batting$OBP <- OBP(Batting)
head(Batting, 3)
```
# Advanced Metrics
The package includes a suite of advanced metrics such as wOBA, RAA, and FIP, among others. Many of the advanced metrics require multiple tables. For example, the wOBA metric requires the Batting, Pitching, and Fielding tables in order to establish a player's regular defensive position.
```{r}
library(baseballDBR)
get_bbdb(table = c("Batting", "Pitching", "Fielding"))
Batting$wOBA <- wOBA(Batting, Pitching, Fielding, Fangraphs = T)
head(Batting, 3)
```
The code above uses [Fangraphs](http://www.fangraphs.com/guts.aspx?type=cn) wOBA values. The default behavior is to uses Tom Tango's adapted [SQL formula](http://www.insidethebook.com/ee/index.php/site/article/woba_year_by_year_calculations/). Other options include `Sep.Leagues`, which may act as a buffer to any bias created by the designated hitter.
```{r}
library(baseballDBR)
get_bbdb(table = c("Batting", "Pitching", "Fielding"))
Batting$wOBA <- wOBA(Batting, Pitching, Fielding, Fangraphs = F, Sep.Leagues = T)
head(Batting, 3)
```
We can also produce a data frame that only shows the wOBA multipliers. Notice the Fangraphs wOBA multipliers slightly differ from the Tango multipliers.
```{r}
library(baseballDBR)
get_bbdb(table = c("Batting", "Pitching", "Fielding"))
fangraphs_woba <- wOBA_values(Batting, Pitching, Fielding, Fangraphs=T)
head(fangraphs_woba, 3)
tango_woba <- wOBA_values(Batting, Pitching, Fielding, Fangraphs=F)
head(tango_woba, 3)
```
# Create Local Database
A relational database is not needed to work with these data. However, we may want to store the data to be called more quickly at a later time. We can download all of the tables at once with the `get_bbdb()` function and then write them to an empty schema in our favorite database. The example uses a newly created PostgreSQL instance, but other database tools can be used assuming an appropriate R package exists.
```{r, eval=F}
library(baseballDBR)
library(RPostgreSQL)
# Load all tables into the Global Environment.
get_bbdb(AllTables = TRUE)
# Make a list of all data frames.
dbTables <- names(Filter(isTRUE, eapply(.GlobalEnv, is.data.frame)))
# Load data base drivers and load all data frames in a loop.
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, host= "localhost", dbname= "lahman", user= "YOUR_USERNAME", password = "YOUR_PASSWORD")
for (i in 1:length(dbTables)) {
dbWriteTable(con, name = dbTables[i], value = get0(dbTables[i]), overwrite = TRUE)
}
# Disconnect from database.
dbDisconnect(con)
rm(con, drv)
```