Skip to content

Latest commit

 

History

History
241 lines (200 loc) · 7.5 KB

README.md

File metadata and controls

241 lines (200 loc) · 7.5 KB

MLB Data Lab

mlb-data-lab is a Python-based application and library that generates comprehensive advanced stat summary sheets for MLB players, customizable by year, providing in-depth analysis and visualizations. It can also be used as a library module, enabling users to develop their own features and extend functionality for custom applications and data processing needs. By leveraging the pybaseball module, MLB-StatsAPI module, and other Python libraries, the project facilitates the collection, analysis, and formatting of data for use in reports, dashboards, and other analytical tools.

The project sources data from MLB and Fangraphs, ensuring accurate and up-to-date statistics. Future updates will expand the application's features and functionality, allowing it to serve both as a standalone tool and as a library for integration into other projects.

Sample Summary Sheets

Below are samples of the summary sheets that can be generated by this project. The first sample is a Batting Summary for Riley Greene for the 2024 season. The second sample is a Pitching Summary for Tarik Skubal for the 2024 season.

Riley Greene Batter Sheet                  Tarik Skubal Pitcher Sheet

In addition to the baseball stats you would expect, the summary sheets also include the following "advanced" stats:

Batters Pitchers
BB% UBR   K/9 Opponent Avg Swing %
K% wRC   BB/9 WHIP Splits
OBP wRAA   K/BB BABIP
SLG wOBA   H/9 LOB%
OPS wRC+   HR/9 ERA-
ISO WAR   K% FIP-
Spd Splits   BB% FIP
BABIP     K-BB% RS/9

Project Structure

The mlb-data-lab project is organized as follows:

mlb_stats/
├── README.md              # Project documentation
├── setup.py               # Setup file for packaging and installation
├── requirements.txt       # Dependencies for the project
├── mlb_stats/
│   ├── apis/
│   │   ├── stats_api.py   # API client for fetching MLB stats
│   │   ├── fangraphs_client.py # API client for Fangraphs data
│   ├── components/
│   │   ├── stats_table.py # Class for generating stats tables
│   ├── data/
│   │   ├── 
│   ├── data_viz/
│   │   ├── batting_spray_chart.py
│   │   ├── pitch_break_plot.py
│   │   ├── pitch_breakdown_table.py
│   │   ├── pitch_velocity_distribution_plot.py
│   │   ├── rolling_pitch_usage_plot.py
│   │   ├── plotting.py
│   ├── player/
│   │   ├── player.py
│   │   ├── player_bio.py
│   │   ├── player_info.py
│   │   ├── player_lookup.py
│   ├── stats/
│   │   ├── 
│   ├── summary_sheets/
│   │   ├── batter_summary_sheet.py
│   │   ├── pitcher_summary_sheet.py
│   │   ├── summary_sheet.py
│   │   ├── team_summary_sheet.py
│   ├── team/
│   │   ├── roster.py
│   │   ├── team.py
│   ├── config.py
│   ├── constants.py
│   ├── utils.py
├── scripts/
│   ├── generate_player_summary.py
│   ├── save_statcast_data.py
├── tests/
│   ├── 

Description of Key Directories:

  • mlb_stats/: Core application logic and components.
    • apis/: API clients for retrieving stats from external services like MLB and Fangraphs.
    • components/:
    • data/:
    • data_viz/:
    • player/:
    • stats/:
    • summary_sheets/:
    • team/:
  • scripts/: Scripts for generating player summary sheets and saving statcast data.
  • tests/: Unit tests for verifying the functionality of various components and modules.

Installation

To get started with the project, follow these steps:

  1. Clone the repository:
git clone https://github.com/timothyf/mlb_stats.git
cd mlb_stats
  1. Set up a Python virtual environment (optional but recommended):
python3 -m venv venv
source venv/bin/activate
  1. Install the required dependencies:
pip install -r requirements.txt

Usage

Generating Player Summary Sheets

There are several scripts in the scripts directory for some basic functionality:

python scripts/generate_player_summary.py [options]

Options:
    --players [1 or more player names]
    --teams [1 or more team names]
    --year [specify a 4-digit year]

Saving Statcast Data

Run the project by executing the main script in the scripts directory:

python scripts/save_statcast_data.py [options]

    --players [1 or more player names]
    --teams [1 or more team names]
    --year [specify a 4-digit year]

Examples

Generate a player sheet for Riley Greene

python scripts/generate_player_summary.py --players 'Riley Greene'

Output:
mlb_stats/output/2024/Tigers/batter_summary_riley_greene.png

Riley Greene Batter Sheet

Generate a player sheets for all of the 2024 Detroit Tigers

python scripts/generate_player_summary.py --teams 'Detroit Tigers' --year 2024

Inspiration

This project was inspired by my time working in the R&D department of the Washington Nationals, and the pitching summary project from Thomas Nestico. Here is a link to an article describing his project:

https://medium.com/@thomasjamesnestico/creating-the-perfect-pitching-summary-7b8a981ef0c5

Copyright Notice

This package and its author are not affiliated with MLB or any MLB team. This API wrapper interfaces with MLB's Stats API. Use of MLB data is subject to the notice posted at http://gdx.mlb.com/components/copyright.txt.

<style> table td.batter-col { background-color: lightblue; color: black; } table td.pitcher-col { background-color: lightgreen; color: black; } </style>