Skip to content

Script that pulls data related to a github repo and visualizes it as timeseries

Notifications You must be signed in to change notification settings

jaydeluca/repo-metrics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Repo Metrics

Random collection of tools to pull and visualize various data about a repository as timeseries metrics.

Functionality:

  • Timeseries file count tracking: for tracking migration projects from one language to another.
  • Pull benchmark data and visualize as timeseries

Setup

A github token is not required but it is recommended as you will get rate limited if you make too many unauthenticated calls

export GITHUB_TOKEN="insert-your-token"
make install

Test / linting

# Make sure you install pytest and ruff
make install

# Tests
make test

# Linting
make lint

File Count Tracking

Arguments

Argument Command Description Example
Repository -r, --repo Repository name. --repo "open-telemetry/opentelemetry-java-instrumentation"
Start Date -s, --start Starting Date in format %Y-%m-%d (will calculate from this date until now) --start "2022-11-15"
Interval -i, --interval Interval (in days) between data points --interval 14
Languages -l, --languages Languages to count (in comma separated list) --languages "java,groovy"
Output -o, --output (Optional) Location where file should be saved --output "./media/example.png"

Example Usage:

In the open-telemetry/opentelemetry-java-instrumentation repository, track the conversion of tests from groovy to java in the instrumentation directory starting from 2022-11-15 with a data point every 2 weeks.

python main.py -r "open-telemetry/opentelemetry-java-instrumentation" -l "groovy,java" -s "2022-11-15" -i 14

Output:

Example

Now just groovy

python main.py -r "open-telemetry/opentelemetry-java-instrumentation" -l "groovy" -s "2022-11-15" -i 14

Output:

Example

Benchmark Visualization

This is very specific to the open-telemetry/opentelemetry-java-instrumentation repo

Arguments

Argument Command Description Example
Repository -r, --repo Repository name. --repo "open-telemetry/opentelemetry-java-instrumentation"
Start Date -s, --start Starting Date in format %Y-%m-%d (will calculate from this date until now) --start "2022-11-15"
Interval -i, --interval Interval (in days) between data points --interval 14

Example Usage:

Chart Min and max heap starting from 2022-02-14 with a data point every 30 days.

python benchmark.py -r "open-telemetry/opentelemetry-java-instrumentation" -s "2022-02-14" -i 30

Output:

Example

Detail Count by Instrumentation module

Arguments

Argument Command Description Example
Repository -r, --repo Repository name. --repo "open-telemetry/opentelemetry-java-instrumentation"
Language -l, --language Language to get count for --language "groovy"
Output -o, --output (Optional) Location where file should be saved --output "./media/example.png"

Example Usage:

In the open-telemetry/opentelemetry-java-instrumentation repository, analyze the files in test directories in the instrumentation directory and outputting counts by module.

python count_by_instrumentation.py -r "open-telemetry/opentelemetry-java-instrumentation" -l "groovy"

Output:

Key File Count Total File Size
servlet 23 143182
aws-sdk 17 127827
jdbc 8 90890
elasticsearch 15 90341
jaxrs 37 72352
vertx 14 56845
ratpack 16 51932
mongo 10 51661
play 15 48986
restlet 22 38226
jaxws 17 21595
kafka 3 21505
twilio-6.6 1 18936
jaxrs-client 5 16067
spymemcached-2.12 1 15630
undertow-1.4 3 12754
hibernate 1 12167
dropwizard 2 10789
rediscala-1.8 1 3898
grails-3.0 3 3201
internal 2 2603
Total 216 911387

Example

Approach

  • Query Github for point in time snapshots based on commits around times spanning a timeframe
    • Get one data point every interval (example: every 14 days)
    • Filter based on some criteria
    • Cache this data locally to avoid repeated api calls
  • Generate Graph to show results over time frame

About

Script that pulls data related to a github repo and visualizes it as timeseries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published