Skip to main content

Retry for GitHub API

Research Software Engineering

As a software engineering researcher, you may need to mine data from GitHub.com or its Enterprise instances, as GitHub has become a—maybe even the major platform for collaborative software engineering. However, like any distributed system, GitHub is prone to errors, timeouts, and connection issues. Additionally, the strict rate limits add another challenge—demanding time and effort that, unfortunately, is often undervalued in academia.

In the followig, I implemented a GitHubRetry class for the popular requests Python library. The code is under MIT. Feel free to use and adjust it as you please. To see my code in action, please have a look into the following Gist:

Warning for GraphQL users: Although all HTTP errors can also happen when using the GraphQL API (andare properly handled by the GitHubRetry class), errors in GraphQL show up in the repsonse JSON like

{"errors": [...]}

Keep that in mind, when using GraphQL API. It might require additional logic for your query.

Michael Dorner
Author
Michael Dorner
Software engineering researcher