As a software engineering researcher, you may need to mine data from GitHub.com or its Enterprise instances, as GitHub has become a—maybe even the major platform for collaborative software engineering. However, like any distributed system, GitHub is prone to errors, timeouts, and connection issues. Additionally, the strict rate limits add another challenge—demanding time and effort that, unfortunately, is often undervalued in academia.
In the followig, I implemented a GitHubRetry
class for the popular requests
Python library. The code is under MIT. Feel free to use and adjust it as you please. To see my code in action, please have a look into the following Gist:
Warning for GraphQL users: Although all HTTP errors can also happen when using the GraphQL API (andare properly handled by the GitHubRetry
class), errors in GraphQL show up in the repsonse JSON like
{"errors": [...]}
Keep that in mind, when using GraphQL API. It might require additional logic for your query.