This repository was archived by the owner on May 29, 2018. It is now read-only.

Description
Most research software does not actually get cited directly. For example, a paper might cite sklearn but not numpy, or numpy but not BLAS, etc. Consequently, most research software is only cited implicitly.
To try and fill in the implied citation network, we can extract software dependencies from known repositories. This can take a few forms:
- Python packages that use setuptools define their dependencies explicitly, and these are stored in a well-structured object that's easy to parse.
- What about R?
- What about MATLAB?
- What about C/C++?
Alternatively, once we have a list of top-level packages, we can start crawling package management hierarchies:
- Debian/ubuntu/etc
- PyPI
- Mathworks file exchange?
- What about Mac users: anaconda? brew? ports?
Once we have a full tree, we'll have to prune it back to some reasonable level. It might be useful to include something like boost, but libc would obviously be a step too far. Where do we draw the line? Can this be automated?