Skip to content
This repository was archived by the owner on May 29, 2018. It is now read-only.
This repository was archived by the owner on May 29, 2018. It is now read-only.

Building the dependency graph #3

@bmcfee

Description

@bmcfee

Most research software does not actually get cited directly. For example, a paper might cite sklearn but not numpy, or numpy but not BLAS, etc. Consequently, most research software is only cited implicitly.

To try and fill in the implied citation network, we can extract software dependencies from known repositories. This can take a few forms:

  • Python packages that use setuptools define their dependencies explicitly, and these are stored in a well-structured object that's easy to parse.
  • What about R?
  • What about MATLAB?
  • What about C/C++?

Alternatively, once we have a list of top-level packages, we can start crawling package management hierarchies:

  • Debian/ubuntu/etc
  • PyPI
  • Mathworks file exchange?
  • What about Mac users: anaconda? brew? ports?

Once we have a full tree, we'll have to prune it back to some reasonable level. It might be useful to include something like boost, but libc would obviously be a step too far. Where do we draw the line? Can this be automated?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions