Software Heritage: building a community to safeguard the Software Commons

Speaker: Nicolas Dandrimont

Track: Community, diversity, local outreach and social context

Type: Long talk (45 minutes)

Room: Anamudi

Time: Sep 10 (Sun): 11:30

Duration: 0:45

Since its inception in 2015, the Software Heritage project has been regularly archiving over 200 million Free Software projects. To date, this amounts to more than 15 billion unique source code files archived, over more than 3 billion individual commits, through support for many version control systems (git, Mercurial, Breezy/Bazaar, Subversion, CVS), hosting platforms, and package managers (Debian, RPM, PyPI, NPM, Rubygems, CTAN, opam, Composer, Bower, and many more), all developed as Free Software for all to see and contribute.

To sustain its growth, Software Heritage has had to evolve a lot, in terms of technical stack as well as in terms of organization, introducing a system of grants allowing third-party contributors to be paid to implement specific features in the archive, greatly improving our coverage and user-accessible features. We’re also in the final steps of building an initial network of mirrors of the archive, hosted by third-party organizations, increasing the resilience of the project.

During this talk, we will showcase the key features that the Software Heritage project offers to the community to help it safeguard the Software Commons: the archive itself, of course, Save Code Now, Add Forge Now, the Vault, the Deposit system, as well as dataset exports and compressed graph representation both allowing the software mining and cybersecurity communities to do large-scale analysis over the whole Software Commons.

We will also lift the veil and show how the technical underpinnings of Software Heritage have evolved (or not) since we’ve last presented it at DebConf, showing how we efficiently store dozens of billions of small files, how we store a graph with hundreds of billions of edges to update it efficiently, and how we’re putting together working mirrors of all of this.

URLs