-
Notifications
You must be signed in to change notification settings - Fork 123
GSOC 2022
This page contains information for students and anyone else interested in participating and helping with the program.
AboutCode is a family of FOSS projects to uncover data ... about software code:
- where does the code come from? which software package?
- what is its license? copyright?
- is the code vulnerable, maintained, well coded?
All these are questions that are important to answer: there are million of free and open source software components available on the web for reuse.
Knowing where a software package comes from, what is its license and if it is vulnerable and what's its licensing should be a problem of the past such that everyone can safely consume more free and open source software.
Join us to make it so!
Our tools are used to help detect and report the origin and license of source code, packages and binaries as well as discover software and package dependencies, and in the future track security vulnerabilities, bugs and other important software package attributes. This is a suite of command line tools, web-based and API servers and desktop applications.
-
ScanCode Toolkit is a popular command line tool to scan code for licenses, copyrights and packages, used by many organizations and FOSS projects, small and large.
-
Scancode.io is a web-based and API to run and review scans in rich scripted ScanPipe pipelines.
-
Scancode Workbench is a JavaScript, Electron-based desktop application to review scan results and document your origin and license conclusions.
-
AboutCode Toolkit is a command line tool to document and inventory known packages and licenses and generate attribution docs, typically using the results of analyzed and reviewed scans.
-
TraceCode Toolkit is a command line tool to find which source code file is used to create a compiled binary by tracing and graphing a build.
-
DeltaCode is a command line tool to compare scans and determine if and where there are material differences that affect licensing.
-
container-inspector is a command line tool to analyze the code in Docker and container images and a low-level library to handle this
-
VulnerableCode is a web-based API and database to collect and track known software package vulnerabilities.
-
license-expression is a library to parse, analyze, simplify and render boolean license expression (such as SPDX)
We have also co-founded and contributed to important projects for other organizations:
-
Package URL which is an emerging standard to reference software packages of all types with simple, readable and concise URLs.
-
SPDX aka. Software Package Data Exchange, a spec to document the origin and licensing of packages.
-
ClearlyDefined to review and help FOSS projects improve their licensing and documentation clarity.
Join the chat online or by IRC at https://gitter.im/aboutcode-org/discuss Introduce yourself and start the discussion!
For personal issues, you can contact the primary org admin directly: @pombredanne and [email protected]
Please ask questions the smart way: http://www.catb.org/~esr/faqs/smart-questions.html
Discovering the origin of code is a vast topic. We primarily use Python with some C/C++ , Rust and Go for performance sensitive code. We use Electron and JavaScript for our ScanCode Workbench.
Our domain includes text analysis and processing (for instance for copyrights and licenses detection), parsing (for package manifest formats), binary analysis (to detect the origin and license of binaries, primarily based on the corresponding source code), Web-based tools and APIs (to expose the tools and libraries as Web Services) and low-level data structures for efficient matching (such as Aho- Corasick and other automata).
Incoming students will need the following skills:
- Intermediate to strong Python programming. For some projects, strong C/C++ and/or Rust may be needed.
- Familiarity with git as a version control system
- Ability to set up your own development environment
- An interest in FOSS licensing and software composition analysis.
We are happy to help you get up to speed, but the more you are able to demonstrate ability and skills in advance, the more likely we are to choose your application!
We expect your application to be in the range of 1000 words. Anything less than that will probably not contain enough information for us to determine whether you are the right person for the job. Your proposal should contain at least the following information, plus anything you think is relevant:
-
Your name
-
Title of your proposal
-
Abstract of your proposal
-
Detailed description of your idea including explanation on why is it innovative and what it will contribute to the project
-
hint: explain your data structures and you planned main processing flows in details.
-
Description of previous work, existing solutions (links to prototypes, bibliography are more than welcome)
-
Mention the details of your academic studies, any previous work, internships
-
Relevant skills that will help you to achieve the goal (programming languages, frameworks)?
-
Any previous open-source projects (or even previous GSoC) you have contributed to and links.
-
Do you plan to have any other commitments during GSoC that may affect your work? Any vacations/holidays? Will you be available full time to work on your project? (Hint: do not bother applying if this is not a serious full time commitment during the GSoC time frame)
Join the chat online or by IRC at https://gitter.im/aboutcode-org/discuss introduce yourself and start the discussion!
The best way to demonstrate your capability would be to submit a small patch
ahead of the project selection for an existing issue or a new issue.
We will always consider and prefer a project submissions where you have
submitted a patch over any other submission without a patch.
You can pick any project idea from the list below. If you have other ideas that are not in this list, contact the team first to make sure it makes sense.
Here is a list of candidate project ideas for your consideration. Your own ideas are welcomed too! Please chat about them to increase your chances of success!
- Improve Copyright detection accuracy and speed in ScanCode
- Improve package-specific license detection quality. These are similar projects focused on different package types.
The long term goal is to work closely with each package ecosystem to contribute the improved license data upstream.
- Improve NPM package license detection results
- Improve PyPI package license detection results
- Improve Maven package license detection results
- Improve Ruby package license detection results
- Improve Go package license detection results
- Improve Debian package license detection results
- Improve Perl package license detection results
- Improve RPM package license detection results
- Improve file classification in ScanCode
- Create docs automatically from scancode data
- Prototype a license detection view for scancode.io
- Integration of alternative code analysis tools and scanners
- Create web application for massive scanning campaign of a whole package ecosystem
See https://github.com/nexB/vulnerablecode/wiki/Project-Ideas for a detailed list of projects.