-
Notifications
You must be signed in to change notification settings - Fork 123
MeetingMinutes
We meet online on Mondays at 16:00 UTC as a reference. See https://www.timeanddate.com/worldclock/meeting.html to get the time in your timezone.
Join us at https://meet.jit.si/AboutCode
Here are the running meeting notes:
Participants:
- Tushar @tg1999
- phillipe @pombredanne
- Keshav @keshavspace
- Ayan @AyanSinhaMahapatra
- swastik sharma @swastkk
- Jay @35C4n0r
- Shrey Parekh
- Shrijal Acharya
Agenda:
- GSoC
- corrupted advisories
- yaml output
- scancode toolkit reference scans
- packaging and operating system support
- cylconedx input in scancode.io
Discussion:
- Need to review swastik's PR: https://github.com/nexB/python-inspector/pull/119
- should we use both cyclonedx libraries from the cyclonedx-python and the hoppr library? - Keshav links: https://gitlab.com/hoppr/hoppr-cyclonedx-models/ and https://github.com/CycloneDX/cyclonedx-python short term: working with these projects to merge features We don't use XML and don't care about old versions. The hoppr library does for the last 2 cyclonedx versions, and it uses the JSON schema to create the models. We can start using hoppr/hoppr-cyclonedx-models in scancode.io and then maybe later we can use it in scancode-toolkit too.
- JSON to XML conversion for cyclonedx -> library exists which works as a single executable in linux/windows/mac.
- advisories which were imported by previous importers, which aren't compatible to current models. We can delete everything from a importer, when we are reimporting from the same. There's a problem of stale and outdated data, and there's a problem of not discarding data that is used elsewhere also. We can consider archiving for this, or consider adding a deprecated flag.
- more people running non-intel architechture, which doesn't work The key thing would be a single executable: like Jono's work on a scancode.io appimage. We should also have app archives for all python versions which is python 3.7-3.11 and in linux/mac/windows. No arm for now, but would be nice. Another thing would be https://github.com/nexB/scancode-toolkit/issues/3205 If we are using other libraries, we have to write wrappers on them to match the same API. Serializing is another problem. Pyahocorasick is going to be the hardest, as this is a trie structure and saving/loading from disk is not simple.
- https://github.com/nexB/aboutcode/wiki/GSOC-2023 GSoC project ideas were discussed, and we need to further edit this and make all the projects have a clear goal and some detailed instructions to explain them better, Ideas related to vulnerablecode will be discussed in the vulnerabelcode call tomorrow see https://github.com/nexB/vulnerablecode/wiki/WeeklyMeetings.
- We uncovered that the scancode yaml output does not produce valid yaml in certain cases where there are license references and/or matched text in the yaml output and the license text has whitespaces/blank lines. for example, happens in the case of apache-2.0 license text. The solution can't be just to remove whitespaces as they are important, but the check has to be done at saneyaml and we have to produce valid yaml there.
- scancode-toolkit-reference-scan scripts are not working because of the dependency issues present while pip installing older versions, and maybe we should be using git checkout instead of pip install here.
Participants:
- Tushar @tg1999
- Jay @35C4n0r
- phillipe @pombredanne
- swastik sharma @swastkk
- Keshav @keshavspace
- Ayan @AyanSinhaMahapatra
- Jono @jyang
- Akhil @lf32
- Heet Dhorajiya
Agenda:
- scancode.io appimage
- dependency issues
- scancode-toolkit release
- GSoC project ideas
- skeleton
Discussion:
- https://github.com/nexB/scancode.io/tree/scancode.io-appimage/etc/scripts/appimage-build
- https://github.com/nexB/skeleton#usage
- Tushar Goel says:assert req is None or isinstance(req, Requirement), req
- https://github.com/nexB/python-inspector/pull/115
- https://github.com/nexB/packvers/issues/2
- https://www.tdcommons.org/dpubs_series/5632/
Participants:
- Tushar @tg1999
- Hritik @Hritik14
- Jay @35C4n0r
- phillipe @pombredanne
- swastik sharma @swastkk
- Keshav @keshavspace
Agenda:
-
Hritik - nothing
-
Swastik Sharma - SCIO: Issue on SCIO problem with installing with LegacyVersion and SPDX
These are due to https://github.com/pypa/packaging/issues/530 solved with https://github.com/nexB/packvers/ and the SPDX tools uypdates https://github.com/nexB/scancode-toolkit/pull/3173
-
Keshav - VCIO: discuss https://hex.pm/ and Exlixir advisory
-
Philippe - SCIO/SCTK: SPDX library issues - Get ready for planning next week
-
Tushar: - VCIO: About a day away to get all importers migrated for VC - VCIO: made release for VC 31 - VCIO: Will need hex in GH importer alright
-
35C/Ajay - FetchCode: made 2 pr in FetchCode - question wrt. https://github.com/nexB/scancode-toolkit/issues/3138
A: there are some likely updates in https://github.com/nexB/scancode-toolkit/pull/3150
- Question: what are scancode toolkit plugins?
Participants:
- Tushar @tg1999
- Chirag Bablani
- swastik sharma
- Omkar
- Hritik
Agenda:
- Packaging issues discovered in scancode.io
Discussion:
- Swastik brought up the breaking of the packaging library in scancode.io https://github.com/nexB/scancode.io/issues/576
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- akhil @lf32
- ajay @35C4n0r
- phillipe @pombredanne
- swastik sharma
Agenda:
- dark mode in scancode.io
- scantext PR
- nuget inspector
- scancode-toolkit release
Discussion:
- work on scancode license detection follow up PR is almost complete except a few minor improvements, we should be able to do the final review tomorrow, and try to get a beta release out this week.
- more checks in scantext? also tests are not completed. It can be done in the open PR too if these are critical, otherwise can be done in subsequent PRs too.
- dark mode in scancode.io: https://blog.openreplay.com/implementing-dark-mode-with-bulma/ this will be also in vulnerablecode, we can also thinka about making a common UI repo, as this is reused in scancode.io. vulnerablecode and now also in purlDb.
- On https://github.com/nexB/fetchcode/issues/64: it would be better to introduce some parameters here to do this rather than removing the code.
- minimal first version out of nuget-inspector, we got some issues reported and we have something which works better now, added support for metadata and target frameworks. PR: nexB/nuget-inspector#9
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- Jono Yang @jyang
- phillipe @pombredanne
Agenda:
- license sync script bugs
- scancode release
Participants:
- Tushar @tg1999
- C35
- Chirag
- phillipe @pombredanne
Agenda:
- Chirag is looking for good first issues to work on - Tushar and Philippe pointed out that we may have a few bite-sized good first issues possibly with https://github.com/nexB/vulnerablecode/issues/597 - Such as project_kb_msr2019. You should reach out online for extra details.
- Philippe working on paper and design for federated data collection and sharing
- public release of matchcode: Jone and Philippe are working on it. Likely to be done in the purldb repo
- Tushar and Philippe discussed VCIO where we can have conflicting advisory ranges. We may need to keep track of which advisory reports which range
- Philippe went Friday to an event in Brussels https://swforum.eu/events/open-source-workshops-computing-sustainability and met with multiple users and possible backers of our projects.
Participants:
- Tushar @tg1999
- phillipe @pombredanne
- akhil @lf32
Agenda:
- NPM purls parsing - https://github.com/package-url/packageurl-python/pull/106
- Various Funding Propsals for Aboutcode projects
- Fixing bugs in SCTK - https://github.com/nexB/scancode-toolkit/issues/3160
- Test latest and greatest version of the dependencies to ensure the dependencies do not fail at run time.
- Releasing a new gem parser to fix https://github.com/nexB/scancode-toolkit/issues/3160
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- phillipe @pombredanne
Agenda:
- New license detection: A beta release of version 32. It will be major change that will need some changes at scancode.io
-
- New license detection
-
- Top level detection is implemented
- A lot of license detection process that depends on resource level
- A model object can be de-serialized from JSON.
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- omkar @OmkarPh
- phillipe @pombredanne
- @keshavspace
- Jono Yan @jyang
Agenda:
- scancode license PRs, other issues before next release
- nuget inspecter dependency resolution
- executable for scancode.io/toolkit
- shared model: experimenting
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- Jono Yan @jyang
- phillipe @pombredanne
- @keshavspace
- omkar @OmkarPh
Agenda:
- new repos and releases
- model sync between projects
Discussion:
- new pipeline in scancode.io to check for vulneribilities
- nexB/purldb with minecode and packagedb
- visitors: fetching data from package indexes
- mappers: transforms data into a scancode package model
- checks if purls actually exist
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- Jono Yan @jyang
- phillipe @pombredanne
- keshav @keshavspace
- omkar @OmkarPh
Agenda:
- clarity scoring
- scancode versions/clearlydefined
- unknown references to packages
- matching/repo of scans
- workbench update
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- Jono Yan @jyang
- phillipe @pombredanne
- @keshavspace
- omkar @OmkarPh
Agenda:
- vulnerablecode release
- scancode dot release
Discussion:
- PRs to merge for a scancode dot release: 31.2.0 See https://github.com/nexB/scancode-toolkit/milestone/16 All PRs here are ready to merge.
- Vulnerablecode v30.0.0 is released: See https://github.com/nexB/vulnerablecode/releases/tag/v30.0.0 for details.
- Omkar: need phillipe's review on https://github.com/nexB/scancode-workbench/pull/532 to merge.
Participants:
- omkar @OmkarPh
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- @keshavspace
- Thomas @tdruez
- phillipe @pombredanne
- Steven @majurg
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: Dependencies-packages page grouped by types UI updates on Packages view
- ziad's update: git importer
- lf32's update: licensetext UI update
- keshav's update: vulntotal benchmarking
Discussion:
- scantext UI update: We should have a link between the license-expressions on the left and the text and highlighting on the right, as otherwise it can be difficult to connect. One way could be applying a background color on the left same as the text background highlight on the right.
- workbench package view: we should have the heading for dependencies without packages indicate this clearly, and not just be 'other packages'. Also show dependencies and packages by their pURL directly and use the library for parsing recreating the strings.
- We need to discuss and create Version Range class for github importer in vers after looking at the version range exapmles in more detail.
Participants:
- omkar @OmkarPh
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- @keshavspace
- Thomas @tdruez
- phillipe @pombredanne
- Steven @majurg
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: Created packages > dependencies page (Top level packages overview)
- ziad's update: Add support for rust ranges
- lf32's update: improve layout for license details
- keshav's update: Streamline VulnTotal CLI support JSON and YAML output add support for grouping Vulnerability by CVE
Discussion:
- Rust deps requirements doc: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html
- Demo by keshav of vultotal CLI: by giving a pURL by CLI, the tool shows the affected and fixed packages and this is grouped by CVE.
- Demo by omkar of workbench prototype: showing top level packages and dependencies, pakages and their dependencies nexted in the left, json data showed in the right. (Phillipe: could be yaml, and show packages by their pURL in the left)
- We need to also return URL links to RULEs in scancode itself at it is problematic to build these URLs elsewhere.
Participants:
- Jono @jyang
- omkar @OmkarPh
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- @keshavspace
- @tdruez
- phillipe @pombredanne
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: Created automatic release using github actions & dropzone for files
- lf32's scancode.io update: Highlight license matches
- ziad's update: migrate rust importer
Discussion:
-
Rust version ranges are not present in univers, they are semver like, but that is for versions, and so we have to create one Cargo VersionRange for this.
-
Can we create a generic version range for most of the semver cases? Not sure whether we can have a generic
-
Download release archives of the new workbench prototype here: https://github.com/OmkarPh/scancode-workbench/releases/tag/v4.0.0betaPowershell2 and give it a try. And also report and gice feedback.
-
On the scancode.io license text detection project, we can now highlight matches. Few points on improvement: 1. Also support overlapping matches 2. Seperate template code from views.py 3. Fix the details page such that it is for one match 4. make the highlighting continious, i.e. the stopwords/punctuation/symbols
beetween matched words should be highlighted (but not other unmatched words)
-
Discussion on the new LicenseDetection format next week monday.
Participants:
- Jono @jyang
- omkar @OmkarPh
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- Thomas @tdruez
- Avishrant @AvishrantSh
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: GSoC update: Completed menu actions, added release scripts
- keshav's update: GSoC update: added snyk.io DataSource and tests for the same
- lf32's scancode.io update: GSoC update: created charts for licenses worked on highlighting matches together in a single text (WIP) worked on details page (WIP)
Participants:
- Jono @jyang
- omkar @OmkarPh
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- Thomas @tdruez
- Avishrant @AvishrantSh
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: landing page, buttons above functional, shows header information in another page. Looking into newer formats.
- ziad's update: Refactor Gitimporter using fetchcode
- keshav's update: add VulnerableCodeDataSource add OSS-Index DataSource
- lf32's scancode.io update: worked on highlighting matches together in a single text (WIP)
Discussion:
- workbench progress reviewed. Landing page UI changes made, buttons above table view for file/copyright/license/package filters is functional now.
- have started supporting newer output format versions in workbench. Header information is now shown in a new tab, the recent 31.x.x releases have more data structure changes that we have to support majorly.
- For scantext in scancode.io details tab with all details would be nice. Along with a highlight tab merge these different matches highlighted which is already present.
- Have to revamp tabular outputs by deprecating the current csv output and introducing seperete csv outputs for files/packages/dependencies. Also introduce xlsx output. See #3043 for more info.
Participants:
- Jono @jyang
- omkar @OmkarPh
- Tushar @tg1999
- Kevin @KevinJi22
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- Thomas @tdruez
- steven @majurg
Agenda:
- GSoC updates
- scancode release
GSoC Status:
- omkar's workbench update: Worked on Column filters for table view & SQlite imports (History adjustments)
- kevin's scancode update: modified rule and license validation based on Phillippe and Jono’s comments
- ziad's update: Refactor Gitimporter using fetchcode
- keshav's update:
- lf32's scancode.io update: fixed nexB/scancode.io#293, failing tests for nexB/scancode.io#450
Discussion:
- trying to keep a clone and update is a premature optimization, better to clone each time (even CIs do this as it can be complicated and problematic otherwise)
- we need to use different colors for each matched text, this could be also different backgrounds. This has to be implemented, have to create a new tokenization func & class that can accommodate multiple matches
- Added license/package and other column buttons. Some issues: wrapping should be word based, some empty lines are present between license keys. Import button could be moved to the left.
- Would be nice if we can build workbench from source for mac/windows/linux. Links for test and builds for workbench: https://github.com/inveniosoftware/intbitset/blob/master/.github/workflows/test-and-build.yml https://github.com/WojciechMula/pyahocorasick/blob/master/.github/workflows/test-and-build.yml https://github.com/nexB/scancode-toolkit/blob/develop/.github/workflows/scancode-release.yml
Participants:
- Jono @jyang
- omkar @OmkarPh
- Tushar @tg1999
- Kevin @KevinJi22
- keshav @keshav-space
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- Thomas @tdruez
Agenda:
- GSoC updates
- workbench build
- lf32 issue
- license detection testing
- GSoC evaluation
GSoC Status:
- omkar's workbench update: Filetree customizations, History, Build improvements, Chart view and other minor things
- kevin's scancode update: added validation during index creation time to check new licenses/rules
- ziad's update: npm importer - improver migration
- keshav's update: add GitHub validator add test for GitHub validator enable supported ecosystem listing in CLI
- lf32's scancode.io update: added resource navigation buttons https://github.com/nexB/scancode.io/pull/469 tried to fix highligting issues
Discussion:
- Hyperlink upstream package repos in lockfiles see https://github.com/nexB/scancode.io/issues/403#issuecomment-1194318783 Limitations in ace editor for hyperlinks and highlighting, but cannot replace it as it has useful functions. But the license detection app should not use/be constrained by ace editor
- See examples of building LicenseIndex from a hanful of rules. https://github.com/nexB/scancode-toolkit/blob/develop/tests/licensedcode/test_match.py#L1349
Participants:
- Jono @jyang
- omkar
- Tushar @tg1999
- Kevin @KevinJi22
- Ayan @AyanSinhaMahapatra
- avishrant
- steven @majurg
- ziad
- steven
- lf32
- Thomas @tdruez
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: Completed bar chart section, and fixed sqlite models types
- kevin's scancode update: 1. Added documentation for how to install and use license plugins 2. Expanded feature to include installing and using rules for new licenses 3. Moved licensedcode_test_utils into src and added documentation for how to use it The link to the PR is here: nexB/scancode-toolkit#2979
- ziad's update: Migrate ruby to new importers , add a doctest for fireeye
- lf32's scancode.io update: Working on license view, adding barchart views
Participants:
- Jono @jyang
- omkar
- Tushar @tg1999
- Kevin @KevinJi22
- Ayan @AyanSinhaMahapatra
- ziad
- steven
- lf32
Agenda:
- GSoC updates
- scancode relase
GSoC Status:
- omkar's workbench update: Fixed querying issues, worked on path and column selection
- kevin's scancode update: added a CI job that installs licenses and tests license detection
- ziad's update: Add fireeye importer , add GSD test
- keshav's update: off this week for final exams
- lf32's scancode.io update: improved ui for scancode.io license detection view
Discussion:
- licening issue in https://github.com/nexB/vulnerablecode/issues/792, have to ask dennis. Also ask for the licensing data at https://github.com/mandiant/Vulnerability-Disclosures
- Add license index checks for licenses installed by wheel or folder. We could either keep these checks when reindexing licenses (if it doesn't take more than 20-30 seconds), else we can have this as seperately and recommend to run these after adding licenses.
- If we highlight based on start and end line, it will be easier but not as correct. Full lines aren't matched, partial matches are not displayed accurately in this case. We should organize a UI review session.
Participants:
- Jono @jyang
- omkar
- lf32
- Keshav
- Tushar @tg1999
- Kevin @KevinJi22
- thomas
- ziad
- steven
- avishrant
- Ayan @AyanSinhaMahapatra
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: Implemented FileTree & path selection (updates are synced across all components on path change) next: implementation of chart views. Would require feedback session with users.
- kevin's scancode update: implemented functionality to use installed external license plugins in license detection
- ziad's update: opened prs for add GSD importer: https://github.com/nexB/vulnerablecode/pull/787
- keshav's update: opened prs for Deps validator and CLI support: https://github.com/nexB/vulnerablecode/pull/789 add osv validator: https://github.com/nexB/vulnerablecode/pull/788
- lf32's scancode.io update:Improved Templates for the web app see https://github.com/nexB/scancode.io/pull/450 for more details and discussions.
Participants:
- Jono @jyang
- omkar
- lf32
- Keshav
- Tushar @tg1999
- Kevin @KevinJi22
- thomas
- ziad
- steven
- avishrant
Agenda:
- GSoC updates
GSoC Status:
- omkars's workbench update: Working on the views and data visualization for the typescript implementation.
- lf32's scancode.io update: start with simple text and highlighting and putting forward a nice UI, don't spend time on supporting binaries just yet. Developed Views and Tweaking around Templates, see https://github.com/nexB/scancode.io/pull/450 for more details and discussions.
- Kevins update on scancode toolkit external licenses: external licenses are being added successfully from folders, need to upload to pypi and test for this that extra license is being detected after being installed from pypi.
- ziad's update: opened prs for Add support for CWE: https://github.com/nexB/vulnerablecode/pull/782 add a PyPa importer: https://github.com/nexB/vulnerablecode/pull/780
- keshav's update: opened prs for Deps validator and CLI support: https://github.com/nexB/vulnerablecode/pull/789 add osv validator: https://github.com/nexB/vulnerablecode/pull/788
Participants:
- Jono @jyang
- omkar
- lf32
- Keshav
- Tushar @tg1999
- Kevin @KevinJi22
Agenda:
- External licenses
- Keshav PR
- workbench project
- Vulnerablecode release
- cyclonedx plugin
GSoC Status:
- kevin working on https://github.com/nexB/scancode-toolkit/pull/2979 where external licenses are added to the licenseindex and now working on getting external licenses installed by wheels https://github.com/nexB/scancode-toolkit/issues/2994. wondering how to find out which plugins have been installed in the first place, maybe will use entry_points variable in each package's setup.py, but how to use that to get all the installed plugins. If plugins installed, don't have to specify by passing options, just use the external licenses.
- omkar worked on updating using all the latest dependencies for the new workbench implementation, has managed to get data for the views, showed demo. Will work on the UI similar to the workbench, table views and other views that are present.
- lf32 has added https://github.com/nexB/scancode.io/pull/450, see discussions and comments for more details, no questions on the call specifically.
- keshav added PR https://github.com/nexB/vulnerablecode/pull/777 to add initial config for vulntotal, to be discussed in detail at https://meet.jit.si/VulnerableCode weekly call.
- ziad having final exams so will start from the following week.
Other Discussion:
- vulnerablecode release process started at https://github.com/nexB/vulnerablecode/pull/776 for 30.0.0rc1 - Tushar and Phillipe
- Finished upgrading scancode-toolkit in scancode.io - Jono
- the cyclonedx output is failing as top level attribute packages is required and cyclonedx output takes all its output from packages. Now there are two options 1. we make the cyclonedx plugin dependent on --packages or --system-packages and thus we will see a message if that isn't the case. 2. we just add warnings to the cli and/or to the output file if there are no packages. Here option 1 is not feasible as we also do --from-json and then convert to cyclonedx and here this should only be based on whether packages top level attribute is present or not. - Jono and Ayan
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Thomas @tdruez
- omkar
- lf32
- Keshav
- Tushar @tg1999
- Kevin @KevinJi22
- Steven @majurg
Agenda:
- lf32 - Questions about scancode.io and PR
- Philippe - Status on SCTK build issues and resolution, release update.
Discussion:
- The scancode configure script has been updated to run on MacBooks with Apple Silicon. The modifications involved include rerunning scancode using Rosetta via bash when the configure script detects you are running on the native ARM terminal.
- A strange bug with the PyPI package parser has been fixed where the PyPI EndToEnd test would fail seemingly at random. This bug was due to the differences in the walk() function used on Azure's version of Python.
- lf32 had questions about how his project should be displayed/presented on scancode.io. He was wondering if each license text that needed to be scanned should have its own project, as is the current usage in scancode.io. Philippe suggests something simpler, where the license text scanning would be more like using Google Translate, something that just shows us the results and we should not keep it around for long.
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Thomas @tdruez
- omkar
- lf32
- Keshav
- catalyn
- Alexander
- Ernest
- Hritik
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
Agenda:
- workbench with omkar
- questions from lf32
- using latest features of scancode in scancode.io
- release of vulnerebalecode
- test issue suite vulnerablecode
- kubernates
Discussion:
- Report attached which is generated from vulnerablecode importers, issue: https://github.com/nexB/vulnerablecode/issues/755. We still don't have bower support, but we query npm instead. It would be intereseting to see which CVE/purl mapping we don't have, and what's the reason. Also would be nice to attach a CSV.
- tushar tested and verified all the data sources, should be out this week potentially. proper support of native code in apple M1/intel is pending, scancode.io system package support
- https://github.com/OmkarPh/workbench-prototype has some of the ecperiments porting to typescript. Need to use skeleton/other files from workbench. All the dependabot PRs should be closed in favour of the one PR to update packages by omkar.
- We don't have a solution yet, we need to recreate the get_installed_packages functions so we can seemlessly get package files in case of system packages. We have to create these functions thus.
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Thomas @tdruez
- omkar
- Tushar @tg1999
- Hritik
- Keshav
- Ayan @AyanSinhaMahapatra
- lf32
Agenda:
- vulnerablecode release
- pending issues in universe
- system packages scanning porting broken
- bazel pr from phillipe
- talk submit
Discussion:
- GSoC kickoff presentation: https://drive.google.com/file/d/1_KLaAVWVbQeEeGxkSuqwWfy2xeNhv4G5/view?usp=sharing. Will have discussions on the proposals next week.
- https://github.com/bazelbuild/rules_docker/pull/2065 PR by phillipe merged in bazel to add md5sums files list to distroless container
- We need to check out alternative virtual filesystem implementations supporting different storages and options to standardise this across our tools. Some existing implementations: https://github.com/PyFilesystem/pyfilesystem2 and https://github.com/fsspec/filesystem_spec
- Tushar and Phillipe to sync on submitting a talk for Open Source Summit (North America 2022), maybe also ask for extension. The deadline is tomorrow.
- Semver issues in univers, see https://github.com/nexB/univers/issues/74 and https://github.com/nexB/univers/pull/69for more info.
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Thomas @tdruez
- omkar
- Tushar @tg1999
- Hritik
- Keshav
- Ayan @AyanSinhaMahapatra
- Alexander
- Ernest
- lf32
Agenda:
- gsoc
- composer versions
- scancode-toolkit release
- scancode.io reuse
- vulnerablecode release
- holder normalisation in summary @jono
- scancode kubernetes packaging
Discussion:
- https://github.com/xerrni/scancode-kube has been created to deploy scancode.io with kubernates. Please give this a try. Also added https://github.com/nexB/scancode.io/pull/442 to link to this.
- Holders are normalized from a list and because of this original detactions can't be referenced correctly. We could remove suffixes when tallying holders here and just use company names. See https://github.com/nexB/scancode-toolkit/issues/2972 for more details.
- Vulnerablecode should have a new release with all the newly ported to new model importer-improvers, should be ready by next week. Some problems are there beacause of redhat API being rate limited. Fixed by https://github.com/nexB/vulnerablecode/pull/757. Per page number can be 1000 instead of 10000 safely here.
- scancode.io docker pipeline is failing because of missing functions in scancode get_installed_files. The codebase/resource model is being reimplemented to be a list of paths, to better facilitate getting a list of files from scancode to scancode.io and assigning files to packages.
Participants:
- Philippe @pombreadanne
- Jono @jyang
- omkar
- Tushar @tg1999
- Hritik
- Keshav
- Ayan @AyanSinhaMahapatra
- lf32
Agenda:
- Severity models in vulnerablecode
- PR#69 univers
- New approach commoncode
Discussion:
- severity is tightly related to reference, but not to the vulneribility. is there a case where a reference that lists more than two vulneribilities? A refrence only exists in the context of a vulneribility, if reference is reused in vulneribilities, that doesn't work. it can have same values but it isn't the same instance. Severity shouldn't be an attribute of a reference, instead, it should exist directly at a vulnaribility level.
- Should https://github.com/nexB/univers/pull/69#discussion_r870417205 be put in a different issue. Nginx shouldn't be considered as a base, for testing, maybe use npm/nuget/maven/pypi instead. Don't make up test cases for the same btw, real examples are always better.
- Codebase/Resource model has been changed such that codebase now holds a flat dictionary of paths. This enables us to use the new package models in scancode.io by passing the files for a package in a codebase scan. Also making all the related functions for traversing the tree return a list always, and an empty one if it doesn't find anything, essentially not failing. 31.0.0b1 released, feedback needed here.
Participants:
- Aditya
- Philippe @pombreadanne
- Jono @jyang
- omkar
- Tushar @tg1999
- Priya
- Keshav
- Ayan @AyanSinhaMahapatra
Agenda:
- scancode release
- LicenseDetection
- gemversion in univers
- PR#726 in vulnerablecode
- scancode issue
Discussion:
- https://github.com/nexB/vulnerablecode/pull/726 from threatrix.io fixes an infinite loop bug
- https://github.com/nexB/scancode.io/issues/409. We systamatically ignore files/directories that we ignore (based on distros/type of rootfs) sometimes which are not scanned and tagged as uninteresting, based on origin/license/security prespective. scancode.io needs documentation on what is tagged as uninteresting and not scanned.
- https://github.com/nexB/univers/pull/69 was created in univers to patch gemversion, need to revisit constraint validation as there could be duplication.
- scancode has beta releases out in pypi and as github releases, this is automated now and builds and publishes on pushing tags, a bit more complicated than the initial implementation in univers done by tushar to push to pypi, as scancode has to be relased as archives for the different supported platforms. Will incorporate bugfixes and release a major version soon.
- We should have two file-level buckets, one for LicenseDetections and the other for clues. The LicenseDetection list would be in the licenses list that had LicenseMatch objects before, and there would be another list (possibly license-clues) which would be a list of LicenseMatch objects from which LicenseDetections couldn’t be created. There should also be some rules which would be tagged as clues, which are not detections of a License in the proper sense but could be a reference to a license potentially (like say a link to ghostscript website is a clue, not a detection)
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Tushar @tg1999
- hritik
- Keshav
- Ayan @AyanSinhaMahapatra
- steven @majurg
- avishrant
Agenda:
- Scancode released betas and status
- scancode.io sync for the package models
- gsoc mentors
- vulnerablecode PR
- package resources fix assigning to for_packages
- conclusions wrt workbench
- release build automation
- storing version ranges
Discussion:
- conclusions: manually edit scans and conclude results based on research. Was an experiment in workbench. Whether something needs reviews or not, and having concluded/review status. We want to tag things that should be reviewed manually and things that don't need any review. This should be in scancode.io and should be removed from workbench as it's confusing there.
- vulneribility ID, should we use UUID/VULCOID one or both and what should be done here. vulneribility ID is needed, and this used to be an UUID, and we should move away from this. shouldn't worry too much about vulnerablecode instances, should focus on releases instead. We should be able to query by vulnerability ID and aliases.
- bugs present in assigning resources to a package in varios ecosystems, npm, pypi etc, have to report and fix these.
- scancode released, please run a small scan in different kinds of mac/windows systems if available, and report problems if present.
- releases vs tags: release can be automated and require manual intervention, tags are much easier. build and publish to pypi on tag, and then release is optional? or whether to release on tag automatically?
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Tushar @tg1999
- hritik
- Viraj Dhanushka
- lf32
Agenda:
- Dependabot PR to update Django
- Fetchcode
- Model Changes
- Release of scancode toolkit
Discussion:
- Merging PRs in scancode toolkit, release is very near
- Model changes on Package License
- Automate release of all the projects
- PR to update Django, we are not using the vulnerable part of Django, we are using hardcoded values. It's a dot version, so we are updating it
- We will be using fetchcode for cloning git repos, we are not supporting incremental clones as of now, but this should be done in future as an enhancement
- Needed review on https://github.com/nexB/vulnerablecode/pull/667
Participants:
- Ayan @AyanSinhaMahapatra
- Philippe @pombreadanne
- Jono @jyang
- Tushar @tg1999
- hritik
- Viraj Dhanushka
- lf32
- Keshav
- Omkar
Agenda:
- GSoC proposals
- package files status and release
- vulcoid
- gsoc lf32 scan single license test
- viraj conclusions scancode.io project
- vulnerablecode advisory
- openssl prs raised
Discussion:
- Issue: https://github.com/nexB/vulnerablecode/issues/695
tried to shorted vulcoid, does this:
VULCOID-srQji2x-4McmdRSewqgI5Q==
work? Also see https://pypi.org/project/shortuuid/. We want to have numbers, which resamles cves but is our own. Could be a hash, but this is not super readable. We could also shorten a hash. - This year in GSoC there are large/medium projects in terms of how many hours the participant will spend on the project. This has to be discussed and selected carefully based on how much time the participant is planning to spend and how much time the project requires. There is also a new addition this year to extend the project timelines based on discussions between the mentor and the participant if situation arises.
- conclusions are the process of reviewing detections from license, copyright and extending to any detected results and the process to review some of them for possible errors and thus the UI/models should enable the creation of this workflow and process.
Participants:
- Aditya
- Alexander
- Ayan @AyanSinhaMahapatra
- Chaitanya
- Ernest
- Philippe @pombreadanne
- Jono @jyang
- Keshav
- Sujit
- Tushar
Agenda:
- Kubernates presentation
- Jono tests
- Chaitanya GSoC
- GSoC proposal aditya
- status package files and how it'll go to scancode.io
- vulnerablecode
Discussion:
- kubernates presentation slides link
- will there be imporvements for alphine and APKBUILD files? Yes in terms of general
- compare adding regex and other data structures to compare time complexity, rust implementation of aho corasick,
- proper testing on nginx required. reaching a stable state for vulnerablecode is required. people interested in vulnerablecode and CPEs. modified some semantics. code that fetches package versions from API, super complicated and difficult to test and mock. Shouldn't be premature optimization there.
Participants:
- Akshat
- Avishrant
- Ayan @AyanSinhaMahapatra
- Chaitanya
- Hritik
- Philippe @pombreadanne
- lf32
- Jono @jyang
- Keshav
- Steven
- Thomas
- Tushar
Agenda:
- vulnerablecode cpe
- nvd reporter
- license text detection webapp
- summary plugin scancode
- vulnerablecode prs
- mypy integration
- package files
- resource codebase
Discussion:
- CPEs are important, how should we reference them in our models. We need to map purl to cpe to add it as packages, which is unsolved. We could add it as references, which seems okay. We don't have any urls associated to CPEs, which is a problem.
- NVD data doens't have any affected packages
- Inferring a PURL from a CPE (GSoC project) Doing search and build some mapping, building a improver.
- Originally the summary plugin would see all license-expressions, authors, copyright, languages etc. To deduplicate the contents of primary license/summary plugins we need to access data from other plugins so should we have a single plugin? If still want this summarization in toolkit, so it makes sense to mertge the license-clarity-score and summary plugins. For the classify plugin, there should still be seperate functions
- resource path, optimized for speed. Difficult to find path. We have to walk the codebase and check paths. We need to be able to find a path quickly.
Participants:
- Ayan @AyanSinhaMahapatra
- Philippe @pombreadanne
- Aditya
- Avishrant
- Jono @jyang
- Keshav
- Tushar
Agenda:
- scancode release/packages
- gsoc sessions
- scancode-toolkit summary plugins deprecation
- openssl PR
- false positives/analyzer
- packages in scancode
Discussion:
- Ayan and Phillipe: In the LicenseDetection implementation, there are the more accurate detection level merging like (See license and unknown intros) which are more accurate than other cases like merging matches into detections in case of a inaccurate detection. Should these features of the analyzer still be taken into scancode-toolkit proper? Yes, they should be unless they have huge machine learning models behind their functioning, with more expensive calculations and larger requirements. There are also parts of the analyzer which gets unique detections out of all the detections, and even though this is slow, it should also be moved into scancode proper. See (RFC False Postitives Issue)[https://github.com/nexB/scancode-toolkit/issues/2878#issuecomment-1079639973] for more background on this issue.
- Packages work is being completed, should have files satisfactorily for both system packages and application packages. RPM, alpine has support, debian complicated, but has been added with some warts. This would need review tomorrow.
- See https://github.com/nexB/scancode-toolkit/issues/2842#issuecomment-1041910505 where deprecation of various summary plugins have been discussed, in favour of making some options default and making primary license also default. Now, we would add deprecation messages which would be displayed at, added to the headers, and also to stderror for people testingfor that in their tests. This would give people a heads up that these options are being deprecated in favour of default options and better summarization with the primary license.
- A data migration should be done for the portgres md5 index issue, see [here](https://github.com/nexB/vulnerablecode/pull/653). Also, computing md5 together for three strings in place of having 3 seperate fields would not be computationally or memory wise less expensive at all, and would make it more complex for cases. So not point doing that.
Participants:
- Ayan @AyanSinhaMahapatra
- Philippe @pombreadanne
- Hritik
- Aditya
- Avishrant
- Keshav
- Nashit10
- Tushar
Agenda:
- scancode release/packages
- gsoc sessions
- npm licenseref-LICENSE
- pypi osv
- openssl PR
Discussion:
- phillipe: working on some changes in packages, specifically model/class simplification, would push a branch for review soon.
- Responsibilities for gsoc session and making slides divided. scancode: phillipe and ayan, vulnerablecode: hritik, tushar: fetchcode, vers and packageurl. Projects page and ideas page with one or two ideas. Also what to expect pages: ayan and tushar. Session could be recorded tomorrow.
- https://github.com/nexB/scancode-toolkit/issues/2872 has to be fixed in code, and not by rules.
Participants:
- Ayan @AyanSinhaMahapatra
- Philippe @pombreadanne
- Jono Yang @jyang
Agenda:
- scancode release
- gsoc sessions
- packages
- summary plugin
Discussion:
- we should have classify as default only after file info is also enabled as true always.
- support for system packages i.e. debian, alpine, rpm are also being added to the new package model before this release.
- 31 release prep https://github.com/nexB/scancode-toolkit/pull/2888 is merged.
- There should be gsoc session thursday.
Participants:
- Aditya @adii21-Ux
- Avishrant @AvishrantsSh
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Philippe @pombreadanne
- Keshav Space @keshav-space
- Tushar Goel @TG1999
Agenda:
- scancode release
- gsoc sessions
- vulnerablecode status update
- openssl issues
- Some data without CVE in openssl, likely an one-off case. CVE is no longer mandatory in vulnerablecode, it's just an important field. It's an alias now. There is a vulnerability ID for the openssl advisory, but this is just that, there could be multiple advisories in a day. If you would have an alias for this, like openssl-20141015-CVE-CVENAME, the ones without CVE would have the part without it. I.e. OPENSSL-20141015, OPENSSL-20141015-CVE-2001-3567 are examples. See also https://www.openssl.org/news/secadv/ where if there's multiple advisories we have -2 added at last with the ID. Like 20101116 and 20101116-2.
- Tushar is working on GitHub imported, should be coming soon. OSV importer for python is also worked on.
- scancode beta release is coming soon, one issue being worked on is native building in one of the dependencies, which has it's share of rabbitholes.
- sessions at two times, maybe one in the morning pacific time, early afternoon in india.
Participants:
- Alexander
- Aditya @adii21-Ux
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Jono @jyang
- Purna
- Philippe @pombreadanne
- Keshav Space @keshav-space
- Sanket
- Tushar Goel @TG1999
Agenda:
- scancode devel problems
- alpine importers
- improvers doc
- scancode.io alexander
- debian inspector bug
- vulnerablecode
- false positives
- gsoc
Discussion:
- Phillipe/Jono: There was this https://github.com/nexB/debian-inspector/issues/25 This was caused from a spelling error (it's license in the spec) "license". Whether we should always use dual spelling for license in index is not clear, but atleast this should not fail right away and should be added as an usual license paragraph.
- We need to have a public deployment of vulnerablecode, but also that there's no need to rush unnecessarily, as we want to make sure it's robust and accurate.
- Just do the migration first, 1. enabling more, once we have the pysec thing for osv merged, 2. we can make it generic later, it has been interpreted differently. Even as the OSV includes packageURL, it isn't present in github data. We can make different data formats and import twice sometimes, as this is not ideal but this is a plus for vulnerablecode, it can handle these differences.
- A plan for false positive detections: https://github.com/nexB/scancode-toolkit/issues/2878
- GSoC project ideas priority should be discuessed, time set for a meeting to make the project ideas page https://github.com/nexB/aboutcode/wiki/GSOC-2022 more ready.
Participants:
- Aditya @adii21-Ux
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Philippe @pombreadanne
- Keshav Space @keshav-space
- Tushar Goel @TG1999
Agenda:
- roadmap wrt. summary plugins
- primary licenses
- openssl wrt. univers
- osv importers in vulnerablecode
- GSoC project IDs
- building native wheels
Discussion
- Hritik/Phillipe: want to have single importer method and wrapper/subclass from there. importers could consume other data in extra_data, shouldn't discard.
- Keshav/Phillipe: openssl has FIPS versioning (some versions are FIPS certified and has fips in version numbers), how to deal with this. Example: <affects base="fips-1.1" version="fips-1.1.1"/>. Could be two seperate importer/improvers 1. fips versions of openssl 2. all other versions of openssl, without fips
- Tushar: vers support for alpine added and new release for vers. New release automation system added for vers, where on release a github action is triggered to test and push wheels to pypi. Will add this to skeleton.
- Ayan: We had summary options which were optional and used to aggregate data on license-expression, authors, copyrights and their counts, along with key-files, facets and details summary. This would be deprecated in favour of a much simpler and default summary option which would have primary data like primary license aggregated. The classify option will also become a default option in the process.
- Phillipe: intbitset and pyahocorasic were build from native code, have contributed upstream the build wheels.
- Phillipe: We had a seperate third party wheel repo, to avoid supply chain attacks wrt. yanking and changing packages. This crashed last week because we had more jobs running in azure as we have been given extra credits there. Will move away from dreamhost, and/or have this as a second option. https://thirdparty.aboutcode.org/pypi/
Participants:
- Abhishek
- Aditya
- Alexander
- Aman Mawar
- Avishrant
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Philippe @pombreadanne
- Purna Chandra Mansingh
- Jono @jyang
- Karan Vaishnav
- Keshav Space @keshav-space
- Tushar Goel @TG1999
Agenda:
- vulnerablecode issue
- univers for alpine
- skeleton
- azure CI
- GSoC projects
- ONAP scancode
- fetchcode
Discussion:
- New version of packageurl-python released with support for generic URLs
- Web application for a package evasluation, in a specific domain. (GSoC Idea)
- Now, the model relationship is either package introduced/fixed vulneribility. We have both an attribute and a flag which is a wart. Vulnerablecode call tomorrow, should be discussed (11 AM CET).
- Scancode.io running at scancode.onap.eu with https://github.com/nexB/scancode.io/pull/397.
- version handling in gentoo handled by their tool, have to check where the bug is. If it needs to be fixed upstream, we need to do that. Could be an alpine only wart.
- Failing nix tests https://github.com/nexB/vulnerablecode/issues/617.
- Not using same approach in CI. Azure, GitHub CLI, appveyor, Travis being used in vaious places. Need to streamline and make that a part of the skeleton.
- scancode.io deployment @ kubernates, where to host a shard, we could have it in the same repo, or in a seperate one. See https://github.com/bitnami/charts
- alpine APKBUILD, fetchcode work needs to be looked at.
- Will be applying to GSoC this week, people asked about mentoring @GSoC.
- A Language server protocol using scancode as a backend, see https://microsoft.github.io/language-server-protocol/ (GSoC Idea)
Participants:
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Philippe @pombreadanne
- Purna Chandra Mansingh
- Karan Vaishnav
- Keshav Space @keshav-space
Agenda:
- vulnerablecode priorities
- project ideas for upcoming gsoc
- fossdem
- scancode packaging
- mocking in tests
Discussion:
- openssl importers migrate to new, versions are there and explicit, not range. We could also document this process of migration on importers, this could be used and could help. Rebase and merge whenever necessary as the code is changing a lot.
- There's also openssh vulneribilities datasource, we should also import from there.
- FOSSDEM happened this weekend remotely, in which Phillipe co organized the Software composition and dependency management devroom dev room, and there was a session from him on Package URL and version range. See https://fosdem.org/2022/schedule/event/package_url_and_version_range_spec/
- Scancode has two ways of use, as an application which we download and run. It could also be used as a library from pypi where dependencies are fetched and installed. Problem with the first one and there was a lot of issues reported, these are all added for the next milestone. Now we're looking at https://cibuildwheel.readthedocs.io/en/stable/ which is building across all the combinations of OS, python versions and others. This could be a GSoC project also as there is more work to be done here.
- Which importers should we start with? We are still lacking a bit on documentation on getting started with vulnerablecode. We should start with importers which has high volume data and good quality as well. Some are github, npm, nginx, osv, openssh, gitlab etc. See list at https://github.com/nexB/vulnerablecode/issues/597, this was updated to change the order. Whenever we have importers for one package, it should maybe be there in package URL spec. We need a new version range for openssl, it looks a lot like pypi.
Participants:
- Aditya
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- jono @jyang
- Philippe @pombreadanne
- Purna Chandra Mansingh
- Sanket
- Tushar @TG1999
- Karan Vaishnav
Agenda:
- project ideas for upcoming gsoc
- scancode release
- vulnerablecode priorities
- scancode-toolkit and scancode.io directions
- Software Heritage client PR, coding conventions
Discussion:
- It's best to use imperative style for function naming (and commits). We name the function after what is returned, and also have that documented in the docstring. We can give a hint of what it does, but better put it in code as comments or by making the code more readable. Look at other projects for style guides, how the code makes sense. We have to understand that code is written once, but read hundreds on times, so code should be self explanatory and have good docs.
- We were not returning the correct set of packages (and versions) with vulnerabilities, and reposting both false negatives and false positives and in some cases skipping. Major changes were introduced to rectify these and an user complained at https://github.com/nexB/vulnerablecode/issues/597. Now all the importers are disabled except one, as they need to be ported to the new model. After this update previous data has to be dropped completely, as the data is wrong, and we can't figure how We had moved away from two branches (main and develop) and to a develop branch with tags.
- We need to put together the project ideas list for GSoC this week, some discussed ideas were:
- Update the importers in vulnerablecode (also good introductory issues as small and very clearly verified outcome)
- copyright detection quality: rules (have very large dataset and review manually)
- copyright detection speed: regex is slow uses fsms and fs-automatons (could be something that be improved)
- scancode workbench update and support
- scancode-toolkit API docs
- scancode.io new views for more data visualization and details
- scancode workbench port to python-electron
- list of updates needed on package ecosystems
- Scancode release upcoming, some of the problems are supporting different versions of macOS, and having wheels built for different python versions, OSs and others. Tools to help with that: https://github.com/pypa/cibuildwheel/, also see https://github.com/pypa/cibuildwheel/discussions/1007 and https://github.com/pypa/cibuildwheel/discussions/1006 On intbitset, an alternative is https://github.com/RoaringBitmap/RoaringBitmap.
Participants:
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- jono @jyang
- Philippe @pombreadanne
- Tushar @TG1999
- Karan Vaishnav
Agenda:
- GSoC introductory session
- scancode release
- vulnerablecode updates
- scancode-toolkit and scancode.io directions
Discussion:
- Phillipe: Will work on the univers release. Btw, one problematic thing there is stars in version numbers, whatever highest number which could be there, pypi also has a star versioned range but defined differently, as star is a string pattern in the case. We have to cope with that. Now in univers we error out if we don't support the cases, explicitly why it doesn't support.
- Hritik: Vulnerablecode, porting improvers to the new model. Have to do a lot as there are a lot of open issues also which need attention, a lot of technical debt. Example: Importer still used importer yeilder, should be like improvers where we have list of explicit classes. We could provide data dumps.
- Phillipe: We should evaluate the amount of work here, because a lot of data is not usable and fully useful, and how can we in the direction of something more useful. Very hard to get where we get the data from. Any order that could be thought of for the importers?
- Hritik: Could look at data sources which are more important, or have more volume of data and used more first.
- Phillipe: New data sources could be interesting to look at and write importers for. Also interesting is we don't track the licensing of the data source which is okay for now. if we don't log when we import operations, which importers are importing when, if we don't know this vulnerability was imported using this process, we wouldn't be able to instill confidence and track and look into issues at scale. One could be an import log, like of a web server, logs creation of records, errors, time of running.
- Hritik: We could redirect the debug log to a file instead of a database maybe?
- Phillipe: We need to be able to go back to the errors and that would be more actionable. The per record basis logs on the vulrneribility/importers/improvers and on the package relations. Useful to do it sooner than later and this can be done in parallel and could be done in a seperate thread. But definately a lesser priority.
- Waiting for google to announce the program, we would organize a session after that, likely one or two where we will present the projects which we are considering.
- Phillipe: scancode-toolkit side we need to make sure that we report correctly packages, package dependencies and package like data. DLLs/Kernel modules a package? probably not. Is an executable that has metadata. It is misleading to call it package. All these would be package level data. Also finishing the work on which files are part of a package instance. Everything that detects, collecting packages, dependencies, collecting package instances is all scancode-tookit. We'll have scancode.io store dependencies, package instances. It already does take package files data, but there isn't much data that scancode returns correctly which needs to be improved. Given a pURL: 1. we need to fetch registry information 2. we need to fetch the packages for the corresponding (could be multiple things like various wheels in pypi, jars, poms in maven) We need to put all this together and it needs to be used in two contexts 1. fetchcode, live scan, 2. More common, but in scancode where as part of a larger complex scan we want to fetch metadata where we want to enrich/compare metadata, fetch metadata, sources and scan them for more enriched data.
- Another direction is everything around simplifying and reporting less data, like in scancode we report line level and technical details which are useful if you want to dive into details. So we need to seperate two things: 1. detections with their technical details like line details and other technical things 2. actual data from detections (like just the license expression, as opposed to the whole matches with details). Also have primary things also, as opposed to details. Also could be a primary package, keep the secondary but not report as default.
Participants:
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Jono @jyang
- Philippe @pombreadanne
- Tushar @TG1999
- Utkarsh
- Aditya
Agenda:
- starting path scancode.io
- packages vs dependencies
- GSoC introductory session
- scancode release
- vulnerablecode updates
Discussion:
- Phillipe: Working on a new scancode release, fixing vulnerabilities in lxml and dparse. Will release a smaller point release if the actual release process takes more time. Will also support python versions 3.10 and drop support for 3.6 in the newer releases potentially.
- Phillipe and Tushar: Need for a GSoC introductory session, after discussion it was decided that it could be after the dates announcement for GSoC.
- Phillipe: Need to report dependencies seperately from packages, as they are really seperate, for example a repository could be a project and not a package and still could have a dependencies file. Also in dependencies there's need for seperate processing for fetching package data.
Participants:
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Philippe @pombreadanne
- Tushar @TG1999
Agenda:
- unknown licenses in scancode
- dparse security issue
- univers issue
- planning on aboutcode: scancode/package
- planning on aboutcode: vulnerablecode
- organizing: function/package
Discussion:
- Phillipe:
Unknown license detection done by akansha has finally been merged after some improvements, it works great now. After the usual license match generation, if there are parts of legalese text that is not matched or weakly matched, we run unknown license detection with a ngram index built from all the license text and then we filter and refine those matches and add them to the License Matches.
Also working on a CVE vulneribility on dparse which is a scancode dependency that I maintained, got in touch with folks from GitHub dependabot, and they disclosed information about it.
- Hritik:
There's an issue reported: https://github.com/nexB/univers/issues/20, where there's a failure to parse version ranges because it isn't hashable.
- Phillipe:
Yes we should have made the constraints a tuple attribute in attrs and then used a converter convert list to tuple. Opens https://github.com/nexB/univers/pull/21 with the correction.
- Ayan:
Shares https://docs.google.com/document/d/1cHAxXZ_VxwEDxRF4BcOXTSSjGp3-_tYLVxXTx2X8oC4/edit?usp=sharing, the Package design doc. Here in the example, we have a simple case of two python manifests a setup.py and a requirements.txt, under the same filepath, so here we could just take the files under that filepath, but in cases where the manifest might be not in the same filepath, we could take the files under that path and have them collected.
Phillipe:
Yes, here a setup.py is a manifest which is sually at project root, but the same can't be said for requirements file, so files for a package could be manifest specific, and is a function of the manifest class itself. But it could be also done for a package instance, where all it's package manifests that it is made out of, their functions are used for getting package files, and then there is an aggregation. Also this can't be on the fly, it has to be after a package instance is created out of package manifests.
Asks about planning in scancode/scancode.io, vulnerablecode and other projects to everyone.
Ayan:
We have a lot of code about packages in a lot of places, we have package manifest parsing and other ecosystem specific functions in scancode, then we have some in fetchcode, some code for mining/matching and some code for fetching data by network calls, these are all seperated in different places and there should be some efforts towards the unification of these package code, maybe put all things that require network access/download as scancode.io pipelines, and rest core functionality in scancode toolkit, as discussed in planning before.
Hritik:
The work on univers and supporting version ranges has to be completed and there's importers and now improvers concept in vulnerablecode, and shivam worked on timetravel type improvers for the data, and also there was some ideas on using NLP to improve data from text sources these could be looked into.
Phillipe:
There's also some planning to be done in function vs packages, where there are a lot of smaller packages like debian inspector, rpm inspector and then smaller package specific functions could be in package inspector, there are other specific packages like fetchcode, then there are repos like univers which implement one function which is a dependency to all the packages which use version ranges, have to think more on which could be better.
Also, there's a question if scancode could get more data from network calls and aggregate more data from lockfiles or such, and would be an optional flag, and only keep tasks which are resource and compute heavy like downoading and scanning to scancode.io, and have all the code related to aggregating and getting data over network calls in scancode itself, and in scancode.io pipelines, import from scancode itself.
(The entire conversation was more of a discussion, opinions were asked and aggregated, this doesn't reflect all that, just what was discussed).
Participants:
- Ayan @AyanSinhaMahapatra
- Philippe @pombreadanne
- Tushar @TG1999
Agenda:
- new scancode release
- scancode workbench
- new gsoc page
Discussion:
Phillipe:
- New scancode release on the way, with updated license detection. Previously a couple thousand false positive rules were added for license lists commonly seen, but they are being deleted and instead a filter is added which filters the matches. If there's a lot of different licenses detected on consecutive lines, it would be a license list and would be removed from the license matches. Other updates are key phrases can now be defined in rules and support for cyclonedx output. The package manifest PR is also getting merged, which introduces package manifest classes for all manifests.
Tushar:
- There has been a issue reported in the gitter discuss channel for scancode-workbench, is this the same issue reported before?
Ayan:
- The repost is for scancode workbench develop branch and said the issues weren't present in the releases, so likely a new issue.
Phillipe:
- Should open a new ticket for that.
Tushar:
- We added a new page for GSoC 2022 Ideas: https://github.com/nexB/aboutcode/wiki/GSOC-2022
Participants:
- Ayan
- Tom
- Jono
- Tushar
- Philippe
Agenda:
- Misc. project updates
- Scancode.io and scancode toolkit codebase and resource roots
- FAQ, QA session community for aspiring contributors
- Workbench problem issue with SCTK
Discussions:
-
- Misc. project updates
-
SCTK VC SCIO Univers version control lib
-
Scancode.io and scancode toolkit codebase and resource roots
How to reuse SCTK codebase navigation in SCIO? Pending PR, needs discussion
-
FAQ, QA session community for aspiring contributors
We have a lot of questions. How to get started? Have good 1st issue is useful We can try it once in early January. promotion of the event is TBD LinkedIn, Twitter, some ....
-
Workbench problem issue with SCTK Seems quite manageable. is candicate for good first issue.
Participants:
- Ayan
- Tom
- Jono
- Tushar
- Philippe
- Hritik
Topics:
- Status
- FetchCode
- ScanCode Workbench failng on latest SCTK: Ayan volunteered to look into this
- VulnerableCode DB
- Jono @JonoYang
- Harsh @harshagrawal523
- Hritik @Hritik14
- Tushar @TG1999
- Philippe @pombredanne
- status update
- question wrt. GSoC: we will participate? which projects?
Philippe: reviewing PR on SCTK:
- PR: new key phrases in licenses
- PR: from Ayan on package files
Other discussions:
- vulncode-db is shutting down. May be we can take over? we will need to collect the data asap before it goes dark
- There are two new implementations of purl: one in Ruby and one in Swift made by a GitHub contributor
- next year GSoC:
- which projects will we have?... TODO: we to create and update the list of projects.
- FOSDEM:
- accepted as a devroom for Software Composition And Dependencies Management
- Ayan @AyanSinhaMahapatra
- Harsh @harshagrawal523
- Hritik @Hritik14
- Tushar @TG1999
- Philippe @pombredanne
- scancode-toolkit update
Phillipe:
- New PR with a lot of new license detection rules https://github.com/nexB/scancode-toolkit/pull/2765
- New WIP PR by folks from softsense https://github.com/softsense/scancode-toolkit/pull/1 and https://github.com/softsense/scancode-toolkit/pull/2. This adds keywords to rules which should be present in matches or they will be dropped, making sure key words are present in matches. For example: GPL should be present in the match for a successful match to a GPL rule with the GPL word in it. This would potentially get rid of a lot of false positives from the matches.
Ayan:
- Changing the Package classes to the new PackageManifest classes https://github.com/nexB/scancode-toolkit/pull/2748
- Looking into PackageDatabase classes for system package manifests
- Would be working on PackageInstances and their creation next.
Phillipe:
- Work on Univers spec which could eventually be moved to PackageURL on a common version range syntax for all versioning schemes
Tushar: Waiting from the ONAP people on the PR.
Phillipe answering Harsh:
- First make sure you're interested in our projects
- Read through https://aboutcode.readthedocs.io/en/latest/contributing.html for a start
- Look into the projects which interest you the most
- Starting with a beginners issue and trying to solve it would make the most sense
- PRs for small doc typos are not useful at all.
- Ayan @AyanSinhaMahapatra
- Tushar
- Jono @JonoYang
- Philippe @pombredanne
- univers update
- Go port of scancode-toolkit sponsored by interested parties - Initial reactions seem tepid: this is a big undertaking, which is not helped by an unfamiliarity with Go. We would also have to maintain two separate codebases.
- PackageManifest implementation/update by Ayan in scancode-toolkit
- Ayan @AyanSinhaMahapatra
- Jono @JonoYang
- Philippe @pombredanne
scancode TK: package files
Ayan:
- Replacing ecosystem specific package classes to PackageManifest classes, one for each package manifest type, so one/more PackageManifest classes would be present for each package ecosystem, and there would be standard functions for package manifest detection and creating PackageManifest objects from manifest files, which would be overriden for each specific manifest type. This is WIP now, see https://github.com/nexB/scancode-toolkit/tree/2098-top-level-packages
- Next would be adding PackageInstance objects, which are created out of one/multiple package manifests, and the files associeated with the package instance. Every package ecosystem would have a PackageInstance class, which would override and implement functions to find all other package manifests for a instance, given one manifest, and to get all the files for that package instance.
- functions related to package root are not touched, but this would be deprecated, and as this top level list of package instances is really package consolidation, the existing package consolidation has to be looked at after this.
Jono:
- Package roots are important in most cases as it can get all the package resources, and we there should be a way to keep doing this
Phillipe:
- There exists no package root in a lot of specific package ecosystem cases, and what we need is to be able to get all the resources associated with a particular package instance and being able to tag them as a part of that package instance. The upcoming changes are in that direction.
- Ayan @AyanSinhaMahapatra
- Ishu @ishukhr
- Jono @JonoYang
- Philippe @pombredanne
- Tushar @TG1999
- Hritik @Hritik14
Ayan: There has been a PR from @balakrishna-mukundaraj, https://github.com/nexB/scancode-toolkit/pull/2546 and there have been some installation failures there with version mismatch. Phillipe could you check this out?
Phillipe: There has been some problems since we switched to version constraints from having pinned requirements, and this needs to be inspected.
Hritik: On separating import and improve operations and revisit time travel.
There have been a conversation in packageurl gitter about having a logo with initial suggestion from @iamwillbar.
Tushar: Should an issue be added for this and should that be in packageurl-spec or packageurl-python? Ayan: It should be packageurl-spec as that is the main PURL repo, other repos are just tool implementations in different language. Phillipe: Yes, please add an issue.
- Tushar: Adding Black pre-commit hooks to packageurl-python, waiting for PR from @aditirao7 on that
- Philippe: new WIP spec for version ranges nottaion
- Tushar: PR to add Black to purl Python library needs review
- Jono @JonoYang
- Tom @tdruez
- Philippe @pombredanne
- Tushar @TG1999
- Hritik @Hritik14
This "vers" spec draft is at https://github.com/nexB/univers/blob/386eb32468c75ecac25ec872ea004b3257962946/VERSION-RANGE-SPEC.rst This will be moved to its own proper PR and is to address specific needs in purl and VulnerableCode. See: - https://github.com/package-url/purl-spec/issues/66 - https://github.com/package-url/purl-spec/issues/84 - https://github.com/package-url/purl-spec/pull/93 - https://github.com/nexB/vulnerablecode/issues/119 - https://github.com/nexB/vulnerablecode/issues/140
univers is the implementation done in //
https://github.com/package-url/packageurl-python/pull/64 has been submitted by @aditirao7 to add Balck style to the purl python library and consider using pre-commit.
We discussed using pre-commit CI to automatically push fixes to the PR branches. None present liked this, so we would instead likely use pre-commit with local git hooks instead and have failures in the CI if code style it not correct. Tushar @TG1999 and Hritik @Hritik14 will help set this up.
- Summarization and data aggregation: should it be in SCTK vs. SCIO. Or can we use a VirtualCodebase and SCTK plugins across the board?
- Drop Python 3.6 and Ubuntu 16 support
- How to deal with optimized build of Docker images such that lower layers are not rebuilt with each code changes. We need a ticket for this
- project statuses
- hacktoberfest
- we said we would put one project on deck for planning discussion each week... which one this week?: VulnerableCode
- Jono @JonoYang
- Tom @tdruez
- Philippe @pombredanne
- Tushar @TG1999
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Avishrant @AvishrantsSh
- Should it be in SCTK vs. SCIO? Or can we use a VirtualCodebase and SCTK plugins across the board?
- The VirtualCodebase can be useful to walk a filesystem tree in a specific tree order
- Is it worth keeping consolidation SCTK plugins in SCTK? the Codebase model is not great when there is no DB.
- in particular the package pipeline in SCIO would need such features
There was no conclusion yet from the discussion, and ideally we would like to to keep summry functions in both. But the programming model for data aggregation in SCTK is really problematic. For instance to find a file or directory resource that has a certain attribute in a VirtualCodebase, the whole codebase needs to be walked and all resources of the codebase checked. Basically we are badly missing the ability to do queries, something that a DB is failrly good at.
So unless we can find a clean way to get the code working cleanly in both cases, we may deprecate aggregation in SCTK and update its migrated code in SCIO to leverage the DB.
Note, that the issue is not so much the performance (which is poor in SCTK for these features) but rather the programming model that is really painful.
- Eveyone is A-OK to drop support 3.6 which is EOL by the end of the year
- We will adopt likely 3.8 as a minimum version number, which is the mininum version that Django will move to too.
- Ubuntu 16 is being dropped from Azure and has long been out of maintenance. SCTK and SCIO are now on Ubuntu 20 for core tests, and other Ubuntu 18 and other OS for smoke tests
- We will use Ubuntu 20 or Debian buster as needed as a base OS for core tests.
-... such that lower layers are not rebuilt with each code changes. - for now the way we build most docker images where we first copy a project then install it creates a layer for dependencies that is rebuilt each time the core code changes. In development this means constant rebuilds of everything - we want smaller images, faster builds and a way to publish pre-built Docker images
We need a ticket for this: Tom to create this in SCIO
- VulnerableCode : - Hritik: working on refactoring of with improvers - Hritik: how to share data efficiently decentralized: bit-torrent? - Philippe: still working on deployment - TODO: add Azure pipelines to CI for tests
- ScanCode TK: - Ayan: one PR merged on changing output structure, working to use one class for each package manifest, rather than one for each package manifest - Ayan: new reference scans diff and doc for SCTK https://github.com/nexB/scancode-toolkit-reference-scans
- ScanCode.io: - Jono: https://github.com/nexB/scancode.io-reference-scans needs some update. - Tom: released a new version with the latest TK. Drop Celery for RQ which is better at managing tasks.
- ExtractCode: - Philippe: Bugs and fixes require a new release
- FetchCode: - Pending PR such https://github.com/nexB/fetchcode/pull/70 ... which file need special attention. Todo ask Alexander to setup some live review time or to help focus the review on the specific parts that need attention.
- Package URL: - Lots of PR merged and chatter around OCI images and if a purl is a location or not.
- already 10 days in, so we need to start fast or it will be too late
- Hritik: project board created in VC. other projects that want to participate should join there
- Ayan: Repos, issues and PR need to be tagged accordingly.
- Hacktoberfest: from @Hritik
- ScanCode.io homepage content
- Package URL for RPM and debs.
- FetchCode pending PRs
- ScanCode.io Keycloak PR
- Recent events presentations
- Jono @JonoYang
- Tom @tdruez
- Philippe @pombredanne
- Tushar @TG1999
- Alexander @aalexanderr
- Ayan @AyanSinhaMahapatra
- need just to tag issues with Hacktoberfest for beginners
- Tushar will look into and sync Hritik and report back
- Philippe to work on draft content
- https://github.com/package-url/packageurl-python/issues/62
- could be a great hacktoberfest and there could some minimal sponsoring available too
- Package URL for RPM and debs.
- https://github.com/nexB/fetchcode/pull/71 : ready to merge and merged
- https://github.com/nexB/fetchcode/pull/70
- https://github.com/nexB/fetchcode/pull/54
- https://github.com/nexB/fetchcode/pull/56
Using CLI tools like wget or curl vs. the standard library needs to be discussed in a ticket. See https://github.com/nexB/fetchcode/issues/72
This may be useful or needed for large files with multipart data.
Alexander is trying to deploy SCIO on a public cloud and want it to gate by some login through of using openid connect: now with GH, and later using LF as an identify provider.
Auth should be mostly configuration and not for only one specific auth server.
At the LF OSS Summit, we had two presentations that talked of ScanCode.io:
- Alexander and Krzysztof: https://osselc21.sched.com/event/lAR7/virtual-emerging-automated-license-compliance-for-containers-alexander-mazuruk-krzysztof-opasiak-samsung-rd-institute-poland?iframe=no
- Philippe: https://osselc21.sched.com/event/lAMB/virtual-software-composition-analysis-with-free-tools-philippe-ombredanne-aboutcodeorg-and-nexb-inc?iframe=no
Alexander and Krzysztof will also present to the Open Networking Edge + Kubernetes on October 11th: https://events.linuxfoundation.org/open-networking-edge-summit-north-america/program/schedule/
- changes in package/package-manifests reporting
- scancode TK output format documentation with diffs between versions
- @AyanSinhaMahapatra
- @JonoYang
- @tdruez
@AyanSinhaMahapatra:
Some documentation on how the scancode output data changes across versions is needed as there are upcoming changes on both the package and license data struture. So it would be nice to have a collection of sample codebase to scan for, and perform diffs with sphinx and hosted, in order for adopters to make sense of the changes easily. So is there some thing we can use, scanning which would cover/show most scancode features in the data.
@AyanSinhaMahapatra:
Working on reporting package instances at top-level with data from possibly multiple package manifests and with the files present under that package. Design Doc at: https://docs.google.com/document/d/1cHAxXZ_VxwEDxRF4BcOXTSSjGp3-_tYLVxXTx2X8oC4/edit?usp=sharing
@JonoYang:
It would be useful to have:
- npm manifests and node_modules directories
- different python manifests in a same directory
to check these features of having package instances and one instance being created from multiple package manifests data. These should also be there in the samples part to effectively document and show diffs.
- planning process
- scancode TK format changes
- ONAP presentation
- license scanning campaign (Debian and Alpine)
- @aalexanderr
- @pombredanne
- @kopasiak
- @AyanSinhaMahapatra
- @tdruez
- @Hritik14
- @JonoYang
The idea would be to add a simple ROADMAP.rst to each repo. And ensure that each project gets its time in turns in the spotlight during the weekly call so that we can review and update the roadmaps, focusing on one at a time.
@pombredanne:
- Would recognize package manifests
- multiple manifests contribute to making a package
- generally the plan is to decouple low level scan/detections that are tied to a file and/or positions within a file, and conflate several of these in a single reported value still keeping the details of the per-file and per lien matches.
For instance:
- multiple package manifests form one package and its files
- multiple license detections form one inferred license expression in a given file
- multiple copyright statements may refer to one copyright holder
@kopasiak:
- ONAP is a comprehensive platform for management and automation of network and telco services for easy scaling and monitoring
See https://docs.onap.org/en/latest/guides/onap-developer/architecture/onap-architecture.html#onap-architecture for more docs.
- License compliance is important to ONAP. The project is deployed using 100's of container images, mostly using Alpine Linux.
- Using ScanCode.io will help ensure that compliant and vetted images are used
@kopasiak:
- have some openstack infrastructure which can be used to scan packages
- for Alpine, which versions to scan?
- an estimate of the machine resource needed would be needed before starting
- whether CPU/RAM/DISK bound
@pombredanne:
- scanning is mostly CPU bound
- Versioning scancode toolkit
- Debian license improvement campaign, possibly also on alpine
- Alpine WIP with maintainers on how to get to a source package
- Docker/container model in SCIO
- @aalexanderr
- @pombredanne
- @Hritik14
- @JonoYang
- @TG1999
- @tdruez
There is a need to create a graph with dot the dependencies of container images.
- there is a need for both new data structure
- and new data to support these
Alexander will create a ticket for this. And will also enter a ticket to avoid re-scan already scans based on checksums.
- Given a binary Alpine package, it is not possible to get to the corresponding source package directly. Each of community, main, non-free, scripts, testing, unmaintained needs to be tried in turn until the package name is found. This is problematic.
- Alexander will get in touch with Alpine maintainers... Mateuz has a pending patch on apktools to fix this.
- The idea of these projects is to organize campaigns to massively improve licensing documentation quality and contribute this upstream.
- first targets are Debian and Alpine.
- This will need some serious sponsoring: TBD with LF projects and other sponsors
- next step: Philippe to draft one pager so we can start engaging possible sponsors.
- calver is not super useful. We are switching back to plain semver. We can start at 22.0.0
- Alexander suggested why not just 30.0.0 instead? This will separate it from calver and make a nice round basis for next semver compatible releases.
- next step: Philippe to draft doc and use the new way on SCTK
- Alexander will be speaking on OSPO conference and on Open kubernetes and will mention ScanCode.io!
- @aalexanderr
- @AyanSinhaMahapatra
- @pombredanne
Agenda:
- PR to FetchCode that is ready to merge
- Versioning data format on ScanCode toolkit
- Design update on package ScanCode models
- Misc: Debian package formats updates
- Adding image id to package model
- pip updates questions
Discussion:
Alexander:
- What remains to be done on pip attribution
- https://github.com/nexB/fetchcode/pull/70/files
Phillipe:
- Add a SPDX license identifier tag for files would be straightforward
Alexander:
- Should we support typing in fetchcode
Phillipe:
- It should be enforced and universally applied for it to be useful
- Don't have to change if typing already added
Alexander:
- DCO check failing on two commits as they are code from pip didn't add signoffs
Phillipe:
- It doesn't have to be your code for you to signoff, you just need to have rights to push that
Alexander:
- adding image ID to scancode IO package model
Phillipe:
- we should not have anything to our model that is specific to the pipeline, but this would be important
- let's put this in a ticket and also discuss next week with @tdruez
Phillipe:
- Debian copyright scanning for structured files now don't have line numbers. To add this changes has to be added to debian-inspector, replacing email module with a new parser with line tracking capabilities.
Alexander:
- Connection alive bug and one ONAP image scanning failed in scancode.io
Phillipe:
- These are bugs and issues should be opened
Ayan:
- Versioning the Output Data Format for scancode introduced. --future-format flag now removed as it's hard to implement two supported versions.
- Changes to the package format planned, with new top-level packages (instances) and file level package metadata reporting. See https://github.com/nexB/scancode-toolkit/projects/10 for more details.
- @AvishrantsSh
- @akugarg
- @AyanSinhaMahapatra
- @JonoYang
- @pombredanne
- @TG1999
- @Hritik14
- @tdruez
Agenda:
- GSOC wrap-up
- Data Versionning in ScanCode Toolkit: discussing https://github.com/nexB/scancode-toolkit/issues/2653
- FetchCode session with Samsung: reporting on the discussion
For next week, we will have a 10/15 minutes session on each GSoC project as a wrap up where each GSoc student will present its project, and make a quick demo.
GSoC:
- AvishrantsSh: Wrapping GSoC things up , submitted the final version of evaluation and released a new version of the plugin on PyPI.
- Akanksha: Submitted the final version of evaluation, need help to wrap the LicenseMatch for unknown license detection.
- Hritik: Working on the new improver design for VulnerableCode and project documentation. Discussed imports
- (Pratik could not join)
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @JonoYang
- @pombredanne
- @tdruez
- @TG1999
Agenda: - GSOC status
Akanksha:
- Following file references to other files in licensedcode
- Now, just in same dir, should whole codebase be done
Phillipe:
- Look only in current is fine and should cover most cases
- The other case is see license in root and this is complex because finding root is complicated and depends on context
- need to create ticket for package ecosystem specific referenced file checks
Avishrant:
- working on making all the tests work for the GLC pipeline
- documentation on adding a new pipeline
- Is it okay to have the final report just as a .rst file instead of RTD
Phillipe:
- Yes perfectly okay as there is no RTD for the
Hritik:
- working on inference
- Not sure about having different confidence levels, would be inference if not full confidence
Phillipe:
- Not sure on the naming of inference, needs refinement
- Would discuss in details in the vulnerablecode meeting tomorrow
Pratik
- Working on documentation, and final report
- Asked if it was okay to have the final report in the wiki
Phillipe:
- having it as .rst files in RTD is best because there are tests and better than seperate wiki
Ayan:
- need to remove the old wiki contents and link to corresponding RTD sites in deltacode
Ayan:
- GSoC evaluation forms will open today/tomorrow, deadline on 23rd for students.
- Will follow up on activating RTD for vulnerablecode and deltacode
Phillipe:
- have pushed a release prep on fetchcode
- added some issues with fetchcode, on better tracing and other problems
- monorepo vs manyrepo, should have a discussion on this next week
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @JonoYang
- @pombredanne
- @TG1999
Agenda: - GSOC status
Akanksha:
- Following file references to other files in licensedcode
- Added PR for adding referenced_filenames to API, working on feedback that it should be in matched_rule and not resource_attribute
- Added new licenses which were not detected
Avishrant:
- working on adding documentation for the GLC pipeline
Hritik:
- working on importer resturcturing (some problems with Oval based importers, looking into them)
- added configure files for documentation
Ayan:
- Will follow up on adding RTD page for vulnarablecode
Pratik
- fixing the deltacode documentation , and adding additional documentation for the use of docker image
Ayan:
- We need to review PR https://github.com/nexB/fetchcode/pull/54
Tushar:
- Mostly ready to relase as a package
- Will look into issues and ping for discussion
Phillipe:
- Will review scancode.io PR which depends on this
Avishrant:
- Recieved a mail from google on writing reports, where should it belong
Ayan:
- Will share GSoC reports from previous years
-
- It is good to have them in RTD or wikis, instead of having blogs/docs present elsewhere, as they are
- more permanent links. Benificial for the project, the participant to link to, and for future participants.
- Would be nice, but not mandatory, if there are blogs/other documentation on experience and POV, link to those
Some Previous Reports:
- https://github.com/nexB/aboutcode/blob/master/docs/source/gsoc/gsoc19_final_report.rst
- https://scancode-toolkit.readthedocs.io/en/latest/contribute/gsoc19_final_report.html
- https://scancode-toolkit.readthedocs.io/en/latest/contribute/gsoc17_final_report.html
- https://gist.github.com/sbs2001/26d42784e738c078a97e3904e8833fc6
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @JonoYang
- @pombredanne
Agenda: - GSOC status
Akanksha:
- Working on following file references to other files
- Question on whether existing unknown matches should be replaced with new resolved ones
Phillipe:
-
- There are two cases
-
- when added to the license plugin, matches should not be replaced, just new match added
- in packagedcode, in specific package manifests (like npm), they can be replaced as this are official specification for declaring license
Avishrant:
- the glc-pipeline repo is generated from skeleton
- working on packaging the pipeline, problems on adding scancode.io as a requirement have tried extra_requires, installing from git
- adding test cases
Phillipe:
-
- There are various solutions
-
- make scancode.io available in pypi and have then have it in dependencies
- install scancode.io locally as wheel (should do this to test now anyway)
- have a installation script
Hritik:
- changing the structure of importer (did it for one importer)
- added basic files for documentation
- which distros are/should be supported and how to mention that in docs
Phillipe:
- we need to run tests on CI to support distros
Ayan:
- Will add config and other files for basic RTD setup
Pratik
- Adding CSV output option in the deltacode CLI from script
- https://github.com/nexB/deltacode/issues/179 added later
Phillipe:
- Usually a good idea to create a ticket first
Philippe:
- new scancode released
- Would make python 3.7-3.9 default as 3.6 nears EOL
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @tdruez
- @TG1999
- @JonoYang
- @pombredanne
Agenda: - GSOC status - fetchcode - scancode-toolkit updates
Avishrant:
- will work on memory issues in go side (at conversions)
- documentation on the pipeline
Phillipe:
- important to fix the bugs but more important to finish first
- create a ticket on that and postpone that
Thomas:
- Have published a repo for the glc pipeline, https://github.com/nexB/scancodeio.io-pipeline-glc_scan
Pratik
- having scancode options in deltacode results
- issues pointed by steven (on removing redundant models)
- work on Documentation
Phillipe:
- Ping for session, some planning on the fingerprints side
Hritik:
- implemented rate limiters
- have to restructure importers and make it easier to contribute importers
- sorting imports and tests
- docker bug fix (review needed)
- subversion http webdab
Phillipe:
- we want to design an aunthentication service which could be common with scancode.io
- make subversion as a requirement and use xml output
- discussion on subversion
- PR for nixOS packaging was submitted. CI being brittle because of that
Akanksha:
- (by text) could not join today not feeling well
Philippe:
- refactoring in fetchcode
- working on alpine apkbuild parsers
- project versioning (semver vs calver) https://github.com/nexB/scancode-toolkit/issues/2601
Jono:
- Extractcode bug replacing spaces with underscore, added fixes for that
- update package detection for miu files
- new releases for commoncode, extractcode
Ayan:
- working on parsers for cocoapod lockfiles (getting dependencies of xcode projects and link to their specs json)
- getting package objects for parsing podspec.jsons which are present in Cocoapods/Specs
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @majurg
- @pratikrocks
- @tdruez
- @pombredanne
Agenda: - GSOC status
Akanksha: - working on PR unknown-unknowns, adding unknowns matches where there are none based on n-grams (some blockers, will continue discussion) - also working on following license references to another file
Ayan:
-
- It would make sense for now to follow [this comment](https://github.com/nexB/scancode-toolkit/issues/1364#issuecomment-869995820) but just
- for file references in the same directory and implement it in a post processing step in license scan plugin (process_codebase function) instead of in a seperate post-scan plugin.
Pratik
- PRs merged and more in review
- work on Documentation
Steven:
- Will add more issues on the specific tasks.
Ayan:
- Updated project board to have only ToDo, In Progress and Done columns, arrange tickets accordingly.
Avishrant:
- rebased on google licenseclassifier upstream
- working on mapping for license (glc handles notices and headers differently than scancode)
- working on fixing bug that is caused by filestreams opened
Phillipe:
- We need to focus on having the format conversion and not on modifying tool behaviour
- Binary files/files larger than a size could be ignored
- Open ticket in google licenseclassifierwith the problem
Hritik: - Fixing mattermost and mozilla importers - Rate Limiters - Opened https://github.com/nexB/vulnerablecode/issues/506
Phillipe:
- Open ticket on API rate limiters politely
Philippe: - working on versioning the JSON format of SCTK - accepting @tdruez's suggestion on having that as an experimental feature and will not be a default change, will be made default in later versions - Presented at UCSC's CROSS, on Open Source Compliance License Tools. Link - https://www.crowdcast.io/e/open-source-compliance
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @tdruez
- @pombredanne
Agenda: - GSOC status
Evaluation for Phase 1
- Output format changes in SCTK
Pratik - test CLI - work on Documentation - large PR ready to merge - TODO: have a session to work on fingerprints formats
Akanksha: - working on PR work unknowns - will update to have a single
Avishrant: - rebased on google licenseclassifier upstream - working on mapping for license - working on test cases for the module and now for the pipeline
Hritik: - Fixing mattermost and mozilla importers - Found new JSON API to get all mozilla products versions
Philippe: - discussion of versioning the JSON format of SCTK - Proposal: - add a new top level version format attribute
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @majurg
- @pratikrocks
- @tdruez
- @TG1999
- @pombredanne
- GSoC project updates
- Other projects
@ ScanCode.io
- Have a working pipeline
- Submitted upstream ticket for Go Classifier
- Rebased modifications for Go Classifier
-
- are there ways to ignorable files like binaries?
-
- best would be to have that in ScanCode.io
-
- Should I use the skeleton?
-
- this can wait.
TODO: we need to make a presentation on how to use the skeleton next week
@tdruez:
- Made some tests on the pipeline and have some issues to review
- Working on adding new data structure to license: done
- What is next? either improve low score detection of licenses or unknown/unknown license detection - @ayan and @philippe : unknown/unknown license detection - @ayan and @philippe : should be in Sancode TK
-
- should "See license" be worked on next?
-
- @ayan and @philippe : unknown/unknown license detection is best first
- adding extensive documentation on DeltaCode - the wiki part should be best moved in the main repo docs directory
- PR for 1st phase ready but some CI issues on Windows - create a ticket as this may be a problem with an outdated skeleton configure.bat file
-
Working on importers - fixing mozilla importers - next is openstack
-
Some issues: - issues in the way: should I solve first or later? - documentation is weak and especially at the low level of the code
- adopt doc standard from Linux Kernel
-
Timing of VulnerableCode meedting needs to be workd out
@TG1999 @ FetchCode
- major restructuration of the code reviewed and needs to be reviewed by a second pair of eyes
@TG1999 in general: we shoudl have smaller PRs when possible. Bit ones are hard to review
New Gitter room created for off topic discussion from @Hritik14 request https://gitter.im/aboutcode-org/coffee-room
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @tdruez
- GSoC project updates
- Have been able to make a pipeline with LicenseClassifier
- Working on Multiprocessing and efficiency issues
- Adding copyright detection as LicenseClassifier V2 doesn't have copyrights detected
@tdruez:
- It doesn not make sense to add functionalitites to the projects, we just want to create a pipeline with the project as it is, so no need to work on adding copyrights detection
- Should work on documentating the process of adding the pipeline, the issues faced, about installing the package and running the pipeline
- Making a branch on scancode.io proper for review and feedback would be better, and point to docs to install the package and run the pipeline
- create Unit tests on running the pipeline
- [sound issues so could not speak, posted status on discuss]
- Hey! @/all I was having some sound issues in today's meeting .
- I was firstly working on addition of new flag in models definition which is completed!
- Moving on to next part i.e. Reporting Unknown licenses separately I have created a PR nexB/scancode-toolkit#2578 .
- As ayan said instead of having a subsection in licenses itself we need to have a separate section for "unknown" ones.
- Also I am working parallely on "Following indirect references" in files.
- Pushed PR: https://github.com/nexB/scancode-toolkit/pull/2578 on reporting unknown licenses seperately
@ayan:
- The https://github.com/nexB/scancode-toolkit/pull/2548 PR is almost done, there's one tests failure but could be not related to what's added (?) I'll check this.
- On #2578, we need to add unknown_licenses as a CodebaseResource, rather than adding it inside licenses.
- Need to sync with phillipe, on the design and how to go ahead, will set up a sync meeting for tomorrow
- Please post a status update on the Chat
- Work on virtualcodebase is ready for review and comments on PR has been addressed
- Working on documenting the changes made
- Also add general docs to be posted in RTD: https://github.com/nexB/deltacode/issues/133
- Working on test speed improvements: https://github.com/nexB/vulnerablecode/pull/490
- General work on vulnerablecode has progressed, would focus on importers
- Working on adding importers, will push PRs on that next
@tdruez:
- Please make sure you leave status updates and post regularly to keep us updated on the work, and let us know about blockers.
- Keep status updates on the main public chat, as other would be able to see them too.
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @JonoYang
- @pratikrocks
- GSoC project updates
- Working on improving license data model definition
- Moving onto reporting known licenses and unknown licenses separately
- Work on virtualcodebase is ready for review
- Working on additional test cases, documenting the changes made, remove unused dependencies from project
- Working on speed improvements
- Begin adding importers, create Contributing.md file
- Worked on scancode.io pipeline for google license classifier
- @Hritik14 asked if we should also discuss documentation updates related to GSoD in the GSoC call
- It would behoove us to combine both calls so we are on the same page regarding documentation
- Reminder that evaluations start on 2021-07-12
- Avishrant @AvishrantsSh
- Shivam @sbs2001
- Tushar @tg1999
- Philippe @pombredanne
- Thomas @tdruez
- Dennis @DennisClark
- Pratick @pratikrocks
- Steven @majurg
- Akanksha @akugarg
- Ayan Mahapatra @AyanSinhaMahapatra
- Hritik @Hritik14
- GSoC projects status
- ScanCode.io integration with VulnerableCode
- Q: We need a project boards for each GSoC project
- A: Philippe to send invites as GitHub commitetsr yo: Akanksha on ScanCode Toolkit, Pratick on DeltaCode, Avishrant on ScanCode.io
- Working on ScanCode TK license models changes to add "is_unknown" flag. Had questions on models resolved by Ayan.
- Made PR on CommonCode that was merged.
- other PR for fingerprint support is pending for review. Steven will check out.
- Discussion about options for Python integration for Go: either as a command line subprocess or using a shared library integration (native, cffi or ctypes)
Some questions:
- Q: I have some issues with ScanCode.io pipelines failing
- A: best is to enter an issue with error log
- Q: Do I need to support multiple OSses?
- A: not needed. For your project this is only Linux
- Working on performance for VulnerableCode with a major performance improvements
- Working on improving tests speeds
Some questions:
- Q: what should be our main channels of communications?
- A: instant discussions on chat, anything that needs to persist goes in tickets
- Q: GSoC evaluations: do we need daily work log?
- A: Nope. The code and commits is all that's needed, but you are welcome to keep your own if you find it useful for you
New importers additions/questions from Shivam pending in the chat
Tushar: New contribution for fetching details for Alpine Docker images for https://github.com/nexB/scancode.io/issues/194