Giter Club home page Giter Club logo

repology-updater's Issues

No-lonely mode for repository

Need a way to make specific repositories not produce lonely packages. This will allow incompatible repositories which have too many packages which do not match with other repos to still take part in comparison. Useful for OpenSUSE as long as it's based on binary package lists and non-unix repositories like F-Droid and Chocolatey which have too many non-portable projects.

Add sisyphus support

Code is there, need an easy way to download all .spec files. Todo: concact sisyphus guys

Parallel downloading/parsing

Since repositories are independent, this should be easy to fetch and parse them in parallel. And it will really be useful for slow repositories such as pkgsrc (index generation) and Fedora (slow fetching)

Improve pkgsrc support

Currently we use list of packages instead of parsing pkgsrc, because the latter process (make index) is too slow. Not sure what we can do here now though.

Don't leave partial state after failed fetch

Failing to fetch single repository shouldn't interrupt the whole fetch, it should only stop single repository from updating. Also, failure to fetch most repositories should remove their state, so incomplete state is not parsed.

Allow multiple packages per repo

Currenly, each metapackage only allow single package per repo (e.g. only one package for FreeBSD). This leads to shadowing and information loss (when e.g. php55, php56, php70 are merged into a single metapackage). Allow multiple packages per repo, always take highest version, but leave all information there.

Script for prototype repology.org generation

Before proper dynamic backend is developed, let's just make a static site generator. Required features:

  • Pagefied package lists
    • Plain (just packages)
    • Outdated for each repository
    • Absent for each repostitory
    • Per maintainer
    • Per category
  • Summary page for each package

Website functionality ideas

  • Browse pagefied package lists A...Z
  • Browse by category
  • Browse by maintainer
  • Browse outdated packages per repository
  • Browse by packages (packages with best support along distros)
  • Compare repositories
  • Compare maintainers
  • Suggest maintainer with same interests

Package merging rules

Packages are named differently across repos. Need rules to merge differently named packages into single entity. Need single package rules (extreme-tuxracer + extremetuxracer) as well as generic rules (FreeBSD: p5-Foo-Bar, Debian libfoo-bar-perl).

Detect dates in versions

E.g. 20[01][0-9]\.?[01][0-9]\.?[0123][0-9]

What to do with this:

  • Convert with single format (see abcmidi package problem) to fix comparison
  • If all items contain date, compare these instead (other parts are likely non-unformative, e.g. 0.0.20160916 vs. git20160916 vs. 2016.09.16

Improve pagination

To make pagination more usable, it needs to work with package names (e.g. aa..ak, ak..bc instead of 1, 2)

Add Fedora packages

Fetchable via web: https://admin.fedoraproject.org/pkgdb/packages/

Available for testing in newrepos branch. Still TODO:

  • Improve fetching. Sequential fetching takes ~6 hours. Maybe do a parallel fetch.
  • Improve parsing. Need more clever .spec parser which will be useful for other RPM repos.
    • Add substitution support (Package: %foovar%)
    • Fix parsing of multiline fields (%description)

Gentoo needs more package splits

Need to be split: btf (dev-java and sci-libs)

Actually, a lot more:

find gentoo.git -type d -maxdepth 2 -mindepth 2 |
   egrep -v 'dev-(perl|python|haskell)' |
   awk -F/ '{print $NF}' |
   sort | uniq -d

ace acl ada amap analog apel atlas attica auctex baloo balsa barcode bbdb bfm binclock bluedevil bson btf build c-support calc calendar cdcover cdrtools charm checkpassword coffee-script color crystal csv daemontools dash dictionary dirdiff docker dolphin ebuild-mode ecb eject elib emacs ess exo fam fcgi ffmpeg fuse gambit gdl git glade glu gnupg gnuplot gom gpgme grip gsasl haskell-mode highline icecream igrep info jack jal jama jde jpeg json kactivities kde-gtk-config kdeplasma-addons kdesu kfilemetadata kglobalaccel khotkeys kinfocenter kmenuedit knewstuff krunner kscreen kstart ksysguard kwin kwrited languagetool launchy lemon libelf libffi libgudev libiconv libintl libkscreen libnet libusb locale lookup lzma magic mailcrypt mailx man mars mash mavros mc mediawiki mew milou mime-types mldonkey mmix mmm-mode modutils mongo mpack mpc msgpack muse mysql nagios nemesis ninja nitrogen notification-daemon nut ocaml openmsx otter pam par pcl pdv picard pkgconfig planner plasma-mediacenter plasma-nm plasma-workspace pmake pms polkit-kde-agent polyglot powerdevil psgml psi python-mode rails re2 redis reduce riece ruby rubygems screen session shadow signify silo ski skkserv slim slurm smack sml-mode snappy spice spin splat splice sqlite3 ssh surf systemsettings szip teco texinfo tf time tokyocabinet tornado tree uclibc udev vc vm w3m xclip xslide xsp yacc zenburn zenirc

Add support for AUR

How do I download a single-file AUR index (the one pacman uses?) or whole AUR as a single repository? No idea for now. As last resort, AUR website may be scrapped.

Additional repository support ideas

Please share any ideas on what additional repositories we can support. A description on how to fetch all package data from specific repository is preferred. Approved repositories with determined fetching algorithm are split to separate bugs and eventually implemented.

Classic *nix package repositories

  • Any RPM repos
    • Fedora (see #36)
    • OpenSUSE (see #44)
    • AltLinux Sisyphus (see #24)
    • Fedora EPEL
  • slackware (#331)
  • homebrew (see #198)
  • DragonFlyBSD's dports (pretty much the same as FreeBSD ports minus some packages which don't build on DragonFly)
  • 💡 VectorLinux

From Fedora release-monitoring

Unsorted

Other platforms

  • NixOS
  • YACP (Yet another cygwin ports) (though project is somewhat inactive)
  • Rosa
  • GuixSD
  • 💡 OpenPandora
  • buckaroo 👍 json recipes, 👎 custom naming (fixed by some rules). Also looks dead already (true, 75% outdatedness).
  • MSYS2 (see #262 )

Since these will contain too many unrelevant unique packages, doable as shadow repos:

  • Chocolatey (see #43)
  • F-Droid (only a handful of packages manually whitelisted, too much android garbage)

Upstream repos

Doable as shadow repos as well

  • CPAN (perl packages)
  • PyPi (python packages)
  • RubyGems
  • 💡 Others? (node, etc).
  • 💡 GitHub, for projects which fetch from there. Need feedback loop here (parse normal repos -> get github urls -> parse github urls for latest tags)

New version detectors

  • 💡 FreeBSD's portscout
  • 💡 pkgsrc's thingy
  • 💡 AllMyChanges.com just for completeness, usefullness is questionable: it's focused on iOS apps and stuff which intersects with *nix software is fairly outdated

...more ideas?

Integrate with vulnerability databases

Mark vulnerable package versions

The plan:

  • Implement harvesting CPE data from upstream repositories
    • ❌ GUIX contains cpe_name (is useless without vendor)
    • ❌ FreeBSD ports define CPE_VENDOR and CPE_PRODUCT, but these are not exposed in INDEX
    • Gentoo contains usable CPE metadata
  • Implement database storage for project → CPE relations
  • Implement fetching and parsing CPE data (https://nvd.nist.gov/vuln/data-feeds#JSON_FEED)
  • Implement setting vulnerable flag for affected packages
    • Match incoming packages against vulnerable version ranges in the database
    • Force project update on new CPE for it (by resetting its hash)
    • It turned out to be viable to bulk update vulnerable status on all packages no it didn't, as we can't update binding tables properly this way to be able to do filtering based on vulnerable property
  • Implement stub for handling patched vulnerabilty information from repositories (discussed in #1045
  • Add vulnerable flag to binding tables to allow using it in project filtering
  • Integrate vulnerability updates into delta update process properly
    • During update, only run on incoming packages
    • When vulnerabilities are updated, queue affected projects in dedicated table, reset their hashes before pushing new packages
  • Add per-maintainer and per-repository vulnerable packages/projects counters
  • When all code is in place, force update for all projects with defined cpe_vendor/cpe_product
  • Add per-project vulnerability flag in order to be able to show "Vulns" project page conditionally
  • Add/ensure that vulnerability counts are saved in repository history, so we can plot graphs
  • Implement vulnerability based history events

Support for removed packages

If package was removed, it's probably problematic and there's little reason bringing it back. For FreeBSD, may use MOVED file

Add unit tests

At least for rules processing and version comparison done, now complete unit test is needed, which includes parsing and processing fake repository data.

Add OpenSUSE support

Available for testing in newrepos branch. Uses binary package lists, so partially unsuitable for comparison, will be implemented as a shadow repo. Also contains too little info. Investigate a possibility of fetching complete data.

Rules TODO

  • apmod:perl
  • apmod:python
  • apmod:wsgi
  • apr (ignore on freebsd)
  • argus, argus-clients (-sasl on freebsd)
  • asterisk (merge 11, 13 on freebsd)
  • autodia (gentoo wtf)
  • bonnie (pkgsrc wtf)

Extend version ignores

  • Ignore only newer (single version bad)
  • Ignore always (versioning schema totally broken)

Split repo fetching and parsing

These are not tied together. Fetching is more generic and may be common to multiple parsing techniques. Currently, there are just 2 types of fetchers:

  • plain file [with options to gunzip or bunzip it after downloading]
  • git repository

Track package state changes

Track package state changes (such as version updates, new packages). Provide RSS streams of such events, display icons for recently updated/added packages.

Badges support

For named package, generate a badge with repositories it's present in with version info

Add chocolatey support

Fetcher is rather trivial: get https://chocolatey.org/api/v2/Packages()?$filter=IsLatestVersion, parse XML, get next page from <link rel="next" href="">. All package info is available in XML, including name, version, tags, comment, author, www.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.