Giter Club home page Giter Club logo

Comments (21)

marcespie avatar marcespie commented on June 20, 2024

I've had a quick look at the parser, the DISTFILES and MASTER_SITES* fields you are using are indeed gone.

We've changed the layout in bsd.port.mk, so that we can have arbitrary DISTFILES.sufx variables, and SITES.sufx that match (also taken the opportunity to shorten the name). Accordingly, there's no longer a fixed list of MASTER_SITES names, thus it's gone from the main table.

I don't know what you use DISTFILES and MASTER_SITES for, but at a guess, you might want to use the "simpler" ROACH_URL and ROACH_SITES variables instead.

On complicated ports with multiple origins, these now should contain the main url that we fetch for the package being built, and the corresponding originating sites (ROACH_URL has all the "fun" parts of OpenBSD-style MASTER_SITES already parsed).

That should actually make the data you report more accurate, hopefully.

If you really need to, you can instead find the full data in subsidiary tables:
_sites contain all the sites for each package path, N is actually the suffix used.
_distfiles contains all the various files that get fetched (but the path{url}sufx transformation was not applied), where type is 0/1/2 depending whether it's a DISTFILES, PATCHFILES, or SUPDISTFILES.

from repology-updater.

AMDmi3 avatar AMDmi3 commented on June 20, 2024

Thanks for the heads up. I'll try to update the parser this week, until then I have to disable openbsd repo.

from repology-updater.

didickman avatar didickman commented on June 20, 2024

I use repology for openbsd very regularly so losing it on repology isn’t ideal. Instead of taking down the entire thing, would it be possible to just show the main master site as a quick stop gap for now? I think that will show the same as before for about 93% of ports as most ports don’t need more than one master site. Example diff here: repology_diff

from repology-updater.

AMDmi3 avatar AMDmi3 commented on June 20, 2024

I'm working on this. For now I came up with this query to return all site+distfile pairs for all ports, which can then be converted to real URLs. Does it look right? I'm particularly puzzled with _distfiles' SUFX vs N but it seems to work with ports using either.

SELECT
    _distfiles.FullPkgPath AS FullPkgPath,
    _sites.Value AS Sites,
    _fetchfiles.Value AS Files
FROM _distfiles
    JOIN _fetchfiles
        ON KeyRef=_Distfiles.Value
    JOIN _sites
        ON _sites.FullPkgPath=_Distfiles.FullPkgPath AND (coalesce(_distfiles.N, _distfiles.SUFX) is not distinct from _sites.N)

from repology-updater.

didickman avatar didickman commented on June 20, 2024

What does repology need exactly? The OpenBSD sqlports package includes a man page which has some light documentation. I'm not sure if you've seen it, but if not I've uploaded it here in case it is helpful: sqlports man page

The database has a table called Meta which might be useful to check for freshness and schema versioning.

sqlite> select * from Meta;
SchemaVersion  Hash                                          CreationDate
-------------  --------------------------------------------  ------------
7.48           e9fy133wnJkyctyUUC3P8MtkwdSShLQHLX+Plp6DZqo=  2023-09-17

Once that's known, the raw tables are the ones starting with underscores. So for example I see repology is using the raw tables today. Is that what you want to do?

Then there are some convenience views which merge the data in the raw tables. Those convenience views do not start with underscores.

The "Ports" view may be very useful to use. But since it's a view it's slow, so the results from the dynamic view are also stored as static data in "PortsQ" which breaks the pattern and is actually a table.

I'm not an expert on sqlports, but if you can share how you need the data for repology I can try to work it out with you. @marcespie is the real expert here. Maybe the PortsQ table is what Marc was suggesting for you to use when he mentioned using the ROACH_SITES and ROACH_URL variables? Here is an example:

sqlite> select fullpkgpath, roach_sites, ROACH_URL from portsq limit 1;
FullPkgPath        ROACH_SITES                                                  ROACH_URL
-----------------  -----------------------------------------------------------  --------------------
archivers/ancient  https://github.com/temisu/ancient/archive/refs/tags/v2.1.1/  ancient-2.1.1.tar.gz

Nevertheless if you can describe what repology needs it might be helpful for those of us on the OpenBSD side to try to help.

from repology-updater.

marcespie avatar marcespie commented on June 20, 2024

from repology-updater.

AMDmi3 avatar AMDmi3 commented on June 20, 2024

What does repology need exactly?

https://repology.org/docs/requirements
TLDR: identifiers/names (portname, pkgname, origin), version, summary, maintainer, categories, licenses, cpe info, homepage and download urls. In the future, depends.

The OpenBSD sqlports package includes a man page which has some light documentation. I'm not sure if you've seen it, but if not I've uploaded it here in case it is helpful: sqlports man page

Thank you, but it contains only the basic info which is already clear from the database structure. It doesn't answer my question regarding SUFX vs N, neither does basic description you've given.

from repology-updater.

AMDmi3 avatar AMDmi3 commented on June 20, 2024

N has to do with the distfiles table containing DISTFILES, SUPDISTFILES
and PATCHFILES.

As I understand, that is _distfiles.Type, not N.

Back to N and SUFX, if my guess is right, this is an example of distfiles-sites linkage via SUFX:

sqlite> select * from _distfiles where fullpkgpath=4448;
FullPkgPath|Value|N|SUFX|Type
4448|18545||.github|0
4448|18546||.github|0
sqlite> select * from _sites where fullpkgpath=4448;
FullPkgPath|N|Value
4448|.github|https://github.com/
4448|.gitlab|https://gitlab.com/
4448|.srht|https://git.sr.ht/

And this is and example of linkage via N:

sqlite> select * from _distfiles where fullpkgpath=13146;
FullPkgPath|Value|N|SUFX|Type
13146|131315|||0
13146|131316|0||0
13146|131317|1||0
13146|131318|2||0
sqlite> select * from _sites where fullpkgpath=13146;
FullPkgPath|N|Value
13146||https://github.com/stumpwm/stumpwm/archive/
13146|0|https://github.com/edicl/cl-ppcre/archive/
13146|1|https://beta.quicklisp.org/archive/clx/2022-07-07/
13146|2|https://beta.quicklisp.org/archive/alexandria/2022-07-07/

coalesce as in the query above should handle both cases, am I right?

As I mentioned MUCH EARLIER in the thread, unless you really want
the full list of all files which are grabbed by the port, in the
new scheme of things, THE MAIN DISTFILE for a given port will be
located at ROACH_URL, grabbable from ROACH_SITE.

I would have already implemented ROACH_* support much earlier if that was suitable, wouldn't I?
Yes, I need the full list of all files.

from repology-updater.

marcespie avatar marcespie commented on June 20, 2024

from repology-updater.

marcespie avatar marcespie commented on June 20, 2024

As forecasted, I just finished removing MASTER_SITES0..9

So you don't need to worry about N, it's gone in sqlports 7.49

from repology-updater.

didickman avatar didickman commented on June 20, 2024

Fantastic! Thank you so much for the quick adaptation!

Following the final bit of work Marc just did, the tweak mentioned in the comment is now required.

diff --git a/repology/parsers/parsers/openbsd.py b/repology/parsers/parsers/openbsd.py
index 076ad7ac..5962b880 100644
--- a/repology/parsers/parsers/openbsd.py
+++ b/repology/parsers/parsers/openbsd.py
@@ -95,7 +95,6 @@ def _iter_sqlports(path: str) -> Iterator[Port]:
         yield Port(**row_dict, distfiles_cursor=distfiles_cursor)


-# TODO: drop _distfiles.N for sqlports >= 7.49
 _DISTFILES_QUERY = """
 SELECT
     _sites.Value AS sites,
@@ -104,7 +103,7 @@ FROM _distfiles
     JOIN _fetchfiles
         ON KeyRef=_Distfiles.Value
     LEFT JOIN _sites
-        ON _sites.FullPkgPath=_Distfiles.FullPkgPath AND (coalesce(_distfiles.N, _distfiles.SUFX) is not distinct from _sites.N)
+        ON _sites.FullPkgPath=_Distfiles.FullPkgPath AND (_distfiles.SUFX is not distinct from _sites.N)
 WHERE
     _distfiles.FullPkgPath = ?
 """

from repology-updater.

marcespie avatar marcespie commented on June 20, 2024

Following-up (somewhat): in the more generic case of all package distributions, we all have different rules about splitting up things and how "unique" things get.

I think I might get some students (!!!!) looking at it this year.

For starters, the number of packages varies widely depending on distributions, but it doesn't mean the number of supported software is that different (depending on supported versions, available options, various splits in packages).

Speaking for OpenBSD, I know the whole pkgpath/fullpkgpath/flavors/subpackages enchilada can be mighty confusing when coming from outside.

I have zero idea what more info could be provided in views from sqlports to make this less confusing... nor do I know where other distros get their info.

from repology-updater.

AMDmi3 avatar AMDmi3 commented on June 20, 2024

Fantastic! Thank you so much for the quick adaptation!

I won't count that as quick, but you're welcome)

Following the final bit of work Marc just did, the tweak mentioned in the comment is now required.

Yesterday sqlports was still at 7.48 so couldn't drop it right away.

from repology-updater.

marcespie avatar marcespie commented on June 20, 2024

from repology-updater.

AMDmi3 avatar AMDmi3 commented on June 20, 2024

For starters, the number of packages varies widely depending on distributions, but it doesn't mean the number of supported software is that different (depending on supported versions, available options, various splits in packages).

Repology primarily counts projects, which is specifically "number of supported software", so apart from a few exceptions subpackages and splits does not count. For instance, despite libxml2 is packaged

CentOS 6:

  pname  | version 
---------+---------
 libxml2 | 2.7.6

ConanCenter:

  pname  | version 
---------+---------
 libxml2 | 2.10.3
 libxml2 | 2.10.4
 libxml2 | 2.11.3
 libxml2 | 2.11.4
 libxml2 | 2.9.10
 libxml2 | 2.9.12
 libxml2 | 2.9.13
 libxml2 | 2.9.14
 libxml2 | 2.9.9

Cygwin:

          pname          | version 
------------------------+---------
  libxml2                | 2.10.4
  libxml2-debuginfo      | 2.10.4
  libxml2-devel          | 2.10.4
  libxml2-doc            | 2.10.4
  python27-libxml2       | 2.10.4
  python36-libxml2       | 2.10.4
  python37-libxml2       | 2.10.4
  python38-libxml2       | 2.10.4
  python39-libxml2       | 2.10.4
  mingw64-i686-libxml2   | 2.9.4
  mingw64-x86_64-libxml2 | 2.9.4

it's counted once for each repo. Of course there are discrepancies, and you can estimate the error rate by percentage of "unique" projects (not matched with any other repository), which is usually below 15%, or by percentage of unique projects which have "related" ones (with matches by url, which mean they can potentialy be merged with something) which is usually below 10%.

Speaking for OpenBSD, I know the whole pkgpath/fullpkgpath/flavors/subpackages enchilada can be mighty confusing when coming from outside.

I don't have a problem with it. There's minor problem with a lot of flavors which Repology cannot deduplicate yet (thus showing a bunch of similar entries on e.g. package versions page), but that does not affect statistics and statuses and will be fixed on Repology side someday.

I have zero idea what more info could be provided in views from sqlports to make this less confusing... nor do I know where other distros get their info.

As said, I don't feel much need to. From what I can see, most unmatched projects in OpenBSD are truly unique or have ambiguous names and have unmatched counterparts in other repos, so there's no OpenBSD specific problems here. Still, it can be improved somewhat by submitting rules to repology or renaming packages in OpenBSD where that makes sense. For instance, I see some potentially misnamed python modules, like devel/py-test-expect may be better called devel/py-pytest-expect to match PyPI.

Depending on our snaps, you'll soon get 7.49. after all, it still takes a day to run a full bulk...

Yes, it's already there and I've deployed updated parser.

from repology-updater.

marcespie avatar marcespie commented on June 20, 2024

Renaming packages is a bitch. We got a mechanism for that, but it is fairly expensive on the updater-side.

from repology-updater.

AMDmi3 avatar AMDmi3 commented on June 20, 2024

Well, just suggesting, it's not fatal as it is.

from repology-updater.

didickman avatar didickman commented on June 20, 2024

Looking at some false positives on OpenBSD, I noticed repology says security/ghidra is out of date.

However that port is marked BROKEN on all archs right now. So it is not built.

Therefore does the below change make sense? It filters out ports that are broken on ALL archs.

This would stop showing what I feel is a false positive for security/ghidra being vulnarable on OpenBSD. (in fact the security issues were noted as part of the reason it was disabled. see: marc.info

This diff purposely does not filter out ports that are only broken on SOME archs.

diff --git a/repology/parsers/parsers/openbsd.py b/repology/parsers/parsers/openbsd.py
index e88fcc33..6357d297 100644
--- a/repology/parsers/parsers/openbsd.py
+++ b/repology/parsers/parsers/openbsd.py
@@ -58,6 +58,8 @@ FROM _Ports
         ON Categories_ordered.FullPkgpath=_Ports.FullPkgpath
     JOIN _Email
         ON _Email.KeyRef=MAINTAINER
+WHERE
+    _Ports.FullPkgPath NOT IN (SELECT DISTINCT FullPkgPath FROM _Broken where Arch IS NULL)
 """

from repology-updater.

didickman avatar didickman commented on June 20, 2024

Another possible false positive.

I can't figure out why repology says Python 3.10.13 in our tree has a vulnerability.

Blindly guessing I see that we have CVE-2023-40217 which states that some affected versions include "[3.10.0, 3.10.13)". To my knowledge a square bracket on the right would mean version <= 3.10.13, while a parenthesis means version < 3.10.13. i.e. strictly less than version 3.10.13.

Is it possible repology isn't marking the affected versions correctly? Or did I miss a vulnerability that's in Python 3.10.13 on my side?

from repology-updater.

AMDmi3 avatar AMDmi3 commented on June 20, 2024

This would stop showing what I feel is a false positive for security/ghidra being vulnarable on OpenBSD

This makes sense, but I don't feel good about hiding information. Repology's target audience is not only maintainers but also upstream, and visibility of broken packages is especially important for these. Also there are maintainers from other repositories who may find even disabled package useful. Showing it as outdated is also a valid call for action. The point that its vulnerability does not in fact affect users because the package is not available is valid though, and it would be fair to at least exclude it from count of vulnerable packages. This requires a bit of development though (#1352).

I can't figure out why repology says Python 3.10.13 in our tree has a vulnerability

You can click an exclamation mark to get a page which highlights CVEs matched with a given version: https://repology.org/project/python/cves?version=3.10.13

from repology-updater.

marcespie avatar marcespie commented on June 20, 2024

from repology-updater.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.