Giter Club home page Giter Club logo

Comments (11)

sellout avatar sellout commented on September 21, 2024 2

With the fix of haskell-works/hw-string-parse#42, the Hackage parser is happy again. But I’m still interested in making it less fragile.

from repology-updater.

AMDmi3 avatar AMDmi3 commented on September 21, 2024

Yes, parsing is broken: https://repology.org/log/3820044
Someone needs to write a proper parser for cabal format.

from repology-updater.

utdemir avatar utdemir commented on September 21, 2024

Thanks for your reply.

a proper parser for cabal format.

That sounds challenging :). It's a weird format.

I am seeing a Python implementation here, but I think it would be too much work to replicate the entire format with it. Do you think there's a way we can reuse existing Cabal implementation on Haskell side? Something akin to building a cabal-to-json executable with Haskell, and calling it from Python side?

from repology-updater.

AMDmi3 avatar AMDmi3 commented on September 21, 2024

Something akin to building a cabal-to-json executable with Haskell, and calling it from Python side?

I strongly prefer to avoid that.

That sounds challenging :). It's a weird format.

That shouldn't be too challenging as long as there's a spec/PEG somehere. I wrote that parser on a whim and assumed that format is indentation based, while it turns out not to be and we're currently failing on that. So is there a spec?

from repology-updater.

sellout avatar sellout commented on September 21, 2024

I wrote that parser on a whim and assumed that format is indentation based, while it turns out not to be and we're currently failing on that. So is there a spec?

I don’t think there is a spec, unfortunately. But the format is indentation-based.

However, I think the specific parsing failures are secondary. The parser seems to mostly work, with some (or at least one) packages hitting edge cases. Failing to parse a single cabal file shouldn’t be a fatal failure of the Hackage parser – it should be able to continue and have a successful run, with some packages left un-updated.

Perhaps a failure threshold (possibly shared across all parsers) would be worthwhile – if a certain percentage of packages are failing to parse, give up because likely some format changed and the entire parser needs a real update.

from repology-updater.

sellout avatar sellout commented on September 21, 2024

Since the previously posted log has expired (and I’m not sure it failed in the same way), here’s the latest one: https://repology.org/log/4428851. The tail of that file:

2024-04-23 08:10:54   AspectAG/0.7.0.1/AspectAG.cabal: ERROR: link: "www.fing.edu.uy/~jpgarcia/AspectAG" does not look like an URL (schema missing)
2024-04-23 08:10:58   exception-hierarchy/0.1.0.11/exception-hierarchy.cabal: ERROR: link: "yet" does not look like an URL (schema missing)
2024-04-23 08:10:58   hw-string-parse/0.0.0.5/hw-string-parse.cabal: ERROR: parsing failed (fatal): KeyError: 'name'
2024-04-23 08:10:58 ERROR: KeyError: 'name'

There seem to be a number of non-fatal errors, where the Hackage parser continues happily to the next Cabal file, but then the failure of hw-string-parse escapes the normal failure handling and causes the entire Hackage parse to fail. Unfortunately, my Python is quite weak, so it’s not immediately obvious to me where or how to catch the failure that’s escaping.

from repology-updater.

AMDmi3 avatar AMDmi3 commented on September 21, 2024

It fails on the last mentioned cabal file:

 cabal-version: 2.2

name:                   hw-string-parse
version:                0.0.0.5
x-revision: 2
synopsis:               String parser
description:            Please see README.md
category:               Data, Bit
stability:              Experimental
homepage:               http://github.com/haskell-works/hw-string-parse#readme
bug-reports:            https://github.com/haskell-works/hw-string-parse/issues
author:                 John Ky
maintainer:             [email protected]
copyright:              2016-2021 John Ky
license:                BSD-3-Clause
license-file:           LICENSE
tested-with:            GHC == 9.2.2, GHC == 9.0.2, GHC == 8.10.7, GHC == 8.8.4, GHC == 8.6.5
build-type:             Simple
extra-source-files:     README.md

The parser assumes indentation based format, while the indentation is broken.

from repology-updater.

sellout avatar sellout commented on September 21, 2024

Yeah, that looks like a bug in the Cabal file. The Cabal docs don’t allow whitespace before cabal-version. I’m not sure what Cabal itself does with that.

  1. Discards the field as part of a missing stazna and parses the file as Cabal 1.1 (or whatever version was before cabal-version was required)?
  2. Parses a bit more liberally than the ABNF allows, and reads the intended cabal-version?

In either case, that doesn’t look like an issue with Repology’s Hackage parser, so then the only issue here is that the failure of that Cabal file isn’t contained, but leads to the failure of the entire Hackage parser.

from repology-updater.

AMDmi3 avatar AMDmi3 commented on September 21, 2024

issue here is that the failure of that Cabal file isn’t contained, but leads to the failure of the entire Hackage parser.

This behavior is intentional.

from repology-updater.

sellout avatar sellout commented on September 21, 2024

This behavior is intentional.

Why is it intentional? It seems like allowing a single broken package to prevent the other 17k packages from being updated is a bit unbalanced.

I understand that Repology would want to be made aware of failures and attempt to correct them, but the logs already contain that data, whether or not the parser actually fails. I see that packaged can be “ignored” in some way – would that be the right way to avoid this? (Just asking out of curiosity – I opened an issue against hw-string-parse (linked above), so I’m hoping this particular case can be resolved quickly enough.)

As an aside – I have subscribed to the atom feeds for the stuff I maintain, I wonder if there’s an atom feed for a particular repo’s failures (and warnings). I would happily subscribe to the one for Hackage so I can be proactive about PRs to fix issues.

I know maintaining something like this (and dealing with the tickets) is a lot of work. I’m curious about this one in particular because I would be inclined to submit a PR myself, but clearly the solution I had in mind is not one that would be accepted.

from repology-updater.

AMDmi3 avatar AMDmi3 commented on September 21, 2024

I understand that Repology would want to be made aware of failures and attempt to correct them, but the logs already contain that data, whether or not the parser actually fails.

Repology wants to provide consistent data, and skipping some packages ruins consistency. Also you cannot expect anyone to look for and examine any logs in the case of missing package which itself would never be noticed.

I see that packaged can be “ignored” in some way – would that be the right way to avoid this?

That has nothing to do with parsing, it affects version comparison.

As an aside – I have subscribed to the atom feeds for the stuff I maintain, I wonder if there’s an atom feed for a particular repo’s failures (and warnings). I would happily subscribe to the one for Hackage so I can be proactive about PRs to fix issues.

There are parsing logs and problems (there are per-maintainer problem view as well) which are somewhat similar yet are not integrated, but neither provide feeds.

With the fix of haskell-works/hw-string-parse#42, the Hackage parser is happy again

Great work, thank you!

from repology-updater.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.