Comments (11)
With the fix of haskell-works/hw-string-parse#42, the Hackage parser is happy again. But I’m still interested in making it less fragile.
from repology-updater.
Yes, parsing is broken: https://repology.org/log/3820044
Someone needs to write a proper parser for cabal format.
from repology-updater.
Thanks for your reply.
a proper parser for cabal format.
That sounds challenging :). It's a weird format.
I am seeing a Python implementation here, but I think it would be too much work to replicate the entire format with it. Do you think there's a way we can reuse existing Cabal implementation on Haskell side? Something akin to building a cabal-to-json
executable with Haskell, and calling it from Python side?
from repology-updater.
Something akin to building a cabal-to-json executable with Haskell, and calling it from Python side?
I strongly prefer to avoid that.
That sounds challenging :). It's a weird format.
That shouldn't be too challenging as long as there's a spec/PEG somehere. I wrote that parser on a whim and assumed that format is indentation based, while it turns out not to be and we're currently failing on that. So is there a spec?
from repology-updater.
I wrote that parser on a whim and assumed that format is indentation based, while it turns out not to be and we're currently failing on that. So is there a spec?
I don’t think there is a spec, unfortunately. But the format is indentation-based.
However, I think the specific parsing failures are secondary. The parser seems to mostly work, with some (or at least one) packages hitting edge cases. Failing to parse a single cabal file shouldn’t be a fatal failure of the Hackage parser – it should be able to continue and have a successful run, with some packages left un-updated.
Perhaps a failure threshold (possibly shared across all parsers) would be worthwhile – if a certain percentage of packages are failing to parse, give up because likely some format changed and the entire parser needs a real update.
from repology-updater.
Since the previously posted log has expired (and I’m not sure it failed in the same way), here’s the latest one: https://repology.org/log/4428851. The tail of that file:
2024-04-23 08:10:54 AspectAG/0.7.0.1/AspectAG.cabal: ERROR: link: "www.fing.edu.uy/~jpgarcia/AspectAG" does not look like an URL (schema missing)
2024-04-23 08:10:58 exception-hierarchy/0.1.0.11/exception-hierarchy.cabal: ERROR: link: "yet" does not look like an URL (schema missing)
2024-04-23 08:10:58 hw-string-parse/0.0.0.5/hw-string-parse.cabal: ERROR: parsing failed (fatal): KeyError: 'name'
2024-04-23 08:10:58 ERROR: KeyError: 'name'
There seem to be a number of non-fatal errors, where the Hackage parser continues happily to the next Cabal file, but then the failure of hw-string-parse escapes the normal failure handling and causes the entire Hackage parse to fail. Unfortunately, my Python is quite weak, so it’s not immediately obvious to me where or how to catch the failure that’s escaping.
from repology-updater.
It fails on the last mentioned cabal file:
cabal-version: 2.2
name: hw-string-parse
version: 0.0.0.5
x-revision: 2
synopsis: String parser
description: Please see README.md
category: Data, Bit
stability: Experimental
homepage: http://github.com/haskell-works/hw-string-parse#readme
bug-reports: https://github.com/haskell-works/hw-string-parse/issues
author: John Ky
maintainer: [email protected]
copyright: 2016-2021 John Ky
license: BSD-3-Clause
license-file: LICENSE
tested-with: GHC == 9.2.2, GHC == 9.0.2, GHC == 8.10.7, GHC == 8.8.4, GHC == 8.6.5
build-type: Simple
extra-source-files: README.md
The parser assumes indentation based format, while the indentation is broken.
from repology-updater.
Yeah, that looks like a bug in the Cabal file. The Cabal docs don’t allow whitespace before cabal-version
. I’m not sure what Cabal itself does with that.
- Discards the field as part of a missing stazna and parses the file as Cabal 1.1 (or whatever version was before
cabal-version
was required)? - Parses a bit more liberally than the ABNF allows, and reads the intended
cabal-version
?
In either case, that doesn’t look like an issue with Repology’s Hackage parser, so then the only issue here is that the failure of that Cabal file isn’t contained, but leads to the failure of the entire Hackage parser.
from repology-updater.
issue here is that the failure of that Cabal file isn’t contained, but leads to the failure of the entire Hackage parser.
This behavior is intentional.
from repology-updater.
This behavior is intentional.
Why is it intentional? It seems like allowing a single broken package to prevent the other 17k packages from being updated is a bit unbalanced.
I understand that Repology would want to be made aware of failures and attempt to correct them, but the logs already contain that data, whether or not the parser actually fails. I see that packaged can be “ignored” in some way – would that be the right way to avoid this? (Just asking out of curiosity – I opened an issue against hw-string-parse (linked above), so I’m hoping this particular case can be resolved quickly enough.)
As an aside – I have subscribed to the atom feeds for the stuff I maintain, I wonder if there’s an atom feed for a particular repo’s failures (and warnings). I would happily subscribe to the one for Hackage so I can be proactive about PRs to fix issues.
I know maintaining something like this (and dealing with the tickets) is a lot of work. I’m curious about this one in particular because I would be inclined to submit a PR myself, but clearly the solution I had in mind is not one that would be accepted.
from repology-updater.
I understand that Repology would want to be made aware of failures and attempt to correct them, but the logs already contain that data, whether or not the parser actually fails.
Repology wants to provide consistent data, and skipping some packages ruins consistency. Also you cannot expect anyone to look for and examine any logs in the case of missing package which itself would never be noticed.
I see that packaged can be “ignored” in some way – would that be the right way to avoid this?
That has nothing to do with parsing, it affects version comparison.
As an aside – I have subscribed to the atom feeds for the stuff I maintain, I wonder if there’s an atom feed for a particular repo’s failures (and warnings). I would happily subscribe to the one for Hackage so I can be proactive about PRs to fix issues.
There are parsing logs and problems (there are per-maintainer problem view as well) which are somewhat similar yet are not integrated, but neither provide feeds.
With the fix of haskell-works/hw-string-parse#42, the Hackage parser is happy again
Great work, thank you!
from repology-updater.
Related Issues (20)
- Add Eclipse Temurin repository for for alpine (apk packages) HOT 3
- Request to update existing ibmi repository
- [REQUEST]: Bulk requests for Gentoo's packages.gentoo.org HOT 3
- The python:diffusers package wasn't imported from PyPI since November: PyPI has the version 0.24.0 since November, but Repology shows it as 0.23.1 HOT 1
- Add serpent os
- PureOS package links HOT 1
- openmamba: new git based sources repository HOT 2
- openSUSE: missing version variable in opensuse.yaml
- Allow multiple branches
- "Parabola has caught up with the newest version" spam when Parabola has different versions for different arches HOT 2
- ibmi: false positives for dead homepage links? HOT 2
- WinGet Integration HOT 11
- CRUX not being updated HOT 2
- OpenMandriva 5.0
- `Package recipe` links point to dead URLs in homebrew
- Abandonware HOT 5
- add msys2/ucrt64 and msys2/clang64 repositories
- My [bot] was banned from repology
- NixOS packages not showing HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from repology-updater.