Giter Club home page Giter Club logo

Comments (18)

warrickball avatar warrickball commented on August 23, 2024 2

I've emailed the maintainers of the Fedora package. 📦

from pydeps.

thebjorn avatar thebjorn commented on August 23, 2024

With Py3.11.3 on windows it "works-for-me" :-)

(dev311) go|c:\srv> python -V
Python 3.11.3

(dev311) go|c:\srv> cd tmp

(dev311) go|c:\srv\tmp> mkdir pydeps191

(dev311) go|c:\srv\tmp> cd pydeps191

(dev311) go|c:\srv\tmp\pydeps191> git clone https://github.com/warrickball/tomso.git
Cloning into 'tomso'...
remote: Enumerating objects: 1839, done.
remote: Counting objects: 100% (271/271), done.
remote: Compressing objects: 100% (139/139), done.
remote: Total 1839 (delta 176), reused 197 (delta 123), pack-reused 1568
Receiving objects: 100% (1839/1839), 29.79 MiB | 11.23 MiB/s, done.
Resolving deltas: 100% (1201/1201), done.

(dev311) go|c:\srv\tmp\pydeps191> cd tomso

(dev311) go|c:\srv\tmp\pydeps191\tomso> pydeps tomso
c:\srv\lib\code\pydeps\pydeps\configs.py:108: UserWarning: Couldn't find a [pydeps] section in your config files 'c:\\srv\\tmp\\pydeps191\\tomso\\setup.cfg' -- or it was empty
  warnings.warn(' '.join("""

(dev311) go|c:\srv\tmp\pydeps191\tomso>

tomso

The last lines of the traceback shows:

  File "/home/wball/.local/lib/python3.11/site-packages/pydeps/mf27.py", line 75, in load_module
    co = marshal.load(fp)  # load marshalled code object.
         ^^^^^^^^^^^^^^^^
ValueError: bad marshal data (unknown type code)

which is calling python's standard marshal.load on a .pyc file... could it be that you have a .pyc file generated from a different Python version laying around?

You can likely find the problem-file with the (undocumented) --debug-mf=2 option - be aware that it produces a significant amount of output...

(fwiw, the double-headed arrows indicate circular imports...)

from pydeps.

warrickball avatar warrickball commented on August 23, 2024

Thanks for the quick reply! Here are the last few lines of the debug output before the traceback I posted before:

...
load_module -> Module(name=tty, file='/usr/lib64/python3.11/tty.py', path=None)
    load_module(PKG_DIRECTORY) fqname=pydoc_data, fp=None, pathname=/usr/lib64/python3.11/pydoc_data
        load_package 'pydoc_data' '/usr/lib64/python3.11/pydoc_data'
            load_module(PY_SOURCE) fqname=pydoc_data, fp=fp, pathname=/usr/lib64/python3.11/pydoc_data/__init__.py
        load_module -> Module(name=pydoc_data, file='/usr/lib64/python3.11/pydoc_data/__init__.py', path=['/usr/lib64/python3.11/pydoc_data'])
    load_package -> Module(name=pydoc_data, file='/usr/lib64/python3.11/pydoc_data/__init__.py', path=['/usr/lib64/python3.11/pydoc_data'])
load_module -> Module(name=pydoc_data, file='/usr/lib64/python3.11/pydoc_data/__init__.py', path=['/usr/lib64/python3.11/pydoc_data'])
    load_module(PY_COMPILED) fqname=pydoc_data.topics, fp=fp, pathname=/usr/lib64/python3.11/pydoc_data/topics.pyc
...

Indeed, pydoc_data.topics fails if I try marshal.load, so I'll see if I can follow up on that:

>>> import marshal
>>> f = open('/usr/lib64/python3.11/pydoc_data/topics.pyc', 'rb')
>>> marshal.load(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: bad marshal data (unknown type code)

It's installed as a system library which I might try reinstalling, although it looks like a core Python library (python3-libs) so I'll wait until I don't have other stuff running (e.g. just before I shut down).

(fwiw, the double-headed arrows indicate circular imports...)

Yes, these are within some functions that convert from the objects in one class to the objects in another.

from pydeps.

warrickball avatar warrickball commented on August 23, 2024

I'm trying to reverse engineer the .pyc format but have so far noticed that if I skip 4 or 8 bytes, marshal.load fails, whereas it succeeds (though perhaps not meaningfully) if I skip 12 bytes:

>>> import marshal
>>> f = open('/usr/lib64/python3.11/pydoc_data/topics.pyc', 'rb')
>>> f.read(12)
>>> marshal.load(f)
>>> f.close()

compared to

>>> import marshal
>>> f = open('/usr/lib64/python3.11/pydoc_data/topics.pyc', 'rb')
>>> f.read(8)
>>> marshal.load(f)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: bad marshal data (unknown type code)

from pydeps.

thebjorn avatar thebjorn commented on August 23, 2024

It should be the Python "magic number":

Python 3.11.3 (tags/v3.11.3:f3909b8, Apr  4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib
>>> importlib.util.MAGIC_NUMBER.hex()
'a70d0d0a'

This path /usr/lib64/python3.11/pydoc_data/topics.pyc is curious though, these days Python puts the .pyc files in a __pycache__ directory so that different python versions can live side-by-side.

(dev311) go|C:\Program Files\Python311\Lib\pydoc_data\__pycache__> ll
total 1392
-rw-rw-rw-  1 bjorn 0    163 2023-05-18 13:46 __init__.cpython-311.opt-1.pyc
-rw-rw-rw-  1 bjorn 0    163 2023-05-18 13:46 __init__.cpython-311.opt-2.pyc
-rw-rw-rw-  1 bjorn 0    163 2023-05-18 13:46 __init__.cpython-311.pyc
-rw-rw-rw-  1 bjorn 0 469298 2023-05-18 13:46 topics.cpython-311.opt-1.pyc
-rw-rw-rw-  1 bjorn 0 469298 2023-05-18 13:46 topics.cpython-311.opt-2.pyc
-rw-rw-rw-  1 bjorn 0 469298 2023-05-18 13:46 topics.cpython-311.pyc

(dev311) go|C:\Program Files\Python311\Lib\pydoc_data\__pycache__>

from pydeps.

warrickball avatar warrickball commented on August 23, 2024

This file is in the OS packages and is definitely a bit odd. There is also a __pycache__ subfolder:

$ ls /usr/lib64/python3.11/pydoc_data/__pycache__/
__init__.cpython-311.opt-1.pyc  __init__.cpython-311.opt-2.pyc  __init__.cpython-311.pyc

but it only contains the .pyc files for __init__.py. topics.pyc is only distributed as a .pyc: there is no source topics.py file:

$ find /usr/lib64/python3.11/ -name topics.py
$

It's provided in the python3-libs OS packages, which I tried re-installing but to no avail.

I'll raise this as an issue in the Fedora package. At a glance it looks like the only file provided as a .pyc instead of source file at this level.

from pydeps.

thebjorn avatar thebjorn commented on August 23, 2024

Yes, that sounds like a packaging issue, the topics.py file should definitely be present:

(dev311) go|C:\Program Files\Python311\Lib\pydoc_data> ll
total 764
-rw-rw-rw-  1 bjorn 0      0 2023-04-05 00:04 __init__.py
drw-rw-rw-  2 bjorn 0   4096 2023-05-18 13:46 __pycache__
-rw-rw-rw-  1 bjorn 0   1437 2023-04-05 00:04 _pydoc.css
-rw-rw-rw-  1 bjorn 0 770927 2023-04-05 00:04 topics.py

(dev311) go|C:\Program Files\Python311\Lib\pydoc_data>

topics.py starts with:

# -*- coding: utf-8 -*-
# Autogenerated by Sphinx on Tue Apr  4 23:22:02 2023
...

so maybe a sphinx step was omitted?

from pydeps.

hroncok avatar hroncok commented on August 23, 2024

Hello. I am one of Fedora's Python maintainers.

topics and encodings are shipped as .pyc-only on purpose. The files are generated and we decided to only ship bytecode to save disk space. See https://src.fedoraproject.org/rpms/python3.11/c/740668aab7abe02f47d7a69e800c61b8b5e52f51

We have never encountered problems with that. It is a supported way to ship Python modules. What is the code here trying to do?

from pydeps.

thebjorn avatar thebjorn commented on August 23, 2024

@hroncok The problem seems to be that the .pyc file is not in the correct format (and also not in the expected __pycache__ folder). The code here is trying to read the .pyc file and look for import-opcodes, but it is failing because the magic number in the .pyc file is incorrect for the installed python version.

from pydeps.

hroncok avatar hroncok commented on August 23, 2024

Yes, it's not in the pycache folder, but why would you think the magic number is incorrect? How do I quickly check the number from Python or shell to see if that's the case?

from pydeps.

hroncok avatar hroncok commented on August 23, 2024

I see the comments above. Will debug the headers.

However note that I just got back from EuroPython and I am taking some time off computers.

from pydeps.

thebjorn avatar thebjorn commented on August 23, 2024

@hroncok No worries, I'm on summer vacation myself :-)

from pydeps.

hroncok avatar hroncok commented on August 23, 2024
>>> import pathlib, struct, marshal
>>> pyc1 = pathlib.Path('/usr/lib64/python3.11/encodings/cp1250.pyc')
>>> pyc2 = pathlib.Path('/usr/lib64/python3.11/encodings/__pycache__/cp1125.cpython-311.pyc')
>>> bytes1 = pyc1.read_bytes()
>>> bytes2 = pyc2.read_bytes()
>>> bytes1[:4]
b'\xa7\r\r\n'
>>> bytes2[:4]
b'\xa7\r\r\n'
>>> struct.unpack("<H2B", bytes1[:4])
(3495, 13, 10)

3495 is Python 3.11b4+, see importlib/_bootstrap_external.py

PEP 552 says:

The pyc header currently consists of 3 32-bit words. We will expand it to 4.

That's 16 bytes for Python 3.7+:

>>> marshal.loads(bytes1[16:])
<code object <module> at 0x7fb1a73f9a70, file "/usr/lib64/python3.11/encodings/cp1250.py", line 1>
>>> marshal.loads(bytes2[16:])
<code object <module> at 0x558a998382f0, file "/usr/lib64/python3.11/encodings/cp1125.py", line 1>

This is consistent with files in __pycache__ and with the specification. It also explains why when skipping 4 or 8 bytes, marshal.load fails, whereas it succeeds if we skip 16 bytes (as we should).

What seems to be the problem here?

from pydeps.

hroncok avatar hroncok commented on August 23, 2024

I believe this comment is outdated:

pydeps/pydeps/mf27.py

Lines 66 to 69 in 3c1c40b

# a .pyc file is a binary file containing only three things:
# 1. a four-byte magic number
# 2. a four byte modification timestamp, and
# 3. a Marshalled code object

The number of bytes at point 2 depends on the Python version.

  • 16-4==12 on Python 3.7+
  • 12-4==8 on Python 3.3-3.6
  • 8-4==4 on older Pythons

The number is hardcocded here:

fp.read(4) # skip modification timestamp

If you care only for Pythons that are not yet end of life, changing this number to 12 should do the trick.

from pydeps.

thebjorn avatar thebjorn commented on August 23, 2024

Looks like your analysis is correct (I'm still on vacation, so I haven't investigated why there isn't a problem on windows...).

The native modulefinder calls importlib._bootstrap_external._classify_pyc(data, fqname, {}), but that seems to be a very private api.

23e0a17 is a (WIP) version that accounts for the different sizes. How to test it isn't immediately obvious to my vacation brain, but I'm sure I'll figure it out soon ;-)

from pydeps.

vkottler avatar vkottler commented on August 23, 2024

I'm running into this same problem using CentOS Stream 9 with python3.11 (which is some build of Python 3.11.4).

I don't have this issue for personal projects that run in GitHub CI though (on ubuntu-latest + macos-latest + windows-latest on Python minor versions 8-11).

Seems to corroborate that this is some RHEL thing.

(edit: I'm not caught up with the details on the thread, seems like there's a "why", just noting a +1 to problem bisection)

from pydeps.

hroncok avatar hroncok commented on August 23, 2024

Yes, CentOS has the same pyc files.

from pydeps.

thebjorn avatar thebjorn commented on August 23, 2024

v1.12.17 should have fixed .pyc header parsing code.

from pydeps.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.