Giter Club home page Giter Club logo

rubymarshal's People

Contributors

d9pouces avatar timpaquatte avatar userunknownfactor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

rubymarshal's Issues

Regression and/or documentation lackage: cannot parse version from rubygems index

Here's a simple program to parse rubygems index I've successfully used with rubymarshal 1.0.3:

#!/usr/bin/env python3
  
import gzip
import requests
import rubymarshal.reader

data = requests.get('https://api.rubygems.org/latest_specs.4.8.gz').content
data = gzip.decompress(data)

for (name, ver, gemplat), _ in zip(rubymarshal.reader.loads(data), range(10)):
    print(name, ver, gemplat)

It's output with 1.0.3:

_ UsrMarshal:Gem::Version(['1.4']) ruby
- UsrMarshal:Gem::Version(['1']) b'ruby'
0mq UsrMarshal:Gem::Version(['0.5.3']) b'ruby'
0xdm5 UsrMarshal:Gem::Version(['0.1.0']) b'ruby'
0xffffff UsrMarshal:Gem::Version(['0.1.0']) b'ruby'
10to1-crack UsrMarshal:Gem::Version(['0.1.3']) b'ruby'
1234567890_ UsrMarshal:Gem::Version(['1.2']) b'ruby'
12_hour_time UsrMarshal:Gem::Version(['0.0.4']) b'ruby'
16watts-fluently UsrMarshal:Gem::Version(['0.3.1']) b'ruby'
189seg UsrMarshal:Gem::Version(['0.0.1']) b'ruby'

With 1.2.6 it looks like this

_ UsrMarshal({}) ruby
- UsrMarshal({}) ruby
0mq UsrMarshal({}) ruby
0xdm5 UsrMarshal({}) ruby
0xffffff UsrMarshal({}) ruby
10to1-crack UsrMarshal({}) ruby
1234567890_ UsrMarshal({}) ruby
12_hour_time UsrMarshal({}) ruby
16watts-fluently UsrMarshal({}) ruby
189seg UsrMarshal({}) ruby

Nice thing is that unicode problem has gone, but bad thing is that custom object is no longer parsed.

At the very least, this requires major version bump.

Next, the documentation is not clean or wrong on how this can be parsed now. Changing it the way an example suggests:

#!/usr/bin/env python3
  
import gzip
import requests
import rubymarshal.reader
from rubymarshal.classes import RubyObject, registry


data = requests.get('https://api.rubygems.org/latest_specs.4.8.gz').content
data = gzip.decompress(data)

class GemVersion(RubyObject):
    ruby_class_name = "Gem::Version"

registry.register(GemVersion)

for (name, ver, gemplat), _ in zip(rubymarshal.reader.loads(data), range(10)):
    print(name, ver, gemplat)

doesn't change a thing.

In fact, this cannot work (at least with this data file), because ClassRegistry uses class names in form of strs, but class name is read by Reader.read as Symbol("Gem::Version"), which is hashed differently, so self.registry.get(class_name, UsrMarshal) always returns UsrMarshal.

I've solved this by using ver.marshal_dump() instead, but I don't think it's correct solution.

Tags missing

This repository misses all version tags. These are crucial for determining which code belongs to which version and what has changes since the last version.

Unrecognized token

I tried rubymarshal with a specific data file and it looks it couldn't load a token inside the data. (Ruby loads the file with no issue and ruby 2.1+ was the thing that encoded the file).

The error log

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "E:\nuriy\appdata\Local\Programs\Python3.6\lib\site-packages\rubymarshal\reader.py", line 192, in load
    return loader.read()
  File "E:\nuriy\appdata\Local\Programs\Python3.6\lib\site-packages\rubymarshal\reader.py", line 78, in read
    result = [self.read() for x in range(num_elements)]
  File "E:\nuriy\appdata\Local\Programs\Python3.6\lib\site-packages\rubymarshal\reader.py", line 78, in <listcomp>
    result = [self.read() for x in range(num_elements)]
  File "E:\nuriy\appdata\Local\Programs\Python3.6\lib\site-packages\rubymarshal\reader.py", line 129, in read
    data = self.read()
  File "E:\nuriy\appdata\Local\Programs\Python3.6\lib\site-packages\rubymarshal\reader.py", line 137, in read
    raise ValueError('token %s is not recognized' % token)
ValueError: token b'!' is not recognized

PokemonData - copie.zip
The file is inside the zip since Github doesn't accept uncommon extensions.

Ruby allows Arrays to be used as Hash key, Python doesn't

I ran into an error while reading ruby data:

Traceback (most recent call last):
  File "….py", line 5, in <module>
    d = rubymarshal.reader.load(f)
  File "…/.local/lib/python3.8/site-packages/rubymarshal/reader.py", line 289, in load
    return loader.read()
  File "…/.local/lib/python3.8/site-packages/rubymarshal/reader.py", line 197, in read
    attributes = self.read_attributes()
  File "…/.local/lib/python3.8/site-packages/rubymarshal/reader.py", line 233, in read_attributes
    attr_value = self.read()
  File "…/.local/lib/python3.8/site-packages/rubymarshal/reader.py", line 125, in read
    result[key] = value
TypeError: unhashable type: 'list'

Ruby doesn't (IIRC) have a different concept of tuples and arrays. I hacked around it by adding

                if isinstance(key, list): # Workaround for non-hashable lists used as key
                    key = tuple(key)

to the reading code for TYPE_HASH. It's a bit ad-hoc but I'm not sure what a consistent fix would be.

Include tests into source distribution

Since tests are missing from source tarball distributed from PyPI, there's no way to use it for packaging the module and running tests. Please include tests into the distribution.

Object id/link tracking is not correct

I've been trying to deserialize some Ruby data with links in it, and it seems the object tracking is not correct. I first noticed that the following list is missing "TYPE_USERDEF":

        if token in (
            TYPE_IVAR,
            # TYPE_EXTENDED, TYPE_UCLASS, ????
            TYPE_CLASS,
            TYPE_MODULE,
            TYPE_FLOAT,
            TYPE_BIGNUM,
            TYPE_REGEXP,
            TYPE_ARRAY,
            TYPE_HASH,
            TYPE_STRUCT,
            TYPE_OBJECT,
            TYPE_DATA,
            TYPE_USRMARSHAL,
        ):

But that was not enough. I checked the ruby source. In the Ruby source file, look for r_entry and r_prepare/r_entry0 pairs in the r_object0 function. The following types are tracked:

TYPE_FLOAT
TYPE_BIGNUM
TYPE_STRING
TYPE_REGEXP
TYPE_ARRAY
TYPE_HASH
TYPE_HASH_DEF
TYPE_STRUCT
TYPE_USERDEF
TYPE_USRMARSHAL
TYPE_OBJECT
TYPE_DATA
TYPE_MODULE_OLD
TYPE_CLASS
TYPE_MODULE

So far so good, the missing ones would be easy to add, and it seems TYPE_IVAR needs to be removed.

However; then I noticed a different inconsistency. The internal objects are not tracked in many cases. For example the string read inside TYPE_USERDEF is read using r_string directly and not r_object, so it is not added to the index table. In the python implementation, it is.

So it is some more work to exactly mimic Ruby here than I first expected.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.