d9pouces / rubymarshal Goto Github PK
View Code? Open in Web Editor NEWread and write serialized data from the Ruby Marshal library
License: Do What The F*ck You Want To Public License
read and write serialized data from the Ruby Marshal library
License: Do What The F*ck You Want To Public License
Here's a simple program to parse rubygems index I've successfully used with rubymarshal 1.0.3:
#!/usr/bin/env python3
import gzip
import requests
import rubymarshal.reader
data = requests.get('https://api.rubygems.org/latest_specs.4.8.gz').content
data = gzip.decompress(data)
for (name, ver, gemplat), _ in zip(rubymarshal.reader.loads(data), range(10)):
print(name, ver, gemplat)
It's output with 1.0.3:
_ UsrMarshal:Gem::Version(['1.4']) ruby
- UsrMarshal:Gem::Version(['1']) b'ruby'
0mq UsrMarshal:Gem::Version(['0.5.3']) b'ruby'
0xdm5 UsrMarshal:Gem::Version(['0.1.0']) b'ruby'
0xffffff UsrMarshal:Gem::Version(['0.1.0']) b'ruby'
10to1-crack UsrMarshal:Gem::Version(['0.1.3']) b'ruby'
1234567890_ UsrMarshal:Gem::Version(['1.2']) b'ruby'
12_hour_time UsrMarshal:Gem::Version(['0.0.4']) b'ruby'
16watts-fluently UsrMarshal:Gem::Version(['0.3.1']) b'ruby'
189seg UsrMarshal:Gem::Version(['0.0.1']) b'ruby'
With 1.2.6 it looks like this
_ UsrMarshal({}) ruby
- UsrMarshal({}) ruby
0mq UsrMarshal({}) ruby
0xdm5 UsrMarshal({}) ruby
0xffffff UsrMarshal({}) ruby
10to1-crack UsrMarshal({}) ruby
1234567890_ UsrMarshal({}) ruby
12_hour_time UsrMarshal({}) ruby
16watts-fluently UsrMarshal({}) ruby
189seg UsrMarshal({}) ruby
Nice thing is that unicode problem has gone, but bad thing is that custom object is no longer parsed.
At the very least, this requires major version bump.
Next, the documentation is not clean or wrong on how this can be parsed now. Changing it the way an example suggests:
#!/usr/bin/env python3
import gzip
import requests
import rubymarshal.reader
from rubymarshal.classes import RubyObject, registry
data = requests.get('https://api.rubygems.org/latest_specs.4.8.gz').content
data = gzip.decompress(data)
class GemVersion(RubyObject):
ruby_class_name = "Gem::Version"
registry.register(GemVersion)
for (name, ver, gemplat), _ in zip(rubymarshal.reader.loads(data), range(10)):
print(name, ver, gemplat)
doesn't change a thing.
In fact, this cannot work (at least with this data file), because ClassRegistry
uses class names in form of str
s, but class name is read by Reader.read
as Symbol("Gem::Version")
, which is hashed differently, so self.registry.get(class_name, UsrMarshal)
always returns UsrMarshal
.
I've solved this by using ver.marshal_dump()
instead, but I don't think it's correct solution.
This repository misses all version tags. These are crucial for determining which code belongs to which version and what has changes since the last version.
I tried rubymarshal with a specific data file and it looks it couldn't load a token inside the data. (Ruby loads the file with no issue and ruby 2.1+ was the thing that encoded the file).
The error log
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "E:\nuriy\appdata\Local\Programs\Python3.6\lib\site-packages\rubymarshal\reader.py", line 192, in load
return loader.read()
File "E:\nuriy\appdata\Local\Programs\Python3.6\lib\site-packages\rubymarshal\reader.py", line 78, in read
result = [self.read() for x in range(num_elements)]
File "E:\nuriy\appdata\Local\Programs\Python3.6\lib\site-packages\rubymarshal\reader.py", line 78, in <listcomp>
result = [self.read() for x in range(num_elements)]
File "E:\nuriy\appdata\Local\Programs\Python3.6\lib\site-packages\rubymarshal\reader.py", line 129, in read
data = self.read()
File "E:\nuriy\appdata\Local\Programs\Python3.6\lib\site-packages\rubymarshal\reader.py", line 137, in read
raise ValueError('token %s is not recognized' % token)
ValueError: token b'!' is not recognized
PokemonData - copie.zip
The file is inside the zip since Github doesn't accept uncommon extensions.
I ran into an error while reading ruby data:
Traceback (most recent call last):
File "….py", line 5, in <module>
d = rubymarshal.reader.load(f)
File "…/.local/lib/python3.8/site-packages/rubymarshal/reader.py", line 289, in load
return loader.read()
File "…/.local/lib/python3.8/site-packages/rubymarshal/reader.py", line 197, in read
attributes = self.read_attributes()
File "…/.local/lib/python3.8/site-packages/rubymarshal/reader.py", line 233, in read_attributes
attr_value = self.read()
File "…/.local/lib/python3.8/site-packages/rubymarshal/reader.py", line 125, in read
result[key] = value
TypeError: unhashable type: 'list'
Ruby doesn't (IIRC) have a different concept of tuples and arrays. I hacked around it by adding
if isinstance(key, list): # Workaround for non-hashable lists used as key
key = tuple(key)
to the reading code for TYPE_HASH
. It's a bit ad-hoc but I'm not sure what a consistent fix would be.
Since tests are missing from source tarball distributed from PyPI, there's no way to use it for packaging the module and running tests. Please include tests into the distribution.
@d9pouces could there be a simpler license you could use for your work?
I've been trying to deserialize some Ruby data with links in it, and it seems the object tracking is not correct. I first noticed that the following list is missing "TYPE_USERDEF":
if token in (
TYPE_IVAR,
# TYPE_EXTENDED, TYPE_UCLASS, ????
TYPE_CLASS,
TYPE_MODULE,
TYPE_FLOAT,
TYPE_BIGNUM,
TYPE_REGEXP,
TYPE_ARRAY,
TYPE_HASH,
TYPE_STRUCT,
TYPE_OBJECT,
TYPE_DATA,
TYPE_USRMARSHAL,
):
But that was not enough. I checked the ruby source. In the Ruby source file, look for r_entry
and r_prepare
/r_entry0
pairs in the r_object0
function. The following types are tracked:
TYPE_FLOAT
TYPE_BIGNUM
TYPE_STRING
TYPE_REGEXP
TYPE_ARRAY
TYPE_HASH
TYPE_HASH_DEF
TYPE_STRUCT
TYPE_USERDEF
TYPE_USRMARSHAL
TYPE_OBJECT
TYPE_DATA
TYPE_MODULE_OLD
TYPE_CLASS
TYPE_MODULE
So far so good, the missing ones would be easy to add, and it seems TYPE_IVAR
needs to be removed.
However; then I noticed a different inconsistency. The internal objects are not tracked in many cases. For example the string read inside TYPE_USERDEF
is read using r_string
directly and not r_object
, so it is not added to the index table. In the python implementation, it is.
So it is some more work to exactly mimic Ruby here than I first expected.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.