Comments (5)
Quick test script to run a lookup 1000 times to compare speed differences (will vary by computer, but can always test against self to show differences)
start=$( date +"%s.%N" )
for _ in $(seq 1 1000);
do
python3 -m puremagic test/resources/media/test.iso > /dev/null
done
end=$( date +"%s.%N" )
python3 -c "print(${end} - ${start})"
from puremagic.
Tested the difference between using named tuples and classes with slots for the PureMagic
internal structure.
class PureMagic:
__slots__ = ["byte_match", "offset", "extension", "mime_type", "name"]
def __init__(self, byte_match, offset, extension, mime_type, name):
self.byte_match = byte_match
self.offset = offset
self.extension = extension
self.mime_type = mime_type
self.name = name
def _asdict(self):
return {
"byte_match": self.byte_match,
"offset": self.offset,
"extension": self.extension,
"mime_type": self.mime_type,
"name": self.name,
}
class PureMagicWithConfidence(PureMagic):
__slots__ = ["name", "confidence"]
def __init__(self, byte_match, offset, extension, mime_type, name, confidence):
super().__init__(byte_match, offset, extension, mime_type, name)
self.name = name
self.confidence = confidence
vs current
PureMagic = namedtuple(
"PureMagic",
(
"byte_match",
"offset",
"extension",
"mime_type",
"name",
),
)
PureMagicWithConfidence = namedtuple(
"PureMagicWithConfidence",
(
"byte_match",
"offset",
"extension",
"mime_type",
"name",
"confidence",
),
)
named tuples still win. 42.329
seconds vs 43.922
for the classes
from puremagic.
I think speedwise that it seems much the muchness, modern CPU's are fast enough that there's little difference to be made.
On low power hardware there might be a more measurable difference. Say on a Pi or low-end x86 system where the sheer horse power is lacking.
I was worried when I suggested Multi-Match or Regex searches that we would see a noticeable increase in search times. However, on my main desktop whatever difference there is, is negligible at worst.
Would/could multi-threading the searches be another way to speed up matching. Once the data is in memory everyone can have a go at identifying it and add to the results pool. This may benefit lower spec systems by utilising their cores rather than sheer horsepower.
from puremagic.
A thought I just had, would switching to a monolithic file cause issues of its own once it grows beyond a certain point? Both from a code maintenance and physical size standpoints?
from puremagic.
Almost all the time in the benchmark #71 (comment) above is in restarting Python over and over again.
Once Python is launched, performance is quite quick. See 0.6 sec for 74 string and file tests:
% python -m pytest --cov=puremagic test/
============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-8.2.0, pluggy-1.5.0
rootdir: /home/runner/work/puremagic/puremagic
plugins: cov-5.0.0
collected 74 items
test/test_common_extensions.py ..................... [ 28%]
test/test_main.py ..................................................... [100%]
---------- coverage: platform linux, python 3.12.3-final-0 -----------
Name Stmts Miss Cover
-------------------------------------------
puremagic/__init__.py 2 0 100%
puremagic/__main__.py 0 0 100%
puremagic/main.py 167 0 100%
-------------------------------------------
TOTAL 169 0 100%
============================== 74 passed in 0.60s ==============================
from puremagic.
Related Issues (20)
- Price-matching other repos for more file support HOT 1
- mimetype from stream HOT 1
- Confidence/Selection logic question HOT 2
- Remove unsupported Python stuff HOT 1
- same (mp3) file, different name ... different output: mp3 versus koz HOT 7
- Is it possible to use filehandles / bytestream? HOT 2
- SVG images not recogniced HOT 1
- missing mime type for webp HOT 1
- Webp image mime type is empty HOT 2
- .epub listed as "INI Config file" in magic_data.json HOT 1
- How to handle two sets of bytes for matching improvements? HOT 4
- Adding JPEG-XL Support HOT 2
- JPEG XS Two mime types HOT 4
- Multi-part checks with negative offset for second match HOT 1
- EncodingWarning when PYTHONWARNDEFAULTENCODING HOT 2
- imghdr matches in PureMagic? HOT 4
- Variant field in magic.json?
- Version 2.0 Goals HOT 2
- For Python 3.13: A drop-in replacement for `imghdr.what()` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from puremagic.