Giter Club home page Giter Club logo

mini_mime's Introduction

MiniMime

Minimal mime type implementation for use with the mail and rest-client gem.

Installation

Add this line to your application's Gemfile:

gem 'mini_mime'

And then execute:

$ bundle

Or install it yourself as:

$ gem install mini_mime

Usage

require 'mini_mime'

MiniMime.lookup_by_filename("a.txt").content_type
# => "text/plain"

MiniMime.lookup_by_extension("txt").content_type
# => "text/plain"

MiniMime.lookup_by_content_type("text/plain").extension
# => "txt"

MiniMime.lookup_by_content_type("text/plain").binary?
# => false

Configuration

If you'd like to add your own mime types, try using custom database files:

MiniMime::Configuration.ext_db_path = "path_to_file_extension_db"
MiniMime::Configuration.content_type_db_path = "path_to_content_type_db"

Check out the default databases for proper formatting and structure hints.

Performance

MiniMime is optimised to minimize memory usage. It keeps a cache of 100 mime type lookups (and 100 misses). There are benchmarks in the bench directory

Memory stats for requiring mime/types/columnar
Total allocated: 8712144 bytes (98242 objects)
Total retained:  3372545 bytes (33599 objects)

Memory stats for requiring mini_mime
Total allocated: 42625 bytes (369 objects)
Total retained:  8992 bytes (72 objects)
Warming up --------------------------------------
cached content_type lookup MiniMime
                        85.109k i/100ms
content_type lookup MIME::Types
                        17.879k i/100ms
Calculating -------------------------------------
cached content_type lookup MiniMime
                          1.105M (± 4.1%) i/s -      5.532M in   5.014895s
content_type lookup MIME::Types
                        193.528k (± 7.1%) i/s -    965.466k in   5.013925s
Warming up --------------------------------------
uncached content_type lookup MiniMime
                         1.410k i/100ms
content_type lookup MIME::Types
                        18.012k i/100ms
Calculating -------------------------------------
uncached content_type lookup MiniMime
                         14.689k (± 4.2%) i/s -     73.320k in   5.000779s
content_type lookup MIME::Types
                        193.459k (± 6.9%) i/s -    972.648k in   5.050731s

As a general guideline, cached lookups are 6x faster than MIME::Types equivalent. Uncached lookups are 10x slower.

Note: It was run on macOS 10.14.2, and versions of Ruby and gems are below.

  • Ruby 2.6.0
  • mini_mime (1.0.1)
  • mime-types (3.2.2)
  • mime-types-data (3.2018.0812)

Development

MiniMime uses the officially maintained list of mime types at mime-types-data repo to build the internal database.

To update the database run:

bundle exec rake rebuild_db

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/discourse/mini_mime. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

mini_mime's People

Contributors

actions-user avatar ahorek avatar aqualon avatar byroot avatar coorasse avatar cvx avatar davidtaylorhq avatar esparta avatar fryguy avatar github-actions[bot] avatar gmcgibbon avatar gogainda avatar ikaronen-relex avatar janko avatar jeremy avatar odlp avatar ohbarye avatar osamasayegh avatar petergoldstein avatar radar avatar samsaffron avatar tgxworld avatar timcraft avatar ybiquitous avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mini_mime's Issues

How about updating benchmark report?

Hello maintainers. This is a kind of suggestion.

I was just curious about the performance report in README (seems it's written 2 years ago) would be still appropriate for the latest versions. So I tried running the benchmark script with the following versions.

Performance Test

  • mini_mime (master)
  • mime-types (3.2.2)
  • mime-types-data (3.2018.0812)
  • Ruby 2.6.0
Memory stats for requiring mime/types/columnar
Total allocated: 8686910 bytes (102917 objects)
Total retained:  3156016 bytes (33593 objects)

Memory stats for requiring mini_mime
Total allocated: 41064 bytes (362 objects)
Total retained:  7156 bytes (60 objects)
Warming up --------------------------------------
cached content_type lookup MiniMime
                        72.481k i/100ms
content_type lookup MIME::Types
                        13.284k i/100ms
Calculating -------------------------------------
cached content_type lookup MiniMime
                        914.838k (± 1.3%) i/s -      4.639M in   5.071456s
content_type lookup MIME::Types
                        140.215k (± 3.4%) i/s -    704.052k in   5.026273s
Warming up --------------------------------------
uncached content_type lookup MiniMime
                         1.329k i/100ms
content_type lookup MIME::Types
                        13.225k i/100ms
Calculating -------------------------------------
uncached content_type lookup MiniMime
                         13.338k (± 1.7%) i/s -     67.779k in   5.083373s
content_type lookup MIME::Types
                        139.626k (± 4.2%) i/s -    700.925k in   5.027074s

You wrote like below in the README,

As a general guideline, cached lookups are 2x faster than MIME::Types equivalent. Uncached lookups are 10x slower.

but now that cached lookups seems 6x faster, how about updating the report? I'm okay to update the README by myself with sending a pull request if you're fine.

Need clarification on data design / filtering

I’m trying to understand the decisions made when building the content_type_mime.db file, because I’m embarking on a fairly major upgrade to the data in mime-types-data (https://github.com/mime-types/mime-types-data/tree/priority-extensions) where I’ve made it possible to specify relative extension priorities.

As far as I can tell, the ext_mime.db file looks to be mostly OK. It’s helping me find some issues (font/ttf should be the highest priority result for .ttf, which it isn’t).

However, it looks like the prioritization code I have put in place will break content_type_mime.db in unpleasant ways.

I’m using my translation of the conversion code from your own Rakefile (also in the same tree), but here’s some of the odder diffs:

+bin         application/java-vm                                                       base64          
+txt         application/winhlp                                                        base64          
+t           application/x-troff-man                                                   8bit            
+t           application/x-troff-me                                                    base64          
+t           application/x-troff-ms                                                    base64          

The first should be class, but the code at https://github.com/discourse/mini_mime/blob/main/Rakefile#L77-L80 forces the first extension from application/octet-stream (because application/java-vm no longer sorts first for class).

There are a couple of other examples like this, but I’m trying to figure out what the intent for this particular file is in order to provide (a) recommendations for updates to your own code and (b) convert it correctly myself.

Wrong lookup_by_content_type extension?

According to README, it says:

MiniMime.lookup_by_content_type("text/plain").extension
# => "txt"

But I tried it on my irb, it returns different string.

$ irb
> require 'mini_mime'
=> true

> MiniMime.lookup_by_filename("a.txt")
=> #<MiniMime::Info:0x007fa692a31cc8 @extension="txt", @content_type="text/plain", @encoding="quoted-printable">

> MiniMime.lookup_by_content_type("text/plain")
=> #<MiniMime::Info:0x007fa69383d178 @extension="c", @content_type="text/plain", @encoding="quoted-printable">

> MiniMime.lookup_by_content_type("text/plain").extension
=> "c"

I expected "txt" returned, but it doesn't.

Shim of the mime-types gem

In ManageIQ, we've found a way to "stub" out any mime-types gem references, and redirect through to mini_mime. This gives us the savings from mini_mime, even for gems that don't yet directly use it (in our case, rest-client, mail, and capybara).
The PR that does this is in our app is here: https://github.com/ManageIQ/manageiq/pull/14525/files . Essentially what we've done is create a "fake" mime-types gem, then in our Gemfile point to the fake .gemspec file. The "fake" mime-types gem is basically a thin wrapper around mini_mime with the interface of mime_types.

Is this something you'd be interested in for the mini_mime gem? I know we can't actually publish the fake mime-types gem, but my thought is that it's possible to put the .gemspec and the fake class in mini_mime, and then give instructions in the README on how a user might go about changing their Gemfile to point to the .gemspec file provided. Something roughly like:


├── lib
│   ├── mime-types-redirector
│   │   ├── lib
│   │   │   ├── mime
│   │   │   │   └── types.rb
│   │   │   └── mime-types.rb
│   │   └── mime-types.gemspec
│   ├── mini_mime
│   │   └── version.rb
│   └── mini_mime.rb

README

If you would like all references to mime-types to instead be redirected though mini_mime, include the following in your Gemfile:

gem "mini_mime"
gem "mime-types", :path => File.expand_path("lib/mime-types-redirector", Gem::Specification.find_by_name("mini_mime").gem_dir)

webm and wmv miscategorized as video instead of audio

Redmine uses mini_mime to determine whether an uploaded media file is an audio or a video file to decide whether to use a <audio> or <video> tag in the HTML.

Unfortunately, both *.webm and *.wmv files are "miscategorized" as audio by mini_mime:

https://github.com/discourse/mini_mime/blob/master/lib/db/ext_mime.db#L1048

webm        audio/webm                                                                base64          

https://github.com/discourse/mini_mime/blob/master/lib/db/ext_mime.db#L1064

wmv         audio/x-ms-wmv                                                            base64          

Though this isn't strictly wrong (webm is a container format for both audio and video; wmv is intended for video, but of course also can contain audio), it is IMO not what one would expect. In this case it's causing Redmine to attempt to play these kinds of video files with an <audio> tag, which forces users to download the files to actually see the video: https://www.redmine.org/issues/31553

For webm, this has been overridden in Redmine, but today I've stumbled over the same issue with wmv. I thought that MIME::Types solved this pretty elegantly by returning an array of matching MIME types, but apparently that library suffers from memory usage issues which led to Redmine replacing it with mini_mime.

I'm not sure what the best solution would be here. I'm pretty new to Ruby and all of these libraries, but from what I can gather, mini_mime uses MIME::Types internally to generate a list of extension -> MIME types mappings, and lets priority_compare decide which MIME type will win when there are multiple. But there is not real concept of "priority" in MIME::Types in the sense of that e.g. video/webm can have a higher priority than audio/webm. priority_compare essentially just simplifies the MIME types, and then compares them alphabetically. This means that audio will always win over video:

irb(main):005:0> MIME::Types.type_for('test.wmv').each { |m| pp m.simplified }; nil
"audio/x-ms-wmv"
"video/x-ms-wmv"
=> nil

IMO this is a fundamental problem which cannot be easily solved, and the proper fix would be to return multiple MIME types for a given extension (#25).

Using non-standard mimes

Seeing how with mime-types you can do MIME::Types.add(some_new_type) to add non-standard mimes, does this gem have any plans on supporting this feature in some way?

mini_mime vs marcel

Hi, I don't know if this is the right place to post it, but I'm trying to compare mini_mime vs marcel regarding looking up by extension, because I think both gems cover the same space. I was trying to compare the number of extensions registered, the performance and memory consumption of every gem.

mini_mime marcel
#extensions ​ File.open(MiniMime::Configuration.ext_db_path).readlines.count => 1196 Marcel::EXTENSIONS.count => 1243

Regarding memory handling, mini_mime has a hash cache of 200 rows and misses are binary-searched from a file while marcel loads all records in a hash in memory. Is not reading from a file less performant than loading everything in memory? Loading everything in memory consumes more memory obviously, but the gain in performance outweighs the memory consumption, in my opinion.

Also I noticed that both DBs in mini_mime contain similar data but is there any reason why are not both DBs merged removing duplicates? I saw that when merging both files the number of rows/extensions is 1210, but I'm not completely sure if it's due to an error removing duplicates:

irb(main)> File.readlines(MiniMime::Configuration.ext_db_path).each do |line|
irb(main)*     s << line.strip
irb(main)> end
irb(main)> File.readlines(MiniMime::Configuration.content_type_db_path).each do |line|
irb(main)*     s << line.strip
irb(main)> end
irb(main)> s.length
=> 1210

Binary differences between mini_mime and mime-types

👋 I'm converting some code to use mini_mime instead of mime-types and I noticed a strange inconsistency in how types are labelled as binary. Here's the difference:

In mime-types: MIME::Types.of("a.css").first.binary? #=> false
In mini_mime: MiniMime.lookup_by_filename("a.css").binary? #> true

This also happens with all other types that are labelled as 8bit encoding in the db file. I think this line might be wrong.

Side note: it might also be cool to add a #ascii? method to check for non-binary encodings.

lookups for csv return text/comma-separated-values over text/csv

I see it in the db, so it's working correctly, but I'm curious why text/comma-separated-values was chosen over text/csv. I'm trying to see if I can get capybara updated to use mini_mime but they have a test that's failing and expecting "Content-type: text/csv". I'm not sure it matters, and I can easily change the test on the Capybara side, but I dug in a little further and found a StackOverflow article stating that the RFC suggests to use text/csv: http://stackoverflow.com/questions/7076042/what-mime-type-should-i-use-for-csv . So, what's the "right" mime type, and if we need to change it, how does that affect users of the gem?

I’ve added `mini_mime` DB conversion to mime-types-data

This may be of interest: mime-types/mime-types-data#47

I pulled the conversion code used directly from the Rakefile here modified it to match the external loader process. The README indicates how people could use this data directly from mime-types-data, which may be an option from a mini_mime perspective as well. If nothing else, you can now copy the latest version from the latest release (after I release this; I am not planning on doing so before November 8).

Random read failures

Hej there. So here's what is happening in our application:

We're sending out some mails via delayed_job and action_mailer like many other apps are likely doing, these mails contain urls to images stored with active_storage. Every now and then building the mail templates is failing with obscure errors happening in mini_mime. These are:

When we try those calls in a console ... everything works. When we stare at the problem ... it disappears :)

It kind of looks vaguely familiar to #37 but that might be misleading.

So, long speech :) Do you have any idea why this could be happening? Any thoughts and ideas are welcome.

Thanks!

Seeking the DB file does not work in a bundled JRuby application, crashes randomly

We have a Rails-based JRuby application deployed as a bundled .jar file, in which we currently use ActionMailer (which uses the Mail gem, which use MiniMime) to send reports as e-mail attachments. We've had reports of weird random crashes causing these e-mails sometimes not to be sent, and we've managed to narrow down the source of these crashes to MiniMime, and specifically to the code in MiniMime::Db::RandomAccessDb.

Here's an example stack trace from a crash log:

job=399431962505 Sending csv report #2501 to REDACTED failed: undefined method `>' for nil:NilClass
	uri:classloader:/gems/mini_mime-1.0.2/lib/mini_mime.rb:135:in `lookup_uncached'
	uri:classloader:/gems/mini_mime-1.0.2/lib/mini_mime.rb:113:in `block in lookup'
	org/jruby/RubyHash.java:1263:in `fetch'
	uri:classloader:/gems/mini_mime-1.0.2/lib/mini_mime.rb:89:in `fetch'
	uri:classloader:/gems/mini_mime-1.0.2/lib/mini_mime.rb:112:in `block in lookup'
	org/jruby/RubyHash.java:1263:in `fetch'
	uri:classloader:/gems/mini_mime-1.0.2/lib/mini_mime.rb:89:in `fetch'
	uri:classloader:/gems/mini_mime-1.0.2/lib/mini_mime.rb:111:in `lookup'
	uri:classloader:/gems/mini_mime-1.0.2/lib/mini_mime.rb:159:in `lookup_by_extension'
	uri:classloader:/gems/mini_mime-1.0.2/lib/mini_mime.rb:65:in `block in lookup_by_extension'
	org/jruby/ext/thread/Mutex.java:165:in `synchronize'
	uri:classloader:/gems/mini_mime-1.0.2/lib/mini_mime.rb:63:in `lookup_by_extension'
	uri:classloader:/gems/mini_mime-1.0.2/lib/mini_mime.rb:59:in `lookup_by_filename'
	uri:classloader:/gems/mini_mime-1.0.2/lib/mini_mime.rb:6:in `lookup_by_filename'
	uri:classloader:/gems/mail-2.7.1/lib/mail/attachments_list.rb:105:in `set_mime_type'
	uri:classloader:/gems/mail-2.7.1/lib/mail/attachments_list.rb:42:in `[]='
	uri:classloader:/app/mailers/application_mailer.rb:58:in `block in add_files'
	org/jruby/RubyArray.java:1792:in `each'
	uri:classloader:/app/mailers/application_mailer.rb:50:in `add_files'
	uri:classloader:/app/mailers/scheduled_job_mailer.rb:17:in `generate'
	uri:classloader:/app/mailers/application_mailer.rb:11:in `send_mail'

(Yes, I know we're a few versions behind, but the relevant code in v1.0.2 looks the same as in master and I'm pretty sure the issue exists there too.)

Anyway, the weird crashes seem to happen because MiniMime::Db::RandomAccessDb.resolve tries to seek the DB file. But in a bundled app the DB files end up being uri:classloader resources, which are not seekable. Worse, due to a questionable decision by the JRuby devs, attempting to seek them doesn't actually raise an exception but fails silently (and, AFAICT, also causes some internal buffers to be dumped so that the next readline ends up starting from the middle of a line).

Anyway, while the weird silent seek failure is arguably a JRuby bug, even if it's fixed those files will never be seekable (since the underlying Java streams aren't). So this would still need to be fixed on MiniMime's side to make it work in a bundled JRuby app.

Off the top of my head, I can see a couple of possible solutions:

  1. Scrap the current overcomplicated RandomAccessDb implementation entirely and just read the whole database into a hash on the first lookup. It's really not that big, and doing so would make the code much simpler and lookups likely faster.
  2. Test each DB file to see if it's seekable (by trying to seek it, and checking that no exception is raised and that the reported file position actually changes), and switch to a backup implementation (e.g. slurping the whole database into a hash) if it's not. More complicated than option 1, but preserves current behavior in cases where the files are seekable.
  3. Same as option 2, but just read any unseekable DB files into strings and wrap them in a StringIO wrapper. Probably the option involving the least changes to existing code.

Extensions with multiple possible content types

In mime-types, I can do this:

MIME::Types.type_for("a.js") # => [
  #<MIME::Type: application/ecmascript>,
  #<MIME::Type: application/javascript>,
  #<MIME::Type: text/ecmascript>,
  #<MIME::Type: text/javascript>,
  #<MIME::Type: application/x-javascript>
]

but in mini_mime, it assumes only one return. This is specifically annoying for JavaScript because I can do:

MiniMime.lookup_by_filename("a.js") # =>
#<MiniMime::Info:0x000 @extension="js", @content_type="application/ecmascript", @encoding="base64">

which tells me *.js uses base64 encoding. Have we thought of these kind of use-cases and which one is actually correct?

I should also mention there is a definition for application/javascript in the content type DB but you'll never reach it by looking up via extension.

Automatic updates from mime-types-data?

Would it be possible for this to be automatically built when mime-types-data changes?

I'm not sure exactly how it would work, but I think now that this repo is on GitHub Actions, it might be possible to react to a workflow event from the mime-types-data repo, or perhaps less elegantly, just run a scheduled cron on some interval.

I can envision a task that, when run, automatically runs bundle exec rake rebuild_db, and if there are changes creates a pull request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.