Giter Club home page Giter Club logo

Comments (10)

4c0d3r avatar 4c0d3r commented on June 16, 2024

Namer does currently attempt this, but it's the porndb's api itself that has problems in some cases. This is a known issue with theporndb, still worth tracking.

from namer.

PAHelper2 avatar PAHelper2 commented on June 16, 2024

Okay, here's a nice example:
POVD.19.10.01.Rebel.Lynn.Cleaning.Lady.Creep.XXX.MP4-SDCLiP

It should be dated 19.10.11, but searching for "cleaning lady creep" in the Scene search web UI finds it. Actually, it finds it TWICE, which is weird.

from namer.

4c0d3r avatar 4c0d3r commented on June 16, 2024

So namer intentional choose not to treat a file as a match if it is more than 1 day newer or older than the data in tpdb. This is to prevent false positives. If you have the namer log enabled in your config this would likely have the correct match at the top sorted by scene name similarity, but namer errs on the side of safety here. It's intentional left up to the user update the file name in these cases, and re-run namer if they way mp4 tags.

from namer.

4c0d3r avatar 4c0d3r commented on June 16, 2024

Hmm if this proves to be enough of a problem for certain sites, I could see giving users a site list to ignore date matches.

from namer.

PAHelper2 avatar PAHelper2 commented on June 16, 2024

I was thinking of something more like a scoring model:

  • matched site +1
  • matched scene name +1
  • matched date +1
  • matched 1 or more performers, +1 per match

And then you need at least a score of 2. Something like that. I know those numbers are not good, but I'm trying to figure out a way where the date could be wrong, but other data you parse is enough to say "ok, the date is wrong, but i'm going to match it to Foo anyways".

But then you'll probably want to write a semi-permanent log that tracks success + score, so you can always go back and see what's happening.

from namer.

4c0d3r avatar 4c0d3r commented on June 16, 2024

The log in this case can be written next to the file today --- turn on namer log: https://github.com/ThePornDatabase/namer/blob/main/namer.cfg#L50

Matching the "scene" today is done by comparing the powerset of the performers/scene name (a string for each performer, and one for each performer and every other performer and so on, and/or the scene name. This is because the "scene name" might not be the scene name, but just performers, etc. And then using a tool called rapid fuzz to match the actual file's "scene name" to all those possibilities, and if any match more than 90%, they are used. https://github.com/maxbachmann/RapidFuzz#scorers. This helps with misspellings, extra words, people getting clever with spellings of certain words, etc. A lot of thought was already put in to this. The key is that the match is gated however on the date and site matching (the site can have spaces removed, and be a truncation of the actual site) and the date needs to be with 1 day as stated before. Those matchings are done after the fact. I understand it can be frustrating if the scene/site name is an exact match and the date is not and it get's rejected, but that's what the namer.log can help with.

from namer.

4c0d3r avatar 4c0d3r commented on June 16, 2024

if you read the namer log it'll help make the algorithm used clear.

from namer.

PAHelper2 avatar PAHelper2 commented on June 16, 2024

Got it! I'll enable and look at the namer log. (I've only been following docker logs -f namer)

And apologies if I'm stating dumb stuff that's already handled much better. I'm just thinking out loud. :)

from namer.

4c0d3r avatar 4c0d3r commented on June 16, 2024

No worries, just makes it clear I need to describe the matching algo better in the readme.rst. https://github.com/ThePornDatabase/namer#for-the-curious-how-is-a-match-made. Feel free to take a stab improving it if you want once you read through a few namer.log files (they're short) - as the author it's hard to gauge if I'm describing it sufficiently.

from namer.

4c0d3r avatar 4c0d3r commented on June 16, 2024

Supported now:

sites_with_no_date_info =

You have to add each site which will have dates ignored to this list....

from namer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.