Giter Club home page Giter Club logo

Comments (14)

azat-d avatar azat-d commented on May 22, 2024 2

Also, there are some overlapped identities between facescrub and ms1m. I downloaded from the freebase the correspondence between MIDs and real names. Please check the attachment
mid_to_name.txt.zip
UPD: Aaron Eckhart has an identifier m.03t4cz. This person is present in both the test and the training sets. Obviously, there are other such persons.
UPD2: m.04wp3s:Sam Rockwell, m.014zfs:Bill Cosby, m.02h3tp:Patrick Swayze, etc - all these identities are both in training and test sets (I've just checked it manually, I believe that there is more than 50% of the intersection.)

from insightface.

azat-d avatar azat-d commented on May 22, 2024 1

Agree. But according to my test there are at least 67.5% overlap. I don't trust to any results that are based on celebrity datasets. The most reliable test is NIST FRVT test, which is free for all researchers.

from insightface.

nttstar avatar nttstar commented on May 22, 2024

We're doing such experiment and will be available in our paper soon, slightly worse I think(<0.1).
We have already removed 500+ identities from ms1m by checking the similarity between facescrub and ms1m. Please see src/data/dataset_merge.py if you want to know how we remove overlaps.

from insightface.

azat-d avatar azat-d commented on May 22, 2024

I just wrote a script that checks for matches between test persons (subset of facescrub that used in MegaFace challenge) and persons from the training set (your cleaned ms1m list). There are 54/80 persons that are both in training and test sets:
Stana_Katic m.0fd6sd
Farrah_Fawcett m.01j851
Sam_Rockwell m.04wp3s
Alec_Baldwin m.018ygt
Christopher_Reeve m.0jrny
James_Remar m.05mlqj
Brendan_Fraser m.0227tr
Brianna_Brown m.0gdvdh
Andrea_Bowen m.05dxl5
Tempestt_Bledsoe m.014yqb
Paul_Bettany m.01chc7
Robert_Redford m.0gs1_
Mark_Wahlberg m.0gy6z9
Sarah_Hyland m.0523pz4
Alley_Mills m.0d_3hq
Kit_Harington m.09v4hnq
Victoria_Justice m.07w71b
Robert_Duvall m.015c4g
Edie_Falco m.01dy7j
Peggy_McCay m.05j0x1
Jeremy_Irons m.016ywr
Rebecca_Budig m.03jtgb
Brad_Garrett m.01rcmg
Bill_Cosby m.014zfs
Christel_Khalil m.0719hb
Lindsay_Hartley m.04w9ky
Joanna_Kerns m.0403xb
Emile_Hirsch m.05mkhs
Christine_Lakin m.06wr68
Marilu_Henner m.02pzx7
James_Marsden m.042ly5
Justin_Timberlake m.0j1yf
Adam_Brody m.0214df
Patrick_Swayze m.02h3tp
John_Malkovich m.017r13
Melina_Kanakaredes m.02pbhg
Nadia_Bjorlin m.04vpr3
Ryan_Phillippe m.01ksr1
Fran_Drescher m.01s3kv
Norman_Reedus m.0bs6hr
Robert_Knepper m.07v7p6
Didi_Conn m.04tvm2
Bobbie_Eakes m.03s_t9
Heath_Ledger m.0237fw
Summer_Glau m.039g0_
Emily_Deschanel m.03vd_l
Orlando_Bloom m.09wj5
Daniel_Day-Lewis m.016yvw
Shia_LaBeouf m.04w391
Kimberlin_Brown m.03ff8f
Adrienne_Barbeau m.01z7nj
Dean_Cain m.02qjj7
Erin_Cummings m.063z0nr
Joaquin_Phoenix m.018db8

from insightface.

nttstar avatar nttstar commented on May 22, 2024

@azat-d I think it is also very difficult to find ALL overlaps by names matching.

from insightface.

nttstar avatar nttstar commented on May 22, 2024

@azat-d I have removed 500+ identities from MS1M by comparing with facescrub dataset, to test MegaFace. By reference, facescrub have only 530 identities in total. I believe our result is quite reliable.

from insightface.

azat-d avatar azat-d commented on May 22, 2024

Megaface test use only 80 identities from facescrub. And checked YOURS train list against those identities.

from insightface.

azat-d avatar azat-d commented on May 22, 2024

And I've found that 54/80 identities are both in test and in yours training set.

from insightface.

azat-d avatar azat-d commented on May 22, 2024

I'm talking about this https://pan.baidu.com/s/1eTn6O62 training set

from insightface.

azat-d avatar azat-d commented on May 22, 2024

Do you mean that there was additional cleaning of this list?

from insightface.

nttstar avatar nttstar commented on May 22, 2024

500+ identities were removed in my binary packed dataset, not this clean list. You can check it in our paper and there's about 0.3% performance drop(98.3% -> 98.0%)
You need to generate features for all 530 identities if you want to upload the result, 80 identities is only required by set-1.

from insightface.

azat-d avatar azat-d commented on May 22, 2024

Ok, thank you!

from insightface.

zhenglaizhang avatar zhenglaizhang commented on May 22, 2024

So great to hear that the results about overlapping identities removing, thank you guys, I will also take a look at this then, may update if any new results here.

from insightface.

zhenglaizhang avatar zhenglaizhang commented on May 22, 2024

closing as this is well discussed here.

from insightface.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.