theiconic / name-parser Goto Github PK
View Code? Open in Web Editor NEWA universal, language-independent name parser PHP library
License: MIT License
A universal, language-independent name parser PHP library
License: MIT License
If you paste name like this 'Ren\xE9e Osborne', parse()
will accept this input, but it will trigger an Error:
Error: Return value of TheIconic\NameParser\Part\AbstractPart::camelcase() must be of the type string, null returned
To prevent this I'm using ...->parse(utf8_encode($nameString));
, but you should probably cover this issue in your repo.
This library is amazing! Thanks so much for providing it!
It's the best one I've found.
I was wondering if there is an easy way to initialize the library with configurable settings, such as if I want to change the salutations that I see in https://github.com/theiconic/name-parser/blob/master/src/Language/English.php
E.g. I may want to add "Prof." as a salutation.
Or maybe there are occasions where I'd want to customize the other constant arrays there.
Other than forking your repo and editing the code, I haven't figured it out yet.
Thanks again!
According to this question I really would appreciate to get a feature solving my needs.
My idea is, that the return of "vCard-function" returns an array with key is vCard property name and value is name (part).
Steps:
Here is my attempt from last night (draft):
/**
* get an array of name properties with vCard property as key
*
* @param string $realname
* @return array
*/
public function getNameProperties(string $realName)
{
$nameParts = explode(',', $realName); // "lastname, firstname"
if (count($nameParts) == 2) { // it`s a person
$nameParts = $this->parser->parse($realName);
$salutation = $nameParts->getSalutation();
$firstName = $nameParts->getFirstname();
$lastName = $nameParts->getLastname();
$middleName = $nameParts->getMiddlename();
$nickName = $nameParts->getNickname();
$initials = $nameParts->getInitials();
$suffix = $nameParts->getSuffix();
if (!empty($middleName) && empty($initials)) {
$additionalName = $middleName;
} elseif (empty($middleName) && !empty($initials)) {
$additionalName = $initials;
} elseif (empty($middleName) && empty($initials)) {
$additionalName = '';
} else {
$additionalName = implode(',', [$middleName, $initials]);
}
$names = implode(';', [$lastName, $firstName, $additionalName, $salutation, $suffix]);
$fullName = implode(' ', [$salutation, $firstName, $additionalName, $lastName]);
if (!empty($suffix)) {
$fullName = $fullName .', '. $suffix;
}
$fullName = preg_replace('/\s+/', ' ', $fullName);
$company = '';
} else { // it`s a company
$names = '';
$nickName = '';
$fullName = $realName;
$company = $realName;
}
return [
'N' => $names,
'FN' => $fullName,
'NICKNAME' => $nickName,
'ORG' => $company,
];
}
The test case:
[
'Smith, John Eric',
[
'lastname' => 'Smith',
'firstname' => 'John',
'middlename' => 'Eric',
]
]
It doesn't parse Eric
as a part, it gets skipped.
I tried making a fix, but gave up after a bit. First, I noticed that the counts of parts in MiddlenameMapper
is 2, i.e. ["John", "Eric"]
so the first condition there makes it quit out.
I changed that to check to < 2
to get further. Next, it fails in the mapFrom
loop because $k = $start; $k < $length - 1; $k++
becomes $k = 0; 0 < 1
so it only does one iteration and misses the Eric
part.
Trying to make that loop not miss the last part, but obviously that will make it fail for every other test case, so I gave up there.
Any idea how this case could be supported? My initial thought is the fact that Lastname was already parsed should be passed in to MiddlenameMapper, so it knows to not skip the last part, but I'm not sure how that information should be passed down the line.
For the following book the name gets parsed incorrectly:
https://www.bol.com/nl/p/dans/9200000105098508
Etje Heijdanus-De Boer
Etje
Heijdanus-De Boer
Etje Heijdanus-De
Boer
The following Name: "PAUL M LEWIS MR"
returns the following
Array
(
[0] => PAUL
[1] => TheIconic\NameParser\Part\Initial Object
(
[value:protected] => M
)
[2] => LEWIS
[3] => TheIconic\NameParser\Part\Salutation Object
(
[normalized:protected] => Mr.
[value:protected] => MR
)
)
making this break and getFirstname and getLastName returns nothing.
the issue seems to be the following line
https://github.com/theiconic/name-parser/blob/master/src/Language/English.php#L41
this probably going to happen with any of the salutations if they come at the end.
Edit:
Another Example:
"SUJAN MASTER"
"JAMES J MA"
"PETER K MA"
Iconic parser correctly parse a name in this format (name surname) example:
Giulio Di Marco -> name = Giulio surname = Di Marco
but when the order of name and surname is inverted (surname name) like:
Di Marco Giulio -> name = Di surname = Marco Giulio
The problem seems to be the prefix in the surname.
Any fix ?
I wanted to use this library for cleanup of publication authors and map them to a common format like
$name->getLastname()).', '.$name->getFirstname() . ' ' . $name->getMiddlename() . ' ' . $name->getInitials())
This works flawlessly for barely every name. But as my own second given name is my primary one I often don't spell out my first one, like:
Schuler, J. Peter M.
which results in
Array
(
[firstname] => Peter
[initials] => J. M.
[lastname] => Schuler
)
and thus I can't retrieve the correct order of names.
I understand that this is a quite special case, as Germany is one of the few countries were multiple firstnames are possible as well as middle names and where firstnames can be used. Additionally up until a few years ago there was a concept of primary firstname
which in my case is/was my second one, thus that kind of writing.
Currently I don't understand all of the parsing, but might supply a pull request if I get this to work while still fulfilling the other test cases.
Input: Vincent Van Gogh
Expected: Vincent
van Gogh
Actual: Vincent
van Gogh
✅
Input: Mr Vincent Van Gogh
Expected: Mr
Vincent
van Gogh
Actual: Mr.
Vincent
van Gogh
✅
Input: Mr Van Gogh
Expected: Mr
van Gogh
Actual: Mr.
Van
Gogh
❌
I would have expected this to be detected as a salutation and lastname.
If the nickname is after the complete name, the last name is marked as the middle name (and no last name is marked).
Charles Dixon (20th century) is parsed as:
TheIconic\NameParser\Name Object
(
[parts:protected] => Array
(
[0] => TheIconic\NameParser\Part\Firstname Object
(
[value:protected] => Charles
)
[1] => TheIconic\NameParser\Part\Middlename Object
(
[value:protected] => Dixon
)
[2] => TheIconic\NameParser\Part\Nickname Object
(
[value:protected] => 20th
)
[3] => TheIconic\NameParser\Part\Nickname Object
(
[value:protected] => century
)
)
)
Found some bugs having to do with capitalized names and multiple middle names. Sharing some examples here, it seems the initials get incorporated somehow.
Names for testing:
The last example should be listed as "Last, First" so "Gabriel" should just be the first name and "Garcia Marques" should be listed as the last name.
Output:
(
[firstname] => Sofia
[middlename] => Garcia
[initials] => D E L A
[lastname] => Mancha
)
(
[firstname] => D
[initials] => A
[lastname] => Lat
)
(
[firstname] => Juanita
[middlename] => Maria
[initials] => D E
[lastname] => Sur
)
(
[firstname] => Garcia Gabriel
[lastname] => Marques
)
<?php
require_once __DIR__ . '/vendor/autoload.php';
$parser = new TheIconic\NameParser\Parser();
$namesToTest = array( 'SOFIA GARCIA DE LA MANCHA', 'DA LAT', 'JUANITA MARIA DE SUR', 'Garcia Marques, Gabriel' );
foreach ( $namesToTest as $input ) {
$name = $parser->parse( $input );
echo $name->getSalutation();
echo $name->getFirstname();
echo $name->getLastname();
echo $name->getMiddlename();
echo $name->getNickname();
echo $name->getInitials();
echo $name->getSuffix();
print_r( $name->getAll() );
echo $name;
}
While adding the Dutch language the van ’t
last name prefix could not get detected.
Adding it to a new language's suffixes will not detect it either, so this has something to do with the parsing internals.
Did add a test in draft PR #35
Expected: Charlotte
van ’t
Wout
Actual: Charlotte Van ’T
Wout
There are quite a few extra salutations that could be added (such as Lord, Lady, Dame)
Thank you for the v1.2.10
release 🎉
Packagist seems out of sync, v1.2.8
is the last version available.
Not sure if it has anything to with the Github webhook outage of last week, because the v.1.2.9
also did not came through.
https://packagist.org/packages/theiconic/name-parser
Can you update / sync the repo?
cc @Seldaek 😉
Thanks for your work on this library - much appreciated.
We had a failure today for "yumeng du" as a name.
Looking at the parts array, the 'du' is not ascribed to a part.
I'm guessing that this is because it's one of the recognised prefixes?
I really like how the name-parser can recognise the lastname prefix correctly. Now I would like to be able to get this prefix and the lastname separately.
This could be a setting (maybe related to the language setting, where in Dutch the lastname prefix is a separate part of the name (and for example sorting goes by last name, without the prefix).)
Example:
Frank van Delft (Dutch name), is recognised as:
TheIconic\NameParser\Name Object
(
[parts:protected] => Array
(
[0] => TheIconic\NameParser\Part\Firstname Object
(
[value:protected] => Frank
)
[1] => TheIconic\NameParser\Part\LastnamePrefix Object
(
[normalized:protected] => van
[value:protected] => Van
)
[2] => TheIconic\NameParser\Part\Lastname Object
(
[value:protected] => Delft
)
)
)
Doing:
echo $name->getFirstname() . "\n";
echo $name->getLastnamePrefix() . "\n";
echo $name->getLastname() . "\n";
Gives:
Frank
van
Delft
The current implementation does not support salutations of more than one word.
Air commodore is an English salutation that consists of more than one word.
What would it take to support more complex salutations?
We find this library useful but there are some things we would like to fix / fixes by others we would like to use. These are mostly in the open PR queue but it looks like the last commit was in 2019.
Are you still maintaining this? Interested in co-maintainers? Happy for this to be forked?
I see support for German, is there anyone working on adding Spanish? Often there are multiple surnames:
Juan de Jesús López Ortíz
GIVENNAME: "Juan"SURNAME: "de Jesús"SURNAME: "López"SURNAME: "Ortíz"
José Guadalupe Jiménez Montoya
GIVENNAME: "José"
GIVENNAME: "Guadalupe"
SURNAME: "Jiménez"
SURNAME: "Montoya"
I'm not sure if this library can support that, but figured I'd ask.
Getting first name of [A'a Cook] will return A'A. The right first name is A'a.
Hello there,
I have a name like this:
Nguyễn Quốc Thái
After parsing, the fullname I get there was "NguyễN QuốC TháI", you can notice how the cases are messed up.
One more thing is that Vietnamese names are written like this: Lastname MiddleName Firstname, how should this be handled?
name-parser/src/Part/AbstractPart.php
Line 84 in faaa310
If the mbstring extension is loaded, the following could be used:
return mb_convert_case($matches[0], MB_CASE_TITLE, "UTF-8");
It should be mentioned in the documenation that the program expects to work with utf-8 strings.
Test case:
One common (if slightly archaic) way of listing names is as follows:
Lastname, Firstname (optional middle initial or name), Suffix
for example:
Tiptree, James, Jr.
which the parser parses as:
[
lastname => string (13) "Tiptree James"
suffix => string (2) "Jr"
]
and
Miller, Walter M., Jr.
which the parser parses as:
[
firstname => string (6) "Walter"
lastname => string (9) "Miller M."
suffix => string (2) "Jr"
]
Interestingly, if you remove the second comma, the names behave differently.
Tiptree, James Jr. still fails (it drops the suffix):
[
firstname => string (5) "James"
lastname => string (7) "Tiptree"
]
Miller, Walter M. Jr. correctly parses into firstname/initial/lastname/suffix, as shown:
[
firstname => string (6) "Walter"
lastname => string (6) "Miller"
initials => string (2) "M."
suffix => string (2) "Jr"
]
So, definitely a few problems to fix here!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.