Giter Club home page Giter Club logo

Comments (5)

romanveaceslav avatar romanveaceslav commented on September 23, 2024

Additionally to clarify:
Actual field autonomous_system_organization in GeoLite2-ASN-Blocks-IPv4.csv and GeoLite2-ASN-Blocks-IPv6.csv has maximum 95 characters including double quote (") if any. All characters are ASCII, it is safe to assume that the string can be stored without truncation in an array of 96 bytes char asfield[96]. I checked this by converting this field to ASCII with iconv without any error, as well as from maxmind and RIPE documentation.

As per maxmind csv files are encoded UTF-8:
https://dev.maxmind.com/geoip/docs/databases/city-and-country?lang=en

As per RIPE org-name is ASCII only, see
https://apps.db.ripe.net/docs/20.Appendices/01-Appendix-A--Syntax-of-Object-Attributes.html
for org-name:

RIPE do not impose a limit of the length of this field, but, apparently, maxmind truncate it to 95 characters or 93 characters if it needs to enclose it in double quotes, e.g. 62.3.160.0 org-name ( https://ipinfo.io/AS9112 ) "Institute of Bioorganic Chemistry Polish Academy of Science, Poznan Supercomputing and Networking Center" is truncated by maxmind to "Institute of Bioorganic Chemistry Polish Academy of Science, Poznan Supercomputing and Networ" which is 95 characters including double quote "

from nfdump.

romanveaceslav avatar romanveaceslav commented on September 23, 2024

My version of solution, which works for me for a couple of weeks is to replace in maxmind.h the "char orgName[64];" in struct asV4Node_s and struct asV4Node_s with
#define orgNameLength 96
char orgName[orgNameLength];"

then create a function csvnsep which, in the case the field starts with ", than looks for the terminated combination of double quote followed by comma ",
This function can replace the the call of strsep in functions loadASV4tree and loadASV6tree in maxmind.c like this:
while ((field = strsep(&l, ",")) != NULL) {
->
while ((field = csvnsep(&l, orgNameLength-1, ',', '"')) != NULL) {

and

            case 2:  // org name
                strncpy(asV4Node.orgName, field, 64);
                asV4Node.orgName[63] = '\0';

->
case 2: // org name
/*VRO changed it to properly process too long strings and strings with comma and quotes " */
strncpy(asV6Node.orgName, field, orgNameLength);
asV6Node.orgName[orgNameLength-1] = '\0';

The function csvnsep, besides of properly identifying the end of quoted string also make have the parameter of the maximum length to make sure that '\0' is at the maximum length and that the string is always terminated with double quote if it starts with double quote.
The function is bellow and no limits to use (or not to use) and I am not doing professional programming for many years.

char *csvnsep (char *stringp, const size_t max_field_length, const char delim_char, const char quote_char)
{
/
max_field_length truncate the field if exceeds this value, if 0 then no check */
char *begin, *end, *end_quotted;
char delim[2];
char quoted_field_end[3];

delim[0] = delim_char;
delim[1] = '\0';

quoted_field_end[0] = quote_char;
quoted_field_end[1] = delim_char;
quoted_field_end[2] = '\0';

begin = *stringp;

if (begin == NULL)
return NULL;

if(begin != quote_char) {
/
Go the usual strsep way as in glibc /
/
Find the end of the token. */
end = begin + strcspn (begin, delim);
if (end)
{
/
Terminate the token and set *STRINGP past NUL character. */
*end++ = '\0';
stringp = end;
}
else
/
No more delimiters; this is the last token. */
*stringp = NULL;

    /* Check whether the length exceeds the max_field_length */
    if(max_field_length > 0 && strlen(begin) > max_field_length)
            begin[max_field_length] = '\0';
    return begin;

}
else {
/* well the field begin with a quote char /
if((end_quotted = strstr(begin+1,quoted_field_end)) != NULL) {
/
begin+1 for unlikely case where field starts with ", */
if(end_quotted[2] == '\0')
*stringp = NULL;
else
stringp = end_quotted + 2;
/
replace comma (separator) with \0 /
end_quotted[1] = '\0';
}
else {
/
a regular end of field not found, either it is the last field or is a mistake, will consider the field until the end of the line */
stringp = NULL;
}
/
Check whether the length exceeds the max_field_length /
if(max_field_length > 0 && strlen(begin) > max_field_length)
begin[max_field_length] = '\0';
/
Make sure that the field ends with quote_char */
if(strlen(begin)>1)
begin[strlen(begin)-1] = quote_char;

    return begin;  

}
}
csv_simple_parse.zip

from nfdump.

phaag avatar phaag commented on September 23, 2024

Thanks for the report. I will use a slightly different approach to fix this. In the end it should work with length 96

from nfdump.

phaag avatar phaag commented on September 23, 2024

Fixed in master repo.

from nfdump.

romanveaceslav avatar romanveaceslav commented on September 23, 2024

Great. Many thanks.
It works as expected, no differences between autonomous_system_organization name produced by the geolookup and the original from maxmind.

Please, consider that in the future maxmind may add new data fields to the right of the existing fields as per they site:
https://dev.maxmind.com/geoip/docs/databases/asn?lang=en

from nfdump.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.