Giter Club home page Giter Club logo

Comments (7)

siebenschlaefer avatar siebenschlaefer commented on May 26, 2024 3

@bobahop My point is, this solution invokes UB, it might behave differently on different platforms, it might behave differently on multiple executions on the same machine, it might seemingly work and then do something unexpected on the day of an important presentation. With undefined behavior anything can happen (e.g. the often quoted "nasal demons" or "pregnant cats"). And I don't think we can do much against that.

from c.

siebenschlaefer avatar siebenschlaefer commented on May 26, 2024 1

char copyPhrase[strlen(phrase)]; creates a VLA but doesn't initialize it, it's elements have an "indeterminate value" and reading such an indeterminate value invokes undefined behavior. In practice that can mean two things: Either the program will interpret the underlying bytes as an object of that type (here: char), so whatever the values of those bytes are determines the value of the element of the array. Or the optimizer analyzes the code and prunes execution paths with UB, leading to strange bugs that are hard to figure out.
But the inner for loop loops over all strlen(phrase) elements of copyPhrase, invoking UB.
Also, the officially arrays must have a length greater than 0, otherwise you get UB, too. But zero-length arrays are often available as an implementation-specific extension.

You could fix that by handling an empty phrase as a special case and ending the inner loop when j becomes equal to i.


BTW: How the compiler implements a VLA is not specified, a common approach is to do something similar as alloca(), use the call stack. But in practice the space on the stack is limited. If the length of the input is too large you get a stack overflow. IIRC that has been an attack vector for some exploits.

I strongly recommend only to use variable-length arrays if the size is limited and small. For a deeper dive into VLAs you might want to read The (too) many pitfalls of VLA in C by Jakub Łukasiewicz, Legitimate Use of Variable Length Arrays by Chris Wellons, or this rant by Linus Torvalds.


The condition of both for loops call strlen().

Some compilers are clever enough to figure out that the result of that call will be the same throughout the function and actually call strlen() only once. Other compilers might not be that smart and call strlen() for each iteration of the loop (see stackoverflow and underhanded.xcott.com).

You could store the result of strlen() in a variable before the loop. Or you could write that for loop in a slightly different way:

for (size_t i = 0, len = strlen(phrase); i < len; i++)
// or
for (size_t i = 0; phrase[i]; i++)
// or
for (const char *cp = phrase; *cp; cp++)
    // use *cp

The function passes a char to tolower() and isalpha().

Like all function from <ctype.h> they comes with a catch: They take an int and their behavior is undefined if their argument is neither representable by an unsigned char nor equal to EOF. You should never pass a char or a signed char to them but always cast it to unsigned char first (see also cppreference).


The function returns either 1 or 0.

But since the return type is bool I'd suggest using true and false. IMHO that would state the intent a little bit clearer.


The loop variables i and j have the type int, the conditions of the loops cast the result of strlen() to an int.

Officially an int has at least 16 bits, 15 for the value because it's signed. That might be too few for very long strings. IIRC that has been used as an attack vector, where a malicious actor creates a string that is too long for an int and thereby causes an infinite loop or manages to access the memory outside of the array.

You could use size_t instead because that type is guaranteed to be able to represent the size of all objects. That would have the additional benefit that you wouldn't need the casts anymore.


This solution uses two nested loops, the outer loop that iterates over the phrase and for each letter the inner loop determines whether that letter appears anywhere else in the string.

The runtime complexity of that approach is in O(n * n), i.e. it's quadratic. If the phrase is an isogram with n characters then is_isogram() performs between 3 * (n + 1) * n / 2 and 4 * (n + 1) * n / 2 char comparisons.

A faster and IMHO even simpler implementation of is_isogram() is possible by using an additional data structure that "remembers" whether you've already seen a letter, e.g. an array of length 26 where each element stores whether you've seen a letter before.

from c.

ryanplusplus avatar ryanplusplus commented on May 26, 2024 1

@bobahop My point is, this solution invokes UB, it might behave differently on different platforms, it might behave differently on multiple executions on the same machine, it might seemingly work and then do something unexpected on the day of an important presentation. With undefined behavior anything can happen (e.g. the often quoted "nasal demons" or "pregnant cats"). And I don't think we can do much against that.

💯

This can and does differ from compiler to compiler and OS to OS so unless we test every combination in the online editor (not even remotely plausible), we'll have to rely upon mentors to address this.

from c.

bobahop avatar bobahop commented on May 26, 2024

Thanks for the analysis. I'm aware of problems with the solution. My issue is about passing all of the tests online but failing two tests locally. In other words, I wouldn't expect this to pass the online tests, but it did.

from c.

bobahop avatar bobahop commented on May 26, 2024

BTW, for VLAs, they were introduced in C99 but were made optional in C11, so compilers aren't required to support them. For instance, my Microsoft compiler flags them as an error, while my gnu compiler supports them. I advise against them when mentoring.

from c.

wolf99 avatar wolf99 commented on May 26, 2024

Not necessarily directly related to this issue AFAIK, but FYI that MSVC does not fully support C.
Last time I looked MS stated that they support C only so far as they need to to be able to fully support C++.
As such I would generally advise that anyone on a Windows platform prefer a non-MSVC compiler, or a *nix compiler via WSL.

I would be interested to know if MSVC marks VLAs as errors if you instruct it to use a pre-C11 standard.

from c.

bobahop avatar bobahop commented on May 26, 2024

I did some more testing. Locally in four runs I got 4 failures, then 2, then none, then 1. Online I got three successes in a row, and then got 4 failures. So that supports the explanation. Thanks, I'll close this now.

from c.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.