luzkan / smells Goto Github PK

📗 The Code Smells Catalog

Home Page: https://luzkan.github.io/smells/

License: MIT License

JavaScript 13.64% Python 4.86% TypeScript 81.51%

catalog code-smell code-smell-detection code-smells information learn-to-code learning-resources refactoring resource webpage

smells's Introduction

Bad Code Smells

A Comprehensive Online Catalog

Visit the Website · Preprint · Springer Paper · The Thesis

What's this?
How can I use this?
- Knowledge Browsing
- Data Extraction
Who is the beneficent?
How can I contribute?

What's this?

This repository contains the source of the Code Smell Catalog website that contains the current list of smells along with their:

attributes
categories
types
relationships between them
problems which they cause
refactoring methods
history
examples

How can I use this?

Website

Feel free to visit the website and browse around!

Data

If you would like to scrape the data, clone the repository and run python /data_scraper/main.py.

Who is beneficent?

Everyone

New Programmers

New developers can browse the code smell list in a nice, readable form of articles and read about them to get a good intuition of what might be a bad practice or what they should watch out for.

They can find the descriptions of smells, their potential causation example, and table-formatted, higher-abstraction attributes about the particular code smell (like whether it is a smell that happens within a class or between classes). On top of that - the majority of code smells have examples that are often very significant when one is learning about a new thing.

Developers

It's much easier to handle a code review discussion when someone can place a link directly to the source of his concerns. This could benefit and accelerate the understanding of code smell among developers.

A large proportion of developers may even intuitively know about most of these things without knowing about the issue itself as a named phenomenon. This, again, can improve the overall skills of developers.

Researchers

The data and information about smells are scattered around and it's hard to collect every single smell to holistically execute research on them. Currently, as of 2022, the researched data about different smells is drastically disproportionate. Some Code Smells are almost always taken into account, some rarely, and some are not covered by the research at all - either because they were lost in the information noise or because they never occurred with the appropriate keyword.

This catalog is designed to unify the available data, and standardize the nomenclature (synonyms) and the different perspectives (taxonomies) from which this issue can be examined.

How can I contribute?

If you would like to contribute, you are more than welcome by opening a new discussion in the issues or directly adding changes by opening new merge requests. I suspect there might be some discussions going (I am deeply convinced that in such a huge pile of stuff, I had to make mistakes, even just statistically speaking). 🐈

This is supposed to be as easy as possible for everyone to contribute from the theoretic side - no need to know any programming languages, as the contents of the website can be managed by markdown-like files in the content directory. The content is in a standard markdown format and the key data in the markdown file header in YAML format.

smells's People

Contributors

Stargazers

Watchers

Forkers

calypsow777 rruiter87 splitdiff socialsky-io azeric zeeneddie edwardhoward gabssnake mishaevtikhiev evertonguimaraes fc122 vanzaj

smells's Issues

Idea: comment section

Some code smells is a good ground for discussion or questions and would be great to have comment section on each page

Python code examples don't follow PEP8

Over code examples I see that Python syntax is being used. And also I see the "Python" tag provided by the repository as one of the languages of the project. And since the project is about differing good from bad - I propose that ALL of the Python code examples should follow the PEP8 guidelines.

For example, in the fallacious method name file function names use camelCase. Like "getFoos". It is against the PEP8 functions naming convention:

Function names should be lowercase, with words separated by underscores as necessary to improve readability.

So function names should be represented like "get_foos", using snake_case.

P.S. and may be it would be better to highlight Python as examples language explicitly?

missing reference 35 in global data smell

https://luzkan.github.io/smells/global-data

Side bar remains open

Problem

Side bar remains open and hinders reading.

System 64-bit

Ubuntu 20.04.4
Firefox 103.0
Chrome 103.0.5060.134

Proposed solution

Add a button to collapse the sidebar.

disagreements on "what comments" and factoring out methods

https://luzkan.github.io/smells/what-comment

I don't think the given code example is that bad or that the suggested replacement is an improvement. This will mostly be about the "factor out methods" part and less so about the "what comments" part.

The original code could be read from top to bottom, in order. The replacement code is spread out across the screen in non-linear order, which harms legibility.
A comment can be a grammatical English sentence that explains something. A function/method name must be fairly short and has a limited character set. By turning comments into function names, we are losing our power to express ourselves in natural language.
It may be argued that a comment may go out of sync with the code it describes. But why does this argument not also apply to the name of the function wrt its body?
In the original function, all the report generating/tweaking etc was an implementation detail of the run method, invisible to the outside world. In the replacement, create_report and send_report are now public. This means other people may use those methods independently of run. So, in the first case, we could refactor run with impunity, but in the latter case, we have to maintain backwards compatibility for create_report. Moving internal functionality out into separate methods is not without cost, as it increases the "API surface area". Even if you name them with an underscore prefix _create_report, that's no guarantee someone won't use them anyway, since Python has no real notion of public vs private methods. See also Hyrum's Law.
The example obscures likely complexity behind the ... in the parameter lists. What often happens in real code is that these intermediate stages require a large amount of state to do their job. They can depend on a lot of variables that are defined and mutated at various points in the run method. When you factor out these stages into separate functions, you somehow need to make that state accessible to the new functions. There are three ways of doing this, all of them bad:
- Global variables. These are obviously bad so I won't spend more time on them
- Really long parameter lists: def create_report(self, spam, eggs, ham, apple, banana, orange, grapefruit, ...). These things are unwieldy, and a sign that the function really shouldn't be a function.
  - You can paper it over by bundling up all the variables into a dictionary//dataclass and passing in that "parameter object" as a single argument. But a) you still need to somehow document what keys/fields are on the parameter object, and b) now you need to construct the parameter object at the callsite and destructure it again in the pulled-out method. All this added bureaucracy is incidental complexity, which reduces the signal-to-noise ratio of our code.
- Turn the variables into attributes on self and share the state via the class instance, i.e. self.spam, self.eggs, and so on. This is only slightly better than global variables -- it's still shared mutable state and all the headaches that implies. And again, people might end up relying on those instance attributes in downstream code, or elsewhere in your class. State that was once localized in both space and time, risks becoming entangled across the entire class or wider codebase.

I've worked on codebases where people did this pervasively. Perfectly readable-from-top-to-bottom code was chopped up and scrambled into an indecipherable labyrinth of methods, methods which otherwise served no standalone purpose. The motivation was only that the original method body exceeded some arbitrary length. I came to think of it as an antipattern.

Fundamentally, the underlying fallacy is in not understanding what a function is for. In Python, at least, the syntax:

def some_function(a, b):
    ... # blah blah blah
    return result

Does three things:

it gives a name to a block of code
it lets us reuse that block of code
it introduces a new scope for the duration of that block (and pass in arguments and return a value from it)

However, we do not always need all three things. The Principle of Least Power suggests that we should use the least powerful language construct for our purposes. For example, often we want 2 and 3, but not 1, i.e. an anonymous function. The lambda syntax lets us define anonymous functions in that case, e.g. for making short inline callbacks passed into another function.

In code like the # Creating report section of the original run method, it's very common that we don't have any particular desire to reuse it, because it's some highly specific non-generalizable thing from the middle of some complicated piece of work. In fact it can be harmful to allow reuse, for the reasons given earlier. (Think of it as a kind of information hiding / implementation hiding). Furthermore, it's not clear that naming the code via the function name gives any benefit over describing it in English in a comment, given that we're not using it in more than one place. So, one reason for using a function is ruled out, and the other is equivocal.

It is the third reason, introducing a new scope, that is most attractive. Scopes are a powerful tool for managing state. Any new variables defined in a function body are local to that function, and are destroyed and garbage collected when the function exits, unless explicitly returned to the caller. They also reduce cognitive load -- nobody has to worry that the banana variable defined in function A has anything to do with the banana in function B.

This reasoning suggests the following course of action:

Instead of factoring out the sections of run into public methods, make them into nested functions of run:

class Foo:
    def run(self, ...):
        ...

        def make_report():
            vanilla_report = get_vanilla_report(...)
            tweaked_report = tweaking_report(vanilla_report)
            final_report = format_report(tweaked_report)
            return final_report

        report = make_report()

        def send_report():
            send_report_to_headquarters_via_email(final_report)
            send_report_to_developers_via_chat(final_report)

        send_report()

Note the following features of this code:

You can read it in order, from top to bottom
The nested functions do not need to take any parameters, because their bodies see everything of the enclosing scope (i.e. if run defined a variable a, then make_report can access a). This means there is no trouble with enormous parameter lists or implicit state on self ...
But the vanilla_report and tweaked_report variables are constrained to only exist within the body of make_report. There's no possibility that tweaked_report gets randomly used again 200 lines later on in run. This means, as soon as our eye leaves the body of the nested function, we can safely forget about those variables. This frees up mental capacity to focus on the next thing. Remember, we can only keep 7+/-2 things in mind at once! We reduce cognitive load by limiting the amount of stuff existing in the run namespace.
Nobody will improperly depend on make_report or send_report because they can't be accessed from outside run ...
But we have not lost any flexibility, because we still have the option to factor out make_report, send_report into their own methods if that does turn out to be desirable in the future.

But the proposal is still deficient, because of the redundancy:

def make_report():
    ...
report = make_report()

We are using a function for something that is only ever used once (lexically), which gives rise to the awkward pattern where we define a function then immediately invoke it to assign the result to a variable, both of which have slightly different names: def make_banana(): ...; banana = make_banana(). There is still a tension from using a construct which can potentially be reused, but in fact only using it once. This suggests we use a weaker construct, but what?

We could instead name the function the same as the eventual variable that takes on its return value:

def report():
    ...
report = report()

And this turns out to be a special case of the decorator pattern! i.e. whenever we have

def some_function(...):
    ...

some_function = something(some_function)

We can use the @ syntactic sugar to shorten it:

@something
def some_function(...):
    ...

In our case, we can use this seemingly trivial decorator:

def assign(f):
    return f()

And use it like this:

@assign
def report():
    vanilla_report = get_vanilla_report(...)
    tweaked_report = tweaking_report(vanilla_report)
    final_report = format_report(tweaked_report)
    return final_report

What this does is define the report function, immediately calls it, then assigns the result to a variable also named report. This means the original function is no longer accessible, i.e. it's a function that can only be called once! Now there is no redundancy.

I called it assign because you can think of this decorator in another way -- instead of a weakening of a function, it's an expansion of the assignment statement =. It's as if we could write:

report =
    vanilla_report = get_vanilla_report(...)
    tweaked_report = tweaking_report(vanilla_report)
    final_report = format_report(tweaked_report)
    return final_report

i.e. assignment over multiple lines, with its own scope, that we can early-return from if we want.

We can do something similar for the sending report part, although in that case there's no meaningful return value, so assign is a misnomer. We can make another, even weaker decorator:

def do(f):
    f()

@do
def send_report():
    send_report_to_headquarters_via_email(report)
    send_report_to_developers_via_chat(report)

This makes it clear that we don't care about the return value (the variable send_report will end up with the value None). If send_report needed to do some intermediate computations, it can use local variables to keep track of those and they won't persist afterwards.

The final code is:

class Foo:
    def run(self, ...):
        ...

        # Detailed "what" comment here
        @assign
        def report():
            vanilla_report = get_vanilla_report(...)
            tweaked_report = tweaking_report(vanilla_report)
            final_report = format_report(tweaked_report)
            return final_report

        # Detailed "what" comment here
        @do
        def send_report():
            send_report_to_headquarters_via_email(report)
            send_report_to_developers_via_chat(report)

And it's perfectly fine to include a long "what" comment here, if it is necessary. Comments and function/variable names are both for human beings, not the computer. Or if you still find long "what" comments offensive, you can format it as a docstring for the nested function instead 🙂

Obviously, if we really do need to use a piece of code more than once, and/or we want it to be accessible from the outside (e.g. in tests), then by all means we should factor these parts out into public methods/functions. My argument is that we should think carefully before doing so, and deliberately opt-in to doing it, because there are costs as well as benefits.

To conclude: the advice to "factor out long functions into separate methods" misdiagnoses the problem as one of function length, when really the problem is how much local state the function encompasses. They are correlated but they're not the same. State is the thing that's hard to reason about. We manage state by partitioning it into smaller scopes. In that sense both approaches are the same. But in the approach I've given, the order of execution remains the order things appear on the screen, we don't introduce more publicly-visible methods than we really need. For the kinds of "we need to do a long laundry list of stuff" type functions (there are more of these than the "Clean Code" people want to admit), this way of organizing code is much better in my experience.

I didn't come up with all this myself, I got a lot of the ideas from the video essay Object Oriented Programming Is Bad. The relevant part starts at about 37:14.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.