Giter Club home page Giter Club logo

codeclimate-duplication's Introduction

codeclimate-duplication

Maintainability

codeclimate-duplication is an engine that wraps flay and supports Java, Ruby, Python, JavaScript, and PHP. You can run it on the command line using the Code Climate CLI or on our hosted analysis platform.

What is duplication?

The duplication engine's algorithm can be surprising, but it's actually very simple. We have a docs page explaining the algorithm.

Installation

  1. Install the Code Climate CLI, if you haven't already.
  2. You're ready to analyze! cd into your project's folder and run codeclimate analyze. Duplication analysis is enabled by default, so you don't need to do anything else.

Configuring

Mass Threshold

We set useful threshold defaults for the languages we support but you may want to adjust these settings based on your project guidelines.

The mass threshold configuration represents the minimum "mass" a code block must have to be analyzed for duplication. If the engine is too easily reporting duplication, try raising the threshold. If you suspect that the engine isn't catching enough duplication, try lowering the threshold. The best setting tends to differ from language to language.

To adjust this setting, use the top-level checks key in your config file:

checks:
  identical-code:
    config:
      threshold: 25
  similar-code:
    config:
      threshold: 50

Note that you have the update the YAML structure under the languages key to the Hash type to support extra configuration.

Count Threshold

By default, the duplication engine will report code that has been duplicated in just two locations. You can be less strict by only raising a warning if code is duplicated in three or more locations only. To adjust this setting, add a count_threshold key to your config. For instance, to use the default mass_threshold for ruby, but to enforce the Rule of Three, you could use this configuration:

plugins:
  duplication:
    enabled: true
    config:
      languages:
        ruby:
          count_threshold: 3

You can also change the default count_threshold for all languages:

plugins:
  duplication:
    enabled: true
    config:
      count_threshold: 3

Custom file name patterns

All engines check only appropriate files but you can override default set of patterns. Patterns are ran against the project root directory so you have to use ** to match files in nested directories. Also note that you have to specify all patterns, not only the one you want to add.

plugins:
  duplication:
    enabled: true
    config:
      languages:
        ruby:
          patterns:
            - "**/*.rb
            - "**/*.rake"
            - "Rakefile"
            - "**/*.ruby"

Python 3

By default, the Duplication engine will use a Python 2 parser. To enable analysis for Python 3 code, specify the python_version as shown in the example below. This will enable a Python 3 parser and add the .py3 file extension to the list of included file patterns.

plugins:
  duplication:
    enabled: true
    config:
      languages:
        python:
          python_version: 3

Node Filtering

Sometimes structural similarities are reported that you just don't care about. For example, the contents of arrays or hashes might have similar structures and there's little you can do to refactor them. You can specify language specific filters to ignore any issues that match the pattern. Here is an example that filters simple hashes and arrays:

plugins:
  duplication:
    enabled: true
    config:
      languages:
        ruby:
          filters:
            - "(hash (lit _) (str _) ___)"
            - "(array (str _) ___)"

The syntax for patterns are pretty simple. In the first pattern: "(hash (lit _) (str _) ___)" specifies "A hash with a literal key, a string value, followed by anything else (including nothing)". You could also specify "(hash ___)" to ignore all hashes altogether.

Visualizing the Parse Tree

Figuring out what to filter is tricky. codeclimate-duplication comes with a configuration option to help with the discovery. Instead of scanning your code and printing out issues for codeclimate, it prints out the parse-trees instead! Just add dump_ast: true and debug: true to your .codeclimate.yml file:

---
plugins:
  duplication:
    enabled: true
    config:
      dump_ast: true
      debug: true
      ... rest of config ...

Then run codeclimate analyze while using the debug flag to output stderr:

% CODECLIMATE_DEBUG=1 codeclimate analyze

Running that command might output something like:

Sexps for issues:

# 1) ExpressionStatement#4261258897 mass=128:

# 1.1) bogus-examples.js:5

s(:ExpressionStatement,
 :expression,
 s(:AssignmentExpression,
  :"=",
  :left,
  s(:MemberExpression,
   :object,
   s(:Identifier, :EventBlock),
   :property,
   s(:Identifier, :propTypes)),
   ... LOTS more...)
   ... even more LOTS more...)

This is the internal representation of the actual code. Assuming you've looked at those issues and have determined them not to be an issue you want to address, you can filter it by writing a pattern string that would match that tree.

Looking at the tree output again, this time flattening it out:

s(:ExpressionStatement, :expression, s(:AssignmentExpression, :"=",:left, ...) ...)

The internal representation (which is ruby) is different from the pattern language (which is lisp-like), so first we need to convert s(: to ( and remove all commas and colons:

(ExpressionStatement expression (AssignmentExpression "=" left ...) ...)

Next, we don't care bout expression so let's get rid of that by replacing it with the matcher for any single element _:

(ExpressionStatement _ (AssignmentExpression "=" left ...) ...)

The same goes for "=" and left, but we actually don't care about the rest of the AssignmentExpression node, so let's use the matcher that'll ignore the remainder of the tree ___:

(ExpressionStatement _ (AssignmentExpression ___) ...)

And finally, we don't care about what follows in the ExpressionStatement so let's ignore the rest too:

(ExpressionStatement _ (AssignmentExpression ___) ___)

This reads: "Any ExpressionStatement node, with any value and an AssignmentExpression node with anything in it, followed by anything else". There are other ways to write a pattern to match this tree, but this is pretty clear.

Then you can add that filter to your config:

---
plugins:
  duplication:
    enabled: true
    config:
      dump_ast: true
      languages:
        javascript:
          filters:
          - "(ExpressionStatement _ (AssignmentExpression ___) ___)"

Then rerun the analyzer and figure out what the next filter should be. When you are happy with the results, remove the dump_ast config (or set it to false) to go back to normal analysis.

For more information on pattern matching, see sexp_processor, especially sexp.rb

codeclimate-duplication's People

Contributors

blakewilliams avatar brynary avatar camillof avatar codeclimate-shipbot avatar dantevvp avatar dblandin avatar devvmh avatar efueger avatar fede-moya avatar filipesperandio avatar gdiggs avatar giordanoluzardo avatar hershamin avatar jakelishman avatar jpignata avatar kankje avatar larkinscott avatar mange avatar maxjacobson avatar mrb avatar mxie avatar noahd1 avatar noelia-lencina avatar nporteschaikin avatar pbrisbin avatar pointlessone avatar rubybrewsday avatar toddmazierski avatar wfleming avatar zenspider avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

codeclimate-duplication's Issues

Allow setting separate language specific mass thresholds for identical and similar code

I can set separate check mass thresholds with

checks:
  identical-code:
    config:
      threshold: 25
  similar-code:
    config:
      threshold: 50

but those aren't language specific.

I can set a language-specific mass threshold with

plugins:
  duplication:
    config:
      languages:
        javascript:
          mass_threshold: 50

but that sets the threshold for both identical and similar code.

I want to be able to do something like

plugins:
  duplication:
    config:
      languages:
        javascript:
          identical-code:
            mass_threshold: 25
          similar-code:
            mass_threshold: 50

Ignore files which ends with -test.js

Hi,
I'm trying to exclude files which are ending with -test.js but it's not working. Example (keyBindigns-test.js).
I must mention that the tests are placed near the implementation files. (No special folder of tests)

.codeclimate.yml

exclude_paths:
- node_modules/
- test-build/
- webpack/
- vendor/
- src/__mocks__
- build/
- src/custom-config.js
- src/runtime-configs.js
- gulpfile.js
- .idea/
- webpack.config.js
- "**-test.js"

I've tried "**-test.js" but is not working

Can you help me please ?

f-strings in python 3.6 cause the check to fail

The duplication check doesn't seem to work with python 3.6 features. Namely formatted string literals don't work for me.

It throws an SyntaxError here. Maybe instead of only being able to name python versions 2 or 3, we could provide an even more specific version, like 3.6?

Any help would be appreciated,
thanks and regards.

Ruby 2.0 Keyword Arguments

This crashes on any file using Ruby 2.0 keyword arguments.

app/models/question.rb:

class Question < ActiveRecord::Base
  #12 more lines
  flexible_slugs_from :prompt, max_length: 200
end
Skipping file ./app/models/question.rb due to exception
(Racc::ParseError) ./app/models/question.rb:13 :: parse error on value ":" (tCOLON) /opt/jruby/lib/ruby/stdlib/racc/parser.rb:539:in `on_error'
/usr/local/bundle/gems/ruby_parser-3.7.2/lib/ruby_parser_extras.rb:1124:in `on_error'
com/headius/racc/Cparse.java:709:in `_racc_do_parse_c'
/opt/jruby/lib/ruby/stdlib/racc/parser.rb:271:in `do_parse'
/usr/local/bundle/gems/ruby_parser-3.7.2/lib/ruby_parser_extras.rb:1036:in `block in process'
org/jruby/ext/timeout/Timeout.java:115:in `timeout'
/usr/local/bundle/gems/ruby_parser-3.7.2/lib/ruby_parser_extras.rb:1024:in `process'
/usr/local/bundle/gems/ruby_parser-3.7.2/lib/ruby_parser_extras.rb:1362:in `block in process'
org/jruby/RubyArray.java:1560:in `each'
/usr/local/bundle/gems/ruby_parser-3.7.2/lib/ruby_parser_extras.rb:1360:in `process'
/usr/src/app/lib/cc/engine/analyzers/ruby/main.rb:27:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:10:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:30:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
Skipping file ./app/models/concerns/report_validations.rb due to exception
(RubyParser::SyntaxError) Odd number (2) list for Hash. s(:array, s(:call, nil, :action)) /usr/local/bundle/gems/ruby_parser-3.7.2/lib/ruby_parser_extras.rb:128:in `syntax_error'
/usr/local/bundle/gems/ruby_parser-3.7.2/lib/ruby18_parser.rb:5688:in `_reduce_492'
com/headius/racc/Cparse.java:709:in `_racc_do_parse_c'
/opt/jruby/lib/ruby/stdlib/racc/parser.rb:271:in `do_parse'
/usr/local/bundle/gems/ruby_parser-3.7.2/lib/ruby_parser_extras.rb:1036:in `block in process'
org/jruby/ext/timeout/Timeout.java:115:in `timeout'
/usr/local/bundle/gems/ruby_parser-3.7.2/lib/ruby_parser_extras.rb:1024:in `process'
/usr/local/bundle/gems/ruby_parser-3.7.2/lib/ruby_parser_extras.rb:1362:in `block in process'
org/jruby/RubyArray.java:1560:in `each'
/usr/local/bundle/gems/ruby_parser-3.7.2/lib/ruby_parser_extras.rb:1360:in `process'
/usr/src/app/lib/cc/engine/analyzers/ruby/main.rb:27:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:10:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:30:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
/usr/src/app/node_modules/babylon/lib/parser/location.js:27
  throw err;
  ^

SyntaxError: Unexpected token (126:5)
    at Parser.pp.raise (/usr/src/app/node_modules/babylon/lib/parser/location.js:24:13)
    at Parser.pp.unexpected (/usr/src/app/node_modules/babylon/lib/parser/util.js:91:8)
    at Parser.pp.expect (/usr/src/app/node_modules/babylon/lib/parser/util.js:85:33)
    at Parser.pp.parseCallExpressionArguments (/usr/src/app/node_modules/babylon/lib/parser/expression.js:334:12)
    at Parser.pp.parseSubscripts (/usr/src/app/node_modules/babylon/lib/parser/expression.js:306:29)
    at Parser.pp.parseExprSubscripts (/usr/src/app/node_modules/babylon/lib/parser/expression.js:277:15)
    at Parser.pp.parseMaybeUnary (/usr/src/app/node_modules/babylon/lib/parser/expression.js:247:19)
    at Parser.pp.parseExprOps (/usr/src/app/node_modules/babylon/lib/parser/expression.js:178:19)
    at Parser.pp.parseMaybeConditional (/usr/src/app/node_modules/babylon/lib/parser/expression.js:160:19)
    at Parser.pp.parseMaybeAssign (/usr/src/app/node_modules/babylon/lib/parser/expression.js:123:19)
/usr/src/app/node_modules/babylon/lib/parser/location.js:27
  throw err;
  ^

SyntaxError: Unexpected token (9:19)
    at Parser.pp.raise (/usr/src/app/node_modules/babylon/lib/parser/location.js:24:13)
    at Parser.pp.unexpected (/usr/src/app/node_modules/babylon/lib/parser/util.js:91:8)
    at Parser.pp.parseClassProperty (/usr/src/app/node_modules/babylon/lib/parser/statement.js:786:50)
    at Parser.pp.parseClassBody (/usr/src/app/node_modules/babylon/lib/parser/statement.js:698:34)
    at Parser.pp.parseClass (/usr/src/app/node_modules/babylon/lib/parser/statement.js:641:8)
    at Parser.pp.parseExport (/usr/src/app/node_modules/babylon/lib/parser/statement.js:856:19)
    at Parser.pp.parseStatement (/usr/src/app/node_modules/babylon/lib/parser/statement.js:137:90)
    at Parser.pp.parseBlockBody (/usr/src/app/node_modules/babylon/lib/parser/statement.js:529:25)
    at Parser.pp.parseTopLevel (/usr/src/app/node_modules/babylon/lib/parser/statement.js:31:8)
    at Parser.parse (/usr/src/app/node_modules/babylon/lib/parser/index.js:98:17)
/usr/src/app/node_modules/babylon/lib/parser/location.js:27
  throw err;
  ^

SyntaxError: Unexpected token (8:19)
    at Parser.pp.raise (/usr/src/app/node_modules/babylon/lib/parser/location.js:24:13)
    at Parser.pp.unexpected (/usr/src/app/node_modules/babylon/lib/parser/util.js:91:8)
    at Parser.pp.parseClassProperty (/usr/src/app/node_modules/babylon/lib/parser/statement.js:786:50)
    at Parser.pp.parseClassBody (/usr/src/app/node_modules/babylon/lib/parser/statement.js:698:34)
    at Parser.pp.parseClass (/usr/src/app/node_modules/babylon/lib/parser/statement.js:641:8)
    at Parser.pp.parseExport (/usr/src/app/node_modules/babylon/lib/parser/statement.js:856:19)
    at Parser.pp.parseStatement (/usr/src/app/node_modules/babylon/lib/parser/statement.js:137:90)
    at Parser.pp.parseBlockBody (/usr/src/app/node_modules/babylon/lib/parser/statement.js:529:25)
    at Parser.pp.parseTopLevel (/usr/src/app/node_modules/babylon/lib/parser/statement.js:31:8)
    at Parser.parse (/usr/src/app/node_modules/babylon/lib/parser/index.js:98:17)
/usr/src/app/node_modules/babylon/lib/parser/location.js:27
  throw err;
  ^

SyntaxError: Unexpected token (8:19)
    at Parser.pp.raise (/usr/src/app/node_modules/babylon/lib/parser/location.js:24:13)
    at Parser.pp.unexpected (/usr/src/app/node_modules/babylon/lib/parser/util.js:91:8)
    at Parser.pp.parseClassProperty (/usr/src/app/node_modules/babylon/lib/parser/statement.js:786:50)
    at Parser.pp.parseClassBody (/usr/src/app/node_modules/babylon/lib/parser/statement.js:698:34)
    at Parser.pp.parseClass (/usr/src/app/node_modules/babylon/lib/parser/statement.js:641:8)
    at Parser.pp.parseExport (/usr/src/app/node_modules/babylon/lib/parser/statement.js:856:19)
    at Parser.pp.parseStatement (/usr/src/app/node_modules/babylon/lib/parser/statement.js:137:90)
    at Parser.pp.parseBlockBody (/usr/src/app/node_modules/babylon/lib/parser/statement.js:529:25)
    at Parser.pp.parseTopLevel (/usr/src/app/node_modules/babylon/lib/parser/statement.js:31:8)
    at Parser.parse (/usr/src/app/node_modules/babylon/lib/parser/index.js:98:17)
/usr/src/app/node_modules/babylon/lib/parser/location.js:27
  throw err;
  ^

SyntaxError: Unexpected token (8:19)
    at Parser.pp.raise (/usr/src/app/node_modules/babylon/lib/parser/location.js:24:13)
    at Parser.pp.unexpected (/usr/src/app/node_modules/babylon/lib/parser/util.js:91:8)
    at Parser.pp.parseClassProperty (/usr/src/app/node_modules/babylon/lib/parser/statement.js:786:50)
    at Parser.pp.parseClassBody (/usr/src/app/node_modules/babylon/lib/parser/statement.js:698:34)
    at Parser.pp.parseClass (/usr/src/app/node_modules/babylon/lib/parser/statement.js:641:8)
    at Parser.pp.parseExport (/usr/src/app/node_modules/babylon/lib/parser/statement.js:856:19)
    at Parser.pp.parseStatement (/usr/src/app/node_modules/babylon/lib/parser/statement.js:137:90)
    at Parser.pp.parseBlockBody (/usr/src/app/node_modules/babylon/lib/parser/statement.js:529:25)
    at Parser.pp.parseTopLevel (/usr/src/app/node_modules/babylon/lib/parser/statement.js:31:8)
    at Parser.parse (/usr/src/app/node_modules/babylon/lib/parser/index.js:98:17)
/usr/src/app/node_modules/babylon/lib/parser/location.js:27
  throw err;
  ^

SyntaxError: Unexpected token (7:19)
    at Parser.pp.raise (/usr/src/app/node_modules/babylon/lib/parser/location.js:24:13)
    at Parser.pp.unexpected (/usr/src/app/node_modules/babylon/lib/parser/util.js:91:8)
    at Parser.pp.parseClassProperty (/usr/src/app/node_modules/babylon/lib/parser/statement.js:786:50)
    at Parser.pp.parseClassBody (/usr/src/app/node_modules/babylon/lib/parser/statement.js:698:34)
    at Parser.pp.parseClass (/usr/src/app/node_modules/babylon/lib/parser/statement.js:641:8)
    at Parser.pp.parseExport (/usr/src/app/node_modules/babylon/lib/parser/statement.js:856:19)
    at Parser.pp.parseStatement (/usr/src/app/node_modules/babylon/lib/parser/statement.js:137:90)
    at Parser.pp.parseBlockBody (/usr/src/app/node_modules/babylon/lib/parser/statement.js:529:25)
    at Parser.pp.parseTopLevel (/usr/src/app/node_modules/babylon/lib/parser/statement.js:31:8)
    at Parser.parse (/usr/src/app/node_modules/babylon/lib/parser/index.js:98:17)

Excluding files and/or directories

Test code is sometimes marked as duplicate and/or similar, which is usually not a problem for me since I accept some duplication in tests just to get clearer and more focused expectations.

It would be helpful to be able to ignore certain files or directories from this engine so I don't get those kinds of checks done on the tests.

Is that possible?

React's propTypes often marked as duplicates

Hi - a common source of false-positives in our CC duplication results comes from React's propType structures. For example:

  propTypes: {
    playBtnState: React.PropTypes.string.isRequired,
    height: React.PropTypes.number.isRequired,
    drawBack: React.PropTypes.bool
  },

is a 'duplicate' of:

  propTypes: {
    clipStore: React.PropTypes.object.isRequired,
    firstClipId: React.PropTypes.number.isRequired,
    includeDetails: React.PropTypes.bool
  },

Can the engine be improved to avoid this, or is there anything we can do in our codebase that would help? Excluding each fingerprint on a case-by-case basis is getting tiresome.

Duplicate found in array constants

Consider the following code:

COLORS = %w(#D8E304 #04D215 #0D52D1 #8A0CCF #CD0D74 #DDDDDD #333333 #57032A #990000 #FF6600 #FCD202 #B0DE09 #0D8ECF #2A0CD0 #754DEB #999999)
NAMES = %w(albert barbara claire daniel edgar frank george herbert isador javier killian larry michael nancy oscar peter)

The duplication engine returns the following:

$ codeclimate analyze -e duplication
Starting analysis
Running duplication: Done!

== app/models/constants.rb (2 issues) ==
1: Similar code found in 1 other location (mass = 18) [duplication]
2: Similar code found in 1 other location (mass = 18) [duplication]

Analysis complete! Found 2 issues.

This happens also if using array literals instead of the %w macro. If I reduce each array by one the error no longer shows up (mass = 18 is the lowest I could get this to reproduce).

As far as I am concerned this is a false positive -- any reason this is considered duplicate code?

Node 1.5 no longer supported

                              DEPRECATION WARNING                            

  Node.js v5.x is no longer actively supported!

  You will not receive security or critical stability updates for this version.

  You should migrate to a supported version of Node.js as soon as possible.
  Use the installation script that corresponds to the version of Node.js you
  wish to install. e.g.

   * https://deb.nodesource.com/setup_4.x — Node.js v4 LTS "Argon" (recommended)
   * https://deb.nodesource.com/setup_6.x — Node.js v6 Current

Similar code detection overly aggresive

(I'm not sure if this is the right place to file this bug, please redirect me if it isn't)

So I have this analysis here: https://codeclimate.com/github/tomato42/tlslite-ng/compare/extensions-remove-similar#issues it says that there is similar code in two other places (definition of ClientCertTypeExtension, SupportedGroupsExtension and ECPointFormatsExtension classes).

I have had them previously defined as more regular classes, with just initializers, as you can see here: https://codeclimate.com/github/tomato42/tlslite-ng/compare/softer-deduplication-in-extensions (just a previous commit to the one above)

Now: while the code is similar, the values, documentation and behaviour is definitely not. So, either the similarity code detection is a little bit too trigger happy, or I'm missing some obvious way to deduplicate this code without loosing the documentation (which in this case is worth more than the code is).

And while I can always just ignore this, it makes the "grades" assessment rather useless, as code similarity, even between such simple parts, makes the rating drop significantly always.

Detects "Identical blocks of code" that are in no way identical

https://codeclimate.com/github/benchpressjs/benchpressjs/issues

If you scroll down to the two Identical blocks of code found in 2 locations. Consider refactoring. issues, you will see that it detects

  return Promise.try(() => {
    Benchpress.cache[template] = Benchpress.cache[template] || load(template);
    return Benchpress.cache[template];
  }).then((templateFunction) => {
    if (block) {

and

  return Promise.try(() => {
    const cached = compileRenderCache.get(hash);
    if (cached) {
      compileRenderCache.ttl(hash);
      return cached;

as identical when the only thing even similar about them is Promise.try(() => {.

This is super annoying. Is it some kind of diffing or hashing bug? Should I report it on the flay repository?

Only report if there are 3 occurrences of the code?

I've been fiddling with the mass settings on the duplication engine, and it's pretty good, but I find that often I have legitimate reasons for writing code twice; namely, it would be needlessly obfuscating code to refactor a given pair of functions.

Is there a way to configure the "count threshold"? Currently this seems to be hard coded at 2: if code happens twice, it is reported. I'd like to be able to configure something like this:

  duplication:
    enabled: true
    config:
      languages:
        ruby:
          mass_threshold: 10
          count_threshold: 3

This would emulate the "rule of three" in Ruby while being extra-strict about code mass.

If this is interesting to the core team, I could potentially even write a pull request.

Python 3 as default

Currently Python 2 is default Python version in duplication. Python 2 will retire on 1 January 2020. Isn't it already the right time to make Python 3 a default?

What does it do?

I've read through the readme and the codeclimate announcement. Still don't really know what this does, though I would assume it detects code duplication? Somehow? Could we add something to the readme that explains it?

Allow configuration of a "number of instances" threshold

In XP, one common practice is to wait for three instances of duplication, so you don't wind up pre-maturely drying something up into a pattern that doesn't have enough instances to inform the decision

It would be nice if there was a way to configure a count threshold similar to a mass threshold so I can decide whether 2 or 3 cases is the right time to think about drying up similar code.

Comments count as similar code

In one of my projects, I have a banner comment on top of every file, which is like the following:

/**
 * path/to/file.js
 *
 * @author  Author <[email protected]>
 * @license 2015-16 (C) Author. All Rights Reserved.
 */

for some reason the above banners in each file are marked as similar code.

Question about hiding mass number

I use CC.com, enabled this engine with duplication functionality that I like a lot, and I couldn't find a mass numbers where they were on the old CC duplication analysis system (right next to an issue's title, in smaller text), and I thought that this wasn't implemented yet or a bug, but then I've found pull request #45 and concluded that mass numbers were removed on purpose.

So I have a question as a user:

This makes the value accessible without unduly cluttering the description with that value.

Were there so many negative feedbacks about this thing, that @wfleming called it "unduly cluttering the description"? I mean, just look at the random screenshot with the result of this engine:

What can I say about it as a user:

Every duplication issue's title has text in bold "Similar code found in N other location(s)", and actually it's rather (if not completely) useless now, because

  • it's the only text duplication issues can have, and there's no way to list all issues, where it's title with "Similar code" text would be useful to distinguish duplication category issues from another;
  • the meaningful info about a number of other locations is duplicated (oh, the irony) right there after the code block.
  • at this point there's no easy way to at least estimate a mass of a chunk of code without additional actions, like
    • click on Read Up button (a mass is written there, ok, but how should I know that in the first place? I learned that only after finding that pull request I mentioned and hardly thinking about what "Read Up Contents" is, maybe that's because english is not my native language), then read a load of text just searching a number of mass: Blah-blah-blah-blah oh! the mass number!
    • click on "View more" button and then be frightened because an issue may be several screens long.

Why a user should know a mass in the first place? To determine severity of an issue, and it seems that before this pull-request this vital information wasn't even there!

So I want you to consider returning a mass number to issues titles or descriptions or anywhere where it could be found it without additional actions (like mouse clicks).

I guess for now I'll write a small userscript just for issues to look as they were before (with mass numbers, I mean).

Thanks for your attention, and for all the good you've brought to this world. 👍

undefined local variable or method `config' in /usr/src/app/bin/duplication

When using the codeclimate/codeclimate Docker image, we started getting

/usr/src/app/bin/duplication:16:in `<main>': undefined local variable or method `config' for main:Object (NameError)

about three hours ago (2017-08-30T17:56Z was the first failure).

A downgrade of the codeclimate/codeclimate Docker image didn't help: https://gitlab.com/gitlab-org/gitlab-ce/-/jobs/30809782

I'm assuming that this is caused by a55e616#commitcomment-23983826.

Ignoring some paths

Hi, is there a way to configure the engine to ignore some paths? There is a global Code Climate setting:

---
engines:
  duplication:
    enabled: true
    config:
      languages:
      - ruby
  rubocop:
    enabled: true
exclude_paths:
- spec/*

But I still want other engines to analyze that spec directory, so I tried both:

---
engines:
  duplication:
    enabled: true
    config:
      languages:
      - ruby
      exclude_paths:
      - spec/*
  rubocop:
    enabled: true

and:

---
engines:
  duplication:
    enabled: true
    exclude_paths:
    - spec/*
    config:
      languages:
      - ruby
  rubocop:
    enabled: true

…but no success. Let me know if I'm doing something wrong, or if this is a future feature!

Ruby: Completely different hashes are marked as similar

Two completely different constant hashes are marked as similar code. There is no possible workaround for it as there's nothing more you can do to decrease this "similarity".

Here's a failing test case to show the problem:

diff --git a/spec/cc/engine/analyzers/ruby/main_spec.rb b/spec/cc/engine/analyzers/ruby/main_spec.rb
index 62ed1cd..c35aa84 100644
--- a/spec/cc/engine/analyzers/ruby/main_spec.rb
+++ b/spec/cc/engine/analyzers/ruby/main_spec.rb
@@ -149,6 +149,44 @@ module CC::Engine::Analyzers
           expect(run_engine(engine_conf)).to eq("")
         }.to output(/Skipping file/).to_stderr
       end
+
+      it "does not see hashes as similar" do
+        create_source_file("foo.rb", <<-EORUBY)
+          ANIMALS = {
+            bat: "Bat",
+            bee: "Bee",
+            cat: "Cat",
+            cow: "Cow",
+            dog: "Dog",
+            fly: "Fly",
+            human: "Human",
+            lizard: "Lizard",
+            owl: "Owl",
+            ringworm: "Ringworm",
+            salmon: "Salmon",
+            whale: "Whale",
+          }.freeze
+
+          TRANSPORT = {
+            airplane: "Airplane",
+            bicycle: "Bicycle",
+            bus: "Bus",
+            car: "Car",
+            escalator: "Escalator",
+            helicopter: "Helicopter",
+            lift: "Lift",
+            motorcycle: "Motorcycle",
+            rocket: "Rocket",
+            scooter: "Scooter",
+            skateboard: "Skateboard",
+            truck: "Truck",
+          }.freeze
+        EORUBY
+
+        issues = run_engine(engine_conf).strip.split("\0")
+
+        expect(issues.length).to eq(0)
+      end
     end
 
     describe "#calculate_points(mass)" do

This is a bug that affects a lot of our larger projects where we have different forms of lookup tables in the codebase. I can see reason to regard them as similar when the keys and/or values are similar/same; maybe even when the hashes are inverted (a: 1 -> 1 => :a), so this issue is not related to those cases.

But in this case, where two separate and completely unrelated hashes causes similarity lints to fail, will only guide developers to do bad PRs or to ignore CodeClimate problems in PRs.

I know the reason behind the issue (that the similarity engine only looks as the AST), but I was hoping that perhaps hashes could be made into an exception for this so we don't have to disable the otherwise useful similarity engine.

I'm also open to helping you come up with a patch that fixes the problem, if you are okay with this behavior being changed. I want to bring this up before spending time writing a patch that might just get rejected on the outset. :-)

Support for the Crystal Programming Language

The Crystal syntax is inspired by Ruby, so I believe that it shouldn't be too hard to support the language here. The mass threshold should even be the same. I'm willing to help out and contribute in any way, just tell me where to start.

Duplication Engine Line Counting Error

Description

A customer's analysis is raising a duplication issue for two LOC which each contain only "end".
This issue could potentially be related to #155

Reason (why are we doing this)

From @maxjacobson "Digging into it a little, it looks like the problem is that when the node that contains duplication is of type resbody, which is to say "the body of a rescue clause", our ruby parser doesn't correctly identify its line numbers. Looking into the ruby_parser code, there's a test for this, but it's marked as "skip" because it hasn't been fixed yet: https://github.com/seattlerb/ruby_parser/blob/eeef7355a39efa4cd9c60898e98d97b71906c416/test/test_ruby_parser.rb#L805"

php-parser error with 7.1 syntax

We've got the following line in our PHP 7.1 codebase:

[$foo, $bar] = foo::bar($baz);

The duplication engine fails with:

Skipping file ./path/to/File.php due to exception (CC::Engine::Analyzers::ParserError): `php -d 'display_errors = Off' /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected '=' on line 48

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
/usr/local/lib/ruby/2.3.0/timeout.rb:91:in `block in timeout'
/usr/local/lib/ruby/2.3.0/timeout.rb:33:in `block in catch'
/usr/local/lib/ruby/2.3.0/timeout.rb:33:in `catch'
/usr/local/lib/ruby/2.3.0/timeout.rb:33:in `catch'
/usr/local/lib/ruby/2.3.0/timeout.rb:106:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:28:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:40:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block (2 levels) in run'

Error when using customized mass_threshold with python language

I am currently in the process of switching from the traditional code climate to the Code Climate Platform. To test things, I run this on Mac OS 10.11.1, have docker-machine configured appropriately, installed codeclimate via brew.

But I have problems running duplication check with the following config (customizing mass_threshold):

# .codeclimate.yml

---
engines:
  duplication:
    enabled: true
    config:
      languages:
        python:
          mass_threshold: 40
  fixme:
    enabled: true
  pep8:
    enabled: true
  radon:
    enabled: true
ratings:
  paths:
    - "**.py"
exclude_paths:
  - "docs/source/conf.py"
  - "tests/**/*"

Running codeclimate analyze fails with the following error:

Starting analysis
Running duplication: Done!
error: (CC::Analyzer::Engine::EngineFailure) engine duplication failed with status 1 and stderr
/usr/lib/ruby/gems/2.2.0/gems/flay-2.6.1/lib/flay.rb:324:in `<': comparison of Fixnum with String failed (ArgumentError)
    from /usr/lib/ruby/gems/2.2.0/gems/flay-2.6.1/lib/flay.rb:324:in `block in process_sexp'
    from /usr/lib/ruby/gems/2.2.0/gems/sexp_processor-4.6.0/lib/sexp.rb:92:in `[]'
    from /usr/lib/ruby/gems/2.2.0/gems/sexp_processor-4.6.0/lib/sexp.rb:92:in `block in deep_each'
    from /usr/lib/ruby/gems/2.2.0/gems/sexp_processor-4.6.0/lib/sexp.rb:117:in `block in each_sexp'
    from /usr/lib/ruby/gems/2.2.0/gems/sexp_processor-4.6.0/lib/sexp.rb:114:in `each'
    from /usr/lib/ruby/gems/2.2.0/gems/sexp_processor-4.6.0/lib/sexp.rb:114:in `each_sexp'
    from /usr/lib/ruby/gems/2.2.0/gems/sexp_processor-4.6.0/lib/sexp.rb:91:in `deep_each'
    from /usr/lib/ruby/gems/2.2.0/gems/flay-2.6.1/lib/flay.rb:322:in `process_sexp'
    from /usr/src/app/lib/cc/engine/analyzers/reporter.rb:32:in `process_sexp'
    from /usr/src/app/lib/cc/engine/analyzers/reporter.rb:18:in `block in run'
    from /usr/src/app/lib/cc/engine/analyzers/reporter.rb:17:in `each'
    from /usr/src/app/lib/cc/engine/analyzers/reporter.rb:17:in `run'
    from /usr/src/app/lib/cc/engine/duplication.rb:32:in `block in run'
    from /usr/src/app/lib/cc/engine/duplication.rb:29:in `each'
    from /usr/src/app/lib/cc/engine/duplication.rb:29:in `run'
    from /usr/src/app/bin/duplication:16:in `<main>'

The problem seems to be related to the mass_threshold setting (or probably to the use of hash), because everything runs fine and also outputs duplication problems as expected when running with the following config:

# .codeclimate.yml

---
engines:
  duplication:
    enabled: true
    config:
      languages:
        - python
  fixme:
    enabled: true
  pep8:
    enabled: true
  radon:
    enabled: true
ratings:
  paths:
    - "**.py"
exclude_paths:
  - "docs/source/conf.py"
  - "tests/**/*"

The error above occurs with the official codeclimate/codeclimate-duplication image provided on Docker Hub, which seems to be some weeks old already.

all error in output

I've got errors on every build of my PHP project:

OpenJDK 64-Bit Server VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release
Skipping file ./src/Wiki/Template.php due to exception (CC::Engine::Analyzers::ParserError): `php /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected T_CONST, expecting T_FUNCTION on line 20

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:24:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:38:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
Skipping file ./src/Wiki/Models/Mission.php due to exception (CC::Engine::Analyzers::ParserError): `php /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected T_CONST, expecting T_FUNCTION on line 22

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:24:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:38:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
Skipping file ./src/Wiki/Models/MissionCost.php due to exception (CC::Engine::Analyzers::ParserError): `php /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected '?' on line 28

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:24:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:38:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
Skipping file ./src/Wiki/Models/Skills.php due to exception (CC::Engine::Analyzers::ParserError): `php /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected T_CONST, expecting T_FUNCTION on line 21

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:24:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:38:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
Skipping file ./src/Wiki/Models/CrewMember.php due to exception (CC::Engine::Analyzers::ParserError): `php /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected '?', expecting T_VARIABLE on line 154

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:24:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:38:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
Skipping file ./src/Wiki/Templates/MissionList.php due to exception (CC::Engine::Analyzers::ParserError): `php /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected '?' on line 98

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:24:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:38:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
Skipping file ./src/Wiki/Templates/CrewList.php due to exception (CC::Engine::Analyzers::ParserError): `php /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected '?' on line 219

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:24:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:38:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
Skipping file ./src/Wiki/Templates/Mission.php due to exception (CC::Engine::Analyzers::ParserError): `php /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected '|', expecting T_VARIABLE on line 198

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:24:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:38:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
Skipping file ./src/Wiki/Templates/MissionCost.php due to exception (CC::Engine::Analyzers::ParserError): `php /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected T_CONST, expecting T_FUNCTION on line 29

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:24:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:38:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
Skipping file ./src/Wiki/Api/ExpandTemplates.php due to exception (CC::Engine::Analyzers::ParserError): `php /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected '?' on line 63

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:24:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:38:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'
Skipping file ./src/Wiki/Api/Parse.php due to exception (CC::Engine::Analyzers::ParserError): `php /usr/src/app/vendor/php-parser/parser.php` exited with code 1:
Parse Error: Syntax error, unexpected '?' on line 204

/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:24:in `block in run'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
/usr/src/app/lib/cc/engine/analyzers/command_line_runner.rb:16:in `run'
/usr/src/app/lib/cc/engine/analyzers/php/parser.rb:20:in `parse'
/usr/src/app/lib/cc/engine/analyzers/php/main.rb:24:in `process_file'
/usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:26:in `run'
/usr/src/app/lib/cc/engine/analyzers/reporter.rb:38:in `block in process_files'
/usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22:in `block in run'

Feature Request: different duplication mass_threshold by code path

I have a lot of pages in my app (in a specific directory) that are using a DSL which requires some duplication. I'd like to be able to set a higher threshold for similar/identical code for files in this directory, without raising the threshold for the rest of the codebase.

Thanks!

Including/excluding files

What is the recommended way of excluding or including files for duplication engine? I would like to exclude the specs folder, but only for the duplication engine analysis.

I was actually using the undocumented paths option which was removed some time ago. Is there some alternative way to achieve the same result now?

image is huge

The docker image is huge at 1.3GB. Can it be split up based on language?

Using `beta` channel with JavaScript (ES6+) throws errors

I get errors inside of CodeClimate build during duplication process:

ERROR: couldn't find node type RestProperty in Sexp::NODE_NAMES.
ERROR: couldn't find node type SpreadProperty in Sexp::NODE_NAMES.
ERROR: couldn't find node type SpreadElement in Sexp::NODE_NAMES.
ERROR: couldn't find node type ClassExpression in Sexp::NODE_NAMES.
ERROR: couldn't find node type ForOfStatement in Sexp::NODE_NAMES.
ERROR: couldn't find node type AwaitExpression in Sexp::NODE_NAMES.
ERROR: couldn't find node type RestElement in Sexp::NODE_NAMES.

It seems that it doesn't know how to handle newer node types in JavaScript AST.

Other language support

I'm interested in helping build support for other languages.

Is there a basic path that can be taken to get to this point?

I think the basic requirements are some parser that can take a code block and turn it into an AST, right?
Is there a way to plug that into this library?

Any help/guidance is appreciated.

exclude_paths for this specific engine not working

Hi!

I'm trying to ignore the duplication engine in according to certain patterns, but no matter which way I try it the files are not ignored.

As an example, I have duplication in app/graph/core_api/mutations/image_update.rb.

Neither of these configs work:

[EDIT: Typo in code examples fixed]

engines:
  duplication:
    enabled: true
    config:
      languages:
        ruby:
          exclude_paths:
            - "app/graph/**/*.rb"
            - "app/graph/core_api/mutations/image_update.rb"
engines:
  duplication:
    enabled: true
    config:
      exclude_paths:
        - "app/graph/**/*"
        - "app/graph/core_api/mutations/image_update.rb"
      languages:
        - ruby

The only way I can get it to ignore this file is to add it to the top-level exclude_paths:

engines:
  duplication:
    enabled: true
    config:
      languages:
        - ruby
exclude_paths:
  - "app/graph/**/*.rb"

However, this directory will then be ignored in all other engines which is not what I want.

I tried finding the code that deals with exclude_paths but came up empty. Is it dealing with this at all currently?

I run this command to test it:

docker run \
  --interactive --tty --rm \
  --env CODECLIMATE_CODE="$PWD" \
  --volume "$PWD":/code \
  --volume /var/run/docker.sock:/var/run/docker.sock \
  --volume /tmp/cc:/tmp/cc \
  codeclimate/codeclimate analyze -e duplication

I only run this locally because the first config example is not working in hosted Codeclimate. Is the local and remote version working differently here?

Anything else I can do to help you debug this?

(CC::Analyzer::Engine::EngineTimeout) engine duplication ran for 900 seconds

Whenever I run duplication engine, I ran into the following problem.

.codeclimate.yml:

 duplication:
    enabled: true
    config:
      languages:
      - ruby
      - javascript

Error :

error: (CC::Analyzer::Engine::EngineTimeout) engine duplication ran for 900 seconds and was killed

Please help. How can I increase timeout ?

Not respecting exclude_paths?

Hi there,
When I run codeclimate locally, the duplication engine always times out. lsof suggests it's spending quite a while reading the contents of my node_modules and tmp directories, which are excluded in exclude_paths. (Those directories appear to be correctly excluded in other engines)

Here's the debug output:

Running duplication: |[DEBUG] engine config: "{\"enabled\":true,\"config\":{\"languages\":[\"ruby\"]},\"exclude_fingerprints\":[\"2250d1464b24f34abf02cbbe872059fe\",\"f0db7ce1dd3e0c6999d697643ef396e7\"],\"include_paths\":[\".autotest\",\".bundle/\",\".cap_tasks\",\".codeclimate.yml\",\".DS_Store\",\".eslintrc\",\".eslintrc-standard\",\".eslintrc-standard-react\",\".gitattributes\",\".gitignore\",\".jshintrc\",\".powenv\",\".powrc\",\".preferred_ips\",\".projectile\",\".rspec\",\".rspec_parallel\",\".rubocop.yml\",\".rubocop_todo.yml\",\".ruby-gemset\",\".ruby-version\",\".sass-cache/\",\".simplecov\",\".spin.rb\",\"app/.DS_Store\",\"app/assets/.DS_Store\",\"app/assets/javascripts/.DS_Store\",\"app/assets/javascripts/.gitignore\",\"app/assets/javascripts/application.js\",\"app/assets/javascripts/embed.js\",\"app/assets/javascripts/embed_player.js\",\"app/assets/javascripts/embed_player_square_ad.js\",\"app/assets/javascripts/extras/\",\"app/assets/javascripts/popup_button.js\",\"app/assets/javascripts/preload.js.erb\",\"app/assets/javascripts/react/\",\"app/assets/stylesheets/\",\"app/controllers/.#payments_controller.rb\",\"app/controllers/.DS_Store\",\"app/controllers/admin/\",\"app/controllers/admin.rb\",\"app/controllers/analytics_controller.rb\",\"app/controllers/api/\",\"app/controllers/api.rb\",\"app/controllers/application_controller.rb\",\"app/controllers/audio_clips_controller.rb\",\"app/controllers/audio_comments_controller.rb\",\"app/controllers/base_controller.rb\",\"app/controllers/blogger_controller.rb\",\"app/controllers/categories_controller.rb\",\"app/controllers/channel_invites_controller.rb\",\"app/controllers/channels_controller.rb\",\"app/controllers/comments_controller.rb\",\"app/controllers/concerns/\",\"app/controllers/facebook_permissions_controller.rb\",\"app/controllers/flipboard_controller.rb\",\"app/controllers/follows_controller.rb\",\"app/controllers/images_controller.rb\",\"app/controllers/messages_controller.rb\",\"app/controllers/notices_controller.rb\",\"app/controllers/page_items_controller.rb\",\"app/controllers/payments_controller.rb\",\"app/controllers/permissions_controller.rb\",\"app/controllers/playlist_memberships_controller.rb\",\"app/controllers/playlists_controller.rb\",\"app/controllers/publishing_controller.rb\",\"app/controllers/ratings_controller.rb\",\"app/controllers/regions_controller.rb\",\"app/controllers/search_controller.rb\",\"app/controllers/services_controller.rb\",\"app/controllers/sessions_controller.rb\",\"app/controllers/sources_controller.rb\",\"app/controllers/static_controller.rb\",\"app/controllers/stats_controller.rb\",\"app/controllers/streams_controller.rb\",\"app/controllers/submissions_controller.rb\",\"app/controllers/tags_controller.rb\",\"app/controllers/text_comments_controller.rb\",\"app/controllers/transcriptions_controller.rb\",\"app/controllers/tumblr_controller.rb\",\"app/controllers/twitter/\",\"app/controllers/uploads_controller.rb\",\"app/controllers/user_assets_controller.rb\",\"app/controllers/users_controller.rb\",\"app/controllers/welcome_controller.rb\",\"app/controllers/widgets_controller.rb\",\"app/controllers/zendesk_session_controller.rb\",\"app/decorators/\",\"app/helpers/\",\"app/indices/\",\"app/mailers/\",\"app/models/\",\"app/presenters/\",\"app/reports/\",\"app/services/\",\"app/views/.DS_Store\",\"app/views/admin/\",\"app/views/analytics/\",\"app/views/api/\",\"app/views/audio_clips/\",\"app/views/audio_comments/\",\"app/views/banners/\",\"app/views/categories/\",\"app/views/channel_invites/\",\"app/views/channel_mailer/\",\"app/views/channels/\",\"app/views/comments/\",\"app/views/flipboard/\",\"app/views/follows/\",\"app/views/layouts/\",\"app/views/messages/\",\"app/views/notices/\",\"app/views/oauth/\",\"app/views/page_header/\",\"app/views/panels/\",\"app/views/payments/\",\"app/views/player/\",\"app/views/playlist_memberships/\",\"app/views/playlists/\",\"app/views/promos/\",\"app/views/publishing/\",\"app/views/ratings/\",\"app/views/regions/\",\"app/views/search/\",\"app/views/services/\",\"app/views/sessions/\",\"app/views/shared/\",\"app/views/sources/\",\"app/views/static/\",\"app/views/stats_mailer/\",\"app/views/streams/\",\"app/views/submissions/\",\"app/views/subscriptions/\",\"app/views/support_mailer/\",\"app/views/tags/\",\"app/views/text_comments/\",\"app/views/twitter/\",\"app/views/uploads/\",\"app/views/user_assets/\",\"app/views/user_mailer/\",\"app/views/users/\",\"app/views/welcome/\",\"app/views/wip/\",\"app/workers/\",\"bin/\",\"Capfile\",\"cert/\",\"clock.rb\",\"config/\",\"config.ru\",\"db/.DS_Store\",\"db/.gitignore\",\"db/amazon_ip_ranges.json\",\"db/geo_ip.dat\",\"db/migrate/\",\"db/schema.rb.orig\",\"db/seeds.rb\",\"db/sphinx/\",\"deploy/\",\"doc/\",\"files_archive/\",\"Gemfile\",\"Gemfile.lock\",\"lib/\",\"Procfile.dev\",\"profiles/\",\"public/.DS_Store\",\"public/.gitignore\",\"public/403.html\",\"public/404.html\",\"public/422.html\",\"public/500.html\",\"public/502.html\",\"public/api/\",\"public/assets-static/\",\"public/crossdomain.xml\",\"public/empxml.html\",\"public/errors/\",\"public/favicon.ico\",\"public/flash/\",\"public/googlef073f5b5cafab306.html\",\"public/i/\",\"public/icons/\",\"public/images/\",\"public/java/\",\"public/jwplayer/\",\"public/nokia_callback.html\",\"public/osd.xml\",\"public/pinterest-5b088.html\",\"public/robots-staging.txt\",\"public/robots.txt\",\"public/tests/\",\"public/wb_1979ce46ecf9652f.txt\",\"public/wordpressres/\",\"Rakefile\",\"report/\",\"script/\",\"scripts/\",\"TAGS\",\"test/\",\"webpack/.DS_Store\",\"webpack/.gitignore\",\"webpack/assets/\",\"webpack/index.html\",\"webpack/karma.conf.js\",\"webpack/npm-shrinkwrap.json\",\"webpack/package.json\",\"webpack/readme\",\"webpack/scripts/\",\"webpack/test/\",\"webpack/test.js\",\"webpack/vendor/README\",\"webpack/webpack-records.json\",\"webpack/webpack.config.base.js\",\"webpack/webpack.config.js\",\"webpack/webpack.production.config.js\",\"webpack/webpack.test.config.js\"]}"
[DEBUG] docker run: ["docker", "run", "--name", "cc-engines-duplication-2f5dc76e-6feb-4b5d-b290-ffd8cff1d662", "--cap-drop", "all", "--label", "com.codeclimate.label=b6a220ac-84c2-42e5-b08b-10ac2ca92021", "--memory", "512000000", "--memory-swap", "-1", "--net", "none", "--rm", "--volume", "/Users/jon/Developer/web:/code:ro", "--volume", "/tmp/cc/02b04a60-58b9-406d-975c-1925ec647418:/config.json:ro", "--user", "9000:9000", "codeclimate/codeclimate-duplication"]

While it's running, lsof suggests it's spending quite a while parsing the contents of, eg, /code/webpack/node_modules/babel-preset-es2015/. Given the include_paths getting passed to the engine above, I'm surprised it's touching those paths at all.

If I temporarily delete my webpack/node_modules directory and rerun the duplication engine, it completes within a minute or so.

Anything else I can try?

Use of undefined constant NaN in JSON.php

There is a problem that manifests itself as a PHP_NOTICE under certain conditions, such as when parsing certain PHP files. (I've already sent a support request for this with more details, via CodeClimate support.)

Notice looks like this:

PHP Notice:  Use of undefined constant NaN - assumed 'NaN' in /usr/src/app/vendor/php-parser/lib/PhpParser/Serializer/JSON.php on line 34

The problem is that the PHP constant NAN is spelled out incorrectly as NaN here: https://github.com/codeclimate/codeclimate-duplication/blob/master/vendor/php-parser/lib/PhpParser/Serializer/JSON.php#L34

Can anyone take a look at this?

Comments count as similar code, take two

I'm having the same issue as described in #104, however that issue's author disappeared.

I have the following banner at the top of my files, which is being flagged as duplicated code (mass = 96):

/*
 * Embassy
 * Copyright (c) 2016, TechnologyAdvice LLC
 */

This is an open-source project and this issue can be found on the website here: https://codeclimate.com/github/TechnologyAdvice/Embassy/issues

It would be a shame to turn duplication checking off, or set the mass threshold to 97, but there's no other obvious solution at the moment.

Rails params whitelist marked

Two different Rails params whitelists are marked as duplicates with a threshold of 23.

# Controller A
params.require(:model_a).permit(:name, :description, :automatic, other_model_ids: []).tap do |model_a_params|
      model_a_params[:settings] = params[:model_a][:settings]
end

# Controller B
params[:model_b].permit(:happened_at, :other_model_id, :some_type, some_other_models_ids: [])
                         .tap { |whitelisted| whitelisted[:reason] = params[:model_b][:reason] }

There are some similarities, but this shouldn't be a duplication error.

NoMethodError on Python Duplication engine

Shits on fire yo. The same code got an error from the duplication engine. See the original CC build here. If you see the commit for the build, that python file causing the error is not changed from previous build. The project is here.

OpenJDK 64-Bit Server VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release
NoMethodError error occurred processing file ./lib/client.py: aborting.NoMethodError error occurred processing file ./lib/py/hello_rb.py: aborting.

NoMethodError: undefined method `fetch' for "":String
          python_version at /usr/src/app/lib/cc/engine/analyzers/python/main.rb:30
                  parser at /usr/src/app/lib/cc/engine/analyzers/python/main.rb:26
            process_file at /usr/src/app/lib/cc/engine/analyzers/python/main.rb:22
                     run at /usr/src/app/lib/cc/engine/analyzers/analyzer_base.rb:23
  block in process_files at /usr/src/app/lib/cc/engine/analyzers/reporter.rb:38
            block in run at /usr/src/app/lib/cc/engine/analyzers/file_thread_pool.rb:22

s-expression errors

Hi all,

I faced with a very strange issue.
I receive this butch of errors

ERROR: couldn't find node type SpreadElement in Sexp::NODE_NAMES. ERROR: couldn't find node type SpreadProperty in Sexp::NODE_NAMES. ERROR: couldn't find node type YieldExpression in Sexp::NODE_NAMES. ERROR: couldn't find node type ImportNamespaceSpecifier in Sexp::NODE_NAMES. ERROR: couldn't find node type JSXEmptyExpression in Sexp::NODE_NAMES.
And there is no matter what I have in configuration or where it was running (docker or on web site)

Last time I receive this errors with configuration: Url

Adding extension to JS Analyzer

Hi all, we use [sprockets-es6] in our Rails project which requires all ES6-ish JS files be written with the .es6 extension. As best I can tell, this analyzer only uses .js and .jsx extensions when building its list of files to analyze. Is there any way to add .es6 to that list?

REQ: add option to ignore import statements (php)

I'm using code climate on a huge Laravel/AngularJS app. So far I'm impressed with the code quality analysis; it's close to 'what I know already'.

However it would be nice to have an option in the duplication engine that allows certain statements to be exact dupes: the namespace and use statements. I have several events defined, for clarity in separate files atm and they all have the same top lines:

namespace App\Events;
 
use Illuminate\Broadcasting\Channel;
use Illuminate\Queue\SerializesModels;
use Illuminate\Broadcasting\PrivateChannel;

This results in a Identical code found in 10 other locations (mass = 57). I'm only just adding these events, there will be at least 5x more. I prefer to keep them in separate files.
Making the treshold higher would mean not catching does nasty little dupes.

Maybe it's possible to add something like 'ignore namespace and use statements' or a more general 'ignore lines starting with ' or even an 'ignore '.

While I realise that I just can ignore these dupe reports, I don't like the clutter it causes and the fact that I always will have to remember to ignore these. They don't help the overall ratings either - they are important for code confidence for myself, our team and our customers.

Or did I miss something in the docs?

Style/TrailingComma is no longer an invalid cop

› rubocop --version
0.44.1
blimmer:~/code/oss/codeclimate-duplication (master ✓)
› rubocop
Error: The `Style/TrailingComma` cop no longer exists. Please use `Style/TrailingCommaInLiteral` and/or `Style/TrailingCommaInArguments` instead.
(obsolete configuration found in /Users/blimmer/code/oss/codeclimate-duplication/.rubocop.yml, please update it)

Python 3 syntax error

Python 3's type annotations seem to throw SyntaxError which results in skipped files:

Skipping file ./tests/test_deck.py due to exception (CC::Engine::Analyzers::ParserError): `python /usr/src/app/lib/cc/engine/analyzers/python/parser.py` exited with code 1:
Traceback (most recent call last):
  File "/usr/src/app/lib/cc/engine/analyzers/python/parser.py", line 34, in <module>
    print json.dumps(to_json(ast.parse(source)))
  File "/usr/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 13
    def __init__(self, name: str, specific_string="abc"):
                           ^
SyntaxError: invalid syntax

Full log: https://codeclimate.com/github/iScrE4m/pyCardDeck/builds/41

Run in verbose mode

Is it possible to run this engine using the CodeClimate CLI in some sort of verbose mode. My configuration currently fails because of a timeout, however my setup does exclude all paths with unrelated files (e.g. node_modules). I am curious to see which file this engine is actually running through the analyzer, but cant find any option to show this output.

Configurable Babylon plugins for JS

Right now, the selection of plugins passed to Babylon for parsing JavaScript files appears to be hard-coded. A one-size-fits-all approach doesn't seem ideal here, though — for instance, in our codebase we don't use JSX, but we do have a number of async functions, which are rejected as invalid syntax.

It would be great if the Babylon plugins could be set up as part of the configuration for the duplication engine for a given codebase.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.