Giter Club home page Giter Club logo

xtree's Introduction

Less is More: Minimizing Code Reorganization using XTREE

                                                                           
                                                                           
                                        __.....__           __.....__      
                                    .-''         '.     .-''         '.    
                      .|  .-,.--.  /     .-''"'-.  `.  /     .-''"'-.  `.  
   ____     _____   .' |_ |  .-. |/     /________\   \/     /________\   \ 
  `.   \  .'    / .'     || |  | ||                  ||                  | 
    `.  `'    .' '--.  .-'| |  | |\    .-------------'\    .-------------' 
      '.    .'      |  |  | |  '-  \    '-.____...---. \    '-.____...---. 
      .'     `.     |  |  | |       `.             .'   `.             .'  
    .'  .'`.   `.   |  '.'| |         `''-...... -'       `''-...... -'    
  .'   /    `.   `. |   / |_|                                              
 '----'       '----'`'-'                                                   

 
              _{\ _{\{\/}/}/}__
             {/{/\}{/{/\}(\}{/\} _
            {/{/\}{/{/\}(_)\}{/{/\}  _
         {\{/(\}\}{/{/\}\}{/){/\}\} /\}
        {/{/(_)/}{\{/)\}{\(_){/}/}/}/}
       _{\{/{/{\{/{/(_)/}/}/}{\(/}/}/}
      {/{/{\{\{\(/}{\{\/}/}{\}(_){\/}\}
      _{\{/{\{/(_)\}/}{/{/{/\}\})\}{/\}
     {/{/{\{\(/}{/{\{\{\/})/}{\(_)/}/}\}
      {\{\/}(_){\{\{\/}/}(_){\/}{\/}/})/}
       {/{\{\/}{/{\{\{\/}/}{\{\/}/}\}(_)
      {/{\{\/}{/){\{\{\/}/}{\{\(/}/}\}/}
       {/{\{\/}(_){\{\{\(/}/}{\(_)/}/}\}
         {/({/{\{/{\{\/}(_){\/}/}\}/}(\}
          (_){/{\/}{\{\/}/}{\{\)/}/}(_)
            {/{/{\{\/}{/{\{\{\(_)/}
             {/{\{\{\/}/}{\{\\}/}
              {){/ {\/}{\/} \}\}
              (_)  \.-'.-/
          __...--- |'-.-'| --...__
   _...--"   .-'   |'-.-'|  ' -.  ""--..__
 -"    ' .  . '    |.'-._| '  . .  '   jro
 .  '-  '    .--'  | '-.'|    .  '  . '
          ' ..     |'-_.-|
  .  '  .       _.-|-._ -|-._  .  '  .
              .'   |'- .-|   '.
  ..-'   ' .  '.   `-._.-´   .'  '  - .
   .-' '        '-._______.-'     '  .
        .      ~,
    .       .   |.   .    ' '-.

Submission

Submitted to Information and Software Technology. ARXIV Link: https://arxiv.org/abs/1609.03614v3

Cite As

@misc{1609.03614,
Author = {Rahul Krishna and Tim Menzies and Lucas Layman},
Title = {Less is More: Minimizing Code Reorganization using XTREE},
Year = {2016},
journal= {Information and Software Technology, submitted},
Eprint = {arXiv:1609.03614},
}

Authors

Data

Latex Source

Source Code

License

This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.

(BTW, it would be great to hear from you if you are using this material. But that is optional.)

In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

For more information, please refer to http://unlicense.org

xtree's People

Contributors

rahlk avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

xtree's Issues

sorry to fuck up your beautiful dir structure..

... had troubles last submission where the files were in subdirs (stupid build scripts on the journal's side)

so not now, cause i am editting, but the next time you get to this paer's source:

  • no subdirs
  • no unused files

t

Reviewer Tasks (For Rahul)

  1. R1: Does the reached results indicate similar direction?
    **Task: (a) Change section 1.2 to section 2; (b) Refer to content from papers 1, 2, 3; (c) Note the use of developers to validate the prioritization, point to section 3 to say why this is not a good idea. **

Respose: You are quite correct that we didnt discuss enuf related work in code smell prioritzation, please see our new section 2.1

  1. Questions regarding defect prediction:

    2.1. R1: Why is defect prediction good to support decisions on code reorganization?
    Task: (a) Add a section on defect prediction; (b) Remark on the relationship between fault proneness and faults in software systems. Use paper 4, 6.

You're quite correct taht we supplied insufficent info on defect prediction and tgeri value to code smells. Please see new section 3 in the papaer.

The enxt few issues related to similair issues addressed with reviewer1.

2.2. R2: What does it then mean to “reduce defects in our data sets”?
2.3. R2: In case four new modules are developed, is there then also a log history that shows how the total number of modules in the system correlates with the total number of defects? A 
2.4 R2: What’s the connection between whether a smell is bad or not and the threshold values of the various code metrics?

In the above, this reviewer is rasining similar points to reviewer1. Thanks to these issues, we have added the new section 3 to address these issues.

Also, especially for reviewer2, we add the following:

  • The paper http://dl.acm.org/citation.cfm?id=2821501 explcitly claims that there is a connection between threshold values for static code attributes and bad smells.

  • Note that we have also added this note to the new section3.

  1. R2: In section 4: “It can be difficult to judge the effects of removing bad smells. Code that is reorganized cannot be assessed just by a rerun of the test suite ...". Isn’t the point about removing bad smells to improve the code without changing the behavior (refactorings)? Why cannot a test suite be run before and after if the behavior is not changed?
    Task: Pretty obvious why. Still, make clear.

  2. R2: Figure 4. This is a good example of researcher bias. Your example is the most favorable example from your point of view. It is one of only two of the eight data sets that improved on both pd and pf.
    Task: Stress the importance of tuning.

  3. R2: The authors state that if there are no historical records of defects, the results of this paper can be used as a guide (which results?). It is referred to Table 8 in the abstract but there is no Table 8. Is it meant to be Figure 8? In case, it’s very hard to understand how that figure could be used. Or is it Figure 1 as stated in the conclusion, but how that Figure 1 compensate for a lack of historical records?
    Task: Explain Fig 8. Note the relevance in case of lack of historical data.

  4. R2: What's the difference between tool, method, and framework?
    Task: Find and remove instances appropriately

  5. R2: What are relationship between different code-metrics in RQ2?
    Task: More examples

  6. R2: What is the selection criterion for different methods in RQ1?
    Task: There's no selection criterion per se. But try rewording

  7. R2: What is the idea of the following statement? Is the point that a log history of defects has shown that modules with more than 100 loc have more defects (per lines of code?) than smaller modules, and then the action is to reduce the size of that module?

    “This code reorganization will start with some initial code base that is changed to a new code base. For example, if the bad smell is loc > 100 and a code module has 500 lines of code, we reason optimistically that we can change that code metric to 100. Using the secondary verification oracle, we then predict the number of defects in new.”

    Task: Again, pretty obvious. May be reword the statement?

Reviewers Comments

Comments from the editors and reviewers

image

Editor

  • Evolve the paper to make clear the defect model, hypothesis, limitations, and readability
  • Prepare a letter of changes explaining all modifications made.

Reviewer 1

  • How the results of this work contribute to future research activities in the area?

Youre quite correct, there is no future work seciton in the paper. Our research shows is that it is potentially naive to exlore thresholds in statuc code attr in s=isolation to each other. Our woek clearly demonstrates how chaneging one this necessitates changeing somethinf else. So, for suture work we recommend theat researcher look for tools that recommend chantes to sets of code chantes. Candidate tech in this are might include: 1. Association rule learning; 2. thresholds generated across synth dimentions eg. PCA; 3. Techniques that cluster data and looik for deltas between them. (Note: that we offer xtree as an example of the third point);

Adiitionlly, Scaleable solutions. After this, we look at applicaions beyond scode attributes. For example, we have been looking at sentiment analysis in stack overflow exchanges to learn dialog pattern that most select for SO entries.

  • How can practitioners benefit from the results achieved in the paper?
    Thank you for that note. We have added text to the end to section one to address this issue.

  • What are the main limitations of the work.

  • Further discussion at the end indicating how this work complements the body of knowledge about code smells. Does the reached results indicate a similar direction? contradictory? Does This work somehow confirm the results of other studies or points to new directions?
    @rahlk: see — An approach to prioritize code smells for refactoring

  • Why is defect-proneness good to support the decision on code reorganization and/or how it could complement other characteristics.
    See: http://link.springer.com/article/10.1007%2Fs10664-015-9361-0

  • Could we use XTREE as a strategy to prioritize the payment of technical debt items?

  • Is there any reason for not using the GQM template to state the goal of the study and also to define null and alternative hypotheses for the research questions?

Reviewer 2

  • What’s the connection between whether a smell is bad or not and the threshold values of the various code metrics?
  • What does it then mean to “reduce defects in our data sets”? Is it the total number of defects in the data set (system?) or the number of “infected modules” in the data set?
  • What is the idea of the following statement? Is the point that a log history of defects has shown that modules with more than 100 loc have more defects (per lines of code?) than smaller modules, and then the action is to reduce the size of that module?

“This code reorganization will start with some initial code base that is changed to a new code base. For example, if the bad smell is loc > 100 and a code module has 500 lines of code, we reason optimistically that we can change that code metric to 100. Using the secondary verification oracle, we then predict the number of defects in new.”

  • In case four new modules are developed, is there then also a log history that shows how the total number of modules in the system correlates with the total number of defects? Are we talking about total number of defects or only the number of “infected modules”?
  • The RQs in the introduction are at a very low level. Actually, I wouldn’t call them research questions because they are too internal.
  • RQ1 is “which of the methods is most accurate?” without presenting the methods. They are just referred to in three other papers that probably describe them. What are your selection criteria for comparing the tool (or framework or system as you also call it) XTREE with exactly these methods?
  • What’s the difference between tool, framework, and method in this paper?
  • How does this work related to the work by Arcelli Fontana, in particular: Arcelli Fontana, F., Mäntylä, M.V., Zanoni, M. et al. Comparing and experimenting machine learning techniques for code smell detection, Empir Software Eng (2016) 21: 1143. doi:10.1007/s10664-015-9378-4?
  • RQ5 is about the relationship between several code metrics. Improving on one metric may cause degradation on another one. This issue could have been much more spelled out with good examples in the paper.
  • There is one example that reducing LOC may increase coupling. Sure, if you split a module into several smaller modules, or move code from one large module to other, smaller modules, usually the overall coupling will increase. But isn’t that an avoidable consequence that will occur implicitly in the process of module splitting or moving code between modules? The way it’s formulated now gives the impression that the programmers should follow the recommendation of increasing the coupling, that is as if it’s a conscious action.
  • For an outsider, it is difficult to follow the structure and argumentation of much of the paper.
  • In Section 4: “It can be difficult to judge the effects of removing bad smells. Code that is reorganized cannot be assessed just by a rerun of the test suite since such reorganizations may not change the system behavior (e.g., refactorings).” Isn’t the point about removing bad smells to improve the code without changing the behavior (refactorings)? Why cannot a test suite be run before and after if the behavior is not changed?
  • Figure 4 shows the effect of tuning, and the authors write: “The rows marked with a * in Figure 4 show data sets whose performance was improved remarkably by these techniques. For example, in poi, the recall increased by 4% while the false alarm rate dropped by 21%.” This is a good example of researcher bias. Your example is the most favorable example from your point of view. It is one of only two of the eight data sets that improved on both pd and pf.
  • The authors state that if there are no historical records of defects, the results of this paper can be used as a guide (which results?). It is referred to Table 8 in the abstract but there is no Table 8. Is it meant to be Figure 8? In case, it’s very hard to understand how that figure could be used. Or is it Figure 1 as stated in the conclusion, but how that Figure 1 compensate for a lack of historical records?
  • Section 2 is too obvious, so much so that most of Section 2 should be deleted. Figure 1 as part of a related work section may remain.
  • It is stated that this paper reports a case study. I know that within the SW community is its common to use the term case study to denote just a demonstration of an example performed by the researchers. But in more mature disciplines, which we should aim to become a member of, “case study” has a certain meaning (e.g. Yin 2003). It would mean an evaluation in a real software development context. This notes the case here. See also the work by Runeson and Host 2009 on case studies in software engineering.
  • Regarding case study, what kind of work remains before the proposed tool could be used by practitioners?

Fix

end of section 2.2 says

Only sections 3.2.1, 3.2.2, and two-fifths of the results in Figure 6 contain material found in prior papers.

but we have no 3.2.1 or 3.22

please fix

Reviewer comments: Dr. M

Reviewer 1

  • Is there any reason for do not use the GQM template to state the goal of the study and also to define null and alternative hypotheses for the research questions?

Reviewer 2

  • Section 2 discusses the idea of why not just ask developers about the effect of code smells when the research literature is contradictory. I find it odd to believe that practitioners should be able to resolve inconsistencies in the research literature. As researchers, we should obviously investigate more into why the results are contradictory. The reason may be varying quality of the research or varying contexts. For example, several papers state that there are more problems in software components with a large number of a smells than components with a low number of smells. One should dismiss such research if the size of the components is not adjusted for. There are certainly more problems with larger components than smaller components given other properties the same. The authors agree that practitioners should not be expected resolve contradictions. My point is that it is so obvious that most of Section 2 should be deleted. Figure 1 as part of a related work section may remain.

  • It is stated that this paper reports a case study. I know that within the SW community is its common to use the term case study to denote just a demonstration of an example performed by the researchers. But in more mature disciplines, which we should aim to become a member of, “case study” has a certain meaning (e.g. Yin 2003). It would mean an evaluation in a real software development context. This is note the case here. See also the work by Runeson and Host 2009 on case studies in software engineering. Regarding case study, what kind of work remains before the proposed tool could be used by practitioners?

Reviewer Comments

Issues currently being addressed

  • Evaluating bad smells based on defect prediction is not enough to decide when it should be ignored or not. A bad smell is useful when if it helps to identify and remove a maintenance problem
  • No mention is made of the risks or mitigation of a less than optimal tree
  • need table showing all changes between all pairs of leafX-to-better-leafY. note that if attribute Z is statistically indistinguishable between X,Y then don't list Z
  • Reducing LOC by splitting a long method into smaller ones, may increase other metrics, such as coupling among methods and files, but didn't further explain how XTREE could possibly address this problem.

Reviewer 1

  • The comparison only focuses on two aspects---effectiveness and verbosity, and XTREE only demonstrated significant improvement of the latter.
  • How much defect and history data is needed to make a reliable prediction?

Reviewer 2

  • None of the motivations appear to provide evidence that by removing code smells developers could reduce defects: the principal assumption of the work.
  • The approach does not propose concrete refactoring strategies. It provides only metric’s thresholds. None of the two research questions presented in the paper are clearly linked to this problem.
  • The paper does not provide a single concrete example of how XTREE is used to validate developer’s bad smells.
  • The use of thresholds is linked to bad smell detection. However, this relation must be made more explicit in the text. (Extracting Relative Thresholds for Source Code Metrics)

Reviewer 3

  • How does XTREE handle overfitting?
    tree being built - either in terms of the number of tree levels
    backtracked or in terms of final defect probability outcomes.

    (Note: The rest of this reviewer's comment was just reiterating our experiment.)

IST paper... over to you

paper is now committed to github

please

  • fix latex compiles
  • look for YYY or XXX in the source code and fix
  • check reference list at back for repeats or anything garbled
  • proof read
  • look for bullshit in my new text
  • Scream loudly cause I dumped the stability study and replaced it with other rhetoric since it did not seem to be a major point. then get over it.
  • maybe check it all out and latex it locally since my recompiles were getting kinda slow.

Figure label broken

LaTeX Warning: Reference fig:jur' on page 10 undefined on input line 1236. LaTeX Warning: Reference fig:jur' on page 10 undefined on input line 1280.
LaTeX Warning: Reference fig:jur' on page 11 undefined on input line 1290. LaTeX Warning: Reference fig:jur' on page 11 undefined on input line 1330.
LaTeX Warning: Reference `fig:jur' on page 14 undefined on input line 1650.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.