Giter Club home page Giter Club logo

jfstmerge's Introduction

Semistructured 3-Way Merge

Build Status GitHub Super-Linter

Table of Contents


What is semistructured merge?

Regular merge tools (such as git merge) are called textual or unstructured merge. Their computation is simply based on comparing consecutive string lines.
Despite being extremely fast, they have no idea about what the developers did on their code and this leads to a large number of inconveniences for the developers: conflicts are reported when they shouldn't (false positives), wasting development time to manually fix them, and actual conflicts are missed by the tool and are not reported (false negatives), leading to defects that affect users.

For example, imagine that on master branch there is this Java class:

public class Math {

    public int sum(int a, int b) {
        return a + b;
    }

    public boolean isEven(int a) {
        return a % 2 == 0;
    }

}

A developer created a branch named left and swapped sum and isEven positions:

public class Math {

    public boolean isEven(int a) {
        return a % 2 == 0;
    }

    public int sum(int a, int b) {
        return a + b;
    }

}

Meanwhile, another developer created a branch named right on top of master and renamed sum to sumIntegers:

public class Math {

    public int sumIntegers(int a, int b) {
        return a + b;
    }

    public boolean isEven(int a) {
        return a % 2 == 0;
    }

}

As there are different consecutive lines in all of the three parts, unstructured merge outputs a conflict on it (and it repeats method sum):

public class Math {

<<<<<<< MINE
=======
    public int sumIntegers(int a, int b) {
        return a + b;
    }

>>>>>>> YOURS
    public boolean isEven(int a) {
        return a % 2 == 0;
    }

    public int sum(int a, int b) {
        return a + b;
    }

}

Semistructured merge, on the other hand, "understands" the changes made by both contributions and produces no conflict:

public class Math {

    public int sumIntegers(int a, int b) {
        return a + b;
    }

    public boolean isEven(int a) {
        return a % 2 == 0;
    }

}

It parses the code completely, creating an AST (Abstract Syntax Tree) for this purpose, but it maintains the contents of the nodes as a text. Whitespaces and comments that occur between a node and its preceding one in the code are stored as a prefix of the latter. Nodes are matched if they have the same identifier (we call this superimposition) and their contents are merged using textual merge. You can check below the identifier of some of Java declarations.

Declaration Identifier
Classes Name
Fields Name
Methods Signature
Constructors Signature
Packages Whole of the statement
Imports Whole of the statement

We use Feature House as base framework for parsing and superimposition and we give more details about the tool in our paper.


Conflict Handlers

Conflict Handlers (or just Handlers) are algorithms that run in sequence after every semistructured merge (or if the user desires so), analysing the merge output and taking actions to refine the result according to the peculiarities of the multiple types of the language's constructions.

Executes when a developer A changed the content of an inner class while another developer B deleted or renamed it.

If A included a reference to the changed class, the handler keeps both A's and B's classes (if B deleted it, it keeps only A's).
Otherwise, if B included a reference to its renamed class, the handler outputs a conflict between A's and B's classes. Else, the handler merges both classes.

Executes when there's at least one initialization block in the code.

If there's exactly one initialization block in A, B and base's code, they're merged.
Otherwise, for each initialization block in base, the handler searches for the first A's and B's initialization block with string similarity higher or equal than 0.7 and merge them. If there's none, the handler uses an empty string for the merge.

Executes when a developer renamed or deleted a method or a constructor.

For each method or constructor in base, if its signature is not present in A's or B's code, they're marked as: (1) renamed without body changes if there's a method or constructor with the same body in the developers' code or (2) renamed or deleted with body changes otherwise.
Then, for each marked method or constructor in base, the handler searches for the first A's and B's method or constructor that satisfies one of the following conditions: (1) equal body; (2) string similarity higher than 0.7 in the body and equal signature but the name; (3) one body is contained in the other. If there's none, they're treated as deleted.
Finally, for each triple of methods or constructors made by the previous search (A's, base's and B's), the handler does an operation based on one of its user-chosen variants:

  • Safe (default): applies a decision tree to decide the result.
  • Keep Both Methods: always keeps A's and B's methods in the triple.
  • Merge Methods: runs textual merge on A's and B's methods.

Executes when developer A added a method or field that refers to a method or field edited by developer B.

For each A's and for each B's method or field, if there's an unstructured merge conflict surrounding them and if A's refers to B's, the handler outputs a conflict. Otherwise, the handler keeps both.

Executes when developer A or B add at least one import statement.

For each A's and for each B's import statements, if they're both importing packages or they're importing classes having equal names, the handler outputs a conflict if there's a type ambiguity compilation error in the merge code.
Else if A is importing a class and B is importing a package, the handler outputs a conflict if unstructured merge reported a conflict surrounding the imports.
If none of these conditions are true, the handler keeps the import statements.

Executes as a statistical tool, when unstructured merge result presents a duplicated declaration from a method or field.

The handler counts duplicated declaration compilation errors in the code generated by unstructured merge.


Logging

For research purposes, S3M's stores a error and some statistical logs in ${HOME}/.jfstmerge directory to research. The formers are typically cryptographed to preserve their integrity, but this can be disabled.


User Guide

Requirements

  • Java 8 (Java version "1.8.0_212" or above)
  • Git (optional) S3M can behave as a merge driver for git merge. If you have interest in this feature, remember to have Git installed. You can find more details about git merge drivers here.

Git integration (as a merge driver)

  1. Download the binary file;
  2. Add the following lines to your .gitconfig file (typically localized in the folder $HOME in Unix or %USERPROFILE% in Windows), replacing pathTo with the path to the binary file in your machine:
[core]
    attributesfile = ~/.gitattributes
[merge "s3m"]
    name = semi_structured_3_way_merge_tool_for_java
    driver = java  -jar "\"pathTo/jFSTMerge.jar\"" %A %O %B -o %A -g
  1. Add the following line to your .gitattributes file (also localized in the $HOME / %USERPROFILE% folder, create the file if not created already):
*.java merge=s3m

Usage

If integrated with Git (as a merge driver), S3M will run automatically every time you invoke the git merge command. No further configuration required. You can still run it as a standalone tool, if desired, with the .jar file present in the /binary folder. You can use the command below after dowloading the jFSTMerge.jar file:

java -jar jFSTMerge.jar leftPath basePath rightPath

where leftPath, basePath and rightPath can be either a file or a directory.

Parameters

Parameter Arity Type Description
-f 3 String Specify the files to be merged (mine, base, yours).
-d 3 String Specify the directories to be merged (mine, base, yours).
-o 1 String Destination of the merged content. (default: yours)
-c 1 Boolean Enables or disables cryptography during log generation. (default: true)
-l 1 Boolean Enables or disables logging. (default: true)
--encoding-inference 1 Boolean Tries to infer file encodings to properly merge them. If not enabled, the tool assumes files are encoded in UTF-8. (default: true)
--ignore-space-change 1 Boolean Lines with whitespace changes only are considered as unchanged for the sake of a three-way merge. (default: true)
-r, --renaming 1 String Choose strategy on renaming conflicts. (possible values: SAFE, MERGE, KEEPBOTHMETHODS) (default: SAFE)
-hdd, --handle-duplicate-declarations 1 Boolean Enables or disables Duplicated Declaration Handler. (default: true)
-hib, --handle-initialization-blocks 1 Boolean Enables or disables Initialization Blocks Handler. (default: true)
-hnereo, --handle-new-element-referencing-edited-one 1 Boolean Enables or disables New Element Referencing Edited One Handler. (default: true)
-hmcrd, --handle-method-constructor-renaming-deletion 1 Boolean Enables or disables Method and Constructor Renaming or Deletion Handler. (default: true)
-htae, --handle-type-ambiguity-error 1 Boolean Enables or disables Type Ambiguity Error Handler. (default: true)

Contributor Guide

Contributor Requirements

  • Java 8 (Java version "1.8.0_212" or above)
  • Gradle 4.6

Getting Started

Cloning the repository and setting up a Gradle project should be enough to start contributing.

Build

We run Gradle as build tool, alongside a wrapper. One can build the tool in command line running gradlew build. Additionally, here you can find a two-step setup guide on the Eclipse IDE.

Testing

We have a bunch of JUnit classes. They mostly test the behavior of the handlers.
We encourage the usage of

testWhatYoureTesting_givenAConditionIsSatisfied_whenSomeActionHappens_shouldExpectedBehavior

style of method names when writing unit tests.

There's also two unique JUnit classes:

  1. one for testing the git merge driver, that serve primarily for the installer (see below);
  2. and another that runs periodically as a Cron Job (see below).

Continuous Integration

We run GitHub Actions as CI tool. It runs a typical gradle build, the unique JUnit tests described above and linters for every new or edited file.


Copyright (c) 2016-2019 by the Federal University of Pernambuco.

Paulo Borba <[email protected]>
Guilherme Cavalcanti <[email protected]>
João Victor <[email protected]>

jfstmerge's People

Contributors

albertotrindade avatar danielmaida avatar delaevernu avatar dependabot[bot] avatar gaabs avatar guilhermejccavalcanti avatar jvcoutinho avatar leonardoanjos16 avatar pauloborba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

jfstmerge's Issues

Probable resource leak when performing merge revisions

This most likely happens because BufferedReader is not properly closed at readFileContent, linked below:

public static String readFileContent(File file){
//StringBuilder content = new StringBuilder();
String content = "";
try{
BufferedReader reader = Files.newBufferedReader(Paths.get(file.getAbsolutePath()), StandardCharsets.UTF_8);
content = reader.lines().collect(Collectors.joining("\n"));
}catch(Exception e){
//System.err.println(e.getMessage());
}
return content;
}


Stacktrace observed:
SEVERE: 
br.ufpe.cin.exceptions.PrintException: Unexpected error while performing print routines.
java.io.FileNotFoundException: /home/Gio/.jfstmerge/jfstmerge.statistics (Too many open files)
	at java.io.FileOutputStream.open0(Native Method)
	at java.io.FileOutputStream.open(FileOutputStream.java:270)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
	at org.apache.commons.io.FileUtils.openOutputStream(FileUtils.java:367)
	at org.apache.commons.io.FileUtils.writeStringToFile(FileUtils.java:1928)
	at org.apache.commons.io.FileUtils.write(FileUtils.java:2045)
	at org.apache.commons.io.FileUtils.write(FileUtils.java:2002)
	at br.ufpe.cin.logging.LoggerStatistics.logContext(LoggerStatistics.java:62)
	at br.ufpe.cin.statistics.Statistics.compute(Statistics.java:74)
	at br.ufpe.cin.app.JFSTMerge.mergeFiles(JFSTMerge.java:203)
	at br.ufpe.cin.app.JFSTMerge.mergeDirectories(JFSTMerge.java:131)
	at br.ufpe.cin.app.JFSTMerge.mergeRevisions(JFSTMerge.java:95)
	at br.ufpe.cin.app.JFSTMerge.main(JFSTMerge.java:222)

	at br.ufpe.cin.logging.LoggerStatistics.logContext(LoggerStatistics.java:89)
	at br.ufpe.cin.statistics.Statistics.compute(Statistics.java:74)
	at br.ufpe.cin.app.JFSTMerge.mergeFiles(JFSTMerge.java:203)
	at br.ufpe.cin.app.JFSTMerge.mergeDirectories(JFSTMerge.java:131)
	at br.ufpe.cin.app.JFSTMerge.mergeRevisions(JFSTMerge.java:95)
	at br.ufpe.cin.app.JFSTMerge.main(JFSTMerge.java:222)

MalformedInputException on merge revisions

Was observed on 88a48f8.

Stacktrace and involved revision attached below.

java.nio.charset.MalformedInputException: Input length = 1
br.ufpe.cin.exceptions.PrintException: Unexpected error while performing print routines.
Unable to manage merged output file!
	at br.ufpe.cin.printers.Prettyprinter.generateMergedFile(Prettyprinter.java:75)
	at br.ufpe.cin.printers.Prettyprinter.generateMergedTuple(Prettyprinter.java:98)
	at br.ufpe.cin.printers.Prettyprinter.generateMergedScenario(Prettyprinter.java:112)
	at br.ufpe.cin.app.JFSTMerge.mergeRevisions(JFSTMerge.java:139)
	at br.ufpe.cin.app.Main.main(Main.java:39)
An error occurred. See /home/gio/.jfstmerge/jfstmerge.log file for more details.
 Send the log to [email protected] for analysis if preferable.

rev_268bd_64808.zip

Nightly build with a larger sample

A periodic test could be applied to:

  1. Identify if merge results remain the same. This could be both a summary check at the end, or a more detailed verification.
  2. Guarantee no errors are thrown

The idea is to run tests daily/weekly with a large sample, which is not viable for every build, depending on the sample size.

Inconsistency in textual merge

Textual merge is reporting different results for the same situation. Consider the following merge case:
LEFT:

void m(){
	x = 1;
	y = 0;
	
	int c;
	
	int z = 1;
}

BASE:

void m(){
	x = 0;
	y = 0;
	
	int c;
	
	int z = 0;
}

RIGHT:

void m(){
	x = 0;
	y = 1;
	
	int c;
	
	int z = 2;
}

Textual merge reports either:

void m(){
<<<<<<< 
		x = 0;
		y = 1;
=======
		x = 1;
		y = 0;
>>>>>>>
		
		int c;
		
<<<<<<< 
		int z = 1;
=======
		int z = 2;
>>>>>>> 
	}

and

<<<<<<< MINE
void m(){
		x = 0;
		y = 1;
		
		int c;
		
		int z = 1;
=======
void m(){
		x = 1;
		y = 0;
		
		int c;
		
		int z = 2;
>>>>>>> YOURS
	}

This happens with JGit and comand line git merge, but not with standalone diff3 tools.

Renaming Handler False Positive bug

It was observed that when both LEFT and RIGHT add the same methods, the renaming handler does not properly match those methods, having False Positives. It was both observed on current master revision and on an older revision (a3a0a2b).

Screenshots attached below:

LEFT | BASE | RIGHT

parents

S3M | UNSTRUCTURED

multiple

Revisions file used: graylog-revision.zip

Basic renaming handler might add FNs

when one developer renames a declaration and changes its body, and the other developer changes the body of the same declaration, in the same area.

automatizar criação de binários (instalação, ferramenta standalone, shelltests)

Toda vez que atualizamos o código fonte do S3M, é necessário gerar um novo jar (vai pra pasta /binary), um novo instalador (vai pra pasta /installer). Os shelltestes são baseados no jar de /binary. Então teria que atualizar isso também.

Os shelltestes dependem do jFSTMerge.jar daqui https://github.com/guilhermejccavalcanti/jFSTMerge/tree/master/testfiles/shelltests/config que é uma copia do que tá aqui https://github.com/guilhermejccavalcanti/jFSTMerge/tree/master/binary

Esse processo é manual, o que pode levar à inconsistências. Tenta configurar os testes pra pegar diretamente o jar daqui https://github.com/guilhermejccavalcanti/jFSTMerge/tree/master/binary evitando essa duplicação

Precisamos de uma forma automatizada de geração de binários.

Add javadoc explaining rename handlers

For each renaming handler, add a one or two lines comment explaining what that handler does. For instance, for the SafeSingleRenamingHandler it could be "This handler only reports a renaming conflict when unstructured merge reports a conflict with the signature of the renamed declaration".

Possible bug in renaming method signature

Hi there,

I think a bug is spotted in RenamingConflictHandler/removeSignature, the regex may match the signature incorrectly under some cases.

For example:

public void startActivityAsCaller(Intent intent, @Nullable Bundle options,
            boolean ignoreTargetSecurity, int userId) {
        if (mParent != null) {
            throw new RuntimeException("Can't be called from a child");
        }
        options = transferSpringboardActivityOptions(options);
        Instrumentation.ActivityResult ar =
                mInstrumentation.execStartActivityAsCaller(
                        this, mMainThread.getApplicationThread(), mToken, this,
                        intent, -1, options, ignoreTargetSecurity, userId);
        if (ar != null) {
            mMainThread.sendActivityResult(
                mToken, mEmbeddedID, -1, ar.getResultCode(),
                ar.getResultData());
        }
        cancelInputsAndStartExitTransition(options);
    }

The FileManger.getStringContentIntoSingleLineNoSpacing() correctly convert it into:

publicvoidstartActivityAsCaller(Intentintent,@NullableBundleoptions,booleanignoreTargetSecurity,intuserId){if(mParent!=null){thrownewRuntimeException("Can'tbecalledfromachild");}options=transferSpringboardActivityOptions(options);Instrumentation.ActivityResultar=mInstrumentation.execStartActivityAsCaller(this,mMainThread.getApplicationThread(),mToken,this,intent,-1,options,ignoreTargetSecurity,userId);if(ar!=null){mMainThread.sendActivityResult(mToken,mEmbeddedID,-1,ar.getResultCode(),ar.getResultData());}cancelInputsAndStartExitTransition(options);}
privatestaticStringremoveSignature(Stringstring){string=string.replaceFirst("^.*(?=(\\{))","");returnstring;}

However, the removeSignature() incorrectly convert it into:

mMainThread.sendActivityResult(mToken,mEmbeddedID,-1,ar.getResultCode(),ar.getResultData());}cancelInputsAndStartExitTransition(options);

which is part of the method body (inside if block).

Indentation and Comments inconsistent with the original code

Since the code is parsed and prettyprinted with JavaCC, though indented with JavaParser, the indentation of the original code and some comments (it's strange that some are orphan comments but some are not) are lost. When diff with git merged results using textual tools (like git diff or VSCode diff), there will be too many differences. And in some cases, comments are as important as codes.

I am trying to preserve the indentation and comments in original files, but in face of some strange problems. I want to know as authors or contributors, what's your way of thinking about implementing it? @guilhermejccavalcanti @pauloborba @gaabs

Indentation using spaces and tabs are different for renaming matches

Found during #42's tests.

Base:

	void m() {
		int a;
	}

Left:

    void n1() {
        int a;
    }

int a is indented with one tab and 4 spaces respectively. For this reason, their bodies are considered different and no match is made, even though they are visually the same in some editors.

A simple solution is to trim whitespaces from the bodies before checking.

Show MergeConflict in diff3 style

Since in merging conflicts manually, the base content is very crucial for the developer to understand both changes (See http://www.drdobbs.com/tools/three-way-merging-a-look-under-the-hood/240164902?pgno=2). Conventionally the diff3 style is used to show the merge conflicts, which seems like this:

<<<<<<< HEAD
       hello(everyone);
||||||| merged common ancestors
       hello();
=======
       hi();
>>>>>>> theirs

However, the jFSTMerge does not follow the convention in MergeConflict.java, conflicts in diff3 style may be more useful in pratice.

Crash with ClassNotFoundException

Files involved: java_ogplatform.zip

Stacktrace:

Exception in thread "main" java.lang.NoClassDefFoundError: org/eclipse/text/edits/MalformedTreeException
	at org.eclipse.jdt.core.dom.ASTConverter.convert(ASTConverter.java:1428)
	at org.eclipse.jdt.core.dom.CompilationUnitResolver.convert(CompilationUnitResolver.java:292)
	at org.eclipse.jdt.core.dom.ASTParser.internalCreateAST(ASTParser.java:1209)
	at org.eclipse.jdt.core.dom.ASTParser.createAST(ASTParser.java:809)
	at br.ufpe.cin.mergers.util.JavaCompiler.compile(JavaCompiler.java:83)
	at br.ufpe.cin.mergers.util.JavaCompiler.compile(JavaCompiler.java:40)
	at br.ufpe.cin.mergers.handlers.DuplicatedDeclarationHandler.handle(DuplicatedDeclarationHandler.java:29)
	at br.ufpe.cin.mergers.handlers.ConflictsHandler.findAndAccountDuplicatedDeclarationErrors(ConflictsHandler.java:81)
	at br.ufpe.cin.mergers.handlers.ConflictsHandler.handle(ConflictsHandler.java:28)
	at br.ufpe.cin.mergers.SemistructuredMerge.merge(SemistructuredMerge.java:57)
	at br.ufpe.cin.app.JFSTMerge.mergeFiles(JFSTMerge.java:172)
	at br.ufpe.cin.app.JFSTMerge.mergeDirectories(JFSTMerge.java:131)
	at br.ufpe.cin.app.JFSTMerge.mergeRevisions(JFSTMerge.java:95)
	at br.ufpe.cin.app.JFSTMerge.main(JFSTMerge.java:222)
Caused by: java.lang.ClassNotFoundException: org.eclipse.text.edits.MalformedTreeException
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 14 more

Order of revisions currently not respected on renamings

Currently, the tool is giving the same results for renaming, independently of which revision is on left and which revision is on right, swapping "MINE" and "YOURS" annotations on some cases.

In other words, merging LEFT BASE RIGHT gives the same result as merging RIGHT BASE LEFT.

Observed scenario:

Base File

public class Test {	
	public void m()	{
	}
}

Left File

public class Test {	
	public void m()	{
		int a;
	}
}

Right File

public class Test {
	public void n()	{
	}
}

Merge output for both merges:

public  class  Test {
	
<<<<<<< MINE
public void m()	{
		int a;
	}

=======
public void n()	{

	}
>>>>>>> YOURS

}

Renaming handlers for other kinds of declarations

Field declarations would only be relevant in case of multiple lines field declarations, otherwise unstructured and semistructured merge report the same conflicts (we have talked about this long time ago). Entire files are handled reporting the same result of unstructured merge, also discussed long time ago. Finally, innerclasses are handled by other handler (DeletionsHandler).

we should empirically confirm that. enums might be a target.

Suggestion: extend FST classes to S3M classes

Instead of having to manipulate classes from FeatureHouse dependency to achieve our goals, we could extend those we need.

For example, instead of manipulating FSTTerminal or AbstractFSTPrinter (FeatureHouse's pretty-printer), we could have a S3MTerminal or S3MPrettyPrinter, extending the previous ones respectively. Then, we could put there whatever we want/need. It would be easier to manage and edit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.