sergi / go-diff Goto Github PK
View Code? Open in Web Editor NEWDiff, match and patch text in Go
License: MIT License
Diff, match and patch text in Go
License: MIT License
I'm seeing non-deterministic behavior when I run dmp
in multiple goroutines. Basically, the "diffs" generated by DiffMain()
should be identical no matter how many goroutines are run, but they differ. I'm going to try my best to see if I can find the cause, but you might have a more deeper understanding of what's going on. :-)
Here is the code (also attached as a file):
package main
import (
"fmt"
"os"
"sync"
"sync/atomic"
"github.com/sergi/go-diff/diffmatchpatch"
)
const (
expect = "[{1 licensed } {0 under the apache license, version 2.0 (the} {-1 #} {0 'license'); you may not use this file except in compliance } {-1 # } {0 with the license. you may obtain a copy of the license at } {-1 # # } {0 http://www.apache.org/licenses/license-2.0 } {-1 # # } {0 unless required by applicable law or agreed to in writing, } {-1 # } {0 software distributed under the license is distributed on an} {-1 #} {0 'as is'basis, without warranties or conditions of any} {-1 #} {0 kind, either express or implied. see the license for the } {-1 # } {0 specific language governing permissions and limitations} {-1 #} {0 under the license.}]"
unknown = "under the apache license, version 2.0 (the #'license'); you may not use this file except in compliance # with the license. you may obtain a copy of the license at # # http://www.apache.org/licenses/license-2.0 # # unless required by applicable law or agreed to in writing, # software distributed under the license is distributed on an #'as is'basis, without warranties or conditions of any # kind, either express or implied. see the license for the # specific language governing permissions and limitations # under the license."
known = "licensed under the apache license, version 2.0 (the'license'); you may not use this file except in compliance with the license. you may obtain a copy of the license at http://www.apache.org/licenses/license-2.0 unless required by applicable law or agreed to in writing, software distributed under the license is distributed on an'as is'basis, without warranties or conditions of any kind, either express or implied. see the license for the specific language governing permissions and limitations under the license."
)
var dmp = diffmatchpatch.New()
const num = 50
func main() {
var matched, missed int32
var wg sync.WaitGroup
wg.Add(num)
for i := 0; i < num; i++ {
go func(i int) {
defer wg.Done()
diffs := dmp.DiffMain(unknown, known, false)
s := fmt.Sprintf("%v", diffs)
if s != expect {
fmt.Fprintf(os.Stderr, "MISMATCH(%d):\n%s\n", i, s)
atomic.AddInt32(&missed, 1)
} else {
atomic.AddInt32(&matched, 1)
}
}(i)
}
wg.Wait()
fmt.Fprintf(os.Stderr, "NUMBER MATCHING: %d\n", matched)
fmt.Fprintf(os.Stderr, "NUMBER MISMATCHING: %d\n", missed)
}
go-diff is currently lacking examples for each exported function and method. This would not only help other users but make the whole project a lot better and complete.
Have a look at https://github.com/pmezard/go-difflib and add everything that is implemented in go-difflib but missing in go-diff as an issue.
At the moment it is only easy to output inline diff outputs but something like this https://github.com/sergi/go-diff/pull/67/files?diff=unified would be a nice addition for the output functions.
Thoughts on determining how different two files are? Say we compare file1.txt to file2.txt and I would like to see that they are 90% similar. Is that something I can determine with this current library or something that would need to be added? Thanks!
I do not know of any free Windows CIs. So this is a very open question for me.
Is it possible to expose the Patch struct fields? It would be useful to have the start1 and start2 fields to feed to Match functions.
Rework the repository code without breaking the API
https://github.com/akovaski/go-diff/commit/a0c8f96cfc5c7ee619b3d18e7fbca09d21ce3000 had the idea of removing "concat" which reduces the code but it might be faster to have the functionality.
So, benchmark it. If the current concat is faster than just appending runes, document it. If not, remove it.
Here is the code:
package main
import (
"github.com/sergi/go-diff/diff"
"log"
)
func main() {
sOld := "1\n2\n3\n4\n5\n6\n7\n3\n8\n9\n3\n10\n3\n11\n3\n12\n13\n14\n15\n12\n13\n16\n13\n13\n17\n18\n19\n20\n21\n22\n23\n24\n25\n26\n27\n28\n29\n30\n31\n32\n33\n34\n35\n12\n36\n37\n38\n39\n40\n41\n42\n13\n43\n44\n13\n45\n46\n47\n13\n13\n48\n49\n50\n51\n52\n13\n53\n54\n55\n56\n57\n58\n59\n60\n61\n62\n63\n64\n65\n66\n67\n68\n69\n13\n70\n71\n72\n73\n74\n13\n75\n13\n76\n77\n78\n79\n80\n81\n82\n83\n84\n85\n86\n87\n88\n89\n90\n67\n91\n92\n93\n81\n68\n13\n94\n71\n95\n96\n97\n98\n99\n100\n101\n102\n63\n103\n67\n104\n105\n13\n106\n107\n108\n109\n110\n111\n112\n113\n114\n115\n90\n116\n67\n13\n117\n72\n73\n74\n13\n75\n13\n76\n118\n119\n120\n78\n68\n121\n13\n122\n123\n124\n125\n93\n126\n68\n127\n13\n128\n129\n130\n131\n132\n133\n134\n135\n13\n136\n137\n138\n13\n78\n68\n13\n139\n140\n141\n142\n68\n13\n143\n144\n145\n146\n13\n147\n148\n13\n149\n150\n151\n152\n153\n150\n154\n13\n155\n156\n"
sNew := "1\n2\n3\n4\n5\n6\n7\n3\n157\n9\n3\n10\n3\n11\n3\n12\n13\n14\n15\n12\n13\n16\n13\n13\n17\n18\n19\n20\n21\n22\n23\n24\n25\n26\n27\n28\n29\n30\n31\n32\n33\n34\n35\n12\n36\n37\n38\n39\n40\n41\n42\n13\n158\n159\n13\n45\n46\n47\n13\n13\n48\n49\n50\n51\n13\n53\n54\n55\n56\n57\n160\n59\n60\n61\n62\n63\n64\n161\n66\n67\n68\n69\n13\n70\n71\n72\n73\n74\n13\n75\n13\n162\n77\n78\n79\n80\n81\n82\n83\n84\n85\n86\n88\n89\n90\n67\n91\n92\n93\n81\n68\n13\n94\n71\n95\n96\n97\n98\n99\n100\n101\n102\n63\n103\n67\n104\n105\n13\n106\n107\n108\n109\n110\n111\n112\n113\n114\n115\n90\n116\n67\n13\n117\n72\n73\n74\n13\n75\n13\n163\n119\n120\n78\n68\n121\n13\n122\n123\n124\n125\n93\n126\n68\n127\n13\n128\n164\n130\n131\n132\n133\n134\n135\n13\n136\n137\n138\n13\n78\n68\n13\n139\n140\n165\n68\n13\n143\n144\n145\n146\n13\n147\n148\n13\n149\n150\n151\n166\n153\n150\n154\n13\n155\n156\n"
dmp := diffmatchpatch.New()
t1, t2, t := dmp.DiffLinesToChars(sOld, sNew)
diffs := dmp.DiffMain(t1, t2, false)
diffs = dmp.DiffCharsToLines(diffs, t)
for _, diff := range diffs {
log.Println(diff.Type, diff.Text)
}
}
DiffLinesToChars seems to work OK, however, it panics "index out of range" at runtime.
I want to diff two texts line by line, just the line level, not word or char.
Is this a bug or I used the library incorrectly?
Thanks
I've been following the contributors' guide and I couldn't get past make lint
command as it was giving an error on a vanilla source code. One of the issues is with testing the output of the likes of golint
where it is expected to be empty. However echo -n "$OUT"
printed out -n
instead of nothing, which was then considered as an error condition.
I've tested a substitute command echo "$OUT\c"
which does the same thing and works both on macOS and Linux. I will send a PR.
package main
import (
"fmt"
"github.com/sergi/go-diff/diffmatchpatch"
)
const (
text1 = "package casec"
text2 = "PackageCasec"
)
func main() {
dmp := diffmatchpatch.New()
diffs := dmp.DiffMain(text1, text2, false)
fmt.Println(dmp.DiffPrettyText(diffs))
}
I expected the diff output results for above code to be one of pPackagecCasec
or pPackage[x]cCasec
, but it printed pPackage cCasec
instead. ([x]
indecates a space letter with red background)
So it was a little hard for me to recognize the space
was deleted or not. I think DiffMain
also should show the status of added or deleted space
letters. I think it could be achieved using "space letter with a red or green background".
I've encountered this issue while using https://github.com/src-d/go-git, but the bug is easily reproducible with the code snippet below and the JSON file in attachment.
package main
import (
"fmt"
"io/ioutil"
"os"
"github.com/sergi/go-diff/diffmatchpatch"
)
func main() {
f, err := os.Open("data.txt")
defer f.Close()
checkErr(err)
data, err := ioutil.ReadAll(f)
checkErr(err)
// from https://github.com/src-d/go-git/blob/v4.0.0/utils/diff/diff.go#L17
dmp := diffmatchpatch.New()
wSrc, wDst, warray := dmp.DiffLinesToChars(string(data), "")
diffs := dmp.DiffMain(wSrc, wDst, false)
diffs = dmp.DiffCharsToLines(diffs, warray)
fmt.Println(diffs)
}
func checkErr(err error) {
if err != nil {
panic(err)
}
}
Output:
$ go run main.go
panic: runtime error: index out of range
goroutine 1 [running]:
github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).DiffCharsToLines(0xc420044ee8, 0xc420078390, 0x1, 0x2, 0xc4202ce000, 0xd802, 0xec00, 0x1, 0x2, 0xc4202ce000)
/Users/krylovsk/src/github.com/sergi/go-diff/diffmatchpatch/diff.go:414 +0x394
main.main()
/tmp/go-diff-debug/main.go:24 +0x29e
exit status 2
I'm the maintainer of DMP and just stumbled across this project. After hosting us for ten years, the original repo at Google Code has shut down and the project has moved to https://github.com/google/diff-match-patch. You probably want to update the corresponding links on your project.
However, a bigger question is whether go-diff should be a separate project, or whether it should be incorporated into the main DMP project. It would be good to keep all versions in lock-step so that when bugs are found in one they are fixed across the board. What are your feelings regarding this?
Which is described in #21 (comment)
Go 1.15 rc 1 on Fedora Rawhide
Testing in: /builddir/build/BUILD/go-diff-1.1.0/_build/src
PATH: /builddir/build/BUILD/go-diff-1.1.0/_build/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin
GOPATH: /builddir/build/BUILD/go-diff-1.1.0/_build:/usr/share/gocode
GO111MODULE: off
command: go test -buildmode pie -compiler gc -ldflags "-X github.com/sergi/go-diff/version=1.1.0 -extldflags '-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld '"
testing: github.com/sergi/go-diff
github.com/sergi/go-diff/diffmatchpatch
# github.com/sergi/go-diff/diffmatchpatch
./patch.go:327:18: conversion from int to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)
FAIL github.com/sergi/go-diff/diffmatchpatch [build failed]
See golang/go#32479
Hi,
I am not using this library directly, but via another package (go-git). I have found that in certain cases, there is panic on this line
go-diff/diffmatchpatch/diff.go
Line 452 in da64554
At the time of panic, the values are
1 iteration ago: i=179592 r=55295 len(chars)=190439 len(lineArray)=58902
next iteration: i=179595 r=65533 len(chars)=190439 len(lineArray)=58902
so clearly lineArray[r] seems to exceed the allocated memory. I still have to understand the code to understand why this happens, but any ideas would be appreciated. Interestingly, in both cases, the diff being processed when panic was unicode.
Currently seeing many open PR's with little or no discussion from @sergi.
Do we need to fork this to get all the excellent updates merged?
Can you add some example? What is input, what is output and so on? I am trying to parse diff --git but i am getting some unexpected chunk line:.
When the patches have special characters such as %, > and < , inconsistent patch status is returned by PatchApply()
The current type for Text
is string
:
type Diff struct {
Type Operation
Text string
}
When I call DiffMainRunes()
though looks like the Text
in each diff needs to be interpreted as a []rune
. Is there a reason not to define that as []rune
(or use multiple types if it's used differently in other APIs)? It was hard to interpret since Text
is not printable in this case, and the documentation doesn't mention this.
Here's my code:
a := "foo\nbar\nbaz"
b := "foo\nbaz\nfooz\nbarrington"
dmp := diffmatchpatch.New()
r1, r2, f := dmp.DiffLinesToRunes(a, b)
fmt.Println(f)
fmt.Println(r1)
fmt.Println(r2)
s := dmp.DiffMainRunes(r1, r2, false)
for _, d := range s {
fmt.Println("d.Type:", d.Type)
// Printing d.Text here without converting it produces an empty string
fmt.Println("d.Text:", []rune(d.Text))
}
dbcb93d started to refactor the code even more. There is a lot to do but I do not have the open OSS ours right now. This needs to be done in small iterations. This issue keeps track on what is already done.
Purpose:
Unfinished functions:
With some texts, DiffText1 returns a wrong result. An example of such data is dmp.go of revision b94bf7 and b94bf7^. To see this, get text1.go (which just prints DiffText1) from https://gist.github.com/tkf/12bde871bf794e59bea88b659ed5b95b and run it as:
cd PATH/TO/go-diff/diffmatchpatch
diff <(go run PATH/TO/text1.go <(git show 'b94bf7:./dmp.go') <(git show 'b94bf7^:./dmp.go')) <(git show 'b94bf7:./dmp.go')
which pints
471c471
<
---
> break
658c658
<
---
>
i.e., dmp.DiffText1(diffs) != text1
.
The above gist also includes text1.bash
which automates finding such examples. For example, I found many such examples by running:
git clone https://go.googlesource.com/go
cd go/src
PATH/TO/text1.bash **/*.go > examples
It is confusing and non-idiomatic to have the package name different from the last part of the import path. People will import "github.com/[whoever]/go-diff/diff" and will then try to use the package as "diff", but the name is secretly "diffmatchpatch". And Go package names are usually shorter.
Hitting a slice bounds out of range panic
panic: runtime error: slice bounds out of range
goroutine 47882 [running]:
panic(0xea1500, 0xc820034020)
/usr/local/go1.6.3.src/src/runtime/panic.go:481 +0x3e6
..gojsondiff/vendor/github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).patchMake2(0xc8215e5c30, 0xc820f708c0, 0x1e, 0xc821594240, 0xa, 0x18, 0x0, 0x0, 0x0)
..gojsondiff/vendor/github.com/sergi/go-diff/diffmatchpatch/dmp.go:1810 +0x647
github.com/apcera/cntm-deps/gojsondiff/vendor/github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).PatchMake(0xc8215e5c30, 0xc8215e5bb8, 0x2, 0x2, 0x0, 0x0, 0x0)
..gojsondiff/vendor/github.com/sergi/go-diff/diffmatchpatch/dmp.go:1768 +0x486
.
.
Able to reproduce this by feeding (left=2016-09-01T03:07:14.807830741Z, right=2016-09-01T03:07:15.154800781Z)
to PatchMake
.
Following diff being applied to left string:[{0 2016-09-01T03:07:1} {1 5.15} *{0 .154} {-1 .} {0 80} {1 0} {0 78} {-1 3074} {0 1Z}]
while the diff being applied should have been [{0 2016-09-01T03:07:1} {1 5.15} *{0 4} {-1 .} {0 80} {1 0} {0 78} {-1 3074} {0 1Z}]
Thank you for your time.
Consider the program below. The program runs slowly. Part of it is due to encoding and decoding between strings and runes. Roughly 1860ms out of 4050ms is spent doing this:
(pprof) top
4050ms of 9270ms total (43.69%)
Dropped 87 nodes (cum <= 46.35ms)
Showing top 10 nodes out of 138 (cum >= 580ms)
flat flat% sum% cum cum%
630ms 6.80% 6.80% 630ms 6.80% runtime.encoderune
600ms 6.47% 13.27% 2250ms 24.27% github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).diffBisect
560ms 6.04% 19.31% 1230ms 13.27% runtime.slicerunetostring
400ms 4.31% 23.62% 500ms 5.39% runtime.semrelease
360ms 3.88% 27.51% 1120ms 12.08% runtime.pcvalue
310ms 3.34% 30.85% 310ms 3.34% runtime.readvarint
310ms 3.34% 34.20% 620ms 6.69% runtime.step
300ms 3.24% 37.43% 300ms 3.24% github.com/sergi/go-diff/diffmatchpatch.runesEqual
300ms 3.24% 40.67% 410ms 4.42% runtime.lock
280ms 3.02% 43.69% 580ms 6.26% github.com/sergi/go-diff/diffmatchpatch.runesIndex
It should be possible to work on one representation during the algorithm to avoid this overhead.
(Note: This has non-deterministic behavior that was reported in #75.)
package main
import (
"flag"
"fmt"
"log"
"os"
"runtime/pprof"
"sync"
"sync/atomic"
"github.com/sergi/go-diff/diffmatchpatch"
)
var (
cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
dmp = diffmatchpatch.New()
)
const (
num = 50000
expect = "[{1 licensed } {0 under the apache license, version 2.0 (the} {-1 #} {0 'license'); you may not use this file except in compliance } {-1 # } {0 with the license. you may obtain a copy of the license at } {-1 # # } {0 http://www.apache.org/licenses/license-2.0 } {-1 # # } {0 unless required by applicable law or agreed to in writing, } {-1 # } {0 software distributed under the license is distributed on an} {-1 #} {0 'as is'basis, without warranties or conditions of any} {-1 #} {0 kind, either express or implied. see the license for the } {-1 # } {0 specific language governing permissions and limitations} {-1 #} {0 under the license.}]"
unknown = "under the apache license, version 2.0 (the #'license'); you may not use this file except in compliance # with the license. you may obtain a copy of the license at # # http://www.apache.org/licenses/license-2.0 # # unless required by applicable law or agreed to in writing, # software distributed under the license is distributed on an #'as is'basis, without warranties or conditions of any # kind, either express or implied. see the license for the # specific language governing permissions and limitations # under the license."
known = "licensed under the apache license, version 2.0 (the'license'); you may not use this file except in compliance with the license. you may obtain a copy of the license at http://www.apache.org/licenses/license-2.0 unless required by applicable law or agreed to in writing, software distributed under the license is distributed on an'as is'basis, without warranties or conditions of any kind, either express or implied. see the license for the specific language governing permissions and limitations under the license."
)
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
}
var matched, missed int32
var wg sync.WaitGroup
wg.Add(num)
for i := 0; i < num; i++ {
go func(i int) {
defer wg.Done()
diffs := dmp.DiffMain(unknown, known, false)
s := fmt.Sprintf("%v", diffs)
if s != expect {
//fmt.Fprintf(os.Stderr, "MISMATCH(%d):\n%s\n", i, s)
atomic.AddInt32(&missed, 1)
} else {
atomic.AddInt32(&matched, 1)
}
}(i)
}
wg.Wait()
fmt.Fprintf(os.Stderr, "NUMBER MATCHING: %d\n", matched)
fmt.Fprintf(os.Stderr, "NUMBER MISMATCHING: %d\n", missed)
}
There are some lines that still do not have test cases https://coveralls.io/github/sergi/go-diff It would be nice to bring the coverage up to a real 100% and then stay there by making decreasing coverage a PR failure.
When diffing JSON, we have some json like:
{
"del": "^2.2.0",
"es6-symbol": "^3.1.1",
"eslint": "^4.11.0",
"eslint-config-enough": "0.2.5",
}
If we compare to
{
"del": "^2.2.0",
"eslint": "^4.11.0",
"eslint-config-enough": "0.2.5",
}
We end up with:
It would be nice to be able to prefer full-line or full word changes instead of single-character changes.
If you change the word "hello" to "goodbye", you see the diff as:
[{Delete hell} {Insert g} {Equal o} {Insert odbye}]
Which is much harder for a human to read than if the lib had the ability to prefer word/line boundaries and showed:
[{Delete hello} {Insert goodbye}]
Implement reading and writing http://www.gnu.org/software/diffutils/manual/html_node/Context-Format.html
I think the patch generated by the following code is wrong:
package main
import (
"io"
"os"
"github.com/sergi/go-diff/diffmatchpatch"
)
const (
text1 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus ut risus et enim consectetur convallis a non ipsum. Sed nec nibh cursus, interdum libero vel."
text2 = "Lorem a ipsum dolor sit amet, consectetur adipiscing elit. Vivamus ut risus et enim consectetur convallis a non ipsum. Sed nec nibh cursus, interdum liberovel."
)
func main() {
dmp := diffmatchpatch.New()
dmp.DiffTimeout = 0
// dmp.DiffTimeout = time.Hour // it works
diffs := dmp.DiffMain(text1, text2, true)
diffs = dmp.DiffCleanupSemantic(diffs)
patches := dmp.PatchMake(text1, diffs)
io.WriteString(os.Stdout, dmp.PatchToText(patches))
}
It generates:
@@ -1,17 +1,18 @@
Lorem
+a
ro
-
um dolor
However, the correct patch would be:
@@ -1,14 +1,16 @@
Lorem
+a
ipsum do
@@ -148,13 +148,12 @@
m libero
-
vel.
which is generated by setting dmp.DiffTimeout = time.Hour
. Note that C++ implementation (with timeout=0) also generates the latter patch.
The docs for DiffMain
and DiffMainRunes
don't explain what the checklines
parameter is used for.
Digging through the source, it looks like this has an impact on how the diff is calculated, if the input is large enough. But I'm unclear the pros and cons for choosing this.
Happy to make a PR to update the documentation if someone can advise on the use?
The current repository title just says "Port of Google's diff-match-patch library to Go" which does not say anything about the functionality of the repository, if you do not know the original. I therefore propose changing the text to something like "Diff, match and patch text in Go".
Would it make sense to make the colors for inserting/deleting configurable?
Code used:
package main
import (
"fmt"
"github.com/sergi/go-diff/diffmatchpatch"
)
func main() {
var oldtext string = `foo
bar
`
var patchtxt string = `@@ -1,2 +1,2 @@
-foo
+foobaz
bar
`
dmp := diffmatchpatch.New()
patch, _ := dmp.PatchFromText(patchtxt)
newtext, _ := dmp.PatchApply(patch, oldtext)
fmt.Println("new text:", newtext)
}
Output:
new text: foobazbar
Expected Output:
new text: foobaz
bar
Is there a way to apply the patch to conserve the new line \n
characters?
Is it specification of the GNU diff?
I want get simple diffs like below.
@@ -24,21 +24,22 @@
ve.
-[Not] eat to eat.
+[Not] live to eat.
But in fact, I'll get below.
@@ -24,21 +24,22 @@
ve.%0A
-%5BNot%5D eat to eat.
+%5BNot%5D live to eat.
There is a TODO marker in the code for a binary search version of commonSuffixLength.
Hi, would it be possible to get a new tagged release? There are quite a lot of changes in master which aren't in a release:
Thank you for making this library!
Checks like the following must be refactored
_, err = dmp.DiffFromDelta("", "+%c3%xy")
if err == nil {
assert.Fail(t, "expected Invalid URL escape.")
}
The problem is that we only check that the error is not nil, but we do not check for the error class. This means that any error lets the test pass even though it is possible that it was the wrong error.
One way to test for the error class is to simply check if assert.Contains if a part of the error string can be found. Another one is to add special error types, and another one is to use one of many encapsulating error packages.
I know that TravisCI has MacOSX support, but I did not look into that yet.
@sergi I am wondering if this project is dead?
The github url for testify has been changed:
-github.com/stretchrcom/testify/assert
+github.com/stretchr/testify/assert
Could you update diffmatchpatch/dmp_test.go
accordingly?
https://github.com/sergi/go-diff/blob/master/diffmatchpatch/diff.go#L49
// DiffMain finds the differences between two texts.
// If an invalid UTF-8 sequence is encountered, it will be replaced by the Unicode replacement character.
func (dmp *DiffMatchPatch) DiffMain(text1, text2 string, checklines bool) []Diff {
https://github.com/sergi/go-diff/blob/master/diffmatchpatch/diff.go#L30
// DiffDelete item represents a delete diff.
DiffDelete Operation = -1
Is DiffDelete mean that text1 do not have this content and text2 has this content? or text2 do not have this content and text1 has this content?
The user can do the experiment to find out that text1 is the old version ,and the text2 is the newer version, so DiffDelete mean that text2 do not have this content and text1 has this content.
But that information is not in the document.
Most people do not need their own instance of DiffMatchPatch since they do not change the default values. So let's make all the exported methods available by adding a default DiffMatchPatch instance and exporting functions using that instance.
@sergi What do you think?
Implement reading and writing http://www.gnu.org/software/diffutils/manual/html_node/Unified-Format.html
OSX have switched from GNU grep to BSD grep in version 10.8 and for this reason grep -P
is no longer supported.
There is a workaround to install pcre grep, which will install GNU grep and -P will work again:
brew install grep --with-default-names
Another solution is to change grep -P
to a relevant perl command, details can be found here.
There are a lot of forks https://github.com/sergi/go-diff/network which could have additional interesting changes.
Look through all known forks and all their branches (not just master) for changes and try to contact the authors to incorporate them back.
Known forks:
I tried looking at the docs for this package, but the DiffMain
method simply says:
DiffMain finds the differences between two texts.
So I'm not sure how it's supposed to handle input that contains invalid utf8 sequences.
Here's how it handles it right now:
package main
import (
"fmt"
"unicode/utf8"
"github.com/sergi/go-diff/diffmatchpatch"
)
func main() {
var inputs = []string{
"a1234567890z",
"Hello 世界",
"a\xe0\xe5\xf0\xe9\xe1\xf8\xf1\xe9\xe8\xe4Z",
}
for _, in := range inputs {
fmt.Printf("input: %q\n(length %v bytes)\nutf8.Valid: %v\n", in, len(in), utf8.ValidString(in))
dmp := diffmatchpatch.New()
diffs := dmp.DiffMain(in, "", true)
fmt.Printf("diff text: %q\n(length %v bytes)\n\n", diffs[0].Text, len(diffs[0].Text))
}
}
Output:
input: "a1234567890z"
(length 12 bytes)
utf8.Valid: true
diff text: "a1234567890z"
(length 12 bytes)
input: "Hello 世界"
(length 12 bytes)
utf8.Valid: true
diff text: "Hello 世界"
(length 12 bytes)
input: "a\xe0\xe5\xf0\xe9\xe1\xf8\xf1\xe9\xe8\xe4Z"
(length 12 bytes)
utf8.Valid: false
diff text: "a����������Z"
(length 32 bytes)
In the case where input is not valid utf8, the length of output, in bytes, is not the same as input (12 bytes vs. 32 bytes).
Is that expected behavior?
If so, is there a way I can use diffmatchpatch
in such a way that it gives me a diff on a byte-level, meaning the length of output, in bytes, should match that of input (aside from pre-processing the input to not contain invalid utf8 sequences)?
Hi there,
thank you for an awesome project.
Is there recommended set of functions to replace command line tool - diff
and diffstat?
Something, which gives output like this:
diff -u test1 test2 | diffstat -s
1 file changed, 3 insertions(+), 3 deletions(-)
As I am playing with dmp.DiffMain(test1, test2, true)
and try to pass entire files as string, I get very different output, than I expect:
[{-1 very} {1 magic,} {0 } {-1 awesome} {1 in} {0 } {-1 tes} {1 fac} {0 t
with} {1 I} {0
} {-1 o} {1 be} {0 l} {-1 d and n} {1 iev} {0 e} {-1 w data} {0
}]
The test files look like this:
test1:
very awesome test
with
old and new data
test2:
magic, in fact
with I
believe
Thanks in advance!
See: http://semver.org/
Helps a lot when using go package management tools like https://github.com/golang/dep
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.