basepi / libgit2 Goto Github PK
View Code? Open in Web Editor NEWThis project forked from libgit2/libgit2
The Library
Home Page: http://libgit2.github.com
License: Other
This project forked from libgit2/libgit2
The Library
Home Page: http://libgit2.github.com
License: Other
Just need to decide what all we need to do to prepare for this. I suppose we can just give each person stuff to talk about decide ordering, and wing it. I'm not too worried about the public speaking part of it, once we have all the content.
This is obviously low-priority until we have the actual code done, but thought I'd throw up an issue.
EDIT: So when are we going to do it? I suppose we'll talk more at the meeting on Saturday.
We're currently just calling free()
and malloc()
. This isn't extensible. We should define some sort of macro that specifies which malloc we're using, and it should probably default to libgit's malloc. I propose we call it ld_malloc()
Do we want patience diff in a separate file? I don't think it would hurt to do so. We could model it after JGit, where the both algos are in different files.
I've merged core into patience, but I think it is at a point where we can merge them into dev-diff-algo.
My branch has all of core and patience in it without merge conflicts, and compiles.
PROBLEM:
Calls to diff_no_index()
fail even when provided with correct paths for files.
RESOURCES:
Code ran is here:
int main() {
git_diffresults_conf *conf;
git_repository *repo;
git_repository_open(&repo, "");
git_index *index;
git_commit *commit1;
git_commit *commit2;
//git_diff(diffdata, commit1, repo);
printf("MAIN\n");
printf("%d\n", git_diff_no_index(&conf, "difftest_before", "difftest_after"));
//git_diff_cached(diffdata, commit1, index);
//git_diff_commits(diffdata, commit1, commit2);
}
ls
gives us this:
a.out difftest_after difftest_before main.c tests.c
tests.c is unrelated.
We are passing in data1 and data2, but also the env variable that already has pointers to data1/2 (though I'm not sure if those pointers point to data yet in the pipeline)
static int fill_hashmap(diff_mem_data *data1, diff_mem_data *data2,
git_diffresults_conf const *results_conf,
diff_environment *env, struct hashmap *result,
int line1, int count1, int line2, int count2)
{
/*
* If env already has data1/2, then there is no reason to pass
* in two data structs
*/
result->file1 = data1; /* maybe? result->file1 = env->data1 */
result->file2 = data2; /* "" */
result->results_conf = results_conf;
result->env = env;
Particularly, we need to:
This needs to be done before we send it out.
EDIT:
We also need to
PROBLEM:
Current implementation of src/diff.c
uses a method called load_file()
. This opens a file given a directory. Arguably, though, this is a function that belongs in a file whose job it is to supply OS-agnostic IO handlers. As it turns out, this instinct is correct, as there is such a file, which you will see in the "resources" section.
OBJECTIVES:
diff.c
to OS-agnostic IO handlers, to the exclusion of diff.c
's load_file()
.RESOURCES:
fileops.h
and fileops.c
supplies the OS-agnostic IO handlers required to complete this bug report. They're pretty easy to use.There are some places where spaces should be tabs in libdiff.c. Anytime there are 4 spaces, that can be a tab, even if those 4 spaces are for lining things up. If there are 6 spaces, four of them should be 1 tab and the other 2 just spaces. I can fix it if you want.
We must add error codes to common.h specific to diff results.
Discussion here for what error codes are needed go here.
/*
* Could we use fewer comparisons by making this a while loop?
* entry = map->first
* while (entry->next) {...
* ?
*/
for (entry = map->first; entry; entry = entry->next) {
Project creation, backlog item creation, and task creation within each backlog item
Some of these will come from, or at least be clarified by, input from Vicent when we get that. However, due to the necessity of showing this setup to the TA tomorrow, we need to get this done tonight.
We are declaring some function at the top of .c files like such: https://github.com/crakdmirror/libgit2/blob/development/src/diff.c#L13
I understand why we would do this, but glancing around it doesn't look like the rest of the project is doing this. Is it ok if we remove these to keep the same feel as the rest of the project?
Right now, libdiff.c
is a hodgepodge of functions that deal with records seamlessly alongside the the diffing algorithm functions. This should change to provide us with better abstraction. Specifically, in this new file should go:
record_classifier
and classd_record
structs)memstore
structdiff_record
, etc.)prepare_data_ctx()
) while (map->entries[index].line1) {
/*
* Set other to the record corresponding to the line we are on
* This seems to be comparing file1 to file1 at times
* If we are on pass = 1, then diff_record will be equal to
* data_ctx1->recs[line-1], which other gets set to here
* TODO: see if this can be bypassed once
*/
other = map->env->data_ctx1.recs[map->entries[index].line1 - 1];
This is critical because this is a production library.
Seems like something for @kyeana
Recently we changed some of the places where we divide or multiply by a power of 2 to use bitshifts. My impression was that this was implemented automatically by the compiler.
We need to decide which to use.
Try to re-use as many use cases as possible from git diff unit tests. Use the libgit2 test framework.
Previously, I thought it was the case that typedef'ing structs was mainly for readability. Recent readings indicate that this is incorrect, and libgit in general seems to agree.
We need to:
We often return -1 after an error problem; we should return error codes instead. Per @crakdmirror's request, these error codes seem to be defined in include/common.h
. Not sure if that's all of them.
Should we declare functions? Where? What sort (e.g., static)?
Probably 80 columns.
libgit2 uses tabs. End of story. Spaces are unacceptable. Additionally, it is not true that any instance of 4 spaces should become a tab. Consider the following:
int some_func(int param,
int param2)
Note that the initial indentation for line 1 and 2 are the same -- we use 1 tab. BUT, the second line must use spaces after this 1-tab indentation so that the params line up.
Another example is here; we indent both the first and the second line with 1 tab, and then use spaces to line up the xdl_change_compact()
calls. There are more than 4 spaces.
This is C99, so either a // Comment
or `/* Comment */ format are acceptable.
for(i=0; i<len; i++)
if(...)
Should be...
for (i = 0; i < len; i++)
if (...)
Notice: spaces after for, if, else, else if and between the following symbols:
= + - < > * / % | & ^ <= >= == != ? :
Braces are placed on the same lines in control-flow statements, e.g., conditionals, loops, and switches. The are NOT placed on the same lines of functions. See style guide for more details.
YES:
int has_cow(struct farmer *bob)
{
if(bob->cow) {
return 0;
}
else {
return NO_COW;
}
}
NO:
int has_cow(struct farmer *bob) {
if(bob->cow)
{
return 0;
}
else
{
return NO_COW;
}
}
We use square roots for a number of things. Our current implementation is here -- just the standard approximation method.
One of the team suggested that we look into using Quake 3's sqrt hack. Initial drawbacks seem to be that it's for floats, not longs. Perhaps we can adapt it to fit longs.
Often we'll write something like "FIXME/TODO: change this function name to something that makes more sense then int if_i_see_one_more_uncommented_function_in_libxdiff_i_am_sending_davide_libenzi_a_bomb_in_the_mail
" in the code to remind ourselves that we have some shit to do.
We should:
Should be a 10-second job, but still on the list of things to-do.
I think you got this, @crakdmirror.
A lot of preparations happen in the diff pipeline. One of these preparations is to build xdlclassifiers and xdlclasses for every record. They're niftily constructed and the code is nicely written. The only problem is that I can't find a place where they're actually used. In our code, we build it in algo_environment()
and in their code, they build it in xdl_prepare_env()
. In both cases, the classifier and its classes are created and then simply thrown away. What does this code do?
If we didn't have to actually build these things, we could cut out probably 1/8th of the diff running time. We need to find out:
We need to know whether to use size_t
and, say, int
and long
. libgit2
uses size_t
all over the place, so the question is likely really a question about when to use it.
NOTES:
We need to settle on, and implement, the functions that will face outwards, directly to the users. This includes:
Tentatively leaning towards Patience. We will need to look at both that and the classical diff algorithm.
This step will mostly be composed of writing the core diff function, possibly in both algorithms, and profiling each. The next step will be integrating it with the protocols (e.g., the internal diff protocol used to merge etc.).
The goals are:
We will need to make sure that we're deleting everything and not causing memory leaks. This is a production library, and that would be bad.
@trane seems good at this sort of thing.
git produces both raw diff output (as in git diff
) and a raw diff internal representation used to apply merges, patches, etc. This task will be composed of:
The next step is to implement it inside and around the core diff function.
In git, if you use git diff and there is a newly added file in your file system but not in the repository, then the contents of that file are not part of the diff. If you delete a file from the file system that was in the repository, then that file is still included in the diff.
Should we follow this behavior? If so it would make the git_diff() function virtually done, minus actually doing the diffs on the files that have changed.
Everyone please go into scrumworks and take a look. I figure once we actually start sprint 2, I'll move the unfinished tasks over to it from Sprint 1. Doesn't allow me to add pieces of backlog items, requires whole backlock item, so I just added all of the main merge implementation to sprint 2, even though I doubt we'll get it done. When you look at how things have worked out, this is really only a 3 week project, not 5. (Since we have all the beginning and end administrative stuff) We're doing the final presentation 2 weeks from tomorrow. Crazy stuff.
PROBLEM: A call to git_diff_no_index()
will allocate the contents of file at params filepath1
and filepath2
to a char *buffer1
and char *buffer2
. Running this method will produce the following error:
a.out(5679) malloc: *** error for object 0x100800000: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug [1] 5679 abort ./a.out
What's going on here is that we're free'ing here without actually having malloc'd the memory to begin with.
NOTES:
I would've just fixed this, but I wasn't sure if that would conflict with resolution of #14.
According to the wiki:
Linux style for comments is the C89 "/* ... */" style. Don't use C99-style "// ..." comments.
Are we following the same? If so, we need to change a bunch of stuff.
PROBLEM: We're sending in a flags parameter as part of the diff
query. Normally, it's the job of a set of helper function (e.g. the hypothetical function print_std_out()
) to create a git_diffresults_conf
struct that configures the result of the diff -- whether we are printing it or merging it, etc. This does not exist yet, and so the flags param is not set, and thus it tends to be full of garbage.
Merge introduced in 833e409f527993bb8726 breaks build. If you need help building, take a look at this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.