basepi / libgit2 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from libgit2/libgit2

0.0 0.0 0.0 3.14 MB

The Library

Home Page: http://libgit2.github.com

License: Other

C 100.00%

libgit2's People

Contributors

Watchers

libgit2's Issues

We need to rehearse the final presentation and what-not.

Just need to decide what all we need to do to prepare for this. I suppose we can just give each person stuff to talk about decide ordering, and wing it. I'm not too worried about the public speaking part of it, once we have all the content.

This is obviously low-priority until we have the actual code done, but thought I'd throw up an issue.

EDIT: So when are we going to do it? I suppose we'll talk more at the meeting on Saturday.

Use or define the libdiff malloc

We're currently just calling free() and malloc(). This isn't extensible. We should define some sort of macro that specifies which malloc we're using, and it should probably default to libgit's malloc. I propose we call it ld_malloc()

Patience diff

Do we want patience diff in a separate file? I don't think it would hurt to do so. We could model it after JGit, where the both algos are in different files.

merge dev-algo-[patience core]

I've merged core into patience, but I think it is at a point where we can merge them into dev-diff-algo.
My branch has all of core and patience in it without merge conflicts, and compiles.

diff_no_index() fails to read files

PROBLEM:

Calls to diff_no_index() fail even when provided with correct paths for files.

RESOURCES:

Code ran is here:

int main() {
    git_diffresults_conf *conf;
    git_repository *repo;
    git_repository_open(&repo, "");
    git_index *index;
    git_commit *commit1;
    git_commit *commit2;

    //git_diff(diffdata, commit1, repo);

    printf("MAIN\n");
    printf("%d\n", git_diff_no_index(&conf, "difftest_before", "difftest_after"));

    //git_diff_cached(diffdata, commit1, index);
    //git_diff_commits(diffdata, commit1, commit2);
}

ls gives us this:

a.out           difftest_after  difftest_before main.c          tests.c

tests.c is unrelated.

patience: fillhashmap possible improvements

We are passing in data1 and data2, but also the env variable that already has pointers to data1/2 (though I'm not sure if those pointers point to data yet in the pipeline)

static int fill_hashmap(diff_mem_data *data1, diff_mem_data *data2,
        git_diffresults_conf const *results_conf,
        diff_environment *env, struct hashmap *result,
        int line1, int count1, int line2, int count2)
{
    /*
     * If env already has data1/2, then there is no reason to pass
     * in two data structs
     */
    result->file1 = data1; /* maybe? result->file1 = env->data1 */
    result->file2 = data2; /* "" */
    result->results_conf = results_conf;
    result->env = env;

Check off documentation for diff

Particularly, we need to:

Check off comments in public-facing API functions before deployment
Check off the main comment that explicitly says where we got the ideas for the diff code, and who to contact for problems with it.

This needs to be done before we send it out.

EDIT:

We also need to

Make sure the comments match up with the things they describe. Right now there's some dissonance between the two.

Use OS-agnostic IO handlers

PROBLEM:

Current implementation of src/diff.c uses a method called load_file(). This opens a file given a directory. Arguably, though, this is a function that belongs in a file whose job it is to supply OS-agnostic IO handlers. As it turns out, this instinct is correct, as there is such a file, which you will see in the "resources" section.

OBJECTIVES:

Transition diff.c to OS-agnostic IO handlers, to the exclusion of diff.c's load_file().
Transition any other file that does this which I overlooked in this bug report.

RESOURCES:

fileops.h and fileops.c supplies the OS-agnostic IO handlers required to complete this bug report. They're pretty easy to use.

extraneous spaces in libdiff.c

There are some places where spaces should be tabs in libdiff.c. Anytime there are 4 spaces, that can be a tab, even if those 4 spaces are for lining things up. If there are 6 spaces, four of them should be 1 tab and the other 2 just spaces. I can fix it if you want.

Create diff error codes

We must add error codes to common.h specific to diff results.
Discussion here for what error codes are needed go here.

patience: find_longest_common_sequence possible improvements

    /*
     * Could we use fewer comparisons by making this a while loop?
     * entry = map->first
     * while (entry->next) {...
     * ?
     */
    for (entry = map->first; entry; entry = entry->next) {

Prepare backlogs and tasks in ScrumWorks

Project creation, backlog item creation, and task creation within each backlog item

Some of these will come from, or at least be clarified by, input from Vicent when we get that. However, due to the necessity of showing this setup to the TA tomorrow, we need to get this done tonight.

Function Declarations

We are declaring some function at the top of .c files like such: https://github.com/crakdmirror/libgit2/blob/development/src/diff.c#L13

I understand why we would do this, but glancing around it doesn't look like the rest of the project is doing this. Is it ok if we remove these to keep the same feel as the rest of the project?

Consolidate records-centric functionality to a record-handling file

Right now, libdiff.c is a hodgepodge of functions that deal with records seamlessly alongside the the diffing algorithm functions. This should change to provide us with better abstraction. Specifically, in this new file should go:

Functions that deal with classification (e.g., those that deal with record_classifier and classd_record structs)
Functions that build and administrate the memstore struct
Functions that deal with actual records (e.g., diff_record, etc.)
Possibly functions that prepare the metadata and context of records and diff metadata (e.g., prepare_data_ctx())

patience: insert_record possible improvements

    while (map->entries[index].line1) {
        /*
         * Set other to the record corresponding to the line we are on
         * This seems to be comparing file1 to file1 at times
         * If we are on pass = 1, then diff_record will be equal to
         * data_ctx1->recs[line-1], which other gets set to here
         * TODO: see if this can be bypassed once
         */
        other = map->env->data_ctx1.recs[map->entries[index].line1 - 1];

Memory-profile diff interface

This is critical because this is a production library.

Seems like something for @kyeana

Prepare final presentation slides

@trane mentioned he might be able to start on these since he's waiting on @DrSleep for something. Thus, he gets assigned the issue! =P

Decide on bitshift vs division/mutiplication by 2

Recently we changed some of the places where we divide or multiply by a power of 2 to use bitshifts. My impression was that this was implemented automatically by the compiler.

We need to decide which to use.

Implement diff unit tests for libgit2

Try to re-use as many use cases as possible from git diff unit tests. Use the libgit2 test framework.

Style compliancy problems

Structs

Previously, I thought it was the case that typedef'ing structs was mainly for readability. Recent readings indicate that this is incorrect, and libgit in general seems to agree.

We need to:

Remove typedefs in structs that are not opaque.
For structs that are opaque, we want to access them only via functions that are designed to handle them.

Error Codes

We often return -1 after an error problem; we should return error codes instead. Per @crakdmirror's request, these error codes seem to be defined in include/common.h. Not sure if that's all of them.

Function declarations

Should we declare functions? Where? What sort (e.g., static)?

Line wrap

Probably 80 columns.

Tabs vs Spaces

libgit2 uses tabs. End of story. Spaces are unacceptable. Additionally, it is not true that any instance of 4 spaces should become a tab. Consider the following:

   int some_func(int param,
                 int param2)

Note that the initial indentation for line 1 and 2 are the same -- we use 1 tab. BUT, the second line must use spaces after this 1-tab indentation so that the params line up.

Another example is here; we indent both the first and the second line with 1 tab, and then use spaces to line up the xdl_change_compact() calls. There are more than 4 spaces.

Comments

This is C99, so either a // Comment or `/* Comment */ format are acceptable.

Spaces

for(i=0; i<len; i++)
if(...)

Should be...

for (i = 0; i < len; i++)
if (...)

Notice: spaces after for, if, else, else if and between the following symbols:

=  +  -  <  >  *  /  %  |  &  ^  <=  >=  ==  !=  ?  :

Braces

Braces are placed on the same lines in control-flow statements, e.g., conditionals, loops, and switches. The are NOT placed on the same lines of functions. See style guide for more details.

YES:

int has_cow(struct farmer *bob)
{
    if(bob->cow) {
        return 0;
    }
    else {
        return NO_COW;
    }
}

NO:

int has_cow(struct farmer *bob) {
    if(bob->cow)
    {
        return 0;
    }
    else
    {
        return NO_COW;
    }
}

Explore bogosqrt vs q3sqrt

We use square roots for a number of things. Our current implementation is here -- just the standard approximation method.

One of the team suggested that we look into using Quake 3's sqrt hack. Initial drawbacks seem to be that it's for floats, not longs. Perhaps we can adapt it to fit longs.

Review and complete TODOs and FIXMEs before ship of diff

Often we'll write something like "FIXME/TODO: change this function name to something that makes more sense then int if_i_see_one_more_uncommented_function_in_libxdiff_i_am_sending_davide_libenzi_a_bomb_in_the_mail" in the code to remind ourselves that we have some shit to do.

We should:

Complete as many of these as possible
Before shipping
Unless we can't do it for some reason

Set up scrumworks first sprint

Should be a 10-second job, but still on the list of things to-do.

C doesn't allow function overloading, and this is annoying

I think you got this, @crakdmirror.

xdlclassifier and xdlclass do not appear to have any function whatsoever

A lot of preparations happen in the diff pipeline. One of these preparations is to build xdlclassifiers and xdlclasses for every record. They're niftily constructed and the code is nicely written. The only problem is that I can't find a place where they're actually used. In our code, we build it in algo_environment() and in their code, they build it in xdl_prepare_env(). In both cases, the classifier and its classes are created and then simply thrown away. What does this code do?

If we didn't have to actually build these things, we could cut out probably 1/8th of the diff running time. We need to find out:

If it's the case that these are not used.
And un-implement them if it is

Decide between size_t and long

We need to know whether to use size_t and, say, int and long. libgit2 uses size_t all over the place, so the question is likely really a question about when to use it.

NOTES:

We will need to pay special attention to converting the stuff in all structs when doing this.

Build public-facing diff API function interface

We need to settle on, and implement, the functions that will face outwards, directly to the users. This includes:

Designing the functions and function signatures to be used directly by the users
Define, implement, and typedef all structs associated with these functions.

Figure out Git's diff unit tests, implement for libgit2

Get Alex's private branch merged

Code freeze is tonight and I know that @DrSleep has been working on his own private branch. Can we have this integrated into dev-testing or something? @crakdmirror @kyeana

Implement core diff function

Tentatively leaning towards Pa tie nce. We will need to look at both that and the classical diff algorithm.

This step will mostly be composed of writing the core diff function, possibly in both algorithms, and profiling each. The next step will be integrating it with the protocols (e.g., the internal diff protocol used to merge etc.).

The goals are:

Fast. Particularly, to make it parallel if we can.

Memory-profile diff algorithm

We will need to make sure that we're deleting everything and not causing memory leaks. This is a production library, and that would be bad.

@trane seems good at this sort of thing.

Implement core diff protocols

git produces both raw diff output (as in git diff) and a raw diff internal representation used to apply merges, patches, etc. This task will be composed of:

Nailing down what that protocol is.
Seeing if we can implement it faster.

The next step is to implement it inside and around the core diff function.

git diff with newly added files.

In git, if you use git diff and there is a newly added file in your file system but not in the repository, then the contents of that file are not part of the diff. If you delete a file from the file system that was in the repository, then that file is still included in the diff.

Should we follow this behavior? If so it would make the git_diff() function virtually done, minus actually doing the diffs on the files that have changed.

Scrumworks Sprint 2 update

Everyone please go into scrumworks and take a look. I figure once we actually start sprint 2, I'll move the unfinished tasks over to it from Sprint 1. Doesn't allow me to add pieces of backlog items, requires whole backlock item, so I just added all of the main merge implementation to sprint 2, even though I doubt we'll get it done. When you look at how things have worked out, this is really only a 3 week project, not 5. (Since we have all the beginning and end administrative stuff) We're doing the final presentation 2 weeks from tomorrow. Crazy stuff.

We free a pointer that wasn't malloc'd

PROBLEM: A call to git_diff_no_index() will allocate the contents of file at params filepath1 and filepath2 to a char *buffer1 and char *buffer2. Running this method will produce the following error:
a.out(5679) malloc: *** error for object 0x100800000: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug [1] 5679 abort ./a.out
What's going on here is that we're free'ing here without actually having malloc'd the memory to begin with.

NOTES:

I would've just fixed this, but I wasn't sure if that would conflict with resolution of #14.

use c89 style comments

According to the wiki:

Linux style for comments is the C89 "/* ... */" style. Don't use C99-style "// ..." comments.

Are we following the same? If so, we need to change a bunch of stuff.

Flags variable is getting passed in uninitialized (full 'o garbage)

PROBLEM: We're sending in a flags parameter as part of the diff query. Normally, it's the job of a set of helper function (e.g. the hypothetical function print_std_out()) to create a git_diffresults_conf struct that configures the result of the diff -- whether we are printing it or merging it, etc. This does not exist yet, and so the flags param is not set, and thus it tends to be full of garbage.

Core protocol team's commits break the build

Merge introduced in 833e409f527993bb8726 breaks build. If you need help building, take a look at this.