Giter Club home page Giter Club logo

libgit2's People

Contributors

ageric avatar alexbudgh avatar arhibot avatar basepi avatar bnoordhuis avatar carlosmn avatar chobie avatar ctimmerm avatar dborowitz avatar drahosp avatar enzbang avatar hausdorff avatar icefox avatar ingmarv avatar joychenhuanhui avatar julioes avatar justinlove avatar jwiegley avatar madcoder avatar marvil07 avatar neopallium avatar nud avatar nulltoken avatar przemoc avatar sakari avatar schacon avatar spearce avatar tclem avatar vimalloc avatar vmg avatar

Watchers

 avatar  avatar  avatar

libgit2's Issues

We need to rehearse the final presentation and what-not.

Just need to decide what all we need to do to prepare for this. I suppose we can just give each person stuff to talk about decide ordering, and wing it. I'm not too worried about the public speaking part of it, once we have all the content.

This is obviously low-priority until we have the actual code done, but thought I'd throw up an issue.

EDIT: So when are we going to do it? I suppose we'll talk more at the meeting on Saturday.

Use or define the libdiff malloc

We're currently just calling free() and malloc(). This isn't extensible. We should define some sort of macro that specifies which malloc we're using, and it should probably default to libgit's malloc. I propose we call it ld_malloc()

Patience diff

Do we want patience diff in a separate file? I don't think it would hurt to do so. We could model it after JGit, where the both algos are in different files.

merge dev-algo-[patience core]

I've merged core into patience, but I think it is at a point where we can merge them into dev-diff-algo.
My branch has all of core and patience in it without merge conflicts, and compiles.

diff_no_index() fails to read files

PROBLEM:

Calls to diff_no_index() fail even when provided with correct paths for files.

RESOURCES:

Code ran is here:

int main() {
    git_diffresults_conf *conf;
    git_repository *repo;
    git_repository_open(&repo, "");
    git_index *index;
    git_commit *commit1;
    git_commit *commit2;

    //git_diff(diffdata, commit1, repo);

    printf("MAIN\n");
    printf("%d\n", git_diff_no_index(&conf, "difftest_before", "difftest_after"));

    //git_diff_cached(diffdata, commit1, index);
    //git_diff_commits(diffdata, commit1, commit2);
}

ls gives us this:

a.out           difftest_after  difftest_before main.c          tests.c

tests.c is unrelated.

patience: fillhashmap possible improvements

We are passing in data1 and data2, but also the env variable that already has pointers to data1/2 (though I'm not sure if those pointers point to data yet in the pipeline)

static int fill_hashmap(diff_mem_data *data1, diff_mem_data *data2,
        git_diffresults_conf const *results_conf,
        diff_environment *env, struct hashmap *result,
        int line1, int count1, int line2, int count2)
{
    /*
     * If env already has data1/2, then there is no reason to pass
     * in two data structs
     */
    result->file1 = data1; /* maybe? result->file1 = env->data1 */
    result->file2 = data2; /* "" */
    result->results_conf = results_conf;
    result->env = env;

Check off documentation for diff

Particularly, we need to:

  • Check off comments in public-facing API functions before deployment
  • Check off the main comment that explicitly says where we got the ideas for the diff code, and who to contact for problems with it.

This needs to be done before we send it out.

EDIT:

We also need to

  • Make sure the comments match up with the things they describe. Right now there's some dissonance between the two.

Use OS-agnostic IO handlers

PROBLEM:

Current implementation of src/diff.c uses a method called load_file(). This opens a file given a directory. Arguably, though, this is a function that belongs in a file whose job it is to supply OS-agnostic IO handlers. As it turns out, this instinct is correct, as there is such a file, which you will see in the "resources" section.

OBJECTIVES:

  • Transition diff.c to OS-agnostic IO handlers, to the exclusion of diff.c's load_file().
  • Transition any other file that does this which I overlooked in this bug report.

RESOURCES:

  • fileops.h and fileops.c supplies the OS-agnostic IO handlers required to complete this bug report. They're pretty easy to use.

extraneous spaces in libdiff.c

There are some places where spaces should be tabs in libdiff.c. Anytime there are 4 spaces, that can be a tab, even if those 4 spaces are for lining things up. If there are 6 spaces, four of them should be 1 tab and the other 2 just spaces. I can fix it if you want.

Create diff error codes

We must add error codes to common.h specific to diff results.
Discussion here for what error codes are needed go here.

Prepare backlogs and tasks in ScrumWorks

Project creation, backlog item creation, and task creation within each backlog item

Some of these will come from, or at least be clarified by, input from Vicent when we get that. However, due to the necessity of showing this setup to the TA tomorrow, we need to get this done tonight.

Consolidate records-centric functionality to a record-handling file

Right now, libdiff.c is a hodgepodge of functions that deal with records seamlessly alongside the the diffing algorithm functions. This should change to provide us with better abstraction. Specifically, in this new file should go:

  • Functions that deal with classification (e.g., those that deal with record_classifier and classd_record structs)
  • Functions that build and administrate the memstore struct
  • Functions that deal with actual records (e.g., diff_record, etc.)
  • Possibly functions that prepare the metadata and context of records and diff metadata (e.g., prepare_data_ctx())

patience: insert_record possible improvements

    while (map->entries[index].line1) {
        /*
         * Set other to the record corresponding to the line we are on
         * This seems to be comparing file1 to file1 at times
         * If we are on pass = 1, then diff_record will be equal to
         * data_ctx1->recs[line-1], which other gets set to here
         * TODO: see if this can be bypassed once
         */
        other = map->env->data_ctx1.recs[map->entries[index].line1 - 1];

Decide on bitshift vs division/mutiplication by 2

Recently we changed some of the places where we divide or multiply by a power of 2 to use bitshifts. My impression was that this was implemented automatically by the compiler.

We need to decide which to use.

Style compliancy problems

Structs

Previously, I thought it was the case that typedef'ing structs was mainly for readability. Recent readings indicate that this is incorrect, and libgit in general seems to agree.

We need to:

  • Remove typedefs in structs that are not opaque.
  • For structs that are opaque, we want to access them only via functions that are designed to handle them.

Error Codes

We often return -1 after an error problem; we should return error codes instead. Per @crakdmirror's request, these error codes seem to be defined in include/common.h. Not sure if that's all of them.

Function declarations

Should we declare functions? Where? What sort (e.g., static)?

Line wrap

Probably 80 columns.

Tabs vs Spaces

libgit2 uses tabs. End of story. Spaces are unacceptable. Additionally, it is not true that any instance of 4 spaces should become a tab. Consider the following:

   int some_func(int param,
                 int param2)

Note that the initial indentation for line 1 and 2 are the same -- we use 1 tab. BUT, the second line must use spaces after this 1-tab indentation so that the params line up.

Another example is here; we indent both the first and the second line with 1 tab, and then use spaces to line up the xdl_change_compact() calls. There are more than 4 spaces.

Comments

This is C99, so either a // Comment or `/* Comment */ format are acceptable.

Spaces

for(i=0; i<len; i++)
if(...)

Should be...

for (i = 0; i < len; i++)
if (...)

Notice: spaces after for, if, else, else if and between the following symbols:

=  +  -  <  >  *  /  %  |  &  ^  <=  >=  ==  !=  ?  :

Braces

Braces are placed on the same lines in control-flow statements, e.g., conditionals, loops, and switches. The are NOT placed on the same lines of functions. See style guide for more details.

YES:

int has_cow(struct farmer *bob)
{
    if(bob->cow) {
        return 0;
    }
    else {
        return NO_COW;
    }
}

NO:

int has_cow(struct farmer *bob) {
    if(bob->cow)
    {
        return 0;
    }
    else
    {
        return NO_COW;
    }
}

Explore bogosqrt vs q3sqrt

We use square roots for a number of things. Our current implementation is here -- just the standard approximation method.

One of the team suggested that we look into using Quake 3's sqrt hack. Initial drawbacks seem to be that it's for floats, not longs. Perhaps we can adapt it to fit longs.

Review and complete TODOs and FIXMEs before ship of diff

Often we'll write something like "FIXME/TODO: change this function name to something that makes more sense then int if_i_see_one_more_uncommented_function_in_libxdiff_i_am_sending_davide_libenzi_a_bomb_in_the_mail" in the code to remind ourselves that we have some shit to do.

We should:

  • Complete as many of these as possible
  • Before shipping
  • Unless we can't do it for some reason

xdlclassifier and xdlclass do not appear to have any function whatsoever

A lot of preparations happen in the diff pipeline. One of these preparations is to build xdlclassifiers and xdlclasses for every record. They're niftily constructed and the code is nicely written. The only problem is that I can't find a place where they're actually used. In our code, we build it in algo_environment() and in their code, they build it in xdl_prepare_env(). In both cases, the classifier and its classes are created and then simply thrown away. What does this code do?

If we didn't have to actually build these things, we could cut out probably 1/8th of the diff running time. We need to find out:

  • If it's the case that these are not used.
  • And un-implement them if it is

Decide between size_t and long

We need to know whether to use size_t and, say, int and long. libgit2 uses size_t all over the place, so the question is likely really a question about when to use it.

NOTES:

  • We will need to pay special attention to converting the stuff in all structs when doing this.

Build public-facing diff API function interface

We need to settle on, and implement, the functions that will face outwards, directly to the users. This includes:

  1. Designing the functions and function signatures to be used directly by the users
  2. Define, implement, and typedef all structs associated with these functions.

Implement core diff function

Tentatively leaning towards Patience. We will need to look at both that and the classical diff algorithm.

This step will mostly be composed of writing the core diff function, possibly in both algorithms, and profiling each. The next step will be integrating it with the protocols (e.g., the internal diff protocol used to merge etc.).

The goals are:

  • Fast. Particularly, to make it parallel if we can.

Memory-profile diff algorithm

We will need to make sure that we're deleting everything and not causing memory leaks. This is a production library, and that would be bad.

@trane seems good at this sort of thing.

Implement core diff protocols

git produces both raw diff output (as in git diff) and a raw diff internal representation used to apply merges, patches, etc. This task will be composed of:

  1. Nailing down what that protocol is.
  2. Seeing if we can implement it faster.

The next step is to implement it inside and around the core diff function.

git diff with newly added files.

In git, if you use git diff and there is a newly added file in your file system but not in the repository, then the contents of that file are not part of the diff. If you delete a file from the file system that was in the repository, then that file is still included in the diff.

Should we follow this behavior? If so it would make the git_diff() function virtually done, minus actually doing the diffs on the files that have changed.

Scrumworks Sprint 2 update

Everyone please go into scrumworks and take a look. I figure once we actually start sprint 2, I'll move the unfinished tasks over to it from Sprint 1. Doesn't allow me to add pieces of backlog items, requires whole backlock item, so I just added all of the main merge implementation to sprint 2, even though I doubt we'll get it done. When you look at how things have worked out, this is really only a 3 week project, not 5. (Since we have all the beginning and end administrative stuff) We're doing the final presentation 2 weeks from tomorrow. Crazy stuff.

We free a pointer that wasn't malloc'd

PROBLEM: A call to git_diff_no_index() will allocate the contents of file at params filepath1 and filepath2 to a char *buffer1 and char *buffer2. Running this method will produce the following error:
a.out(5679) malloc: *** error for object 0x100800000: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug [1] 5679 abort ./a.out
What's going on here is that we're free'ing here without actually having malloc'd the memory to begin with.

NOTES:

I would've just fixed this, but I wasn't sure if that would conflict with resolution of #14.

use c89 style comments

According to the wiki:

Linux style for comments is the C89 "/* ... */" style. Don't use C99-style "// ..." comments.

Are we following the same? If so, we need to change a bunch of stuff.

Flags variable is getting passed in uninitialized (full 'o garbage)

PROBLEM: We're sending in a flags parameter as part of the diff query. Normally, it's the job of a set of helper function (e.g. the hypothetical function print_std_out()) to create a git_diffresults_conf struct that configures the result of the diff -- whether we are printing it or merging it, etc. This does not exist yet, and so the flags param is not set, and thus it tends to be full of garbage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.