Giter Club home page Giter Club logo

Comments (13)

johanatan avatar johanatan commented on August 29, 2024

Another way to look at this issue is to consider that local is a premature, unnecessary & [inherently] destructive coercion to a "bottom" value. If one takes this viewpoint, he might be inclined to argue that mbt describe local itself should not be special and should just return the module hashes for the modules which have changed on disk (per their [and their dependencies] current contents).

from mbt.

buddhike avatar buddhike commented on August 29, 2024

Thanks for reporting. Current behaviour is intentional. mbt build|describe local commands are there to find what's changed in your workspace. They should just build/describe workspace changes.
Although, I'm keen to understand where this improvement could be useful. Could you please elaborate the optimisation you are trying to do in your pipeline?

from mbt.

johanatan avatar johanatan commented on August 29, 2024

I want to use the hash of the content to see if the build artifact for a particular locally modified module has been built and, if so, skip it.

from mbt.

buddhike avatar buddhike commented on August 29, 2024

Thanks for explaining what you are trying to do. After making a change in your repo, if you run mbt describe|build local, it should just build/describe what's changed. So this behaviour achieves what you need. Although, I think a potential problem is if you now repeat this command, same changes get rebuilt. Which is probably what you are trying to avoid.

Traditional build systems like make, maven, msbuild already optimise the build process this way so that if you run the same command multiple times, it does not do much.
From the start, we did not want to replace these tools with mbt. Its sole purpose was to execute build tools of your choice in a subtree of a repo, when it has been modified.

Version calculation was required because, if we are building a sub-set of our repository, we may want a reliable way to associate those build artifacts to a change in the repo (commit SHA is not good enough now because it represents every module in our commit, not just the ones got built as a result of a change).
Currently the version of a module is calculated by combining:

  • SHA of git tree object for its dir (same thing you get when you run git cat-file -p master^{tree} | grep tree)
  • SHA of git blob objects for of its file dependencies (same as git cat-file -p master^{tree} | grep blob)
  • Version numbers of dependencies in .mbt.yml

mbt build|describe local was introduced because we found ourselves doing a lot of git commit... and mbt build|describe head while we are developing the build related scripts. When thinking about the problem we figured out that, we could just interrogate the workspace (file system) for modules and reduce that down to the ones that are not added to the index. However, since these are the items we don't have in git yet, we cannot get the SHAs for them. This is why we have a fixed version called local to denote a module that's being built using this command.

Sorry about the length of the response. Thought giving a bit of background to how local was born and the problem it's trying to solve will clear things up.

If you still think the version number based on the local content is essential, feel free to have a stab. I think you'd be able to generate the modules set as per head and change our discover workspace algorithm to construct the modified modules based on the workspace diff :-).

//cc @wenisman

from mbt.

johanatan avatar johanatan commented on August 29, 2024

Ah, thanks for the background.

Hmm, would that even be possible though since as you mentioned there isn’t any hash of local changes until they’re committed?

Or were you thinking of a different algorithm that looks at the content of the files on disk?

Also, can you please point me to where in the codebase your current “discover workspace algorithm” is implemented?

Thanks!

from mbt.

buddhike avatar buddhike commented on August 29, 2024

No problem at all. Thank you for feedback like this.

Current workspace discovery code is in ModulesInWorkspace function in discover component.

func (d *stdDiscover) ModulesInWorkspace() (Modules, error) {

That works out all the modules based on the presence of .mbt.yml file in directory tree. ModulesInWorkspace is used by a component called ManifestBuilder which produces various views of our modules set. In this particular case, it reduces the modules down to what's modified in the workspace.

mods, err = b.Reducer.Reduce(mods, deltas)

I was thinking that, ModulesInWorkspace may not be necessary to achieve our goal here. ByWorkspaceChanges in ManifestBuilder could generate the modules based on the current head (i.e. call ManifestBuilder.ByCurrentBranch()) then have a another function in Discover module to interrogate the workspace diff for modules. Finally we union the two sets.

New function in Discover could generate the Version based on the content of those modified files and their paths (to track mv x y scenarios).

Just food for thought anyway. There could be a simpler approach... 😄

from mbt.

buddhike avatar buddhike commented on August 29, 2024

Closing this for now due to inactivity.

from mbt.

johanatan avatar johanatan commented on August 29, 2024

How about:

-. commit local changes
-. grab the new content hashes
-. restore the working dir, staging area, etc to original state.

??

Not the most elegant solution but should be straightforward. The most difficult part would be step 3, restoration of all the associated states (new files, edited files, staged files, unstaged files etc etc).

Thoughts?

from mbt.

buddhike avatar buddhike commented on August 29, 2024

That's definitely workable. However, there's a bit more elegant story to this now.

Couple of weeks ago, @sergey-ostapenko did some awesome work to remove the file system walking for working out local modules. If you look at the discussion here #86, you can see a method I suggested around creating a union of modules as per head commit and currently modified items in workspace. Current implementation is much simpler than that but it does not have a way to differentiate modules that have been changed. So it is still using the fixed version string local.

If we change the implementation to the union based one, you could technically retain the versions from HEAD tree for unmodified modules. As for modified ones, you could simply create a hash based on the content of modified files and their path.

A bit more work than what you proposed I suppose, but it would be cleaner in my opinion (You might even realise an easier approach than the union if you look at @sergey-ostapenko implementation 😉 ).

Let me know what you think.

from mbt.

johanatan avatar johanatan commented on August 29, 2024

Can you elaborate on "simply create a hash based on the content of modified files and their path"? Keep in mind that I'd like the hash we compute here to ultimately match the one that git itself would compute if we were to add and commit all of the local changes (so that build optimization can occur--i.e., so that already-built modules would not be rebuilt later on).

from mbt.

johanatan avatar johanatan commented on August 29, 2024

I am using the following script for now as a workaround. Feel free to integrate the equivalent into mbt itself:

#!/usr/bin/env bash
# Argument to the script is a valid project name

pushd $CODEBUILD_SRC_DIR > /dev/null
tmp_file=`mktemp`

function cleanup {
    rm -f $tmp_file
    rm -r .git/logs
    rm -r .git/refs
    mv .git/logs2 .git/logs
    mv .git/refs2 .git/refs
    [ -f .git/COMMIT_EDITMSG2 ] && mv .git/COMMIT_EDITMSG2 .git/COMMIT_EDITMSG
    popd > /dev/null
}

trap cleanup EXIT

cp .git/index $tmp_file
cp -R .git/logs .git/logs2
cp -R .git/refs .git/refs2
[ -f .git/COMMIT_EDITMSG ] && cp .git/COMMIT_EDITMSG .git/COMMIT_EDITMSG2

(
export GIT_INDEX_FILE=$tmp_file
git add .
git commit -m "commit for latest version" > /dev/null
mbt describe head | grep "^$1 " | tr -s ' ' | cut -d ' ' -f3
)

from mbt.

buddhike avatar buddhike commented on August 29, 2024

Can you elaborate on "simply create a hash based on the content of modified files and their path"?
We could compute a SHA1 of the content of the modified file and its path. However, this does not guarantee that you would get the same version after it's committed.

Thanks for sharing your script.
Is this a requirement because your CI does some automated work before building your artifacts but you don't want them in the commit tree?

from mbt.

johanatan avatar johanatan commented on August 29, 2024

Yes, one should not have to commit in order to build.

I think this sums it up (from a systems consistency perspective):

local is a premature, unnecessary & [inherently] destructive coercion to a "bottom" value. It breaks consistency with the rest of an otherwise clean design.

from mbt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.