Giter Club home page Giter Club logo

Comments (3)

mih avatar mih commented on September 3, 2024

All of the runtime is coming from _eval_submodule():

Test patch

diff --git a/datalad_next/iter_collections/gitstatus.py b/datalad_next/iter_collections/gitstatus.py
index d3f4dd5..6fe6e71 100644
--- a/datalad_next/iter_collections/gitstatus.py
+++ b/datalad_next/iter_collections/gitstatus.py
@@ -289,7 +289,7 @@ def _yield_repo_items(
             # TODO others?
         )
         # TODO possibly trim eval_submodule_state
-        _eval_submodule(path, item, eval_submodule_state)
+        #_eval_submodule(path, item, eval_submodule_state)
         if item.status:
             yield item
# with patchtime datalad next-status
nothing to save, working tree clean
datalad next-status  0.84s user 0.13s system 101% cpu 0.962 total

# without the patchtime datalad next-status
nothing to save, working tree clean
datalad next-status  95.98s user 26.99s system 110% cpu 1:51.23 total

from datalad-next.

mih avatar mih commented on September 3, 2024

The culprit is the timing of detection that a submodule is absent.

The following patch tried swapping out the iter_subproc() method for a simpler subprocess.run(). Marginal difference -- big compliment to iter_subproc()!

diff --git a/datalad_next/iter_collections/gitstatus.py b/datalad_next/iter_collections/gitstatus.py
index d3f4dd5..9f0f7a8 100644
--- a/datalad_next/iter_collections/gitstatus.py
+++ b/datalad_next/iter_collections/gitstatus.py
@@ -13,6 +13,7 @@ from typing import Generator
 
 from datalad_next.runners import (
     CommandError,
+    call_git_lines,
     iter_git_subproc,
 )
 from datalad_next.itertools import (
@@ -414,18 +415,16 @@ def _get_submod_worktree_head(path: Path) -> tuple[bool, str | None, bool]:
         # its basis. it is not meaningful to track the managed branch in
         # a superdataset
         HEAD = corresponding_head
-    with iter_git_subproc(
-        ['rev-parse', '--path-format=relative',
-         '--show-toplevel', HEAD],
+    res = call_git_lines(
+        ['rev-parse', '--path-format=relative', '--show-toplevel', HEAD],
         cwd=path,
-    ) as r:
-        res = tuple(decode_bytes(itemize(r, sep=None, keep_ends=False)))
-        assert len(res) == 2
-        if res[0].startswith('..'):
-            # this is not a report on a submodule at this location
-            return False, None, adjusted
-        else:
-            return True, res[1], adjusted
+    )
+    assert len(res) == 2
+    if res[0].startswith('..'):
+        # this is not a report on a submodule at this location
+        return False, None, adjusted
+    else:
+        return True, res[1], adjusted
 
 
 def _eval_submodule(basepath, item, eval_mode) -> None:

from datalad-next.

mih avatar mih commented on September 3, 2024

Here is the patch

diff --git a/datalad_next/iter_collections/gitstatus.py b/datalad_next/iter_collections/gitstatus.py
index d3f4dd5..5e4a980 100644
--- a/datalad_next/iter_collections/gitstatus.py
+++ b/datalad_next/iter_collections/gitstatus.py
@@ -437,6 +436,14 @@ def _eval_submodule(basepath, item, eval_mode) -> None:
         return
 
     item_path = basepath / item.path
+
+    # this is the cheapest test for the theoretical chance that a submodule
+    # is present at `item_path`. This is beneficial even when we would only
+    # run a single call to `git rev-parse`
+    # https://github.com/datalad/datalad-next/issues/606
+    if not (item_path / '.git').exists():
+        return
+
     # get head commit, and whether a submodule is actually present,
     # and/or in adjusted mode
     subds_present, head_commit, adjusted = _get_submod_worktree_head(item_path)
❯ time datalad next-status
nothing to save, working tree clean
datalad next-status  1.17s user 0.21s system 100% cpu 1.372 total

A 80x speedup for this extreme use case.

from datalad-next.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.