Comments (3)
All of the runtime is coming from _eval_submodule()
:
Test patch
diff --git a/datalad_next/iter_collections/gitstatus.py b/datalad_next/iter_collections/gitstatus.py
index d3f4dd5..6fe6e71 100644
--- a/datalad_next/iter_collections/gitstatus.py
+++ b/datalad_next/iter_collections/gitstatus.py
@@ -289,7 +289,7 @@ def _yield_repo_items(
# TODO others?
)
# TODO possibly trim eval_submodule_state
- _eval_submodule(path, item, eval_submodule_state)
+ #_eval_submodule(path, item, eval_submodule_state)
if item.status:
yield item
# with patch
❯ time datalad next-status
nothing to save, working tree clean
datalad next-status 0.84s user 0.13s system 101% cpu 0.962 total
# without the patch
❯ time datalad next-status
nothing to save, working tree clean
datalad next-status 95.98s user 26.99s system 110% cpu 1:51.23 total
from datalad-next.
The culprit is the timing of detection that a submodule is absent.
The following patch tried swapping out the iter_subproc()
method for a simpler subprocess.run()
. Marginal difference -- big compliment to iter_subproc()
!
diff --git a/datalad_next/iter_collections/gitstatus.py b/datalad_next/iter_collections/gitstatus.py
index d3f4dd5..9f0f7a8 100644
--- a/datalad_next/iter_collections/gitstatus.py
+++ b/datalad_next/iter_collections/gitstatus.py
@@ -13,6 +13,7 @@ from typing import Generator
from datalad_next.runners import (
CommandError,
+ call_git_lines,
iter_git_subproc,
)
from datalad_next.itertools import (
@@ -414,18 +415,16 @@ def _get_submod_worktree_head(path: Path) -> tuple[bool, str | None, bool]:
# its basis. it is not meaningful to track the managed branch in
# a superdataset
HEAD = corresponding_head
- with iter_git_subproc(
- ['rev-parse', '--path-format=relative',
- '--show-toplevel', HEAD],
+ res = call_git_lines(
+ ['rev-parse', '--path-format=relative', '--show-toplevel', HEAD],
cwd=path,
- ) as r:
- res = tuple(decode_bytes(itemize(r, sep=None, keep_ends=False)))
- assert len(res) == 2
- if res[0].startswith('..'):
- # this is not a report on a submodule at this location
- return False, None, adjusted
- else:
- return True, res[1], adjusted
+ )
+ assert len(res) == 2
+ if res[0].startswith('..'):
+ # this is not a report on a submodule at this location
+ return False, None, adjusted
+ else:
+ return True, res[1], adjusted
def _eval_submodule(basepath, item, eval_mode) -> None:
from datalad-next.
Here is the patch
diff --git a/datalad_next/iter_collections/gitstatus.py b/datalad_next/iter_collections/gitstatus.py
index d3f4dd5..5e4a980 100644
--- a/datalad_next/iter_collections/gitstatus.py
+++ b/datalad_next/iter_collections/gitstatus.py
@@ -437,6 +436,14 @@ def _eval_submodule(basepath, item, eval_mode) -> None:
return
item_path = basepath / item.path
+
+ # this is the cheapest test for the theoretical chance that a submodule
+ # is present at `item_path`. This is beneficial even when we would only
+ # run a single call to `git rev-parse`
+ # https://github.com/datalad/datalad-next/issues/606
+ if not (item_path / '.git').exists():
+ return
+
# get head commit, and whether a submodule is actually present,
# and/or in adjusted mode
subds_present, head_commit, adjusted = _get_submod_worktree_head(item_path)
❯ time datalad next-status
nothing to save, working tree clean
datalad next-status 1.17s user 0.21s system 100% cpu 1.372 total
A 80x speedup for this extreme use case.
from datalad-next.
Related Issues (20)
- `next-status` claims nothing to save when all is staged and no HEAD
- External protocol specifications with `shell` based "driver" implementation
- A `sibling` command to rule them all HOT 3
- External protocol for secret store adaptors
- Warning: `cannot collect test class 'TestArchive' because it has a __init__`
- Revamp `save` with improved messages
- `pytest` now unintentional runtime dependency HOT 1
- Add note that `rev-parse --path-format=relative` requires Git > 2.30
- [WARNING] Could not load extension 'next': ImportError(cannot import name 'Dataset' from partially initialized module 'datalad.api' HOT 1
- Failing testing of datalad "core" HOT 2
- 1.4 introduced build reproducibility issue
- Remove manual subpackage doc listing -> `__all__`
- Add `pathspecs` argument to `iter_gitworktree()` docstring
- Move to `hatch`? HOT 1
- Support for detection of (un)staged changes?
- datalad-core's `test_wtf` fails on `main` HOT 2
- WEBDAV-related special remote error HOT 8
- Switch to `rich` for any and all terminal rendering/interaction needs
- Homogenize (and fix) implementations of `iter_gitdiff|status()`
- creating new sibling ria+ssh : [ERROR ] 'SSHRemoteIO' object has no attribute 'url2transport_path' HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datalad-next.