Giter Club home page Giter Club logo

Comments (11)

mattip avatar mattip commented on June 30, 2024 1

Thanks, exactly what i was looking for

from codespeed.

diegorusso avatar diegorusso commented on June 30, 2024

aarch64 results are being currently backfilled. I will update the issue once I've done.

from codespeed.

diegorusso avatar diegorusso commented on June 30, 2024

And they are now back filled. The issue can be closed now.

from codespeed.

mattip avatar mattip commented on June 30, 2024

We also need to do some of this for PyPy (speed.pypy.org) since we moved to git last month. How did you do the backfill?

from codespeed.

diegorusso avatar diegorusso commented on June 30, 2024

I can share the scripts I used to do the migration and the backfill. Bear in mind this was very tailored to the case of speed.python.org. I'm not sure what the situation for pypy is.

from codespeed.

mattip avatar mattip commented on June 30, 2024

Ahh. I assume these are DB manipulations scripts? Sure, that might be helpful to give me a hint how to do this for us.

from codespeed.

diegorusso avatar diegorusso commented on June 30, 2024

I've commented the issue with information that weren't relevant hence deleted the comments.
@mattip before giving you the right answer, can you tell me what exactly you need to do?

Questions I have:

  • what SCM technology you are migrating from?
  • how do you map commit from the old SCM with git commits?
  • how far do you want to go back with the backfill?

from codespeed.

mattip avatar mattip commented on June 30, 2024

We have results for pypy-64 and pypy-jit-64 that relate to python2.7. When we used mercurial (until Jan 6) the branch was named "default". Since then we use git, and the branch is named "main". So for this benchmark for instance, you see it has no results before Jan 6, since the branch for those results is "default". So my answers are

  • mercurial
  • I am not sure what you are asking, there is no overlap of commits. Each project can only have one SCM and one default branch, but the timeline does not know about the SCM. Since the py3.9 branch is consistent between the two repos, the history can go back as far as we have data. But on the pypy-64 or pypy-jit-64 projects, the branch name changed on Jan 6, so there are no results for the PyPy project before that date (click on the "benchmarker" environment radio button, note how there is no previous result. Now click on one of the PyPy3.9 executables, note how there are results before Jan 6).
  • I guess it would be nice to go back a year or two.

from codespeed.

diegorusso avatar diegorusso commented on June 30, 2024

I was thinking along the line of mapping commits from mercurial to git. For instance these two commits are the same:

The above commits are the last one to be in common between the repos. Before those commits one can try to map the mercurial commit to the git ones so you don't need to backfill anything, data is already there but I'm not an expert of mercurial hence I don't know if this is feasible or not.

Provided the fact you cannot convert in the database mercurial revisions to git revisions (I think it is possible though) then you can resort to the actual backfill.
I mean the process could be easy, depending on your machine. If you want to run them sequentially I guess you can stick all the commits that you want at the end of benchmark.conf and run pyperformance with compile_all
The format is
sha1=main

If you want to run things in parallel (we did it because we have CPU isolation on the AArch64 machine), collect all the revisions you want to test in a txt file and write a simple python script that calls pyperformance with every revision to test.

Our script was something like that

from multiprocessing import Pool
from pathlib import Path
import subprocess
import sys
import signal

def get_revisions():
    revisions = []
    with open("backfill_shas.txt", "r") as f:
        for line in f:
            sha, branch = line.split("=")
            revisions.append((sha, branch.rstrip()))
    return revisions


def run_pyperformance(revision):
    sha = revision[0]
    branch = revision[1]
    print(f"Running run-pyperformance.sh with sha: {sha}, branch: {branch}")
    output_dir = Path("output")
    output_dir.mkdir(parents=True, exist_ok=True)
    out_file = output_dir / f"{branch}-{sha}.out"
    err_file = output_dir / f"{branch}-{sha}.err"
    with open(out_file, "w") as output, open(err_file, "w") as error:
        subprocess.run([
            "./run-pyperformance.sh",
            "-x",
            "--",
            "compile",
            "benchmark.conf",
            sha,
            branch,
        ],
        stdout=output,
        stderr=error,
        )

if __name__ == '__main__':
    pool = Pool(6)
    try:
        res = pool.map_async(run_pyperformance, get_revisions())
    except KeyboardInterrupt:
        print("Caught KeyboardInterrupt, terminating workers")
        pool.terminate()
    else:
        print("Normal termination")
        pool.close()
    pool.join()

./run-pyperformance.sh is a wrapper script that has some logic around running pyperformance in parallel (lock files, etc..)

Depending on the revisions you need to backfill, the process could be lengthy. I strongly suggest the first approach, the one that modifies the data you have in order to map the new git revisions.

from codespeed.

mattip avatar mattip commented on June 30, 2024

Maybe we have drifted off the original issue far enough that the title needs expanding, or I can open a new issue?

We can generate a bi-directional mapping of commits on PyPy between mercurial and git using the methodology we used to migrate the repo. From the GUI, I can drill down to a particular benchmark result for a particular revision, and I see there I can add another revision. Any idea how I would do that in a SQL query? Then I could, for each interesting hg hash, find all the results and add the corresponding git hash.

from codespeed.

diegorusso avatar diegorusso commented on June 30, 2024

Maybe we have drifted off the original issue far enough that the title needs expanding, or I can open a new issue?

I don't mind either way.

We can generate a bi-directional mapping of commits on PyPy between mercurial and git using the methodology we used to migrate the repo. From the GUI, I can drill down to a particular benchmark result for a particular revision, and I see there I can add another revision. Any idea how I would do that in a SQL query? Then I could, for each interesting hg hash, find all the results and add the corresponding git hash.

Ok, if there is a way to map mercurial revision with git revisions, I would strongly suggest to add these revisions on codespeed and then associate the results to the new git revisions. It will be much easier than re run 2 years worth of benchmarks.

For doing so I strongly suggest not to use SQL directly but use Django ORM as it will be less prone to error and easy to read.
As an example, here is the script that we used to migrate data from master to main branch

from codespeed.models import Result, Revision, Branch, Report
from django.core.exceptions import ObjectDoesNotExist

# Get the branches
master_branch = Branch.objects.get(name="master")
main_branch = Branch.objects.get(name="main")

# Get all master Results
master_results = Result.objects.filter(revision__branch__name=master_branch.name)

# We need to iterate over the master resuls and change
# the branch of the revision from master to main 
for result in master_results:
    revision = result.revision
    # We have 2 cases
    try:
        # This is when we have 2 revision with the same commit id but different branches.
        # We need to get the new revision and update the result with the new revision.
        new_revision = Revision.objects.get(commitid=revision.commitid, branch=main_branch)
        result.revision = new_revision
        result.save()
    except ObjectDoesNotExist:
        # If the revision doesn't exist with the main branch we need to update the current one.
        revision.branch = main_branch
        revision.save()
    print(result)

# We also need to update the reports as well.
master_reports = Report.objects.filter(revision__branch__name=master_branch.name)

for report in master_reports:
    revision = report.revision
    new_revision = Revision.objects.get(commitid=revision.commitid, branch=main_branch)
    report.revision = new_revision
    report.save()
    print(report)

I hope this helps you to figure out the logic for mapping mercurial commit to git commit and then change the results to point to the new git commits.

from codespeed.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.