Giter Club home page Giter Club logo

grimoirelab-perceval's Introduction

Perceval Build Status Coverage Status PyPI version Documentation in RTD

Send Sir Perceval on a quest to retrieve and gather data from software repositories.

Usage

usage: perceval [-g] <backend> [<args>] | --help | --version | --list

Send Sir Perceval on a quest to retrieve and gather data from software
repositories.

Repositories are reached using specific backends. The most common backends
are:

    askbot           Fetch questions and answers from Askbot site
    bugzilla         Fetch bugs from a Bugzilla server
    bugzillarest     Fetch bugs from a Bugzilla server (>=5.0) using its REST API
    confluence       Fetch contents from a Confluence server
    discourse        Fetch posts from Discourse site
    dockerhub        Fetch repository data from Docker Hub site
    gerrit           Fetch reviews from a Gerrit server
    git              Fetch commits from Git
    github           Fetch issues, pull requests and repository information from GitHub
    gitlab           Fetch issues, merge requests from GitLab
    gitter           Fetch messages from a Gitter room
    googlehits       Fetch hits from Google API
    groupsio         Fetch messages from Groups.io
    hyperkitty       Fetch messages from a HyperKitty archiver
    jenkins          Fetch builds from a Jenkins server
    jira             Fetch issues from JIRA issue tracker
    launchpad        Fetch issues from Launchpad issue tracker
    mattermost       Fetch posts from a Mattermost server
    mbox             Fetch messages from MBox files
    mediawiki        Fetch pages and revisions from a MediaWiki site
    meetup           Fetch events from a Meetup group
    nntp             Fetch articles from a NNTP news group
    pagure           Fetch issues from Pagure
    phabricator      Fetch tasks from a Phabricator site
    pipermail        Fetch messages from a Pipermail archiver
    redmine          Fetch issues from a Redmine server
    rocketchat       Fetch messages from a Rocket.Chat channel
    rss              Fetch entries from a RSS feed server
    slack            Fetch messages from a Slack channel
    stackexchange    Fetch questions from StackExchange sites
    supybot          Fetch messages from Supybot log files
    telegram         Fetch messages from the Telegram server
    twitter          Fetch tweets from the Twitter Search API

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show version
  -g, --debug           set debug mode on
  -l, --list            show available backends

Run 'perceval <backend> --help' to get information about a specific backend.

Requirements

  • Python >= 3.8
  • Poetry >= 1.2
  • git
  • build-essential

You will also need some other libraries for running the tool, you can find the whole list of dependencies in pyproject.toml file.

How to install

  • build-essentials

Build-essentials is a package that contains a set of tools to compile and build software. It is required to work with Debian packages.

$ sudo apt-get install build-essential
  • git

Git is a version control system that allows you to keep track of changes in your code. It is required to work with Git repositories.

$ sudo apt-get install git

Installation

There are several ways to install Perceval on your system: packages or source code using Poetry or pip or using Docker.

PyPI

Perceval can be installed using pip, a tool for installing Python packages. To do it, run the next command:

$ pip install perceval

Source code

To install from the source code you will need to clone the repository first:

$ git clone https://github.com/chaoss/grimoirelab-perceval
$ cd grimoirelab-perceval

Then use pip or Poetry to install the package along with its dependencies.

Pip

To install the package from local directory run the following command:

$ pip install .

In case you are a developer, you should install perceval in editable mode:

$ pip install -e .

Poetry

We use poetry for dependency management and packaging. You can install it following its documentation. Once you have installed it, you can install perceval and the dependencies in a project isolated environment using:

$ poetry install

To spaw a new shell within the virtual environment use:

$ poetry shell

Docker

A Perceval Docker image is available at DockerHub.

Detailed information on how to run and/or build this image can be found here.

Documentation

Documentation is generated automatically in the ReadTheDocs Perceval site.

References

If you use Perceval in your research papers, please refer to Perceval: software project data at your will -- Pre-print:

APA style

Dueñas, S., Cosentino, V., Robles, G., & Gonzalez-Barahona, J. M. (2018, May). Perceval: software project data at your will. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (pp. 1-4). ACM.

BibTeX

@inproceedings{duenas2018perceval,
  title={Perceval: software project data at your will},
  author={Due{\~n}as, Santiago and Cosentino, Valerio and Robles, Gregorio and Gonzalez-Barahona, Jesus M},
  booktitle={Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings},
  pages={1--4},
  year={2018},
  organization={ACM}
}

Examples

Askbot

$ perceval askbot 'http://askbot.org/' --from-date '2016-01-01'

Bugzilla

To fetch bugs from Bugzilla, you have two options:

a) Use the traditional backend

$ perceval bugzilla 'https://bugzilla.redhat.com/' --backend-user user --backend-password pass --from-date '2016-01-01'

b) Use the REST API backend for Buzilla 5.0 (or higher) servers. We strongly recommend this backend when data is fetched from version servers >=5.0 because the retrieval process is much faster.

$ perceval bugzillarest 'https://bugzilla.mozilla.org/' --backend-user user --backend-password pass --from-date '2016-01-01'

Confluence

$ perceval confluence 'https://wiki.opnfv.org/' --from-date '2016-01-01'

Discourse

$ perceval discourse 'https://foro.mozilla-hispano.org/' --from-date '2016-01-01'

Docker Hub

$ perceval dockerhub grimoirelab perceval

Gerrit

To run gerrit, you will need an authorized SSH private key:

$ eval `ssh-agent -s`
$ ssh-add ~/.ssh/id_rsa
Identity added: /home/user/.ssh/id_rsa (/home/user/.ssh/id_rsa)

To run the backend, execute the next command:

$ perceval gerrit --user user 'review.openstack.org' --from-date '2016-01-01'

Git

To run this backend execute the next command. Take into account that to run this backend Git program has to be installed on your system.

$ perceval git 'https://github.com/chaoss/grimoirelab-perceval.git' --from-date '2016-01-01'

To run the backend against a private git repository, you must pass the credentials directly in the URL:

$ perceval git https://<username>:<password>@repository-url

For example, for private GitHub repositories:

$ perceval git https://<username>:<api-token>@github.com/chaoss/grimoirelab-perceval

Git backend can also work with a Git log file as input. We recommend to use the next command to get the most complete log file.

$ git log --raw --numstat --pretty=fuller --decorate=full --parents --reverse --topo-order -M -C -c --remotes=origin --all > /tmp/gitlog.log

Then, to run the backend, just execute any of the next commands:

$ perceval git --git-log '/tmp/gitlog.log' 'file:///myrepo.git'

or

$ perceval git '/tmp/gitlog.log'

GitHub

$ perceval github elastic logstash --from-date '2016-01-01'

The GitHub backend accepts the categories issue, pull_request and repository which allow to fetch the specific data.

$ perceval github --category issue elastic logstash

GitLab

$ perceval gitlab fdroid fdroiddata -t $GITLAB_TOKEN --from-date '2016-01-01'

Gitter

$ perceval gitter -t 'abcdefghi' --from-date '2020-03-18' 'jenkinsci' 'jenkins'

GoogleHits

$ perceval googlehits "bitergia grimoirelab"

Groups.io

$ perceval groupsio 'updates' -e '<[email protected]>' -p 'my-password' --from-date '2016-01-01'

In order to fetch the data from a group, you should first subscribe to it via the Groups.io website. In case you want to know the group names where you are subscribed, you can use the following script: https://gist.github.com/valeriocos/ad33a0b9b2d13a8336230c8c59df3c55

HyperKitty

$ perceval hyperkitty 'https://lists.mailman3.org/archives/list/[email protected]' --from-date 2017-01-01

Jenkins

$ perceval jenkins 'https://build.opnfv.org/ci/'

JIRA

$ perceval jira 'https://tickets.puppetlabs.com' --project PUP --from-date '2016-01-01'

Launchpad

$ perceval launchpad ubuntu --from-date '2016-01-01'

Mattermost

$ perceval mattermost 'http://mattermost.example.com' jgw7jdmjkjf19ffkwnw59i5f9e --from-date '2016-01-01' -t 'abcdefghijk'

MBox

$ perceval mbox 'http://example.com' /tmp/mboxes/

MediaWiki

$ perceval mediawiki 'https://wiki.mozilla.org' --from-date '2016-06-30'

Meetup

$ perceval meetup 'Software-Development-Analytics' --from-date '2016-06-01' -t abcdefghijk

NNTP

$ perceval nntp 'news.mozilla.org' 'mozilla.dev.project-link' --offset 10

Pagure

$ perceval pagure '389-ds-base' --from-date '2020-03-06'

Phabricator

$ perceval phabricator 'https://secure.phabricator.com/' -t 123456789abcefe

Pipermail

$ perceval pipermail 'https://mail.gnome.org/archives/libart-hackers/'

Pipermail also is able to fetch data from Apache's mod_box interface:

$ perceval pipermail 'http://mail-archives.apache.org/mod_mbox/httpd-dev/'

Redmine

$ perceval redmine 'https://www.redmine.org/' --from-date '2016-01-01' -t abcdefghijk

Rocket.Chat

Rocket.Chat backend needs an API token and a User Id to authenticate to the server.

$ perceval rocketchat -t 'abchdefghij' -u '1234abcd' --from-date '2020-05-02' https://open.rocket.chat general

RSS

$ perceval rss 'https://blog.bitergia.com/feed/'

Slack

Slack backend requires an API token for authentication. Slack apps can be used to generate and configure this API token. The scopes required by a Slack app for the backend are channels:history, channels:read and users:read. To know more about Slack apps and its integration please refer the Slack apps documentation. For more information about the scopes required by a Slack app please refer the Scopes and permissions documentation.

The following script can also be used to generate an OAuth2 token to access the Slack API.

$ perceval slack C0001 --from-date 2016-01-12 -t abcedefghijk

StackExchange

$ perceval stackexchange --site stackoverflow --tagged python --from-date '2016-01-01' -t abcdabcdabcdabcd

Supybot

$ perceval supybot 'http://channel.example.com' /tmp/supybot/

Telegram

Telegram backend needs an API token to authenticate the bot. In addition and in order to fetch messages from a group or channel, privacy settings must be disabled. To know how to create a bot, to obtain its token and to configure it please read the Telegram Bots docs pages.

Note that the messages are available on the Telegram server until the bot fetches them, but they will not be kept longer than 24 hours.

$ perceval telegram mybot -t 12345678abcdefgh --chats 1 2 -10

Twitter

Twitter backend needs a bearer token to authenticate the requests. It can be obtained using the code available on GistGitHub: https://gist.github.com/valeriocos/7d4d28f72f53fbce49f1512ba77ef5f6

$ perceval twitter grimoirelab -t 12345678abcdefgh

Community Backends

Some backends are implemented in a seperate repository but not merged into chaoss/grimoirelab-perceval due to long-run maintainence reasons. Please feel free to check the backends and contact the maintainers for any issues or questions related to them.

Running tests

Perceval comes with a comprehensive list of unit tests. To run them, in addition to the dependencies installed with Perceval, you need httpretty.

License

Licensed under GNU General Public License (GPL), version 3 or later.

grimoirelab-perceval's People

Contributors

alch-emi avatar allmight2099 avatar animeshk08 avatar anveshc05 avatar camillem avatar captaindredge avatar dependabot[bot] avatar dpose avatar electrocucaracha avatar eslerd avatar eyehwan avatar filmaj avatar gregoriorobles avatar harshalmittal4 avatar imnitishng avatar jgbarah avatar jjmerchante avatar linonymous avatar lukaszgryglicki avatar mafesan avatar nworb999 avatar ria18405 avatar sduenas avatar stevenkolawole avatar valeriocos avatar vchrombie avatar vsevagen avatar xurizaemon avatar zhifeiyue avatar zhquan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grimoirelab-perceval's Issues

Error handlers for Perceval iterators

Request:

Add a new option, when instantiating a Perceval iterator, to specify an error handler. This could be done similar to the errors parameter in codecs.

I would implement at least "strict" (which would always raise the exception, as it does now), "ignore" (which would ignore the item causing the exception, and go on with the next one) and maybe "logging" (which would use the logging module to log the problem, and then go on as with ignore).

Rationale:

  • Now, when some exception happens while a Perceval iterator is working, in some cases it is difficult to catch it. That can happen if for some the exception thrown cannot be caught in the calling environment.
  • In some other cases, having some error handling policy easily specified is just convenient.

^M) in git logs causing trouble

In some messages from the Linux kernel git log, the following line appears:

    ^M)

That is, four spaces followed by ASCII 13, Carriage Return, a parenthesis and the end of line. See an example of a one-commit git log with one of these commits in
linux-gitlog.txt (line 9 in that file).

The regex in perceval/backends/git.py for parsing the message is looking for lines starting with four spaces:

    MESSAGE_LINE_PATTERN = r"[\s]{4}(?P<msg>.*)$"

But this fails miserably in this case: the CR is identified by the regex as the end of the line ($), and the reminding parenthesis as the beginning of the next line. Since that one does not start with four spaces (in fact, it starts with "(", it is not considered as a message line, and therefore the message ends. This causes the code to consider the commit up to this line, and then expect a new commit starting in line 10, which is not happening. Exception launched, end. You can reproduce (using the attached file, saved as /tmp/linux-gitlog.txt) as follows:

$ perceval -g git --git-log /tmp/linux-gitlog.txt http://github.com/torvalds/linux.git
[2016-03-17 23:51:36,617 - root - INFO] - Sir Perceval is on his quest.
[2016-03-17 23:51:36,618 - perceval.backends.git - INFO] - Fetching commits: 'http://github.com/torvalds/linux.git' git repository from 1970-01-01 00:00:00+00:00
[2016-03-17 23:51:36,618 - perceval.backends.git - DEBUG] - Invalid message format on line 10. Skipping.
[2016-03-17 23:51:36,619 - perceval.backends.git - DEBUG] - Invalid action format on line 10. Skipping.
[2016-03-17 23:51:36,619 - perceval.backends.git - DEBUG] - Commit 2d137c24e9f433e37ffd10b3d5f418157589a8d2 parsed
{
    "Author": "[email protected] <[email protected]>",
    "AuthorDate": "Sat Apr 16 15:23:55 2005 -0700",
    "Commit": "Linus Torvalds <[email protected]>",
    "CommitDate": "Sat Apr 16 15:23:55 2005 -0700",
    "__metadata__": {
        "backend_name": "Git",
        "backend_version": "0.1.0",
        "origin": "http://github.com/torvalds/linux.git",
        "timestamp": 1458255096.619283,
        "updated_on": 1113690235.0
    },
    "commit": "2d137c24e9f433e37ffd10b3d5f418157589a8d2",
    "files": [],
    "message": "[PATCH] arm: fix SIGBUS handling\n\n",
    "parents": [
        "baaa2c512dc1c47e3afeb9d558c5323c9240bd21"
    ],
    "refs": []
}
Traceback (most recent call last):
  File "/home/jgb/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/git.py", line 204, in run
    for commit in commits:
  File "/home/jgb/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backend.py", line 161, in decorator
    for item in func(self, *args, **kwargs):
  File "/home/jgb/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/git.py", line 107, in fetch
    for commit in commits:
  File "/home/jgb/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/git.py", line 150, in parse_git_log_from_file
    for commit in parser.parse():
  File "/home/jgb/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/git.py", line 371, in parse
    parsed = self.handlers[self.state](line)
  File "/home/jgb/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/git.py", line 413, in _handle_commit
    raise ParseError(cause=msg)
perceval.errors.ParseError: commit expected on line 10

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jgb/venvs/grimoirelab/bin/perceval", line 163, in <module>
    main()
  File "/home/jgb/venvs/grimoirelab/bin/perceval", line 84, in main
    cmd.run()
  File "/home/jgb/venvs/grimoirelab/lib/python3.5/site-packages/perceval/backends/git.py", line 211, in run
    raise RuntimeError(str(e))
RuntimeError: commit expected on line 10

I guess the regex should be fixed to consider this case, or maybe CR should be filtered out somehow before using the regex.

Undeclared dependency on install

While installing perceval, some dependencies were installed (requests, beautifulsoup, etc).

After installing and launching the software I had the following issue:
Import error: no module named configparser

After installing python-configparser the program works smoothly. Is there a need python-configparser to be added as a dependency during installation?

[git] Option for backends.git.Git allowing for no automatic update of repo

When working off-line, or when willing to exactly reproduce something that happened in a given repository, "as it is", it would be interesting having an option for backends.git.Git (or more likely for fetch, of that class) which doesn't retrieve new commits from the original repository before parsing the log.

I will produce a patch request about this when I have some time...

Query on tool functionality

Is there is a feature in this tools that links the mailing list data with the contributors working on the piece of code. For instance, a contributor is working on file A and he asks a question on file A in mailing lists. Linking of the mailing list with the file.

Thanks in advance!

[supybot] crash when text message begins with a single "*"

supybot backend fails when text message begins with a single "*"

(e.g.: '* dpose message like this')

I got this issue running 'perceval supybot --origin my_irc_channel #my_irc_channel/#my_irc_channel.2016-07-08.log'

Log format:

2016-07-08T11:58:38+0000  <dpose> message1
2016-07-08T11:59:19+0000  <dpose> message2
2016-07-08T11:59:28+0000  * dpose message like this
2016-07-08T11:59:46+0000  *** dpose has left #my_irc_channel
2016-07-08T12:00:40+0000  *** dpose has quit IRC

ImportError: No module named 'dateutil'

On a new Perceval installaiton, I get the following error when running the script:

Traceback (most recent call last):
  File "/usr/local/bin/perceval", line 30, in <module>
    from perceval.backends import PERCEVAL_CMDS
  File "/usr/local/lib/python3.4/dist-packages/perceval/backends/__init__.py", line 23, in <module>
    from .bugzilla import Bugzilla, BugzillaCommand
  File "/usr/local/lib/python3.4/dist-packages/perceval/backends/bugzilla.py", line 32, in <module>
    import dateutil.tz
ImportError: No module named 'dateutil'

When I attempt to install python3-dateutil, I get the following messages:

$ sudo pip3 install python-dateutil
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /usr/local/lib/python3.4/dist-packages
sudo pip3 install python-dateutil --upgrade
Requirement already up-to-date: python-dateutil in /usr/local/lib/python3.4/dist-packages

Solution

I was able to fix this message by using the APT package:

sudo apt-get install python3-dateutil

Unicode error due to using surrogates

When running the git backend to analyze the git repo of the torvalds/linux GitHub repository, and uploading it to ElasticSearch (using the Python elasticsearch module) i get an error which apparently is due to trying to UTF-8 encode a "surrogated" string. The relevant part of the debugging messages and the exception I get is:

...
'message': "[PATCH] intel8x0: AC'97 audio patch for Intel ESB2\n\nThis
patch adds the Intel ESB2 DID's to the intel8x0.c file for AC'97
audio\nsupport.\n\nSigned-off-by: \udca0Jason Gaston
<[email protected]>\nSigned-off-by: Andrew Morton
<[email protected]>\nSigned-off-by: Linus Torvalds <[email protected]>",
'commit': 'c4c8ea948aa21527d502e87227b2f1d951bc506d'}
...
Traceback (most recent call last):
...
  File "/home/jgb/venvs/grimoirelab/lib/python3.5/site-packages/elasticsearch/transport.py", line 312, in perform_request
     body = body.encode('utf-8')
 UnicodeEncodeError: 'utf-8' codec can't encode character '\udca0' in position 984: surrogates not allowed

The code causing the error is:

res = self.es.index(index = self.index, doc_type = self.type,
                            id = self._id(item), body = item)

The problem seems to be that there is a character, decoded from the git log by Perceval into a Unicode string, which is using surrogates (the character is '\udca0'), and it cannot be properly encoded in UTF-8 by the elasticsearch code, thus raising the exception.

[mbox] Error when parsing a specific date

I found the following error when parsing a mailing list file in mbox format.

Traceback (most recent call last):
  File "/usr/local/bin/perceval", line 176, in <module>
    main()
  File "/usr/local/bin/perceval", line 97, in main
    cmd.run()
  File "/usr/local/lib/python3.4/dist-packages/perceval/backends/mbox.py", line 330, in run
    raise RuntimeError(str(e))
RuntimeError: Thu, 14 Aug 2008 02:07:59 +0200 CEST is not a valid date
~/$ perceval mbox --no-cache --origin=evince mboxes ./gmane.comp.emulators.qemu

pip3

If Perceval uses Python 3.4 or later, doesn't is use pip3 instead of pip?
I think we must change the README.md file.

Git Backend: Error when no commits are found

perceval -g git git://gerrit.libreoffice.org/benchmark --from-date "2016-03-09" 

...
File "/usr/local/lib/python3.4/dist-packages/perceval/backends/git.py", line 396, in _handle_commit
raise ParseError(cause=msg)
perceval.errors.ParseError: commit expected on line 1

Warning related to git rename detection

It seems the way Perceval runs git to get the git log could miss some renaming in some cases (this is with the Linux kernel git repo).

DEBUG:Git https://github.com/torvalds/linux.git repository cloned into /tmp/tmplwaiec1i/torvalds/linux
DEBUG:Running command git fetch origin (cwd: /tmp/tmplwaiec1i/torvalds/linux, env: {'LANG': 'C'})
DEBUG:
DEBUG:Running command git reset --hard origin (cwd: /tmp/tmplwaiec1i/torvalds/linux, env: {'LANG': 'C'})
DEBUG:
DEBUG:Git https://github.com/torvalds/linux.git repository pulled into /tmp/tmplwaiec1i/torvalds/linux
DEBUG:Running command git log --raw --numstat --pretty=fuller --decorate=full --all --reverse --topo-order --parents -M -C -c --remotes=origin (cwd: /tmp/tmplwaiec1i/torvalds/linux, env: {'PAGER': '', 'LANG': 'C'})
DEBUG:warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 1569 and retry the command.

DEBUG:Git log fetched from https://github.com/torvalds/linux.git repository (/tmp/tmplwaiec1i/torvalds/linux)

So, git log is not able of tracking renaming to the extent it could, because of that limit. I reproduced that running git log from the command line:

$ git log --raw --numstat --pretty=fuller --decorate=full --all --reverse --topo-order --parents -M -C -c --remotes=origin > linux.log
warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 1569 and retry the command.

Maybe this is a corner case, but probably trying to consider it wouldn't harm. I wonder if we could pass some option or whatever to git log when running it from Perceval, to avoid these cases...

How to parse perceval output into valid JSON?

The output of the perceval git command does not appear to be valid JSON. Rather, it is a series of objects, that are not comma delimeted:

{
    "Author": "Ville Jyrkk\u00e4 <[email protected]>",
    "AuthorDate": "Mon Feb 22 10:54:58 2016 +0200",
    "Commit": "Ville Jyrkk\u00e4 <[email protected]>",
    "CommitDate": "Mon Feb 22 10:54:58 2016 +0200",
    "__metadata__": {
        "backend_name": "Git",
        "backend_version": "0.1.0",
        "origin": "https://github.com/Digipalvelutehdas/dipor-dashboard.git",
        "timestamp": 1458733054.514615,
        "updated_on": 1456131298.0,
        "uuid": "60208b4c67b862b7f5d34b913d179fc827e8814c"
    },
    "commit": "752192d16eee70cdbaf6e6fe2feba20e4ed8cb4f",
    "files": [
        {
            "action": "A",
            "added": "2",
            "file": "README.md",
            "indexes": [
                "0000000...",
                "1fe1d4e..."
            ],
            "modes": [
                "000000",
                "100644"
            ],
            "removed": "0"
        }
    ],
    "message": "Initial commit",
    "parents": [],
    "refs": []
}
{
    "Author": "Brylie Christopher Oxley <[email protected]>",
    "AuthorDate": "Mon Feb 22 11:40:37 2016 +0200",
    "Commit": "Brylie Christopher Oxley <[email protected]>",
    "CommitDate": "Mon Feb 22 11:40:37 2016 +0200",
    "__metadata__": {
        "backend_name": "Git",
        "backend_version": "0.1.0",
        "origin": "https://github.com/Digipalvelutehdas/dipor-dashboard.git",
        "timestamp": 1458733054.515244,
        "updated_on": 1456134037.0,
        "uuid": "0cd185cadcbcd8d588a79cc4fef7cab54579ce6c"
    },
    "commit": "709723c75e11069626496007a442d119a168c560",
    "files": [
        {
            "action": "A",
            "added": "116",
            "file": "docs/LICENSE",
            "indexes": [
                "0000000...",
                "670154e..."
            ],
            "modes": [
                "000000",
                "100644"
            ],
            "removed": "0"
        }
    ],
    "message": "Initial migration",
    "parents": [
        "752192d16eee70cdbaf6e6fe2feba20e4ed8cb4f"
    ],
    "refs": []
}
...

How can we convert this output into valid JSON?

[gerrit] Bad query when filtering several reviews using blacklist option

When executing gerrit backend with --blacklist option to filter reviews, these reviews are added to the ssh call as:

blacklist_reviews = " AND NOT (%s)" % (','.join(self.blacklist_reviews))

and it must be

blacklist_reviews = " AND NOT (%s)" % (' OR '.join(self.blacklist_reviews))

so the result is

ssh -p 29418 [email protected] gerrit  query limit:1 '(status:open OR status:closed) AND NOT (75970 OR 67778 OR 74821)'

Supported URLs for bugzilla

Please consider this as a question/feature request.

Of what I understand with using the bugzilla backend, it looks for an URL of the style: https://bugzilla.organization.org/. It doesn't support URL schema as the following: https://bugzilla.organization.org/buglist.cgi?product=project. Is my understanding correct?

If this is the case, it would be nice to support also the mentioned URL schema. Often one may need only the Bugzilla of a single project (say gedit) and not the bug repository for the entire organization (gnome in this case). IIRC in bicho there was the possibility to ask only for a single project, so it is a pity it's not the case in perceval.

Would it be complicated to implement this?

git backend is not working after empty repositories change

(acs@dellx) ~ $ git --version
git version 2.7.4
(acs@dellx) (master $% u=) ~/devel/perceval $ git log -1
commit d350d0923c7054807e4aeba91df0c66e29fa7b3e
Author: Santiago Dueñas <[email protected]>
Date:   Mon Jan 16 17:31:26 2017 +0100

    [backend] Set cache object during command initialization
(acs@dellx) (master $% u=) ~/devel/perceval $ sudo python3 setup.py install
(acs@dellx) (master $% u=) ~/devel/perceval $ perceval -g git https://github.com/grimoirelab/perceval
[2017-01-16 23:18:08,928 - perceval.backends.core.git - DEBUG] - Running command git count-objects (cwd: /home/acs/.perceval/repositories/https://github.com/grimoirelab/perceval, env: {'LANG': 'C'})
[2017-01-16 23:18:08,930 - perceval.backends.core.git - DEBUG] - 
[2017-01-16 23:18:08,931 - perceval.backends.core.git - DEBUG] - Git https://github.com/grimoirelab/perceval repository has 0 objects
[2017-01-16 23:18:08,931 - perceval.backends.core.git - WARNING] - Git https://github.com/grimoirelab/perceval repository is empty; unable to pull
...
[2017-01-16 23:18:08,935 - perceval.backends.core.git - INFO] - Fetch process completed: 0 commits fetched

The problems seems to be related with the is_empy checking:

https://github.com/grimoirelab/perceval/blob/master/perceval/backends/core/git.py#L709

(acs@dellx) (master $% u=) ~/devel/perceval $ cd /home/acs/.perceval/repositories/https://github.com/grimoirelab/perceval
(acs@dellx) (master u=) ~/.perceval/repositories/https:/github.com/grimoirelab/perceval $ git count-objects
0 objects, 0 kilobytes
(acs@dellx) (master u=) ~/.perceval/repositories/https:/github.com/grimoirelab/perceval $ git log | grep '^commit ' | wc -l
565

The repository is not empty but has 0 objects.

failed Tests

I've checked the tests and often a lot of then fail. Is it normal?

[jira] Not clear error when running wrong URLs in Jira

Hi,

I've tried to run Perceval on Jira. After a while I realized that the error was related to a wrong URL I was using.

However, the error is not captured and it's not clear what that means until you use the debug mode in Perceval. I would expect some clearer message error without using the debug option.

Error and how to reproduce this:

$ perceval -g jira 'https://issues.apache.org' --project PIG

[2016-10-28 12:11:39,324 - root - INFO] - Sir Perceval is on his quest.
[2016-10-28 12:11:39,328 - perceval.backends.jira - INFO] - Looking for issues at site 'https://issues.apache.org', in project 'PIG' and updated from '1970-01-01 00:00:00+00:00'
[2016-10-28 12:11:39,331 - requests.packages.urllib3.connectionpool - INFO] - Starting new HTTPS connection (1): issues.apache.org
[2016-10-28 12:11:40,152 - requests.packages.urllib3.connectionpool - DEBUG] - "GET /rest/api/2/field HTTP/1.1" 404 214
Traceback (most recent call last):
  File "/usr/local/bin/perceval", line 4, in <module>
    **import**('pkg_resources').run_script('perceval==0.4.0.dev1', 'perceval')
  File "/usr/lib/python3/dist-packages/pkg_resources/**init**.py", line 735, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/**init**.py", line 1652, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.4.0.dev1-py3.4.egg/EGG-INFO/scripts/perceval", line 176, in <module>
    main()
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.4.0.dev1-py3.4.egg/EGG-INFO/scripts/perceval", line 97, in main
    cmd.run()
  File "/usr/local/lib/python3.4/dist-packages/perceval-0.4.0.dev1-py3.4.egg/perceval/backends/jira.py", line 408, in run
    raise requests.exceptions.HTTPError(str(e.response.json()))
  File "/usr/lib/python3/dist-packages/requests/models.py", line 819, in json
    return json.loads(self.text, **kwargs)
  File "/usr/lib/python3.4/json/**init**.py", line 318, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.4/json/decoder.py", line 343, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.4/json/decoder.py", line 361, in raw_decode
    raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)

[git] Error when trying to fetch a repo in 'detached HEAD' state

When trying to fetch a git repo in 'detached HEAD' state, it returns the following error:

perceval git https://gerrit.automotivelinux.org/gerrit/zzz_acl/staging_acl
[2017-01-04 12:04:28,989] - Sir Perceval is on his quest.
[2017-01-04 12:04:28,990] - Fetching commits: 'https://gerrit.automotivelinux.org/gerrit/zzz_acl/staging_acl' git repository from 1970-01-01 00:00:00+00:00; all branches
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 261, in run
    for commit in commits:
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backend.py", line 192, in decorator
    for data in func(self, *args, **kwargs):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 109, in fetch
    commits = self.__fetch_and_parse_log(from_date, branches)
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 122, in __fetch_and_parse_log
    repo = self.__create_and_update_git_repository()
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 131, in __create_and_update_git_repository
    repo.pull()
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 660, in pull
    self._exec(cmd_reset, cwd=self.dirpath, env={'LANG' : 'C'})
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 790, in _exec
    raise RepositoryError(cause=cause)
perceval.errors.RepositoryError: git command - fatal: ambiguous argument 'origin': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/perceval", line 4, in <module>
    __import__('pkg_resources').run_script('perceval==0.5.0.dev2', 'perceval')
  File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 746, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 1501, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/EGG-INFO/scripts/perceval", line 179, in <module>
    main()
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/EGG-INFO/scripts/perceval", line 100, in main
    cmd.run()
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 268, in run
    raise RuntimeError(str(e))
RuntimeError: git command - fatal: ambiguous argument 'origin': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

Problem with pip in installation

If I do the steps as README.md tell me, when I run "pip install -r requirements.txt" It don't work because I need pip3 instead of pip.
The correct would be:

pip3 install -r requirement.txt

Python3 requests issue with Perceval

When running a command on Perceval, I get errors where python requests: RuntimeError: [SSL: CERTIFICATE_VERIFY_FAILED] . My details are as follows.

Met setup: A Windows 10 machine, with latest Virtualbox, and Suse Leap 42. Python3 is installed with all the required extensions. So far, I only installed Perceval, to see how far I get with this.

I run the following command, which does provide feedback that Perceval is on a quest:

$ perceval github -u your_github_user -p your_github_passwd --owner
elastic --repository filebeat --from-date '2016-01-01'

However, this throws the following output/errors:

osboxes:/home/osboxes/grimoire # perceval github -u 'robinmuilwijk' -p '********' --owner elastic --repository filebeat --from-date '2016-01-01'
[2016-06-11 11:01:44,037] - Sir Perceval is on his quest.
Traceback (most recent call last):
File "/usr/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 544, in urlopen
body=body, headers=headers)
File "/usr/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 341, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 761, in validate_conn
conn.connect()
File "/usr/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 238, in connect
ssl_version=resolved_ssl_version)
File "/usr/lib/python3.4/site-packages/requests/packages/urllib3/util/ssl
.py", line 279, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib64/python3.4/ssl.py", line 364, in wrap_socket
_context=self)
File "/usr/lib64/python3.4/ssl.py", line 578, in init
self.do_handshake()
File "/usr/lib64/python3.4/ssl.py", line 805, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:598)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.4/site-packages/requests/adapters.py", line 367, in send
timeout=timeout
File "/usr/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 574, in urlopen
raise SSLError(e)
requests.packages.urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:598)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.4/site-packages/perceval/backends/github.py", line 409, in run
for issue in issues:
File "/usr/lib/python3.4/site-packages/perceval/backend.py", line 171, in decorator
for data in func(self, _args, *_kwargs):
File "/usr/lib/python3.4/site-packages/perceval/backends/github.py", line 135, in fetch
for raw_issues in issues_groups:
File "/usr/lib/python3.4/site-packages/perceval/backends/github.py", line 307, in get_issues
self.__get_headers())
File "/usr/lib/python3.4/site-packages/perceval/backends/github.py", line 291, in __send_request
r = requests.get(url, params=params, headers=headers)
File "/usr/lib/python3.4/site-packages/requests/api.py", line 69, in get
return request('get', url, params=params, *_kwargs)
File "/usr/lib/python3.4/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, *_kwargs)
File "/usr/lib/python3.4/site-packages/requests/sessions.py", line 465, in request
resp = self.send(prep, *_send_kwargs)
File "/usr/lib/python3.4/site-packages/requests/sessions.py", line 573, in send
r = adapter.send(request, *_kwargs)
File "/usr/lib/python3.4/site-packages/requests/adapters.py", line 428, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:598)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/bin/perceval", line 163, in
main()
File "/usr/bin/perceval", line 84, in main
cmd.run()
File "/usr/lib/python3.4/site-packages/perceval/backends/github.py", line 417, in run
raise RuntimeError(str(e))
RuntimeError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:598)
osboxes:/home/osboxes/grimoire #

I tried what you suggested, running the below commands in python console:

>>> import requests
>>> requests.get('https://github.com', verify=True)

I did the same with url api.github.com but both give me the same errors as I listed above.

So I ended up changing /perceval/backends/github.py at line 291, to:

r = requests.get(url, params=params, headers=headers, verify=False)

That does not solve the problem either, same errors again. Did I change that line incorrectly?

Thanks, Robin

Problem parsing dates in pipermail

(acs@dellx) (master *$% u=) ~/devel/GrimoireELK/utils $ ./p2o.py -g -e http://localhost:9200 pipermail https://mail.python.org/pipermail/cplusplus-sig/
....
2016-10-10 20:11:03,411 Message [email protected]  Tue Oct 14 00:29:20 2008 parsed
2016-10-10 20:11:03,412 Error feeding ocean from pipermail (https://mail.python.org/pipermail/cplusplus-sig/): Fri, 24 May 2002 00.24.50 +0200 is not a valid date
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.3.2.dev0-py3.5.egg/perceval/utils.py", line 119, in str_to_datetime
    dt = dateutil.parser.parse(ts)
  File "/usr/lib/python3/dist-packages/dateutil/parser.py", line 1008, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/lib/python3/dist-packages/dateutil/parser.py", line 395, in parse
    raise ValueError("Unknown string format")
ValueError: Unknown string format

Error while running perceval gitbackend (unexpected end of log stream)

I run the following commands, after installing the current version of Perceval:

$ git clone https://github.com/MetricsGrimoire/CVSAnalY.git
$ cd CVSAnalY
$ git log --raw --numstat --pretty=fuller --decorate=full --parents -M -C -c --remotes=origin --all \
  > /tmp/gitlog.log
$ perceval git /tmp/cvsanaly-gitlog.log 
[2016-01-25 00:34:26,568] - Sir Perceval is on his quest.
[2016-01-25 00:34:26,570] - Fetching commits: '/tmp/cvsanaly-gitlog.log' git log
Traceback (most recent call last):
  File "/home/jgb/venvs/perceval/lib/python3.4/site-packages/perceval/backends/git.py", line 129, in run
    for commit in commits:
  File "/home/jgb/venvs/perceval/lib/python3.4/site-packages/perceval/backend.py", line 161, in decorator
    for item in func(self, *args, **kwargs):
  File "/home/jgb/venvs/perceval/lib/python3.4/site-packages/perceval/backends/git.py", line 73, in fetch
    commits = [commit for commit in self.parse_git_log(self.gitlog)]
  File "/home/jgb/venvs/perceval/lib/python3.4/site-packages/perceval/backends/git.py", line 73, in <listcomp>
    commits = [commit for commit in self.parse_git_log(self.gitlog)]
  File "/home/jgb/venvs/perceval/lib/python3.4/site-packages/perceval/backends/git.py", line 102, in parse_git_log
    for commit in parser.parse():
  File "/home/jgb/venvs/perceval/lib/python3.4/site-packages/perceval/backends/git.py", line 294, in parse
    raise ParseError(cause=msg)
perceval.errors.ParseError: unexpected end of log stream

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jgb/venvs/perceval/bin/perceval", line 158, in <module>
    main()
  File "/home/jgb/venvs/perceval/bin/perceval", line 79, in main
    cmd.run()
  File "/home/jgb/venvs/perceval/lib/python3.4/site-packages/perceval/backends/git.py", line 136, in run
    raise RuntimeError(str(e))
RuntimeError: unexpected end of log stream

[git] Error when trying to fetch an empty repository

I've found that in git backend, when trying to fetch an empty repository, it returns the following error:

$ perceval git https://github.com/ethereum/branding.git
[2016-12-23 11:49:31,513] - Sir Perceval is on his quest.
[2016-12-23 11:49:31,517] - Fetching commits: 'https://github.com/ethereum/branding.git' git repository from 1970-01-01 00:00:00+00:00; all branches
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 261, in run
    for commit in commits:
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backend.py", line 192, in decorator
    for data in func(self, *args, **kwargs):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 109, in fetch
    commits = self.__fetch_and_parse_log(from_date, branches)
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 122, in __fetch_and_parse_log
    repo = self.__create_and_update_git_repository()
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 131, in __create_and_update_git_repository
    repo.pull()
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 660, in pull
    self._exec(cmd_reset, cwd=self.dirpath, env={'LANG' : 'C'})
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 790, in _exec
    raise RepositoryError(cause=cause)
perceval.errors.RepositoryError: git command - fatal: ambiguous argument 'origin': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/perceval", line 4, in <module>
    __import__('pkg_resources').run_script('perceval==0.5.0.dev2', 'perceval')
  File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 746, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 1501, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/EGG-INFO/scripts/perceval", line 179, in <module>
    main()
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/EGG-INFO/scripts/perceval", line 100, in main
    cmd.run()
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/git.py", line 268, in run
    raise RuntimeError(str(e))
RuntimeError: git command - fatal: ambiguous argument 'origin': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

I know that it doesn't makes sense to fetch an empty repository, but we don't know when it's going to have information so in my opinion we still need to track it. Maybe a warning instead of error?

In task transactions use tags names

In the phab maniphest task transactions the values that appears for tags (projects) are:

"oldValue": {
                           "PHID-PROJ-uciao3ovue2wd7jxlscl": {
                              "type": "41",
                              "src": "PHID-TASK-eby44siilz55yf4ngxmw",
                              "dst": "PHID-PROJ-uciao3ovue2wd7jxlscl",

It will be great to have the human names for these tags, like it is done in projects array data.

Structure of table extracted in Git

I have used git to retrieve commit data. Are there any documents that explain the structure of the table. What information is extracted in "Action" field is recording?

Parsing error with Askbot backend

(acs@dellx) (askbot *$%) ~/devel/GrimoireELK/utils $ ./p2o.py -g --index askbot-dev askbot 'http://askbot.org/' --from-date '2016-01-01'
...
2016-11-28 00:20:24,592 Fetching questions from 'http://askbot.org/': page 77/81
2016-11-28 00:20:25,974 Fetching HTML question 12583
2016-11-28 00:20:27,405 Fetching HTML question 10350
2016-11-28 00:20:28,833 Fetching HTML question 14382
2016-11-28 00:20:30,920 Fetching HTML question 7893
2016-11-28 00:20:31,048 Error feeding ocean from askbot (http://askbot.org/): 'data-comment-id'
Traceback (most recent call last):
  File "/home/acs/devel/GrimoireELK/utils/grimoire/arthur.py", line 98, in feed_backend
    ocean_backend.feed(backend_cmd.from_date)
  File "/home/acs/devel/GrimoireELK/utils/grimoire/ocean/elastic.py", line 186, in feed
    for item in items:
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev1-py3.5.egg/perceval/backend.py", line 192, in decorator
    for data in func(self, *args, **kwargs):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev1-py3.5.egg/perceval/backends/core/askbot.py", line 90, in fetch
    question_obj = self.__build_question(html_question)
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev1-py3.5.egg/perceval/backends/core/askbot.py", line 188, in __build_question
    if AskbotParser.parse_question_comments(html_question[0]):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev1-py3.5.egg/perceval/backends/core/askbot.py", line 315, in parse_question_comments
    question_comments = AskbotParser.parse_comments(comments)
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev1-py3.5.egg/perceval/backends/core/askbot.py", line 433, in parse_comments
    'id': comment.attrs["data-comment-id"],
KeyError: 'data-comment-id'

[Mediawiki] Using --from-date on recent mediawiki versions results in a exception (UTC Format related)

Hey there,

I want to report a bug:

Was testing perceval on our wiki project which uses mediawiki 1.27.

Passing the --from-data params with a timestamp crashes the script.

For example:

[2016-09-01 18:26:41,164 - requests.packages.urllib3.connectionpool - DEBUG] - "GET /api.php?list=allrevisions&format=json&arvnamespace=0&action=query&arvstart=2015-06-01T00%3A00%3A00%2B00%3A00&arvprop=ids&arvlimit=max&arvdir=newer HTTP/1.1" 200 None
Traceback (most recent call last):
  File "/home/crisbal/WikiToLearn/perceval/venv/lib/python3.5/site-packages/perceval/backends/mediawiki.py", line 494, in run
    for build in pages:
  File "/home/crisbal/WikiToLearn/perceval/venv/lib/python3.5/site-packages/perceval/backend.py", line 171, in decorator
    for data in func(self, *args, **kwargs):
  File "/home/crisbal/WikiToLearn/perceval/venv/lib/python3.5/site-packages/perceval/backends/mediawiki.py", line 110, in fetch
    for page_reviews in fetcher:
  File "/home/crisbal/WikiToLearn/perceval/venv/lib/python3.5/site-packages/perceval/backends/mediawiki.py", line 203, in __fetch_1_27
    pages_json = data_json['query']['allrevisions']
KeyError: 'query'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/crisbal/WikiToLearn/perceval/venv/bin/perceval", line 176, in <module>
    main()
  File "/home/crisbal/WikiToLearn/perceval/venv/bin/perceval", line 97, in main
    cmd.run()
  File "/home/crisbal/WikiToLearn/perceval/venv/lib/python3.5/site-packages/perceval/backends/mediawiki.py", line 505, in run
    raise RuntimeError(str(e))
RuntimeError: 'query'

The problem is that newer mediawiki versions refuse to work with the formatted timestamp produced by python (https://phabricator.wikimedia.org/T144482)

It's an issue that mediawiki devs have to fix but we could hack comething to make it work.

setup.py having UTF-8 characters causing trouble in readthedocs

I see readthedocs is failing when processing the setup.py file:

 python setup.py install --force
Traceback (most recent call last):
  File "setup.py", line 31, in <module>
    fd.read(), re.MULTILINE).group(1)
  File "/home/docs/checkouts/readthedocs.org/user_builds/perceval/envs/stable/lib/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 794: ordinal not in range(128)

The problem seems to be:

  • For some reason, the default codec for readthedocs seems to be ascii instead of utf8
  • There is a utf8 char in the perceval/_version.py file:
#     Santiago Dueñas <[email protected]>

That seems to make the parsing fail when reading the file, if the default encoding is ASCII, which seems to be the case in readthedocs.

If you're having utf8 by default, you can still reproduce the error by using LC_ALL=C:

(perceval)jgb@expisito:~/src/grimoirelab/perceval$ LC_ALL=C python setup.py install --force
Traceback (most recent call last):
  File "setup.py", line 31, in <module>
    fd.read(), re.MULTILINE).group(1)
  File "/home/jgb/venvs/perceval/lib/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 794: ordinal not in range(128)

I see two fixes:

  • Removing the header from perceval/_version.py. In the end, that file probably doesn't need a header... That would avoid the bug by removing the utf8 char.
  • Teaching setup.py to use utf when reading perceval/_version.py. For example:
with open('perceval/_version.py', 'r', encoding="utf-8") as fd:
    version = re.search(r'^__version__\s*=\s*[\'"]([^\'"]*)[\'"]',
                        fd.read(), re.MULTILINE).group(1)

I guess any of these would solve the problem. Please, fix it so that we can have updated readthedocs documentation, and in addition, the setup.py works in non-utf8 environments.

[mbox] not fetching the messages.

I'm trying to get all the mails at http://mail-archives.apache.org/mod_mbox/httpd-dev/
But the output is Done. 0/0 messages fetched; 0 ignored

perceval mbox 'http://mail-archives.apache.org/mod_mbox/httpd-dev/' /home/mrx/src/ccm/CodeComMerg/http-dev
[2016-12-26 23:59:53,965] - Sir Perceval is on his quest.
[2016-12-26 23:59:53,966] - Looking for messages from 'http://mail-archives.apache.org/mod_mbox/httpd-dev/' on '/home/mrx/src/ccm/CodeComMerg/http-dev' since 1970-01-01 00:00:00+00:00
[2016-12-26 23:59:53,966] - Done. 0/0 messages fetched; 0 ignored
[2016-12-26 23:59:53,966] - Fetch process completed
[2016-12-26 23:59:53,967] - Sir Perceval completed his quest.

Please guide me.

mbox takes any sentence starting by `^From ` as a new message

The mbox backend parses anything starting with From as a new message. Therefore, the following message from OpenStack will be taken as 2 messages:

From eric at cloudscaling.com  Thu Aug  9 18:39:17 2012
From: eric at cloudscaling.com (Eric Windisch)
Date: Thu, 9 Aug 2012 14:39:17 -0400
Subject: [openstack-dev] [Openstack] Making the RPC backend a required
 configuration parameter
In-Reply-To: <[email protected]>
References: <[email protected]>
 <[email protected]>
Message-ID: <[email protected]>


>  
> I also don't understand why having a default that doesn't work for
> anyone makes any sense.
>  
I would hope that a localhost only installation with a username and password of 'guest' include a very small number of anyones. Who is really using a completely stock, default configuration successfully, and do they really care?  Everyone else is using configuration management of sorts, at which point this discussion is moot. Even devstack changes this configuration.

If someone really is installing Nova from scratch and using a default configuration? Does setuptools also install RabbitMQ for you?  No?  Right, you need to read documentation and recognize that RabbitMQ needs to be installed.  Sure, once it is installed, no configuration is required; Unless you're /actually/ going to use it, of course.

From everything I've seen, the general recommendation on the mailing list for those installing Nova on a single node is to use devstack. In that case, the configuration is prompt-driven, and whatever changes need to be made, can be made.

Regards,
Eric Windisch

The main issue is that leaves the message truncated. In this example, the last paragraph and signature will be lost.

If the purpose is to only parse metadata, the approach is ok. Although there would not be reason to store the body of the message.

[jira] The Jira backend starts from the newest issues instead of starting from the oldest ones

Hi!

I've been playing with the Jira backend of Perceval. I noticed that this starts from the newest issues instead of starting to retrieve information from the oldest ones.

As the incremental support relies on the last date, this means that there's not an actual incremental support for Jira if the connection drops (but this may work if everything's working fine).

How to reproduce the error:

perceval -g jira 'https://issues.apache.org/jira' --project PIG

Thanks!

perceval.errors.RepositoryError: [Errno 2] No such file or directory: 'git'

Hi,

I have a git log file in /tmp/gitlog.log and I'm running sudo docker run --rm -it --name perceval -v ~/.perceval/cache:/root/.perceval/cache grimoirelab/perceval:latest git /tmp/gitlog.log.

It fails with the following trace:

[2016-12-13 16:42:36,117] - Sir Perceval is on his quest.
[2016-12-13 16:42:36,119] - Fetching commits: '/tmp/gitlog.log' git repository from 1970-01-01 00:00:00+00:00; all branches
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/perceval/backends/core/git.py", line 782, in _exec
    cwd=cwd, env=env)
  File "/usr/local/lib/python3.4/subprocess.py", line 856, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.4/subprocess.py", line 1460, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'git'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/perceval/backends/core/git.py", line 261, in run
    for commit in commits:
  File "/usr/local/lib/python3.4/site-packages/perceval/backend.py", line 192, in decorator
    for data in func(self, *args, **kwargs):
  File "/usr/local/lib/python3.4/site-packages/perceval/backends/core/git.py", line 109, in fetch
    commits = self.__fetch_and_parse_log(from_date, branches)
  File "/usr/local/lib/python3.4/site-packages/perceval/backends/core/git.py", line 122, in __fetch_and_parse_log
    repo = self.__create_and_update_git_repository()
  File "/usr/local/lib/python3.4/site-packages/perceval/backends/core/git.py", line 128, in __create_and_update_git_repository
    repo = GitRepository.clone(self.uri, self.gitpath)
  File "/usr/local/lib/python3.4/site-packages/perceval/backends/core/git.py", line 639, in clone
    cls._exec(cmd, env={'LANG' : 'C'})
  File "/usr/local/lib/python3.4/site-packages/perceval/backends/core/git.py", line 785, in _exec
    raise RepositoryError(cause=str(e))
perceval.errors.RepositoryError: [Errno 2] No such file or directory: 'git'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/perceval", line 179, in <module>
    main()
  File "/usr/local/bin/perceval", line 100, in main
    cmd.run()
  File "/usr/local/lib/python3.4/site-packages/perceval/backends/core/git.py", line 268, in run
    raise RuntimeError(str(e))
RuntimeError: [Errno 2] No such file or directory: 'git'

remo2 backend is not installed

When executing:

(acs@dellx) (master $% u=) ~/devel/perceval $ sudo python3 setup.py install

remo2 backend is not installed

(acs@dellx) (master $% u=) ~/devel/perceval $ perceval | grep remo
    remo             Fetch events and people from a ReMo site

Probably is related with the fact that remo and remo2 both have the class

class ReMo(Backend):

and the automatic detection of backend classes does not support it.

[pipermail] Certificate verification error with some lists

When trying to fetch some pipermail archives, a certificate error is returned during the handshake:

$ perceval pipermail https://lists.projectatomic.io/projectatomic-archives/atomic-devel/
[2016-12-28 15:26:29,747] - Sir Perceval is on his quest.
[2016-12-28 15:26:29,749] - Looking for messages from 'https://lists.projectatomic.io/projectatomic-archives/atomic-devel/' since 1970-01-01 00:00:00+00:00
[2016-12-28 15:26:29,749] - Downloading mboxes from 'https://lists.projectatomic.io/projectatomic-archives/atomic-devel/' to since 1970-01-01 00:00:00+00:00
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 559, in urlopen
    body=body, headers=headers)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 345, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 786, in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 252, in connect
    ssl_version=resolved_ssl_version)
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 305, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.5/ssl.py", line 385, in wrap_socket
    _context=self)
  File "/usr/lib/python3.5/ssl.py", line 760, in __init__
    self.do_handshake()
  File "/usr/lib/python3.5/ssl.py", line 996, in do_handshake
    self._sslobj.do_handshake()
  File "/usr/lib/python3.5/ssl.py", line 641, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:720)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 588, in urlopen
    raise SSLError(e)
requests.packages.urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:720)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/pipermail.py", line 149, in run
    for message in messages:
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backend.py", line 192, in decorator
    for data in func(self, *args, **kwargs):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/pipermail.py", line 91, in fetch
    mailing_list.fetch(from_date=from_date)
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/pipermail.py", line 214, in fetch
    r = requests.get(self.url)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 447, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:720)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/perceval", line 4, in <module>
    __import__('pkg_resources').run_script('perceval==0.5.0.dev2', 'perceval')
  File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 746, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 1501, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/EGG-INFO/scripts/perceval", line 179, in <module>
    main()
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/EGG-INFO/scripts/perceval", line 100, in main
    cmd.run()
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.5.0.dev2-py3.5.egg/perceval/backends/core/pipermail.py", line 154, in run
    raise RuntimeError(str(e))
RuntimeError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:720)

Something similar happened with Jira's backend, so there I've implemented an option to enable insecure requests or add a valid PEM key.

As this is something that probably extends to other backends, how about adding a --insecure parameter as general argument? I don't like the idea of making insecure requests by default when its not needed.

phabricator backend broken for new phabricator versions

After upgrading our phabricator site when I try to use the perceval backend:

(acs@dellx) ~ $ perceval -g phabricator -t api-6b3df6XXXXXXXXXXXXX https://phabricator.bitergia.net
....
[2016-10-19 18:41:12,465 - root - INFO] - Sir Perceval is on his quest.
[2016-10-19 18:41:12,467 - perceval.backends.phabricator - INFO] - Fetching tasks of 'https://phabricator.bitergia.net' from 1970-01-01 00:00:00+00:00
[2016-10-19 18:41:12,467 - perceval.backends.phabricator - DEBUG] - Phabricator Conduit client requests: maniphest.search params: {'params': '{"constraints": [{"modifiedStart": 0}], "order": "outdated", "__conduit__": {"token": "api-6b3df67XXXXXXXXXX"}, "attachments": {"projects": true}}', '__conduit__': True, 'output': 'json'}
[2016-10-19 18:41:12,473 - requests.packages.urllib3.connectionpool - INFO] - Starting new HTTPS connection (1): phabricator.bitergia.net
/usr/lib/python3/dist-packages/urllib3/connectionpool.py:794: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
[2016-10-19 18:41:12,726 - requests.packages.urllib3.connectionpool - DEBUG] - "POST /api/maniphest.search HTTP/1.1" 200 121
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.4.0.dev1-py3.5.egg/perceval/backends/phabricator.py", line 470, in run
    for task in tasks:
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.4.0.dev1-py3.5.egg/perceval/backend.py", line 181, in decorator
    for data in func(self, *args, **kwargs):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.4.0.dev1-py3.5.egg/perceval/backends/phabricator.py", line 82, in fetch
    for task in self.__fetch_tasks(from_date):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.4.0.dev1-py3.5.egg/perceval/backends/phabricator.py", line 123, in __fetch_tasks
    for raw_tasks in self.client.tasks(from_date=from_date):
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.4.0.dev1-py3.5.egg/perceval/backends/phabricator.py", line 560, in tasks
    r = self._call(self.MANIPHEST_TASKS, params)
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.4.0.dev1-py3.5.egg/perceval/backends/phabricator.py", line 638, in _call
    code=result['error_code'])
perceval.backends.phabricator.ConduitError: Constraint "0" is not a valid constraint for this query. (code: ERR-CONDUIT-CORE)

[gerrit] Use native ssh python library

Actually we are doing requests against Gerrit using sub-processes, forcing the user to add private keys manually in the host.

Using a library like paramiko would allow us to add the key as a parameter for the back-end, and automatize the process.

git backend fails to parse date

Trying to analyze the https://github.com/kennethreitz/requests git repository with the perceval git backend it fails because problems in the formatting of a date.

perceval -g git https://github.com/kennethreitz/requests
....
  File "/usr/local/lib/python3.5/dist-packages/perceval-0.3.2.dev0-py3.5.egg/perceval/utils.py", line 125, in str_to_datetime
    raise InvalidDateError(date=str(ts))
perceval.errors.InvalidDateError: Thu Sep 8 02:38:50 2011 +51800 is not a valid date

python3-setuptools requirement seems to be missed

$ sudo python3 setup.py install
Traceback (most recent call last):
  File "setup.py", line 30, in <module>
    from setuptools import setup
ImportError: No module named 'setuptools'

I've installed it with apt and after that, Perceval installs perfectly

FilePath not stored in some cases such as merges

There are commits like merges whose log contains information about the filepath but this filepath is not displayed by Perceval.

This commit: https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net-next/+/8cbdd85bda499d028b8f128191f392d701e8e41d . In this commits Perceval only stores the number of added and removed lines, but not the filepath.

commit_8cbdd85bda499d028b8f128191f392d701e8e41d.txt

In order to provide some numbers, if the Linux kernel is analyzed at the level of changes in the files, 9 M out of 11 M of them do not contain this filepath. As those are mainly merges, that information is still there spread in other commits, so it's not a big deal :).

ImportError: No module named 'bs4'

On a new Perceval installaiton, I get the following error when running the script:

Traceback (most recent call last):
  File "/usr/local/bin/perceval", line 30, in <module>
    from perceval.backends import PERCEVAL_CMDS
  File "/usr/local/lib/python3.4/dist-packages/perceval/backends/__init__.py", line 23, in <module>
    from .bugzilla import Bugzilla, BugzillaCommand
  File "/usr/local/lib/python3.4/dist-packages/perceval/backends/bugzilla.py", line 31, in <module>
    import bs4
ImportError: No module named 'bs4'

It would be helpful to document the module requirements, so that users can be sure everything will work as expected.

Reference

JIRA support

Hi !

I'm a volunteer of OpenMRS, we need to gather bug info from JIRA (among other sources). Is there anything planned? I could give a hand.

Cheers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.