Giter Club home page Giter Club logo

catwatch's Introduction

Travis build status Coveralls coverage status Apache 2 Analytics

CatWatch

CatWatch is a web application that fetches GitHub statistics for your GitHub accounts, processes and saves your GitHub data in a database, then makes the data available via a REST API. The data reveals the popularity of your open source projects, most active contributors, and other interesting points. As an example, you can see the data at work behind the Zalando Open Source page.

To compare it to CoderStats: CatWatch aggregates your statistics over a list of GitHub accounts.

Prerequisites

  • Maven 3.0.5
  • Java 8
  • PostgreSQL 9.4

Getting Started

First, run PostgreSQL and create the database and a role via a unix shell:

psql -c "create database catwatch;" -U postgres -h localhost
psql -c "create database catwatch_test;" -U postgres -h localhost
psql -c "create user cat1 with password 'cat1';" -U postgres -h localhost

Build and run the web application with Maven.

cd catwatch-backend

# build
../mvnw package

# run
../mvnw spring-boot:run -Dorganization.list=<listOfGitHubAccounts>

# run with postgresql and auto create the database (drops existing contents)
../mvnw spring-boot:run -Dspring.profiles.active=postgresql -Dspring.jpa.hibernate.ddl-auto=create

# run with H2 in memory database and auto create the database
../mvnw spring-boot:run 

# run with GitHub basic authentication
../mvnw spring-boot:run -Dgithub.login=XXX -Dgithub.password=YYY

# run with GitHub OAuth token (supports 2FA)
../mvnw spring-boot:run -Dgithub.oauth.token=XXX

The web application is available at http://localhost:8080

It provides the CatWatch REST API.

Details

General

Travis CI is used for continuous integration (see button on the top). Coveralls is used for tracking test coverage (see button on the top).

Database

By default, the web application uses an H2 in-memory database. The file application-postgresql.properties demonstrates how a PostgreSQL database can be configured.

After the application is started, some test data are added to the database.

Admin Console

Currently the scheduler is being executed at 8:00 AM every morning. There are some endpoints.

Initialise the database with test data (for the virtual organization 'galanto''):

GET /init

Drop the database:

GET /delete

Import the data (see catwatch-dump/export.txt):

POST /import

Export the data:

GET /export

Fetch the data. Please note that the properties github.login github.password must be set:

GET /fetch

Get the config:

GET /config

Update temporarily the scoring function for projects (see catwatch-score/scoring.project.sh):

POST /config/scoring.project

TODO

Here are open tasks regarding the infrastructure:

  • Deployment (Database migration, GitHub account credentials management)
  • Monitoring
  • Robustness (DB fails, CatWatch backend fails)
  • Cleaning up the code base

Potential and confirmed bugs:

  • not all Zalando projects are listed (confirmed)
  • the number of contributors is not correct (potential)
  • the time series graphs should be hidden for the first version as they break the responsive layout (confirmed)

catwatch's People

Contributors

alexanderyastrebov avatar alexkops avatar fsczalando avatar hjacobs avatar hyandell avatar janloeffler avatar jbellmann avatar jhorstmann avatar jmcs avatar kubusgol avatar lappleapple avatar linki avatar mariadodevska avatar mikiobraun avatar mikkeloscar avatar mkunz avatar mrandi avatar nan-wang avatar olleolleolle avatar priyamaji avatar rbobin avatar rwitzel avatar semonte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

catwatch's Issues

Investigate large startup time

It is possibly due to hibernate schema discovery

17:06:05.214 [main] INFO  org.hibernate.Version - HHH000412: Hibernate Core {4.3.10.Final}
17:06:05.219 [main] INFO  org.hibernate.cfg.Environment - HHH000206: hibernate.properties not found
17:06:05.221 [main] INFO  org.hibernate.cfg.Environment - HHH000021: Bytecode provider name : javassist
17:06:05.658 [main] INFO  o.h.annotations.common.Version - HCANN000001: Hibernate Commons Annotations {4.0.5.Final}
17:06:31.766 [main] INFO  org.hibernate.dialect.Dialect - HHH000400: Using dialect: org.hibernate.dialect.PostgreSQLDialect

Internal Fetcher runs even when k8s profile is set

#66 introduced the ability to disable the regular in-process fetcher in order to run multiple replicas of catwatch. This was done via scoping it into a profile so that the user could decide whether to run it via setting or omitting this profile.

See https://github.bus.zalan.do/stups/stups-deploy/pull/285/files#diff-bfcd49cb9b1efcabc1006c3528eb334e for a possible configuration.

However, looking at the logs of the Kubernetes deployment, I can see that with the above configuration the in-process task fetcher is still running.

$ kubectl logs -f catwatch-master-11-14-232796100-8zpw1
08:01:00.001 [pool-3-thread-1] INFO  o.z.c.backend.scheduler.Fetcher - Starting fetching data. Snapshot date: Mon Jun 26 08:01:00 UTC 2017 1498464060000, IP and MAC Address: 10.2.62.159#0A-58-0A-02-3E-9F.
08:01:00.001 [pool-3-thread-1] INFO  o.z.c.backend.scheduler.Fetcher - Enqueued task TakeSnapshotTask for organization 'zalando'.
08:01:00.001 [pool-2-thread-7] INFO  o.z.c.b.github.TakeSnapshotTask - Taking snapshot of organization 'zalando'.
08:01:00.001 [pool-3-thread-1] INFO  o.z.c.backend.scheduler.Fetcher - Enqueued task TakeSnapshotTask for organization 'zalando-stups'.
08:01:00.002 [pool-3-thread-1] INFO  o.z.c.backend.scheduler.Fetcher - Enqueued task TakeSnapshotTask for organization 'zalando-incubator'.
08:01:00.002 [pool-2-thread-8] INFO  o.z.c.b.github.TakeSnapshotTask - Taking snapshot of organization 'zalando-stups'.
08:01:00.002 [pool-3-thread-1] INFO  o.z.c.backend.scheduler.Fetcher - Submitted 3 TakeSnapshotTasks.
08:01:00.002 [pool-2-thread-9] INFO  o.z.c.b.github.TakeSnapshotTask - Taking snapshot of organization 'zalando-incubator'.
08:01:01.688 [pool-2-thread-8] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting statistics for organization 'zalando-stups'.
08:01:01.905 [pool-2-thread-7] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting statistics for organization 'zalando'.
08:01:03.309 [pool-2-thread-9] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting statistics for organization 'zalando-incubator'.
08:01:03.774 [pool-2-thread-9] WARN  o.z.c.b.github.OrganizationWrapper - No teams found for organization 'zalando-incubator'.
08:01:18.903 [pool-2-thread-8] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting statistics for organization 'zalando-stups'.
08:01:18.903 [pool-2-thread-8] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting projects for organization 'zalando-stups'.
08:01:28.288 [pool-2-thread-7] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting statistics for organization 'zalando'.
08:01:28.288 [pool-2-thread-7] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting projects for organization 'zalando'.
08:02:02.044 [pool-2-thread-9] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting statistics for organization 'zalando-incubator'.
08:02:02.044 [pool-2-thread-9] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting projects for organization 'zalando-incubator'.
08:02:46.878 [pool-2-thread-8] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting projects for organization 'zalando-stups'.
08:02:46.878 [pool-2-thread-8] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting contributors for organization 'zalando-stups'.
09:01:06.548 [pool-2-thread-8] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting contributors for organization 'zalando-stups'.
09:01:06.549 [pool-2-thread-8] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting languages for organization 'zalando-stups'.
09:01:12.755 [pool-2-thread-8] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting languages for organization 'zalando-stups'.
09:01:12.755 [pool-2-thread-8] INFO  o.z.c.b.github.TakeSnapshotTask - Successfully taken snapshot of organization 'zalando-stups'.
09:02:16.736 [pool-2-thread-7] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting projects for organization 'zalando'.
09:02:16.736 [pool-2-thread-7] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting contributors for organization 'zalando'.
09:03:11.951 [pool-2-thread-7] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting contributors for organization 'zalando'.
09:03:11.951 [pool-2-thread-7] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting languages for organization 'zalando'.
09:03:21.399 [pool-2-thread-7] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting languages for organization 'zalando'.
09:03:21.399 [pool-2-thread-7] INFO  o.z.c.b.github.TakeSnapshotTask - Successfully taken snapshot of organization 'zalando'.
09:03:23.011 [pool-3-thread-1] INFO  o.z.c.backend.scheduler.Fetcher - Successfully saved data for organization 'zalando'.
09:03:23.612 [pool-3-thread-1] INFO  o.z.c.backend.scheduler.Fetcher - Successfully saved data for organization 'zalando-stups'.
$ kubectl get pods catwatch-master-11-14-232796100-8zpw1 -o json | jq '.spec.containers[].env[] | select(.name == "SPRING_PROFILES_ACTIVE")'
{
  "name": "SPRING_PROFILES_ACTIVE",
  "value": "postgresql,k8s"
}

/cc @jbellmann

Project box visuals

I think that the boxes have too thin borders, and border-top is almost invisible when using dark colors. We could use wider border and perhaps remove the top-border, for example:

image

For reference, current version:

image

What say you?

Screenshots?

Could I request some screenshots or a live demo? The project sounds interesting and I want to learn more before I have the time to run through the setup. :)

A different approach for the Catwatch formula

Does the Catwatch formula used to calculate rankings factor in the number of commits? I would advise dropping this from the formula because it doesn't necessarily indicate better project quality. Projects like Zappr that are really useful aren't showing up near the top of the rankings -- would prefer that they did.

Add new spring profile for easy LIVE configuration

Benefit: Dockerfile must not be adjusted to pass configuration parameters, i.e. something like the following can be avoided.

CMD java -jar /catwatch-backend.jar -Dspring.database.driverClassName=${SPRING_DATASOURCE_DRIVERCLASSNAME} -Dspring.jpa.hibernate.ddl-auto=${SPRING_JPA_HIBERNATE_DDL_AUTO}

Consider applying a penalty for projects with less than 2 maintainers

We should apply score penalties to projects with less than 2 maintainers. This involves changing the "score" function (https://github.com/zalando/catwatch/blob/master/catwatch-backend/src/main/resources/application.properties#L22).

Proposal for the score function:

function(project) {
    var penalty = 0;
    if (project.maintainers.length < 2) {
        penalty = 100;
    }
    return project.forksCount > 0 ? ( project.starsCount + project.forksCount + project.contributorsCount + project.commitsCount / 100 - penalty) : 0;
}

Does catwatch support multiple concurrent fetcher tasks?

I'm currently in the process of migrating https://zalando.github.io/ to our Kubernetes setup.

I intend to run at least two replicas (containers) of Catwatch in order to ensure it's available during cluster updates. However, as far as I can see, that would also schedule two concurrent fetcher tasks [1].

My question is: Do I have to fear any undefined behaviour, corrupt data, etc. from running mutiple fetchers or is it safe?

[1] https://github.com/zalando-incubator/catwatch/blob/master/catwatch-backend/src/main/java/org/zalando/catwatch/backend/scheduler/Fetcher.java#L82-L84

Optimize repository scanning (GitHub API calls)

18:33:25.092 [pool-3-thread-1] INFO  o.z.c.b.github.TakeSnapshotTask - Taking snapshot of organization 'zalando-stups'.
18:33:25.093 [http-nio-8080-exec-1] INFO  o.z.c.backend.scheduler.Fetcher - Submitted 1 TakeSnapshotTasks.
18:33:27.189 [pool-3-thread-1] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting statistics for organization 'zalando-stups'.
18:33:35.223 [pool-3-thread-1] WARN  o.z.c.b.github.RepositoryWrapper - No contributors found for project 'stups-feedback' of organization 'zalando-stups'.
18:33:36.829 [pool-3-thread-1] WARN  o.z.c.b.github.RepositoryWrapper - No contributors found for project 'costreport' of organization 'zalando-stups'.
18:33:46.474 [pool-3-thread-1] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting statistics for organization 'zalando-stups'.
18:33:46.475 [pool-3-thread-1] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting projects for organization 'zalando-stups'.
18:35:01.580 [pool-3-thread-1] WARN  o.z.c.b.github.RepositoryWrapper - No commits found for project 'stups-feedback' of organization 'zalando-stups'.
18:35:01.833 [pool-3-thread-1] WARN  o.z.c.b.github.RepositoryWrapper - No contributors found for project 'stups-feedback' of organization 'zalando-stups'.
18:35:12.587 [pool-3-thread-1] WARN  o.z.c.b.github.RepositoryWrapper - No commits found for project 'costreport' of organization 'zalando-stups'.
18:35:12.721 [pool-3-thread-1] WARN  o.z.c.b.github.RepositoryWrapper - No contributors found for project 'costreport' of organization 'zalando-stups'.
18:35:12.881 [pool-3-thread-1] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting projects for organization 'zalando-stups'.
18:35:12.881 [pool-3-thread-1] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting contributors for organization 'zalando-stups'.
18:35:16.024 [pool-3-thread-1] WARN  o.z.c.b.github.RepositoryWrapper - No contributors found for project 'stups-feedback' of organization 'zalando-stups'.
18:35:16.160 [pool-3-thread-1] WARN  o.z.c.b.github.RepositoryWrapper - No contributors found for project 'costreport' of organization 'zalando-stups'.
18:35:25.062 [pool-3-thread-1] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting contributors for organization 'zalando-stups'.
18:35:25.062 [pool-3-thread-1] INFO  o.z.c.b.github.TakeSnapshotTask - Started collecting languages for organization 'zalando-stups'.
18:35:27.504 [pool-3-thread-1] INFO  o.z.c.b.github.TakeSnapshotTask - Finished collecting languages for organization 'zalando-stups'.
18:35:27.504 [pool-3-thread-1] INFO  o.z.c.b.github.TakeSnapshotTask - Successfully taken snapshot of organization 'zalando-stups'.
18:35:27.853 [http-nio-8080-exec-1] INFO  o.z.c.backend.scheduler.Fetcher - Successfully saved data for organization 'zalando-stups'.
18:35:27.853 [http-nio-8080-exec-1] INFO  o.z.c.backend.scheduler.Fetcher - Finished fetching data.

Read ".catwatch.yaml" meta information file

Proposal: Catwatch should automatically read a .catwatch.yaml file in the root of the repository if it's there. This YAML file allows defining a human readable title and a project image (logo):

Example:

title: ZMON Controller
image: https://demo.zmon.io/logo.png

Follow REST API guidelines: snake_case JSON properties

The Catwatch API should follow the Zalando REST API guidelines, including:

  • Snake-case JSON properties (i.e. use "organization_name" instead of "organizationName")

Currently the "/projects" endpoints (to name one) returns camelCase JSON properties.

Inconsistent project-naming issue

On the front/home page, the name of the TechMonkeys project "Gin-OAuth2" is showing up with additional running text in its respective card. Other projects are listed by name-only.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.