Comments (12)
The "pull_requests" table might not populate all the repos added to the project after a successful pipeline run because the CollectPrs
function relies on the GithubApiParams
struct, which includes the ConnectionId
and Name
parameters. If these parameters are not correctly set or if the new repositories are not included in the scope defined by these parameters, the pull requests for those repositories will not be extracted and populated into the table.
Additionally, the CollectPrs
function uses a GraphQL query to collect pull requests, and it orders the results by CREATED_AT
and uses cursor pagination. This approach may miss some pull requests that are updated during the collection process. Although these missed pull requests will be collected in the next run, it might not be sufficient for immediate data consistency.
Here is the relevant code snippet from the CollectPrs
function:
func CollectPrs(taskCtx plugin.SubTaskContext) errors.Error {
data := taskCtx.GetData().(*tasks.GithubTaskData)
var err errors.Error
apiCollector, err := api.NewStatefulApiCollector(api.RawDataSubTaskArgs{
Ctx: taskCtx,
Params: tasks.GithubApiParams{
ConnectionId: data.Options.ConnectionId,
Name: data.Options.Name,
},
Table: RAW_PRS_TABLE,
})
if err != nil {
return err
}
err = apiCollector.InitGraphQLCollector(api.GraphqlCollectorArgs{
GraphqlClient: data.GraphqlClient,
PageSize: 10,
BuildQuery: func(reqData *api.GraphqlRequestData) (interface{}, map[string]interface{}, error) {
query := &GraphqlQueryPrWrapper{}
if reqData == nil {
return query, map[string]interface{}{}, nil
}
ownerName := strings.Split(data.Options.Name, "/")
variables := map[string]interface{}{
"pageSize": graphql.Int(reqData.Pager.Size),
"skipCursor": (*graphql.String)(reqData.Pager.SkipCursor),
"owner": graphql.String(ownerName[0]),
"name": graphql.String(ownerName[1]),
}
return query, variables, nil
},
GetPageInfo: func(iQuery interface{}, args *api.GraphqlCollectorArgs) (*api.GraphqlQueryPageInfo, error) {
query := iQuery.(*GraphqlQueryPrWrapper)
return query.Repository.PullRequests.PageInfo, nil
},
ResponseParser: func(iQuery interface{}, variables map[string]interface{}) ([]interface{}, error) {
query := iQuery.(*GraphqlQueryPrWrapper)
prs := query.Repository.PullRequests.Prs
for _, rawL := range prs {
if apiCollector.GetSince() != nil && !apiCollector.GetSince().Before(rawL.CreatedAt) {
return nil, api.ErrFinishCollect
}
}
return nil, nil
},
})
if err != nil {
return err
}
return apiCollector.Execute()
}
Ensure that the ConnectionId
and Name
parameters are correctly set to include all the repositories you want to track. Additionally, consider the limitations of the current pagination and ordering strategy, and plan for potential updates to the collection logic to ensure all relevant pull requests are captured [1][2][3].
Details
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
To continue the conversation, mention @dosu.
from incubator-devlake.
-
Since you are using the GitHub Server option with the GraphQL option toggled off, the data collection process will rely on the REST API instead of GraphQL. This means the
CollectApiPullRequests
function will be used to collect pull request data. -
The table that captures the
ConnectionId
andName
parameters isarchived.GithubConnection
. Here is the relevant code snippet:
err = db.AutoMigrate(archived.GithubConnection{})
if err != nil {
return err
}
encodeKey := basicRes.GetConfig(plugin.EncodeKeyEnvStr)
connection := &archived.GithubConnection{}
connection.Endpoint = basicRes.GetConfig(`GITHUB_ENDPOINT`)
connection.Proxy = basicRes.GetConfig(`GITHUB_PROXY`)
connection.Token = basicRes.GetConfig(`GITHUB_AUTH`)
connection.Name = `GitHub`
if connection.Endpoint != `` && connection.Token != `` && encodeKey != `` {
// update from .env and save to db
err = db.Create(connection)
if err != nil {
return err
}
}
This code creates a GithubConnection
entry with ConnectionId
and Name
parameters [1].
Details
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
To continue the conversation, mention @dosu.
from incubator-devlake.
The pull_requests
table might not be populating with all the repositories added to your project due to the filtering conditions in the EnrichPullRequestIssues
function. Specifically, the dal.Where
clause filters the data based on repo_id
and connection_id
. If these IDs do not match the repositories you expect to be included, the data will not be processed.
Here is the relevant code snippet:
cursor, err := db.Cursor(
dal.From(&models.GithubPullRequest{}),
dal.Where("repo_id = ? and connection_id = ?", repoId, data.Options.ConnectionId),
)
Ensure that the repoId
and data.Options.ConnectionId
are correctly set for all the repositories you have added to your project. If these values are incorrect or not set for some repositories, the corresponding pull requests will not be included in the pull_requests
table.
Additionally, verify that the PullRequest
struct in your codebase is correctly defined and matches the expected schema. Here is the definition for reference:
type PullRequest struct {
DomainEntity
BaseRepoId string `gorm:"index"`
HeadRepoId string `gorm:"index"`
Status string `gorm:"type:varchar(100);comment:open/closed or other"`
Title string
Description string
Url string `gorm:"type:varchar(255)"`
AuthorName string `gorm:"type:varchar(100)"`
AuthorId string `gorm:"type:varchar(100)"`
ParentPrId string `gorm:"index;type:varchar(100)"`
PullRequestKey int
CreatedDate time.Time
MergedDate *time.Time
ClosedDate *time.Time
Type string `gorm:"type:varchar(100)"`
Component string `gorm:"type:varchar(100)"`
MergeCommitSha string `gorm:"type:varchar(40)"`
HeadRef string `gorm:"type:varchar(255)"`
BaseRef string `gorm:"type:varchar(255)"`
BaseCommitSha string `gorm:"type:varchar(40)"`
HeadCommitSha string `gorm:"type:varchar(40)"`
}
func (PullRequest) TableName() string {
return "pull_requests"
}
Make sure that the table name and field types are correctly set to avoid any schema mismatches.
Additionally, ensure that the EnrichPullRequestIssues
function in your code is correctly extracting and processing the pull request data. Here is the relevant part of the EnrichPullRequestIssues
function:
func EnrichPullRequestIssues(taskCtx plugin.SubTaskContext) (err errors.Error) {
db := taskCtx.GetDal()
data := taskCtx.GetData().(*GithubTaskData)
repoId := data.Options.GithubId
var prBodyCloseRegex *regexp.Regexp
prBodyClosePattern := data.Options.ScopeConfig.PrBodyClosePattern
prBodyClosePattern = strings.Replace(prBodyClosePattern, "%s", data.Options.Name, 1)
if len(prBodyClosePattern) > 0 {
prBodyCloseRegex, err = errors.Convert01(regexp.Compile(prBodyClosePattern))
if err != nil {
return errors.Default.Wrap(err, "regexp Compile prBodyClosePattern failed")
}
}
charPattern := regexp.MustCompile(`[\/a-zA-Z\s,]+`)
cursor, err := db.Cursor(dal.From(&models.GithubPullRequest{}),
dal.Where("repo_id = ? and connection_id = ?", repoId, data.Options.ConnectionId))
if err != nil {
return err
}
defer cursor.Close()
converter, err := api.NewDataConverter(api.DataConverterArgs{
InputRowType: reflect.TypeOf(models.GithubPullRequest{}),
Input: cursor,
RawDataSubTaskArgs: api.RawDataSubTaskArgs{
Ctx: taskCtx,
Params: GithubApiParams{
ConnectionId: data.Options.ConnectionId,
Name: data.Options.Name,
},
Table: RAW_PULL_REQUEST_TABLE,
},
Convert: func(inputRow interface{}) ([]interface{}, errors.Error) {
githubPullRequst := inputRow.(*models.GithubPullRequest)
results := make([]interface{}, 0, 1)
issueNumberListStr := ""
if prBodyCloseRegex != nil {
issueNumberListStr = prBodyCloseRegex.FindString(githubPullRequst.Body)
}
if issueNumberListStr == "" {
return nil, nil
}
issueNumberListStr = charPattern.ReplaceAllString(issueNumberListStr, "#")
issueNumberList := strings.Split(issueNumberListStr, "#")
for _, issueNumberStr := range issueNumberList {
issue := &models.GithubIssue{}
issueNumberStr = strings.TrimSpace(issueNumberStr)
issueNumber, numFormatErr := strconv.Atoi(issueNumberStr)
if numFormatErr != nil {
continue
}
err = db.All(
issue,
dal.Where("number = ? and repo_id = ? and connection_id = ?",
issueNumber, repoId, data.Options.ConnectionId),
dal.Limit(1),
)
if err != nil {
return nil, err
}
if issue.Number == 0 {
continue
}
githubPullRequstIssue := &models.GithubPrIssue{
ConnectionId: data.Options.ConnectionId,
PullRequestId: githubPullRequst.GithubId,
IssueId: issue.GithubId,
PullRequestNumber: githubPullRequst.Number,
IssueNumber: issue.Number,
}
results = append(results, githubPullRequstIssue)
}
return results, nil
},
})
if err != nil {
return err
}
return converter.Execute()
}
Ensure that the EnrichPullRequestIssues
function is correctly configured and that the ConnectionId
and repoId
are properly set for all repositories. This function is responsible for extracting and processing the pull request data, and any misconfiguration here could result in missing data in the pull_requests
table [1][2][3][4].
Details
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
To continue the conversation, mention @dosu.
from incubator-devlake.
The absence of the labels
field in the devlake._raw_github_api_issues
table for the missing repository could indeed be a contributing factor to why the data is not being loaded into the pull_requests
table. The labels
field is processed and transformed during the data ingestion process, and its absence might affect the overall data integrity and the subsequent transformations.
The CollectApiIssues
function handles the labels
field by using the convertGithubLabels
function, which processes each label associated with an issue. If the labels
field is missing or malformed, it could lead to incomplete or incorrect data being ingested, which might prevent the data from being correctly loaded into the pull_requests
table [1].
To address this issue, ensure that the labels
field is correctly populated in the devlake._raw_github_api_issues
table. You might need to verify the data collection process and ensure that the GitHub API responses include the labels
field for all issues. If the labels
field is missing in the API response, it could indicate a problem with the data source configuration or the API request parameters.
Details
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
To continue the conversation, mention @dosu.
from incubator-devlake.
The missing labels
in the devlake._raw_github_api_issues
table could be due to several factors:
-
Regex Configuration: The
IssueRegexes
might not be correctly configured to match the labels in the issues. If the regex patterns for severity, component, priority, or type do not match any labels, those labels will not be processed. -
Label Extraction Logic: The logic in the
convertGithubLabels
function might not be correctly extracting and storing the labels. Ensure that the labels are being appended to theresults
slice and that thejoinedLabels
are being correctly joined and assigned togithubIssue.Type
. -
Database Schema: The schema for the
GithubIssueLabel
table might not be correctly defined or might be missing. Ensure that the table_tool_github_issue_labels
exists and is correctly defined to store the labels.
To resolve this issue, you can:
-
Verify Regex Patterns: Ensure that the regex patterns in the
GithubScopeConfig
are correctly defined to match the labels you expect. -
Debug Label Extraction: Add logging or debugging statements in the
convertGithubLabels
function to verify that labels are being correctly processed and appended to theresults
slice. -
Check Database Schema: Verify that the
GithubIssueLabel
table is correctly defined and exists in the database. The table should have the following schema:
package archived
import (
"github.com/apache/incubator-devlake/core/models/migrationscripts/archived"
)
// Please note that Issue Labels can also apply to Pull Requests.
// Pull Requests are considered Issues in GitHub.
type GithubIssueLabel struct {
ConnectionId uint64 `gorm:"primaryKey"`
IssueId int `gorm:"primaryKey;autoIncrement:false"`
LabelName string `gorm:"primaryKey;type:varchar(255)"`
archived.NoPKModel
}
func (GithubIssueLabel) TableName() string {
return "_tool_github_issue_labels"
}
Here is the relevant code for reference:
func convertGithubLabels(issueRegexes *IssueRegexes, issue *IssuesResponse, githubIssue *models.GithubIssue) ([]interface{}, errors.Error) {
var results []interface{}
var joinedLabels []string
for _, label := range issue.Labels {
results = append(results, &models.GithubIssueLabel{
ConnectionId: githubIssue.ConnectionId,
IssueId: githubIssue.GithubId,
LabelName: label.Name,
})
if issueRegexes.SeverityRegex != nil && issueRegexes.SeverityRegex.MatchString(label.Name) {
githubIssue.Severity = label.Name
}
if issueRegexes.ComponentRegex != nil && issueRegexes.ComponentRegex.MatchString(label.Name) {
githubIssue.Component = label.Name
}
if issueRegexes.PriorityRegex != nil && issueRegexes.PriorityRegex.MatchString(label.Name) {
githubIssue.Priority = label.Name
}
if issueRegexes.TypeRequirementRegex != nil && issueRegexes.TypeRequirementRegex.MatchString(label.Name) {
githubIssue.StdType = ticket.REQUIREMENT
} else if issueRegexes.TypeBugRegex != nil && issueRegexes.TypeBugRegex.MatchString(label.Name) {
githubIssue.StdType = ticket.BUG
} else if issueRegexes.TypeIncidentRegex != nil && issueRegexes.TypeIncidentRegex.MatchString(label.Name) {
githubIssue.StdType = ticket.INCIDENT
}
joinedLabels = append(joinedLabels, label.Name)
}
if len(joinedLabels) > 0 {
githubIssue.Type = strings.Join(joinedLabels, ",")
}
return results, nil
}
Additionally, ensure that the ConvertIssueLabels
subtask is correctly converting the labels from the tool layer to the domain layer. Here is the relevant code for the ConvertIssueLabels
subtask:
func ConvertIssueLabels(taskCtx plugin.SubTaskContext) errors.Error {
db := taskCtx.GetDal()
data := taskCtx.GetData().(*GithubTaskData)
repoId := data.Options.GithubId
cursor, err := db.Cursor(
dal.From(&models.GithubIssueLabel{}),
dal.Join(`left join _tool_github_issues on _tool_github_issues.github_id = _tool_github_issue_labels.issue_id`),
dal.Where("_tool_github_issues.repo_id = ? and _tool_github_issues.connection_id = ?", repoId, data.Options.ConnectionId),
dal.Orderby("issue_id ASC"),
)
if err != nil {
return err
}
defer cursor.Close()
issueIdGen := didgen.NewDomainIdGenerator(&models.GithubIssue{})
converter, err := api.NewDataConverter(api.DataConverterArgs{
RawDataSubTaskArgs: api.RawDataSubTaskArgs{
Ctx: taskCtx,
Params: GithubApiParams{
ConnectionId: data.Options.ConnectionId,
Name: data.Options.Name,
},
Table: RAW_ISSUE_TABLE,
},
InputRowType: reflect.TypeOf(models.GithubIssueLabel{}),
Input: cursor,
Convert: func(inputRow interface{}) ([]interface{}, errors.Error) {
issueLabel := inputRow.(*models.GithubIssueLabel)
domainIssueLabel := &ticket.IssueLabel{
IssueId: issueIdGen.Generate(data.Options.ConnectionId, issueLabel.IssueId),
LabelName: issueLabel.LabelName,
}
return []interface{}{
domainIssueLabel,
}, nil
},
})
if err != nil {
return err
}
return converter.Execute()
}
Ensure that the issue.Labels
are being correctly iterated over and that each label is being processed and appended to the results
slice. Additionally, verify that the joinedLabels
are correctly joined and assigned to githubIssue.Type
. Also, make sure that the GithubIssueLabel
table is correctly defined and exists in the database.
Finally, ensure that the issue_labels
table in the domain layer is correctly defined to store the labels. The schema should be as follows:
package ticket
import (
"github.com/apache/incubator-devlake/core/models/common"
)
// Please note that Issue Labels can also apply to Pull Requests.
// Pull Requests are considered Issues in GitHub.
type IssueLabel struct {
IssueId string `json:"id" gorm:"primaryKey;type:varchar(255);comment:This key is generated based on details from the original plugin"` // format: <Plugin>:<Entity>:<PK0>:<PK1>
LabelName string `gorm:"primaryKey;type:varchar(255)"`
common.NoPKModel
}
func (IssueLabel) TableName() string {
return "issue_labels"
}
By verifying these configurations and ensuring the correct processing of labels, you can address the issue of missing labels
in the devlake._raw_github_api_issues
table [1][2][3][4].
Details
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
To continue the conversation, mention @dosu.
from incubator-devlake.
Thanks for your reply.
I have few queries to be answered.
-
Since we are using GitHub Server option, the GraphQL option is toggled-off at the time of establishing the connection. Attached screen-shot for reference.
-
Which table captures ConnectionID and Name parameters?
from incubator-devlake.
I am able to view the corresponding connections ID and name in devlake._tool_github_connections
table...
However, I could not see the data in pull_request table.
from incubator-devlake.
I see that the missing repo does not have the below snippet in devlake._raw_github_api_issues. Could this the reason for why the data was not loaded "pull-request" table.
"labels": [
{
"id":
"node_id":
"url":
"name": "",
"color": "",
"default":,
"description":
}
from incubator-devlake.
I am just a user which uses devLake tool to get the metrics. How do I know as to why the labels
field is missing?
What could be the contributing factor for the missing Labels
in my connection.
I am sure that the way I have established a connection and configured the data source (Github) and hence the project is same as before.
Is this something that needs to be fixed at the DevLake side?
from incubator-devlake.
@JoshuaPoddoku : Even with the latest upgrade to v1.0.0 Beta8, I still face the same issue.
Appreciate if anyone from DevLake community could join for a quick meeting.
There are no Github related data in Pull_requests.
from incubator-devlake.
This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.
from incubator-devlake.
This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.
from incubator-devlake.
Related Issues (20)
- [Bug][Backend] fatal error: concurrent map writes
- [Bug][gitextractor] Error running pipeline beta5 HOT 10
- [Bug][Github] Pull Requests not being updated HOT 1
- [Bug][Sonarcube] Data too long for column 'component' at row 12 HOT 1
- [Bug][Github] Table 'pull_request_assignees' & 'pull_request_reviewers' has no user data from Github HOT 7
- Question about RDS Devlake HOT 21
- [Question][DORA] About the logic of βMedian Lead Time for Changesβ HOT 10
- [Feature][Azure DevOps Go] Add TFVC Repository Support in DevLake Plugin for Azure DevOps Go
- [Question]Getting file info HOT 13
- [Bug][TimeAfter] Bug title The timeAfter parameter in the DevLake sync API isn't functioning as expected. HOT 2
- [Feature][Config UI] Support favorites in project list
- [Bug][CircleCI] Regex Matches Workflows but Fails to Match Jobs HOT 9
- [Feature][Webhook] Allow cleanup for (deployment/incident) data collected by a webhook
- [Question] Including the gitextractor and customize plugin in new GHE blueprint HOT 3
- [Question]How do I modify this so that it records file names that were updated during git commits/pull requests HOT 5
- [Bug][Framework] fields are empty in table `_devlake_subtasks` HOT 1
- [Bug][Grafana] Monthly Deployment SQL Filter HOT 3
- Unable to access config-ui after resetting the admin password in Grafana HOT 4
- [Bug][jira] field_id is empty in issue_changelogs HOT 2
- [Question][Module Name] Question title HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from incubator-devlake.