Giter Club home page Giter Club logo

reportbooru's Introduction

Reportbooru is a collection of services for reporting on Danbooru. It includes the following functionality:

  • Search hits
  • Missed search hits
  • Search trends
  • Common searches
  • User similarity reports
  • Exporting data to Google BigQuery
  • Calculating related tags
  • User performance reports

The web frontend is a standard Rails application. Reportbooru also runs daemon processes that listen on Amazon SQS for jobs.

You can deploy using Capistrano. It's recommended you fork this project and modify the following files:

  • config/deploy/production.rb
  • .env

A sample .env file called .env-SAMPLE is included in the project. The .env file itself is symlinked during deployment so you should create a version on the server at /var/www/reportbooru/shared/.env.

reportbooru's People

Contributors

albertc5 avatar r888888888 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

reportbooru's Issues

Inactive Report incorrectlty selects unlimited uploaders

It lists many users that don't even have the Approve Posts permission. Instead, it looks like it's selecting those with the Unlimited Uploads permission

According to the User model on Danbooru, the can_approve_posts bit is in the 14th position.

https://github.com/r888888888/danbooru/blob/master/app/models/user.rb#L37-L59

Therefore, the value 1 should be bit-shifted to the left 13 places and not 14 to get it into the right position.

https://github.com/r888888888/reportbooru/blob/master/app/models/reports/inactive_approvers.rb#L49

Don't include Meta tags in missing tags report

From the Danbooru forum:

Chiera said:

Can we PLEASE kick out the mention from the DanbooruBot that I should consider using [[bad_id]] in the future?

Thanks :3.

Basically, some users are being told to add bad_id, and I've also heard mention of md5_mismatch amongst others. This should not be. In fact, none of the Meta tags should be part of the missing tags report. They should be filtered out of the equation.

Incorrect URL links for user accounts on Inactive Approvers report

With the switch to version 3 on the Inactive Approvers report, it seems like a variable isn't being properly dereferenced and so it spits out the object reference instead.

Example:

https://danbooru.donmai.us/users/#DanbooruRo::User:0x0055c1a88e6790

Version 2 (without issue): http://isshiki.donmai.us/user-reports/inactive_approvers/2017-10-22_v2.html
Version 3 (with issue): http://isshiki.donmai.us/user-reports/inactive_approvers/2017-10-29_v3.html

As an additional issue, it seems like there is some kind of text encoding error going on with the Inactive Approvers report, as one of the users is listed as 一声, when such a user doesn't even exist. It sort of looks like garbled Kanji characters, and sure enough when I manually switched the encoding to UTF-8 the name instead read as 一声 which does exist. Checking the other reports, it seems like this encoding issue only exists on the Inactive Approvers report.

Change Sorting on Reports

There are enough reports on Isshiki that it now requires scrolling down to get to the latest reports. Instead, the latest reports should always be at the top of each directory listing.

Change Naming of Vandalism Report

As mentioned on the Danbooru Forums.

Basically, the name of the report carries a negative connotation, even when the activity that occurs may not necessarily be malicious. User Jarlath had the best suggestion for a name change.

I'd be calling it "mass tag changes" as that describes the activity, and most of those changes look like valid gardening activities to me.

Also, this might be a separate issue, but he also brought up the idea of including Gold and Platinum level users in the report in addition to Member-level users.

Would it be worth including the gold and platinum members, in case there's some sort of edit war?

It might be worth just to try that out... if it clutters up the report, it could always be reverted back.

Numbers appear to be off for latest post changes report

skylightcrystal said (forum #152797):

https://isshiki.donmai.us/user-reports/post_changes/2018-11-28_v2.html
https://isshiki.donmai.us/user-reports/post_changes/2018-12-02_v2.html

Errm... I think something's a bit wrong with some of those figures - compare it to the one 2 weeks ago for example: https://isshiki.donmai.us/user-reports/post_changes/2018-11-18_v2.html

The numbers have increased by 5 to 10 times from November 18th to November 28th. That itself is an anomaly. I even checked the numbers back to October 28th, and they all are normal until November 28th, the report which was right after the failed November 25th report. I'm not sure, but could that failed report have anything to do with the massively skewed numbers?

The minimum number of items for each report is off by 1

skylightcrystal said (forum #1234):

It's a very minor issue, but I think the minimum number of uploads for the top tagger report on isshiki (https://isshiki.donmai.us/user-reports/taggers/) is actually 51, not 50. I have uploaded 50 images in the last 30 days but am not on the list, whereas those who have uploaded 51 images are on there. Looking back I couldn't see any times in the past when people have been included in the list with precisely 50 uploads.

The comments one also seems to have a minimum count of 11, not the stated 10.

I looked through all of the models in the reports directory, and the greater than operator ">" is being used for almost all of them, which would of course not include the number it's comparing against.

DanbooruRo::Post.where("posts.created_at > ?", date_window).group("posts.uploader_id").having("count(*) > ?", min_uploads).pluck("posts.uploader_id")

DanbooruRo::Comment.where("created_at > ?", date_window).group("creator_id").having("count(*) > ?", min_changes).pluck(:creator_id)

Instead the greater than or equal to operator ">=" should be used. The only report that is currently using that operator is the approvers report. All of the other reports need to be fixed.

Split post changes report by user level

For uploads there are two reports, one for Builders and one for Members. The post changes report should also be split between Builders and non-Builders. This would be useful both for detecting vandalism, and for identifying users to promote.

Related issue: the tag vandalism report is currently limited to Members. It should instead be non-Builders, so that it includes Gold and Platinium.

Ref: https://danbooru.donmai.us/forum_topics/14945.

Discrepancy for rating percentages in uploads report

It was reported by Chiara that the percentages for the ratings on the upload reports don't add up to 100%. This appears to be because the numbers are being truncated to integers.

https://github.com/r888888888/reportbooru/blob/master/app/models/reports/uploads.rb#L22-L24

Switching the format string to "%.1f%%" would alleviate this discrepancy, although to eliminate it the numbers would need to be rounded down to whichever decimal point is being used.

Missing names on inactive approvers report

Comparing the list of all approvers with the list of active approvers and inactive approvers seems to be missing a lot of names. The following list is what I came up with after I did a comparison.

['Arantheus', 'TheStupidOne', 'buehbueh', 'Bloodletter', 'jxh2154', 'bigrich', 'CaptainLoony', 'alicemaiwaifu', 'Xz', 'memento_mori', 'MagicalAsparagus', 'Toks', 'wareya', 'ePlus', 'Altered']

All of these approvers seem to have no approvals during the report period, so maybe it's not including approvers with zero uploads?

Add Post Replacement Report

Since this is now a function, and it now has its own controller, it would be nice to give credit to the approvers doing the heavy lifting with this kind of work.

As for stats, there's not much interesting to collect beyond overall quantity, so it should only have a User and Total columns.

Incorrect numbers on Top Taggers report

User Chiera reported to me on Discord that the Top Taggers report appeared to be very off for user Nova_Genesis. Chiera referred to the Upload Tags report for that user where it indeed looks like the user is tagging on average much beyond the reported 7 tags.

I checked for myself using the API on the post_versions controller, and the following numbers are what I came up with for that user between 2017-08-25 08:40:54 UTC to 2017-09-24 08:40:54 UTC.

Total Uploads: 296
Tag Mean: 24.3
Tag Median: 29
Tag Q1: 7
Tag Q3: 37

Unicode usernames don't render correctly

Usernames containing Unicode characters don't render correctly:

image

The problem is that the page doesn't declare that it's in UTF-8. Need to either add <meta charset="utf-8">, or configure the webserver to send Content-Type: text/html; charset=utf8.

Change Relative Links to New Location

With the switch of directories that happened with Danbooru issue #2798, the HTML no longer pulls from the correct locations for the additional resources such as the JS and CSS files.

<script src='/reports/assets/jquery-3.1.1.slim.min.js'></script>
<script src='/reports/assets/jquery.tablesorter.min.js'></script>
<link href='/reports/assets/pure.css' rel='stylesheet'>

The links for all of the above should change "reports" to "user-reports".

Add extraneous tags report

Similar to the missed tags report, but it would instead detail tags removed by other users that exceeds a certain threshold. I don't know what the threshold should be, but it should probably be lower since users are less likely to remove tags except for blatant mistagging. It could always be fine-tuned so that it roughly produces the same amount of reports as the missed tags report.

Specify Collection Range in JSON/HTML

I've been trying to validate the data in your reports with my own, and in many cases the data is similar, and in others it's far apart. (See http://danbooru.donmai.us/forum_topics/13112 for my attempt to compare it against the Nov 2 collection on Isshiki).

Theoretically, if we process the same data set the same way we should get the same results.

Adding the timestamp range (from and to) in Zulu UTC to the HTML/JSON would help with this. For the HTML, it doesn't even need to be a visible item, as long as it's in the page somewhere.

Adding the ID range would be even better, in case there are some discrepancies with the timestamps (e.g. incorrect timezone used).

Adding both would be the most preferred option, and would precisely delineate the data being analyzed for that particular report.

This should hopefully eliminate the data range as a source of variability between your report and mine, and any differences at that point should be due to inconsistencies in the data processing. (Either on your end or on mine)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.