horenbergerb / osrs_optical_recognition Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 729.06 MB

Uses various tools to parse a video feed of Oldschool Runescape into actionable intelligence

Python 100.00%

haar-cascade haar-cascade-classifier object-detection opencv opencv-python osrs pytesseract runescape

osrs_optical_recognition's People

Contributors

Stargazers

Watchers

osrs_optical_recognition's Issues

Create tooltip parser that uses binary hypothesis test on color

The current method is not very good.

This new method would let you take a color sample from a tooltip and use it as the mean value of a Gaussian distribution of colors. Then each pixel would be binary hypothesis tested against this distribution. You could have a similar test for each color of text. I think this model would get substantially better results than the naive thresholding I am currently using and would be easily configurable.

Write calibration script

I want to write a script that lets you easily generate a config file for screen ROIs.

One way to do it would be a script that prompts you
"Click the top left corner of the window"
"Click the bottom right corner of the window"
And then you can add more named ROIs on top of that.

Another way would be some kind of search over the screen to find elements of the OSRS client.

Yet another way would be to use window names?

Update documentation

I made a powerpoint and a paper on this project, so I should really have a better README.

Not sure how to organize everything. Maybe an overview in the README and then more detailed breakdown in the docs folder?

Set up testing via Github Actions

It would be super convenient to have a testing system in place so I could automatically verify all the features work as intended after each push. I don't think it would take too long to set up.

I should start implementing unit tests and look into how to streamline the setup process for this repo.

Use LSTM or RNN to predict future frames

Currently future frame prediction uses a 3D convolutional network. It's super janky. It has lots of problems and doesn't easily allow you to vary the quantity of input frames.

I found a survey of video prediction methodologies.

It seems like I might want to try one of the methods used on M-MNIST, since that dataset has some similarities to our segmentation masks.

Folded recurrent neural networks seem to perform well, but they're also somewhat complicated and don't seem to be discussed often.

CrevNet is interesting and puts an emphasis on efficiency, but it also seems pretty complex.

This paper was one of the most well-cited with pretty decent results on M-MNIST. This one looks pretty interesting and fairly achievable. Surprised it doesn't have convolution. Might be good to try implementing this one next.

Integrate Kahlman filter into object detector

I'd like to integrate a Kahlman filter into the object detection algorithm.

I think a simple "model" for objects like chickens would simply assume that object positions are stable and tend to displace locally according to a Gaussian distribution. You could do something similar for trees, but you'd expect much smaller variance for the Gaussian.

If you wanted to get really fancy, you could use a different model when the camera is being moved. In this case, you'd need to model the camera movement as object displacements. I will try to post some resources later for a simple way to do this.

Make data collection more modular and expand capabilities

Right now data collection is a (sloppy) pipeline for generating 32x32 images paired with text labels.

Ideally data would be collected passively in the most general possible form, such as screenshots of the play screen paired with mouse coordinates. Then, these could later be processed into collections of more useful data.

Data that would be interesting to extract includes:

Tooltip text, complete or split by color
Approximate player location (template matching on world map using minimap)
Chat logs
Inventory contents
Relative NPC/item locations via minimap?

Reorganize demos

Currently the demos are just a very messy script of commented-out lines.

Ideally the demos would be organized in src/Demos. The demos would be run via a demos.py script in the root directory which takes command line arguments that specify which demo to run.

Additionally, some of the machine learning scripts aren't really "demos" at all, such as collecting training data, training the neural networks, and running the actual segmentation. Maybe these belong somewhere else.

horenbergerb / osrs_optical_recognition Goto Github PK

osrs_optical_recognition's People

Contributors

Stargazers

Watchers

osrs_optical_recognition's Issues

Create tooltip parser that uses binary hypothesis test on color

Write calibration script

Update documentation

Set up testing via Github Actions

Use LSTM or RNN to predict future frames

Integrate Kahlman filter into object detector

Make data collection more modular and expand capabilities

Reorganize demos

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent