Giter Club home page Giter Club logo

osrs_optical_recognition's People

Contributors

horenbergerb avatar

Stargazers

 avatar

Watchers

 avatar

osrs_optical_recognition's Issues

Create tooltip parser that uses binary hypothesis test on color

The current method is not very good.

This new method would let you take a color sample from a tooltip and use it as the mean value of a Gaussian distribution of colors. Then each pixel would be binary hypothesis tested against this distribution. You could have a similar test for each color of text. I think this model would get substantially better results than the naive thresholding I am currently using and would be easily configurable.

Write calibration script

I want to write a script that lets you easily generate a config file for screen ROIs.

One way to do it would be a script that prompts you
"Click the top left corner of the window"
"Click the bottom right corner of the window"
And then you can add more named ROIs on top of that.

Another way would be some kind of search over the screen to find elements of the OSRS client.

Yet another way would be to use window names?

Update documentation

I made a powerpoint and a paper on this project, so I should really have a better README.

Not sure how to organize everything. Maybe an overview in the README and then more detailed breakdown in the docs folder?

Set up testing via Github Actions

It would be super convenient to have a testing system in place so I could automatically verify all the features work as intended after each push. I don't think it would take too long to set up.

I should start implementing unit tests and look into how to streamline the setup process for this repo.

Use LSTM or RNN to predict future frames

Currently future frame prediction uses a 3D convolutional network. It's super janky. It has lots of problems and doesn't easily allow you to vary the quantity of input frames.

I found a survey of video prediction methodologies.

It seems like I might want to try one of the methods used on M-MNIST, since that dataset has some similarities to our segmentation masks.

Folded recurrent neural networks seem to perform well, but they're also somewhat complicated and don't seem to be discussed often.

CrevNet is interesting and puts an emphasis on efficiency, but it also seems pretty complex.

This paper was one of the most well-cited with pretty decent results on M-MNIST. This one looks pretty interesting and fairly achievable. Surprised it doesn't have convolution. Might be good to try implementing this one next.

Integrate Kahlman filter into object detector

I'd like to integrate a Kahlman filter into the object detection algorithm.

I think a simple "model" for objects like chickens would simply assume that object positions are stable and tend to displace locally according to a Gaussian distribution. You could do something similar for trees, but you'd expect much smaller variance for the Gaussian.

If you wanted to get really fancy, you could use a different model when the camera is being moved. In this case, you'd need to model the camera movement as object displacements. I will try to post some resources later for a simple way to do this.

Make data collection more modular and expand capabilities

Right now data collection is a (sloppy) pipeline for generating 32x32 images paired with text labels.

Ideally data would be collected passively in the most general possible form, such as screenshots of the play screen paired with mouse coordinates. Then, these could later be processed into collections of more useful data.

Data that would be interesting to extract includes:

  • Tooltip text, complete or split by color
  • Approximate player location (template matching on world map using minimap)
  • Chat logs
  • Inventory contents
  • Relative NPC/item locations via minimap?

Reorganize demos

Currently the demos are just a very messy script of commented-out lines.

Ideally the demos would be organized in src/Demos. The demos would be run via a demos.py script in the root directory which takes command line arguments that specify which demo to run.

Additionally, some of the machine learning scripts aren't really "demos" at all, such as collecting training data, training the neural networks, and running the actual segmentation. Maybe these belong somewhere else.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.