Giter Club home page Giter Club logo

flowvqa's Introduction

FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts ๐ŸŒŠ๐Ÿค”๐Ÿ’ก

Welcome to the FlowVQA Project Repository

FlowVQA introduces a novel benchmark for visual question answering, emphasizing the use of flowcharts for complex reasoning and evaluation. This benchmark aims to challenge and enhance the capabilities of multimodal language models through spatial reasoning, decision-making, and logical progression tasks.

Abstract

Traditional benchmarks in visual question answering do not fully test models' visual grounding and complexity, especially in spatial reasoning. FlowVQA addresses this gap by providing a comprehensive set of 2,272 human-verified flowchart images and 22,413 question-answer pairs. This new benchmark is designed for a thorough evaluation of visual and logical reasoning capabilities in AI.

Citing Our Work

Please consider citing our paper if you use FlowVQA in your research:

@article{SinghEtAlXXXX, title={Your Paper Title Here}, author={Singh, Shubhankar and others}, journal={Journal Name}, volume={XX}, number={XX}, pages={XX--XX}, year={XXXX}, publisher={Publisher} }

Repository Contents

  • Data Repository: Contains test and train JSONs, including flowchart images, mermaid scripts, tags, questions, and question types.
  • Code Repository: Features code snippets, example prompts, and additional resources.

Contributors

The project was developed by Shubhankar Singh, Purvi Chaurasia, Yerram Varun, Vatsal Gupta, and Pranshu Pandya, under the mentorship of Dr. Vivek Gupta and Dr. Dan Roth.

Future Directions and Use Cases

FlowVQA facilitates research in several key areas, such as:

  • Flowchart Reasoning: Enhancing visual logic and reasoning capabilities of models.
  • Graph-Encoder Models: Improving structural reasoning with flowchart-based models.
  • Adversarial and Counterfactual Probes: Testing models with challenging questions.
  • Complex Subtasks: Developing additional tasks for comprehensive training and evaluation.
  • NeuroSymbolic AI: Applying neurosymbolic methods for better performance and understanding.

Repository Status

๐Ÿšง Under Construction. Please refrain from using the data or code until further notice.

Stay Informed

For project updates and more details on contributing, please follow this repository.

Thank you for your interest in FlowVQA! ๐ŸŽ‰

flowvqa's People

Contributors

shubh11220 avatar flowvqa avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.