docqai / docq Goto Github PK

View Code? Open in Web Editor NEW

47.0 2.0 9.0 5.68 MB

Private ChatGPT alternative. Securely unlocks knowledge from confidential business information.

Home Page: https://docqai.github.io/docq/

License: GNU Affero General Public License v3.0

Dockerfile 0.21% Python 99.51% Shell 0.29%

ai secure-by-default turnkey privacy rag chatbot chatgpt genai

docq's People

Stargazers

Watchers

Forkers

docqai dxv2k osala-eng dzlabsch fernandezjose shootmir snekkenull visioninhope tmyhres

docq's Issues

BUG: Shared Ask - Spaces with an empty or corrupt index crashes chat

Recreate

Create a new Space with a data source. Don't index.
in shared ask, select that space created above and ask any question
an error is thrown when the following line is called on the spaces index

graph = ComposableGraph.from_indices(
            GPTListIndex, indices, index_summaries=summaries, service_context=_get_service_context()
        )

INFRA: Azure app and model hosting

Is your feature request related to a problem? Please describe.

On-click deployment to Azure for both the app and model hosting.

Describe the solution you'd like

Use an ARM template so we can use one click launch.
the template will be hosted in the repo itself and use a gitraw link for launch URL.
use Azure App Service for app hosting.

Additional context

CORE: Types of spaces

Is your feature request related to a problem? Please describe.

Right now spaces are manual upload only in terms of how documents are loaded. With a type property we can support more loading source locations and strategies, and open doors in the future for plugins to function at this stage of the end-to-end flow.

Describe the solution you'd like

An additional property of type (wording may vary) that is saved with each space.

Describe alternatives you've considered

N/A

Additional context

First additional type could be a web scraper.

INFRA: AWS App and model hosting

Is your feature request related to a problem? Please describe.
On-click deployment to AWS for both the app and model hosting.

Describe the solution you'd like

Generate Cfn from CDK.
Check in the Cfn template into repo so it can be referenced directly and publicly so the LaunchStack URL will work.
Use Elastic Beanstalk

Describe alternatives you've considered

Additional context

CORE: add datasource for Azure blob storage

Is your feature request related to a problem? Please describe.
add datasource for Azure blob storage

Describe the solution you'd like
Use the dataloader from the Llamaindex hub for loading docs for indexing compatible format.

add doc with guidance on how to obtain the credentials needed for this data source.

Describe alternatives you've considered

Self contained loader. We should probably do this in the future we can control not downloading the docs into a temp folder.

Additional context

CORE: Supporting public shared spaces

Is your feature request related to a problem? Please describe.

For #23 we need a different type of shared space where all data can be accessed by the general public.

Describe the solution you'd like

Shared spaces should be internal by default and the sharing is controlled by multi-user access as before. What's new is the new type of shared space for public use.

Describe alternatives you've considered

N/A

Additional context

On UI/UX make sure we distinguish this type of shared space enough from the conventional, internal shared space.

UX: Make feature flags admin section intuitive

Is your feature request related to a problem? Please describe.
#18 implemented feature flags so an admin can control which features are available to users. Because the names a of the flags don't relate to the feature names in the UI (menu) all that well it could be confusing. Certainly when the number of flags increase we need a better way to map flag to feature so admins know what they are controlling with more certainty.

Describe the solution you'd like
TBD

Describe alternatives you've considered
TBD

Additional context
N/A

CORE: Feature selections by admin

Is your feature request related to a problem? Please describe.

Right now we have 3 key user features:

General Chat
Ask Your Documents
Ask Shared Documents

Admin should be able to choose which user features are available for use.

Describe the solution you'd like

Part of the system settings.

Describe alternatives you've considered

N/A

Additional context

N/A

UX: shared ask section - make the Space selector stick to the top of the page

Current

In the Shared Ask page the Space selection UI currently scroll with the chat history. once the chat history becomes longer than the users screen hight it disappears off the top. The long the chat history the more users have to scroll to get to the Space selection UI.

Solution

Make the Space selection UI stick to the top of the screen while allowing just the chat history to scroll. For smaller screen rights we might also require a show and hide option for the space selector.

INFRA: GCP app and model hosting support

Is your feature request related to a problem? Please describe.
There's no GCP infra hosting support at all at the moment.

Describe the solution you'd like

We want to support one-click support just like Azure and AWS.

PaLm is the Google LLM

Describe alternatives you've considered
n/a

Additional context
If possible synth a template using IaC.

UX/UI: Present the Web Scraper extractor template field as a DDL

At present the user needs to look up the template name in docs and copy-paste. This is obviously error prone and a poor experience.

Switch to present the available templates as a DDL in the Space config UI.

Add an input field type to ConfigKey and use this to drive the UI rendering logic needed dynamically.

It might be a good idea to define input field type as an enum.

UI: Supporting avatar for each user

Is your feature request related to a problem? Please describe.

Given the predominately chat interface, it'd be nice to use individual users' avatars in those chat windows.

Describe the solution you'd like

Pretty straightforward as st-chat already supports using avatars. The only missing piece is the user profile screen where each user can update their avatar, among other things.

Describe alternatives you've considered

N/A

Additional context

If we're going to create a user profile screen, think about what else would be great to have on that screen too.

CORE: Error when logging in

Describe the bug
A clear and concise description of what the bug is.
An Error occures when trying to retrieve user settings during login.

To Reproduce
Steps to reproduce the behavior:

Start a new instance of the server or start with an empty .persisted folder
Select any page that requires authentication to view. i.e General Chat
Enter login credentials and then press the login button.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS: [e.g. Ubuntu 22.04, macOS 11.4]
Code Version [e.g. 1.1.0]
Other Runtime Versions (please list below)

Additional context
Add any other context about the problem here.

INTG: Slack / Team integrations

Is your feature request related to a problem? Please describe.

An alternative to the built-in chat interface, integrations with Slack and Team would allow more convenient access.

Describe the solution you'd like

Could be piggy-backed to #22 or build something before the AI support is in place.

Describe alternatives you've considered

See the suggested solution above for a plugin-based alternative. If time is of the essence then we should probably go for it without waiting for the plugin system to be ready.

Additional context

Customer validation first!!

CORE: Adding data source for web scraping

Is your feature request related to a problem? Please describe.

To support #23 we also need to be able to do web scraping for a space as data source.

Describe the solution you'd like

Use something like https://llama-hub-ui.vercel.app/l/web-beautiful_soup_web to scrape designated (part of a) website.

Describe alternatives you've considered

Could hand-build something using BS too I guess.

There are also a few other scrapers available for LlamaIndex such as https://llama-hub-ui.vercel.app/l/web-simple_web or https://llama-hub-ui.vercel.app/l/web-async_web

Additional context

Utilising the existing space data source framework internally.

RFC: Handling public sessions in the backend

Is your change proposal related to a problem? Please describe.
In issue #23 we are building public-facing functionality where a widget is embedded on a third-party (clients) website. There are several things the platform doesn't handle for this use case:

Unauthenticated access - at present users need an account and authenticate
Anonymous users - at present users have an identity that we can attach state to, partition by, track over time, and authorise using. We need to have a way to accept anonymous interactions while tracking usage so we provide customers with insight, power the context etc.

Propose the solution you'd like

Phase 1:
ephemeral sessions - generate a unique id (UUID) for each new user session. partition all interactions using this user ID. Don't persist anything in the backend. Have a sliding time window for session expiration (i.e. time of last user interaction + X minutes, where X=15mins). Use local storage to track history.

This is an MVP and should be good enough to start deming the functionality even trialled for real.

Phase 2:
Persist the session on the client side and the server side

User - A user arriving at a clients marketing website will click on a familiar chat bot icon on the bottom right of the website to launch/open the chat window, hence start a session. When the same user returns on the same device they will see their chat history. Initially, we'll support a single chat thread to keep it simple.

Customer - The customer configured the chatbox with a specific public Space in their instance. They are able to see some basic stats on the users interacting with the bot such as number of users in the last 1day and 7day. They can also see chat history, the questions and responses per user. Later we may use GenAI to give insight on this data.

Technical:

Adopt the backend to be able to use UUIDs for users, this should enable the data layer to handle both anonymous users and authN users the same.
Secure chat/response web API calls (TBD) - maybe CORS is sufficient. We are trying to prevent usage abuse and prompt injection type issue not data privacy.

Describe alternatives you've considered
We can not bother converting the backend to use UUID for user IDs but then persisting anything by user will require a code branch to handle the two different types of users (anonymous vs authN) which is probably more complicated.

Additional context
Given we have little input from a real customer and usage at this point we should be cautious about over-engineering prematurely but structure to hedge so we can adopt quickly if we need to.

RFC: Support unlimited number of personal spaces and pass-through chat sessions

Is your change proposal related to a problem? Please describe.

If you use chatgpt or some other public chatbots, you can save and restore historic chat sessions by topics. I think we should:

Support unlimited chat sessions, saved and reactivated, like chatgpt
Support unlimited personal spaces, each of which can host file uploads or ingest from a data source

This would empower individual users without needing setup from group admins.

Propose the solution you'd like

Database + UI/UX changes await.

I believe we're using a separate location for any personal data. Maybe it's a good time to align the storage centrally?

Describe alternatives you've considered

N/A

Additional context

Still debatable whether we should allow individual users to do so?

CORE: Settings for model selection

Is your feature request related to a problem? Please describe.

At the system level, enable model selection/switch among the settings.

Needs to consider: should we allow user-level model selection? more specifically, what are the reasons for it?

Describe the solution you'd like

Save it to the same table in the database like the rest of the settings.

Describe alternatives you've considered

No alternative.

Additional context
N/A

UI: Large prompt crashed chat UI

Describe the bug
In "Ask Your Documents" I pasted in the following and hit return and got an unhandled error. Later the prompt did appear in the chat history. But ii crashed before responding.

To Reproduce

Login with into https://docq-ai.streamlit.app/Ask_Your_Documents
Do not upload any docs
Paste the question as follows

create a summary of "<long block of text copied from a PDF export of a slide deck>"

Expected behavior
Similar prompts return a result. Any errors are gracefully handled in the UI. Sufficient details logged in the backed to troubleshoot.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

I used https://docq-ai.streamlit.app/Ask_Your_Documents today (11 June 2023)

Additional context
I will send the "" separately.

UI: Each chat entry supporting markdown/html formatting

Is your feature request related to a problem? Please describe.

Immediate problem: very soon we're showing the sources of each response from a question/chat, that need to be formatted properly to avoid clutter.

Future challenge: we're going to support plugins to utilise these chat responses and being able to properly format the eventual results would be a must-have.

Describe the solution you'd like

Right now we're using https://github.com/AI-Yash/st-chat which may or may not support markdown/html formatting. Great if it does but we need to think carefully about ripping it out and putting in our own implementation as an alternative: short-term pain vs long-term gain in this case.

Describe alternatives you've considered

See above for self-build vs customising st-chat.

Additional context

Let's avoid reinventing the wheels and seek out some UI/UX patterns for chatbot response display.

UI + UX + CORE: recorded convo and playback i.e. repeatable human scripts

Is your feature request related to a problem? Please describe.

There're many mundane new data-related tasks in many businesses, in summary as "Do X on Time T, every Date D or every Interval I".

Describe the solution you'd like

Recording a certain part of a chat conversation and saving it for future playback. Think of it as Zapier but with human language + AI assistance. This becomes super powerful with #9 where different action-oriented plugins would come into play to take action.

It's also a prequel to autonomous actions as this is purely human-driven. However, it has the benefits of being predictable and closely mapped to current workflows.

Describe alternatives you've considered

N/A

Additional context

The feature needs to be:

Independent of the space feature so that it could run against any permissible spaces
Dependent on the system => user hierarchy as designed, plus possible sharing with permissions

UX: Consolidate Space management into a single view

Is your feature request related to a problem? Please describe.
Currently, Space create, edit, and archiving is handled in the Admin Overview section. While document management for a space is in the Admin Docs section. This is clunky, there's some overlap in functionality, and should be improved.

As a side effect this will eliminate the problem where navigating between the two view by clicking Manage Documents causing the user to login again.

Describe the solution you'd like
Consolidate on a single screen based on the current Admin Docs view. Bring space creation and editing here.

Create Space: add a Create Space tab, same pattern as create new user.

Edit Space: add an Edit button in the Space Details tab, and bring over the same Spaces edit form UI Admin Overview.

Describe alternatives you've considered
Other version of the UI will involve a lot more work. The above seems like low effort.

Additional context

Make sure navigating directly with a query string param works
Make sure switching space using the DDL works

CORE: Add database schema control, likely via an ORM

Is your feature request related to a problem? Please describe.

Right now, there's no db schema change control; all the tables are created on the fly.
This is not fit for purpose as production-ready software.

Describe the solution you'd like
See https://github.com/ozcanyarimdunya/python-orm for 3 mainstream ORMs as candidates.
The safe choice is SQLAlchemy for sure

Describe alternatives you've considered
See above for 3 ORM tools.

Additional context
Ideally pick one having good integration with FastAPI.

RFC: Address concerns with supply chain risks from Docq

Is your change proposal related to a problem? Please describe.

As a self-hosted OSS, Docq is well-positioned to answer most supply chain risk questions from businesses adopting it.
This is more to do with our own internal process to prevent any risk from our software supply chain, propagating downstream to Docq's customers.

Propose the solution you'd like

GitHub is in a great position to help us in OSS capacity such as all the security tools they offer. We should be clear about our additional policies and processes.

Describe alternatives you've considered

N/A

Additional context

Some accreditations such as SOC2 or ISO27001 may move into view at some point.

UI/UX: Each user should be able to reset chat history

Is your feature request related to a problem? Please describe.

Two problems/intentions:

Sometimes chat history is undesirable, maybe as a result of LLM hallucinations, or something wrong questions being asked
Sometimes a new context is needed

Describe the solution you'd like

A simple reset button would do the trick for now.
A decision needs to be made about implementing something similar to how ChatGPT works with multiple chats.

Describe alternatives you've considered

As mentioned above, the multiple chats feature is another solution which may or may not consider implementing in the future.

Additional context
N/A

INTG: Public-facing no-auth chat interface backed by an admin-designated space

Is your feature request related to a problem? Please describe.

There're cases where a business would offer external people (partners, customers etc.) access to a certain portion of their data, usually already publicly accessible such as product literature and knowledge base.

This feature would extend the organisational data's use beyond internal applications.

However see the context below for the potential pushback.

Describe the solution you'd like

Just like #14 this can be piggy-backed to #22 or build something quick before the API-support is available.

See RFC for the backend in #57

Describe alternatives you've considered

N/A

Additional context

The pushback for NOT implementing this feature is mainly for the absolute guarantee of data security, as this opens doors for misconfigured private spaces being used as the data source for the public-facing chat interface. Although it'd be a user error, it's Docq that has this feature to allow it to happen. This is the reason to be cautious about this feature.

RFC: Adding 'organisation' support for users and spaces

Is your change proposal related to a problem? Please describe.

Right now Docq has a flat structure for both users and spaces. Adding an org level should address some real-world segregation and grouping use cases.

Propose the solution you'd like

Adding an org level means for any given user,

They belong to their personal org by default, where they're an admin user for it.
They can create more orgs and invite others to join the org.

With this shift, users will be scoped by org; the same for spaces.

Describe alternatives you've considered

N/A

Additional context

Docq is designed to be a single-tenant system but it may change in the future. This feature may have implications when Docq switches to a multi-tenant architecture.

RFC: Pinning Document(s) when using chat UI

Is your change proposal related to a problem? Please describe.

Right now we're operating based on spaces which contain documents. Another way, which people find useful, is to go from a set of documents first and run questions against them.

The assumption here is that users know which documents to start with. This offers a more targeted use which is helpful for summarisation etc.

Propose the solution you'd like

Step by step:

Allow users to select a set of documents from the document list of a space
They click "query" or something similar as an in-context drop-down menu
It switches back to the Chat UI with an indication that this particular question is within the pre-selected context
Get the answer as always

A further enhancement is to allow selections from documents that live in different spaces.

Describe alternatives you've considered
N/A

Additional context
Users may be able to do it today with spaces created only for this particular set of documents.

DOC: Polish doc site

Is your feature request related to a problem? Please describe.

Adding GA tracking
Add social cards support
Adding more relevant links
Page structure
Styling in general
- markdown styling
anchor link headings
Build process, especially versioned API docs
Define and target user personas/role explicitly (business champion, app admin, infra eng, project contributor etc.)

Describe the solution you'd like

Given the application nature of the project, a comprehensive user guide is expected so we should probably spend more time looking at that part of the doc site, slightly different from a typical OSS.

Describe alternatives you've considered

Happy with gh-pages and the current setup.

Additional context

N/A

INFRA: Azure ARM template - switch Azure OpenAI service to private endpoint

Is your feature request related to a problem? Please describe.
Currently the ARM template deploys Azure OpenAI with public access switched on.

Describe the solution you'd like

Switch to config and use Azure OpenAI over a private endpoint / VNET so that traffic remains within the customer VPC

https://learn.microsoft.com/en-us/azure/cognitive-services/cognitive-services-virtual-networks?context=/azure/cognitive-services/openai/context/context

Describe alternatives you've considered
n/a

Additional context
Add any other context or screenshots about the feature request here.

INFRA: Self-hosted Models

Is your feature request related to a problem? Please describe.

This would solve one problem:

dependency on cloud-vendor-hosted models (Azure OpenAI and GCP PaLM)
And open up one opportunity:
future fine-tuning models under-the-hood

Describe the solution you'd like

Very much an infra challenge. Incorporating it into existing provisioning (ARM for Azure, CF for AWS) would be the first step.

Describe alternatives you've considered

No alternative.

Additional context

Likely to start with models available and popular on HuggingFace, with commercial-friendly licenses such as ASL2 (Falcon, OpenAssistant and Dolly).
See https://github.com/eugeneyan/open-llms

RFC: Ready-made, plugged-in mobile apps

Is your change proposal related to a problem? Please describe.

In certain scenarios, web-based access to Docq may be less desirable than mobile access - think frontline workers as an example.

Propose the solution you'd like

The target here is NOT about simply having a mobile app, but a way to deliver the new capabilities by extensions and plugins (see #9) with minimal effort to the mobile client/channel. This requires a rethink about what we should build as well as how we should build it. A possible direction is on-demand or hot reloading in Docq mobile, as a technical solution.

Cross-platform solutions such as RN and Flutter could be the candidates here.

Another consideration is customizability within mobile apps. If Docq is turnkey then Docq mobile should be turnkey too.

Describe alternatives you've considered

Could use responsive (web) design for mobile web access - unless we're satisfied with the UX it's unlikely to be the solution here.

Additional context

Multi-tenancy support in mobile apps is one to consider, i.e. like Slack where you can load multiple organisations from one Slack mobile app.

RFC: Streamlit hosting - data persistence support

Situation/Problem

Each time Docq is deployed to Streamlit cloud it wipes all the data because isn't using ephemeral storage. So this hosting option can only used for throw away demo mode. It cannot be used for any real customer scenarios. Streamlit hosting is on the low cost and easy end of hosting option. Such as option has a place in a customers journey to adopting Docq.

If we can persist data we can use it for hosting a real usable version for customers that can be used for serious trials/pilots.

Requirments

Components with disk persistence that need to be altered:

SQLite - is using the standard disk-based persistence requiring a mount point
Datasource document list tracking - uses the Python standard lib json which requires a disk mount point
LlamaIndex index - is using the standard disk-based persistence requiring a mount point-
Manual file upload - uses a standard. st.file_uploader returns a byte array which is written to disk using a standard file handler.

Have the ability to configure the deployment to be S3 backed or filesystem mount backed for persistence.

Solution

The high-level approach is to use an S3 bucket as the backing store. This is not a drop in approach. This solution is proposed because there doesn't seem to be a drop in solution. That means Streamlit Cloud doesn't seem to support persistent filesystem mounts.

Each of the components we use that persists data will need to have some sort of support for S3 as a backing store. Below is the S3 backing solution for each component.

SQLite - https://github.com/uktrade/sqlite-s3vfs. Does the concurrency model change. s3vfs makes a point that it doesn't handle concurrent write and needs to be handled in the app.
LlamaIndex - StorageContext can take an instance of fsspec as for persistence via the fs argument. fsspec, S3fs
document list - does the json module support a byte array/stream interface? if so use that together with an S3 interface module like s3fs.
manual file uploads - switch to using fsspec / [s3fs](https://s3fs.readthedocs.io/en/latest/) rather than standard file handler.

fsspec support several backing stores like S3, localfile, GCS, etc.

Alternatives

Simple persistent filesystem mout

This would be the simplest solution and therefore idea. It will require no code changes. However there doesn't seem to be an option for this in Streamlit cloud.

Streamlit file connections

This is unlikely to work given none of the components we use that persist data will not support this interface out of the box.

Streamlit data connections feature abstracts over s3fs hence fsspec. Specifically using S3, Streamlit file connection, and S3fs

See KB article

CORE: Dats source for Hubspot

Is your feature request related to a problem? Please describe.
Ability for an admin to attach communication data related to a customer account(s) in Hubspot

Describe the solution you'd like
This needs to take access authorisation in consideration. We need to have a way to make sure only a user with access to the attached HS account can chat against the Space.

If a customer provides there employees broad access in principal (even if all users don't have a HS license) then multiple accounts can be attached without authz policy enforcement.

Describe alternatives you've considered

Additional context
Add any other context or screenshots about the feature request here.

UI/UX/CORE: Handle long list of files

Is your feature request related to a problem? Please describe.
The UI doesn't handle long lists explicitly. It will be a long list. Likely to hit some performance limits. The indexer will likely also have a noticeable performance impact. Basically, the current design is for a small number of files. The limits are unknown.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

UX: Hidden menu on home screen is confusing

Currently, when a user navigates to the root page the left menu is hidden. The user has to click on one of the links in the middle of the page which prompts for login and show the left menu. These links, in the middle of the home page, open in a new tab. All this is unnecessarily confusing as a user and makes the experience feel clunky.

home

clicking on link

BUG/UI: space selection box not sizing to fit column

Problem:
The space selection UI component should occupy the width of the chat window column. It should also have a solid background when expanded. Likely this is a cross-browser compatibility issue.

Google Chrome: Version 115.0.5790.170 (Official Build) (x86_64)
OS: macOS 12.3.1

CORE: Azure data loader - Switch to our own implementation

Is your feature request related to a problem? Please describe.
Two key reason

remove the external dependency - security (#25 ) and network dependency for download from LlamaHub
support default metadata like file name
support custom metadata - the module we are using doesn't have support to pass in custom metadata like space id, and data source type

Describe the solution you'd like

folk code.

Describe alternatives you've considered

we could also switch to use Apache OpenDAL based loader which has support for S3 and GCS.

Additional context
Add any other context or screenshots about the feature request here.

RFC: Offering API-based external integrations

Is your change proposal related to a problem? Please describe.

Beyond the internal, plugin-based method of integration, Docq should offer external-facing, API-based integration mostly for interfacing with the segregated, indexed private data for other organisational use.

Propose the solution you'd like

Modularising the core part of the application and creating an API front alongside the existing Streamlit-based UI front.

Describe alternatives you've considered

Could hack Streamlit to retrofit an API interface;
Or rewrite UI completely to be API-based therefore rearchitecting the whole application into backend -> API -> frontend.

Additional context

It's a medium-term goal so priority-wise, it should give way to other big features.

UX: Reloading pages forces the users to login

Describe the bug
After login if you hit refresh in the browser you have to login.

Clicking on these manage documents links also open in a new window and force the user to login again

To Reproduce
Steps to reproduce the behavior:

click on General Chat
Login
Hit refresh in your browser

Expected behavior
User remains logged in until the session expires or explicit logout action. Refresh any page shout not force a user to login. It should reload the page as expected.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS: [e.g. Ubuntu 22.04, macOS 11.4]
Code Version main branch latest (27 June 2023) commit c4314d2
Other Runtime Versions (please list below)

Additional context
Add any other context about the problem here.

CORE: secure secrets - data source config and model config secrets

RFC: Plugin platform & ecosystem for custom development by 3rd-parties

Is your feature request related to a problem? Please describe.

Docq is designed to be a general-purpose application, similar to how WordPress began as a blogging engine.
There're many use cases that only need additional features to be added in a particular part of the application to make it more useful for certain roles and/or certain sectors/industries, such as:

Data Ingestion: more SaaS data connectors and network storage locations
Query interface: more from the likes of Slack/Team and other custom-made integration to offer Copilot-like experience
Customising prompting to support different LLM use cases such as coding and writing
Using response differently to offer custom content or even agent-based further actions

Describe the solution you'd like

This needs to be supported by a convenient plugin library, and a hub-like infrastructure to facilitate delivery.

The assumption here is that every plugin will be free (in cost and use). Let's work on non-free if there's enough demand.

Describe alternatives you've considered

The key here is allowing 3rd-party custom development without getting a deep understanding of how Docq core works.

Additional context

Keep it simple, stupid - one function/class to lower the barrier to contribution.

There may be security considerations down the line but we should be able to manage them when they arise.

UX: Move logout button to left nav

Is your feature request related to a problem? Please describe.
The auth_required() method currently is set to render the logout button on every page it's called.

Describe the solution you'd like
Set logout render to false and add a logout option elsewhere like the left nav.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

CORE: Links to sources in shared spaces need to be made space aware

Describe the bug

I (@cwang) added error handling code to wrap around the problem with the function

docq/source/docq/manage_documents.py

Line 65 in b5c7185

 def format_document_sources(source_nodes: list[NodeWithScore], space: SpaceKey) -> str: 

The root cause, afaik is that the code assumes all documents exist in one's personal space, which fails to take documents from shared spaces into consideration.

In order to fix this, the system needs to be able to locate documents not only from a personal space but also any shared space.

To Reproduce
Steps to reproduce the behaviour:

Click "Ask Shared Documents"
Ask a question
See that the source part of the response now says "Unable to list sources" which is the error handing code in action

Expected behaviour

It should work the same way, with correct, downloadable links for all the source references.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS: [e.g. Ubuntu 22.04, macOS 11.4]
Code Version [e.g. 1.1.0]
Other Runtime Versions (please list below)

Additional context

N/A

RFC: Browser extensions/plugins as frontend for Docq

Is your change proposal related to a problem? Please describe.

Integration in organisations' existing software stack is the challenge. Generally one can go deep to work with specific software or float with browser extensions/plugins.

Propose the solution you'd like

Lightweight solutions like browser extensions/plugins would be easiest but be aware of the browser restrictions imposed by enterprise browser management platforms.

Describe alternatives you've considered

Otherwise go deep on actual software in organisations' data stack, not sure if we can justify ROI.

Additional context

Will likely rely on #22

BUG: forgetting recent chat history

Problem

It appears that only a certain amount of shared chat history is being persisted. Once that limit is reached no further history is persisted. This is seen as the most recent chat history missing.

RFC: Adding support for image (and later audio/video) generation models

Is your change proposal related to a problem? Please describe.

Multi-modal is the likely future. Before single multi-modal models mature for real use, we should consider adding support for image generation models which have seen their use in particular sectors.

What Docq could offer in this case is the synergy between text and image generations, from the same set of private docs.

Propose the solution you'd like

Probably built on top model selections, see #12

Describe alternatives you've considered

Consider supporting open source self-hosted image models as the natural next step.

Additional context

As mentioned, single multi-modal models could appear on the horizon soon; they could be easily supported in the current codebase. This RFC is about having different models for specific use cases (text & image generations).

CORE: Having a demo mode

Is your feature request related to a problem? Please describe.

A few problems with a shared demo app like Docq:

Data needs to be wiped from time to time
Some demo data can be helpful, if pre-populated

Describe the solution you'd like

Implementing a demo mode turned on by a flag (env var).
add a permanent banner, visible on every screen, that clearly indicates that this is a shared environment for demo purposes and not to upload sensitive data. In red colour.

Describe alternatives you've considered

N/A

Additional context

Mostly about data preparation and deletion under the demo mode. Welcome other ideas to be built in.

UI: Revisit the chat UI

Describe the bug

It crashes at times in that chat history disappears, i.e. blank.

To Reproduce
Steps to reproduce the behaviour:

Go to "Ask Your Documents" and then "Manage Documents" tab
Upload a doc or delete a doc
Switch back to "Ask" tab
You'll notice that the chat history has gone.
Go to another page and come back you will get the chat history back however.

Expected behaviour

Chat UI should simply function all the time.

Screenshots

Environment (please complete the following information):

OS: MacOS + Chrome
Code Version: 0.0.1
Other Runtime Versions: Python 3.11

Additional context

We're using streamlit-chat which can be swapped out if needed.

INFRA: Cloud-vendor-hosted models

Is your feature request related to a problem? Please describe.

Current Docq only support Openai.com for demo/protoype. This is obviously not private.

Describe the solution you'd like
So we want to drop that and support for cloud vendor serverless models, as below, that can be deployed in a customers' VPC.

Azure OpenAI #33
AWS TitanML/Bedrock #32
GCP PaLM #35

The app needs support added for each of these as the high level API (in LlamaIndex and LangChain are different).

Finally setting section needs to be adjusted to indicate which models are provisioned on the backend hence being used by the app code. Handled in #12

Describe alternatives you've considered

No alternative.

Additional context

Likely via LangChain or another similar library; Unlikely to do it manually by dropping down to the API level.

docqai / docq Goto Github PK

docq's People

Stargazers

Watchers

Forkers

docq's Issues

Recreate

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Current

Solution

Situation/Problem

Requirments

Solution

Alternatives

Simple persistent filesystem mout

Streamlit file connections

Problem

Recommend Projects

Recommend Topics

Recommend Org