danswer-ai / danswer Goto Github PK

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

License: MIT License

Dockerfile 0.25% Python 59.24% Shell 0.23% JavaScript 0.35% TypeScript 39.75% CSS 0.06% Mako 0.02% HTML 0.03% Smarty 0.08%

enterprise-search rag ai-chat chatgpt gen-ai nextjs python information-retrieval

danswer's Introduction

Open Source Gen-AI Chat + Unified Search.

Danswer is the AI Assistant connected to your company's docs, apps, and people. Danswer provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your own control. Danswer is MIT licensed and designed to be modular and easily extensible. The system also comes fully ready for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for configuring Personas (AI Assistants) and their Prompts.

Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc. By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already supported?" or "Where's the pull request for feature Y?"

Usage

Danswer Web App:

DanswerShortChatDemo.mp4

Or, plug Danswer into your existing Slack workflows (more integrations to come 😁):

danswer-slack.mp4

For more details on the Admin UI to manage connectors and users, check out our Full Video Demo!

Deployment

Danswer can easily be run locally (even on a laptop) or deployed on a virtual machine with a single docker compose command. Checkout our docs to learn more.

We also have built-in support for deployment on Kubernetes. Files for that can be found here.

💃 Main Features

Chat UI with the ability to select documents to chat with.
Create custom AI Assistants with different prompts and backing knowledge sets.
Connect Danswer with LLM of your choice (self-host for a fully airgapped solution).
Document Search + AI Answers for natural language queries.
Connectors to all common workplace tools like Google Drive, Confluence, Slack, etc.
Slack integration to get answers and search results directly in Slack.

🚧 Roadmap

Chat/Prompt sharing with specific teammates and user groups.
Multi-Model model support, chat with images, video etc.
Choosing between LLMs and parameters during chat session.
Tool calling and agent configurations options.
Organizational understanding and ability to locate and suggest experts from your team.

Other Noteable Benefits of Danswer

User Authentication with document level access management.
Best in class Hybrid Search across all sources (BM-25 + prefix aware embedding models).
Admin Dashboard to configure connectors, document-sets, access, etc.
Custom deep learning models + learn from user feedback.
Easy deployment and ability to host Danswer anywhere of your choosing.

🔌 Connectors

Efficiently pulls the latest changes from:

Slack
GitHub
Google Drive
Confluence
Jira
Zendesk
Gmail
Notion
Gong
Slab
Linear
Productboard
Guru
Bookstack
Document360
Sharepoint
Hubspot
Local Files
Websites
And more ...

💡 Contributing

Looking to contribute? Please check out the Contribution Guide for more details.

danswer's People

Contributors

Stargazers

Watchers

Forkers

munchbakerydev canferman vgees fujinyu bigbrother666sh jnhu76 ai-jie01 macbie yanniszhou gupta-anubhav12 thesteve0 iwillcodeu mtcto sinjoroking jackleehal sd5884703 kaviarasusakthivadivel sarkarda ksivam pipboyguy ankitmaloo floriank talglobus marksweissma intermag0 overad pkabra sikkgit ivanskodje pablotoribio-beta ssddanbrown singularity9 northrift jesusoctavioas devdoshi millerhooks stephenxxxx send2cloud jamesthesnake radekg longjohncoder dosycorps suryatmodulus riacycle developerisnow c0debrain jjhw abhilashgundlapally nicholasbulka ceifa tonywhite11 admariner dinooojay josegron dorucioclea bmoe872 touristshaun hongping-zh dst1213 techventurebuilder hhy5277 benatsb jiangrongbo kyrolabs petrevane georgeraaa bronwin87 rock2585783isabel bozh123 phaveryan eltociear jakeunsted arapahoe yett5527asamuel omarma04912cpherson iongpt dinodefend samueldoherty9958908 valerakop reneerom5793eo admondguo bartonbet448091ty 9527meredithbrowne beaushae augu029432swarner lynn8035377barnard vergessen123 jaytoday susannat66864uttle aubreyreyn3869olds douglasbo4011yle pushpeshkarki iamiskender mevakab furkanenes19 enesalp19 frodoggg davidcsteves kcwhite jmeng1

danswer's Issues

Installation issue

Things seem to be progressing with the massive download and then I get this error.
https://share.getcloudapp.com/v1uWgO19

Feature Request: Bitbucket/Stash Connector

For companies working with the Atlassian tool suite we need a Bitbucket/Stash connector similar to the github connector.

[FEATURE REQUEST]Document ingestion/chunking settings

Please add an "advanced settings" section on the document upload/ingestion page exposing the chunking options before tokenizing.

More important among them

Chunk size (in characters) New models with better indexing capabilities are appearing and it's very possible we'll get some upgrade on ada 002 witha higher token limit.

Chunk overlap: Having a small overlapping text between the end of one chunk and the start of the other improves vector DB search results.

Metadata edit: columns per chunk: chunk number, document page, document name and, if possible, an optional field to add a some additional info (alternative document address, for example).

Dropbox Paper connector

We use Dropbox Paper as our main store of internal documentation, so this would be super great. Thanks for your awesome work so far!

Feature Request: rocket.chat bot

Issue: Not all enterprises use Slack.

Proposal: Write a bot which talks to the rocket.chat API and answers questions. Mostly a copy of the slack bot with some changed API calls. Refactoring out common code would be a bonus.

How to ingest more than 1Mb of files at once? API ingestion available?

I have thousands of files to be ingested, but currently it will not process any batch bigger than 1Mb.
How can I increase this limit? Or is there an API way to ingest files?

How to use proxy to valid OenAI api eky

I need to use proxy to valid OpenAI api key.
I modify docker_compose.dev.yml as follow:
environment: - http_proxy=http://XXXX - https_proxy=http://XXXX
above operation is operated on every image.
However, I can open the web, but the backend still report error.
How can i set internet proxy to valid openai api key correctly.

Development Enviroment File not Respected

I'd like to disable telemetry for Qdrant:

# Very basic .env file with options that are easy to change. Allows you to deploy everything on a single machine.
# .env is not required unless you wish to change defaults

# Choose between "openai-chat-completion" and "openai-completion"
INTERNAL_MODEL_VERSION=openai-chat-completion

# Use a valid model for the choice above, consult https://platform.openai.com/docs/models/model-endpoint-compatibility
OPENAPI_MODEL_VERSION=gpt-3.5-turbo

# Enable or disable telemetry
QDRANT__TELEMETRY_DISABLED=true

However docker-compose logs still show telemetry being sent:

danswer-stack-vector_db-1      | [2023-06-29T08:46:31.439Z INFO  storage::content_manager::consensus::persistent] Loading raft state from ./storage/raft_state
danswer-stack-vector_db-1      | [2023-06-29T08:46:31.444Z INFO  qdrant] Distributed mode disabled
danswer-stack-vector_db-1      | [2023-06-29T08:46:31.444Z INFO  qdrant] Telemetry reporting enabled, id: 8f118170-18b1-4ed0-b5ae-d4d4d5438b29
danswer-stack-vector_db-1      | [2023-06-29T08:46:31.444Z INFO  qdrant::tonic] Qdrant gRPC listening on 6334
danswer-stack-vector_db-1      | [2023-06-29T08:46:31.444Z INFO  actix_server::builder] Starting 23 workers
danswer-stack-vector_db-1      | [2023-06-29T08:46:31.444Z INFO  actix_server::server] Actix runtime found;

Docker Compose version v2.18.1

Client: Docker Engine - Community
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:52:14 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:52:14 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0```

Hide admin panel

As I'm testing the stack, saw that everyone seems to be an admin. As I have played around, there's something called user connectors. But there seems to be no way to add users (or connectors). Am i missing something on the documentation as well?

Notion Connector

Notion gained tons of popularity as a team wiki tool. Would be great to have a Notion connecter a la Notion Loader

Gmail support

I'm doing a lot of tech support by email and I have a lot of recurrent questions. Is is possible to integrate gmail as well?
Thanks!

Microsoft Teams Connector

For companies that are on Microsoft, you really need a connector for Teams like for Slack.

Feature Request: Fine grain controls over slack integration

Some controls that could be added to the admin panel

Channel to index
- Would allow for only indexing messages from a specific channel.
Index only channels that the bot has been invited to.
- If the bot has been invited to a specific channel, it can only index that channel
Filtering for slack bot so it only searches through specific indexes vs all indexes when asked a question.
- Use-case example: If we index a github repo, we don't necessarily want github responses showing up to a marketing question.

Provide a demo instance / dogfood danswer website search

I'd love to give danswer a spin, and maybe adopt it at work. But I'd prefer to test with some neutral data before committing and connecting work stuff. It would be awesome if you had your own public instance fed with this GitHub repo and your public Slack (and whatever else compatible sources you might already have). This way, potential users and customers could try danswer out right on your homepage, cross-check with the sources (e.g. by joining Slack and comparing chats to search results), and be more easily convinced to jump right in.

Is this something you'd consider doing?

Setup GitHub Actions to build Docker images

Please setup GitHub Actions to build docker images so we don't have to build them for each update.
Thanks

EDIT: Would love a AIO linuxserver.io-like image as well.

Question: what data is shared with external dependencies (OpenAI)

I can't find this information in readme's. OpenAI API key is required for the Danswer to be able to generate answers, which begs the question: which data from connected services are shared with OpenAI?

How to modify the deployment on the server to a port that I can use

Where should I modify the port? I see many places that need to be used

GitLab Connector

Developer Guide for contributing connectors

Hi,

Perhaps you could put together a Developer Guide for contributing connectors and add it to https://docs.danswer.dev/connectors/overview?

I just opened a couple of connector issues myself and I see there were a couple of other ones before me. If it's not too difficult this is where the community and open source could really shine.

I think this approach has worked really well for projects like Ruff for example.

All the best!

无法部署，求大佬看看

我执行命令，但是告诉我没有 -f 这个参数，这是啥意思

我的docker版本是：Docker version 20.10.21, build 20.10.21-0ubuntu1~22.04.3

Local File PDF Support

.txt文件现在使用太局限了，可以支持PDF吗

GPT4ALL 1.0.5 not available on Macbook Pro M1

Looks like there is no Arm64 build of GPT4ALL after 0.1.7.

When I downgrade to 0.1.7 it builds, but it's probably going to generate a few issues here - so here's an issue for tracking.

Solution for now:

Downgrade gpt4all in backend/requirements/default.txt to 0.1.7

Further issue causes the stack not to start when using 0.1.7 - perhaps we could switch off GPT4ALL in backend/danswer/configs/app_configs.py ?

danswer-stack-api_server-1     | Traceback (most recent call last):
danswer-stack-api_server-1     |   File "/usr/local/bin/uvicorn", line 8, in <module>
danswer-stack-api_server-1     |     sys.exit(main())
danswer-stack-api_server-1     |              ^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
danswer-stack-api_server-1     |     return self.main(*args, **kwargs)
danswer-stack-api_server-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
danswer-stack-api_server-1     |     rv = self.invoke(ctx)
danswer-stack-api_server-1     |          ^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
danswer-stack-api_server-1     |     return ctx.invoke(self.callback, **ctx.params)
danswer-stack-api_server-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
danswer-stack-api_server-1     |     return __callback(*args, **kwargs)
danswer-stack-api_server-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/uvicorn/main.py", line 403, in main
danswer-stack-api_server-1     |     run(
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/uvicorn/main.py", line 568, in run
danswer-stack-api_server-1     |     server.run()
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 59, in run
danswer-stack-api_server-1     |     return asyncio.run(self.serve(sockets=sockets))
danswer-stack-api_server-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
danswer-stack-api_server-1     |     return runner.run(main)
danswer-stack-api_server-1     |            ^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
danswer-stack-api_server-1     |     return self._loop.run_until_complete(task)
danswer-stack-api_server-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
danswer-stack-api_server-1     |     return future.result()
danswer-stack-api_server-1     |            ^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 66, in serve
danswer-stack-api_server-1     |     config.load()
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/uvicorn/config.py", line 471, in load
danswer-stack-api_server-1     |     self.loaded_app = import_from_string(self.app)
danswer-stack-api_server-1     |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/uvicorn/importer.py", line 21, in import_from_string
danswer-stack-api_server-1     |     module = importlib.import_module(module_str)
danswer-stack-api_server-1     |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
danswer-stack-api_server-1     |     return _bootstrap._gcd_import(name[level:], package, level)
danswer-stack-api_server-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
danswer-stack-api_server-1     |   File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
danswer-stack-api_server-1     |   File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
danswer-stack-api_server-1     |   File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
danswer-stack-api_server-1     |   File "<frozen importlib._bootstrap_external>", line 940, in exec_module
danswer-stack-api_server-1     |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
danswer-stack-api_server-1     |   File "/app/danswer/main.py", line 26, in <module>
danswer-stack-api_server-1     |     from danswer.direct_qa import get_default_backend_qa_model
danswer-stack-api_server-1     |   File "/app/danswer/direct_qa/__init__.py", line 6, in <module>
danswer-stack-api_server-1     |     from danswer.direct_qa.gpt_4_all import GPT4AllChatCompletionQA
danswer-stack-api_server-1     |   File "/app/danswer/direct_qa/gpt_4_all.py", line 18, in <module>
danswer-stack-api_server-1     |     from gpt4all import GPT4All  # type:ignore
danswer-stack-api_server-1     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/gpt4all/__init__.py", line 1, in <module>
danswer-stack-api_server-1     |     from . import gpt4all # noqa
danswer-stack-api_server-1     |     ^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/gpt4all/gpt4all.py", line 6, in <module>
danswer-stack-api_server-1     |     from . import pyllmodel
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/gpt4all/pyllmodel.py", line 39, in <module>
danswer-stack-api_server-1     |     llmodel, llama = load_llmodel_library()
danswer-stack-api_server-1     |                      ^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/site-packages/gpt4all/pyllmodel.py", line 32, in load_llmodel_library
danswer-stack-api_server-1     |     llama_lib = ctypes.CDLL(llama_dir, mode=ctypes.RTLD_GLOBAL)
danswer-stack-api_server-1     |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     |   File "/usr/local/lib/python3.11/ctypes/__init__.py", line 376, in __init__
danswer-stack-api_server-1     |     self._handle = _dlopen(self._name, mode)
danswer-stack-api_server-1     |                    ^^^^^^^^^^^^^^^^^^^^^^^^^
danswer-stack-api_server-1     | OSError: /usr/local/lib/python3.11/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libllama.so: cannot open shared object file: No such file or directory
danswer-stack-api_server-1 exited with code 0

Add Paperless-ngx Connector

I'd love to see the addition of a Connector for the Paperless-ngx open-source document management system.

Google Drive PDF Parsing Issue

I started to watch the /var/log/update.log on danswer/danswer-background and noticed the following exceptions raised:

07/15/2023 04:23:59 PM            update.py  95 : Starting new indexing attempt for connector: 'GoogleDriveConnector', with config: '{}', and with credentials: '[6]'
07/15/2023 04:24:00 PM    connector_auth.py  49 : Refreshed Google Drive tokens.
07/15/2023 04:24:01 PM         connector.py  75 : Parseable Documents in batch: ['2023 Consolidated SS + Flick - Financial Model', 'XX Cash Flow 2023', '351618_UASD3PGZ (3).pdf', '351618_UASD3PGZ (2).pdf', '351618_UASD3PGZ (1).pdf', '351618_UASD3PGZ.pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (7).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (6).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (5).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (4).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (3).pdf']
07/15/2023 04:24:17 PM             store.py 159 : Indexed 13 chunks into Typesense collection 'danswer_index', number failed: 0
07/15/2023 04:24:22 PM            timing.py  29 : encode_chunks took 5.2996666431427 seconds
07/15/2023 04:24:22 PM          indexing.py 167 : Indexed 13 chunks into Qdrant collection 'danswer_index', status: UpdateStatus.COMPLETED
07/15/2023 04:24:22 PM indexing_pipeline.py  44 : Indexed 0 new documents
07/15/2023 04:24:23 PM         connector.py  75 : Parseable Documents in batch: ['XX_SPL001_waybill_UASD3PGZ_A5 (2).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5 (1).pdf', 'XX_SPL001_waybill_UASD3PGZ_A5.pdf', 'Order_351619_waybill (16).pdf', 'Order_351619_waybill (15).pdf', 'Order_351619_waybill (14).pdf', 'Order_351619_waybill (13).pdf', 'Order_351619_waybill (12).pdf', 'Order_351619_waybill (11).pdf', 'Order_351619_waybill (10).pdf', 'Order_351619_waybill (9).pdf', 'Order_351619_waybill (8).pdf']
07/15/2023 04:24:28 PM            update.py 176 : Indexing job with id 96 failed due to EOF marker not found
Traceback (most recent call last):
  File "/app/danswer/background/update.py", line 155, in run_indexing_jobs
    for doc_batch in doc_batch_generator:
  File "/app/danswer/connectors/google_drive/connector.py", line 165, in poll_source
    yield from self._fetch_docs_from_drive(start, end)
  File "/app/danswer/connectors/google_drive/connector.py", line 144, in _fetch_docs_from_drive
    text_contents = extract_text(file, service)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/connectors/google_drive/connector.py", line 101, in extract_text
    pdf_reader = PdfReader(pdf_stream)
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/PyPDF2/_reader.py", line 319, in __init__
    self.read(stream)
  File "/usr/local/lib/python3.11/site-packages/PyPDF2/_reader.py", line 1415, in read
    self._find_eof_marker(stream)
  File "/usr/local/lib/python3.11/site-packages/PyPDF2/_reader.py", line 1471, in _find_eof_marker
    raise PdfReadError("EOF marker not found")
PyPDF2.errors.PdfReadError: EOF marker not found

答非所问？？为啥我用出来和演示的不一样

Feature Request: Delete Indexed Files

Issue:
There's currently no way to delete specific indexed files without removing the entire database. As it stands, if there's an error or if a file becomes irrelevant, the only solution is to clear the whole database, which isn't ideal.

Proposed Solution:
The ability to delete individual indexed files. This would mean adding a delete option in the interface that would remove the file from the system's database, Vector DB (Qdrant), and the search engine (Typesense).

install fail

root@VM-8-13-centos docker_compose]# docker compose -f docker-compose.dev.yml -p danswer-stack up -d --pull always --force-recreate
unknown shorthand flag: 'f' in -f
See 'docker --help'.

Usage: docker [OPTIONS] COMMAND

A self-sufficient runtime for containers

Options:
--config string Location of client config files (default "/root/.docker")
-c, --context string Name of the context to use to connect to the daemon (overrides DOCKER_HOST env var and default context set with "docker context use")
-D, --debug Enable debug mode
-H, --host list Daemon socket(s) to connect to
-l, --log-level string Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")
--tls Use TLS; implied by --tlsverify
--tlscacert string Trust certs signed only by this CA (default "/root/.docker/ca.pem")
--tlscert string Path to TLS certificate file (default "/root/.docker/cert.pem")
--tlskey string Path to TLS key file (default "/root/.docker/key.pem")
--tlsverify Use TLS and verify the remote
-v, --version Print version information and quit

Management Commands:
app* Docker App (Docker Inc., v0.9.1-beta3)
builder Manage builds
buildx* Build with BuildKit (Docker Inc., v0.5.1-docker)
config Manage Docker configs
container Manage containers
context Manage contexts
image Manage images
manifest Manage Docker image manifests and manifest lists
network Manage networks
node Manage Swarm nodes
plugin Manage plugins
secret Manage Docker secrets
service Manage services
stack Manage Docker stacks
swarm Manage Swarm
system Manage Docker
trust Manage trust on Docker images
volume Manage volumes

Commands:
attach Attach local standard input, output, and error streams to a running container
build Build an image from a Dockerfile
commit Create a new image from a container's changes
cp Copy files/folders between a container and the local filesystem
create Create a new container
diff Inspect changes to files or directories on a container's filesystem
events Get real time events from the server
exec Run a command in a running container
export Export a container's filesystem as a tar archive
history Show the history of an image
images List images
import Import the contents from a tarball to create a filesystem image
info Display system-wide information
inspect Return low-level information on Docker objects
kill Kill one or more running containers
load Load an image from a tar archive or STDIN
login Log in to a Docker registry
logout Log out from a Docker registry
logs Fetch the logs of a container
pause Pause all processes within one or more containers
port List port mappings or a specific mapping for the container
ps List containers
pull Pull an image or a repository from a registry
push Push an image or a repository to a registry
rename Rename a container
restart Restart one or more containers
rm Remove one or more containers
rmi Remove one or more images
run Run a command in a new container
save Save one or more images to a tar archive (streamed to STDOUT by default)
search Search the Docker Hub for images
start Start one or more stopped containers
stats Display a live stream of container(s) resource usage statistics
stop Stop one or more running containers
tag Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
top Display the running processes of a container
unpause Unpause all processes within one or more containers
update Update configuration of one or more containers
version Show the Docker version information
wait Block until one or more containers stop, then print their exit codes

Run 'docker COMMAND --help' for more information on a command.

To get more help with docker, check out our guides at https://docs.docker.com/go/guides/

Google Drive: Scope files to a specific root folder in My Drive / Team Drive

It would be nice to be able to give the Google Drive connector a specific folder (whether under My Drive or a Shared Drive), as I'm sure there are certain folders in most cases that should not be indexed.

images can't be displayed properly

Google Drive: Can't manage connection in error state

I'm getting this in the UI. Seems due to the folder scoping being added? After removing volumes and starting again, it worked fine.

Chatwoot and Whatsapp connector

Chatwoot already connects to various channels, like whatsapp, and has integrations with Chatbots like Dialogflow, Rasa and could be useful for automating responses on such channels, by integrating with the Chatwoot API.

Docker: Can't build the project

Latest main code isn’t building so healthy for me, even after removing all volumes & containers. Not sure if it's just a me problem.

=> CACHED [web_server builder 4/4] RUN npm run build                                                                                                                                                        0.0s
 => CANCELED [web_server runner 4/6] COPY --from=builder /app/public ./public                                                                                                                                0.0s
 => ERROR [web_server runner 5/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./                                                                                                         0.0s
 => ERROR [web_server runner 6/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static                                                                                                 0.0s
------
 > [web_server runner 5/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./:
------
------
 > [web_server runner 6/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static:
------
failed to solve: failed to compute cache key: failed to calculate checksum of ref 48a78c11-0415-4a0b-84fe-5522179bfa68::vvvkeyv4m03y3pjgwjywf5yn8: "/app/.next/static": not found

Fixed it by adding the below line to web/Dockerfile

COPY --from=builder --chown=nextjs:nodejs /app/.next ./.next

Slack Connector issue with Conversations API and 'is_archived'

The Slack API doesn't like 'is_archived' conversations and we should filter them out.

https://api.slack.com/methods/conversations.join

Slack API doc:

Common error response
Typical error response if the conversation is archived and cannot be joined

{
    "ok": false,
    "error": "is_archived"
}

07/18/2023 03:52:12 PM         connector.py 173 : Pulled 22300 documents from slack channel alerts-trials
07/18/2023 03:52:14 PM            update.py 176 : Indexing job with id 240 failed due to The request to the Slack API failed. (url: https://www.slack.com/api/conversations.join)
The server responded with: {'ok': False, 'error': 'is_archived'}
Traceback (most recent call last):
  File "/app/danswer/background/update.py", line 155, in run_indexing_jobs
    for doc_batch in doc_batch_generator:
  File "/app/danswer/connectors/slack/connector.py", line 293, in poll_source
    for document in get_all_docs(
  File "/app/danswer/connectors/slack/connector.py", line 150, in get_all_docs
    for message_batch in channel_message_batches:
  File "/app/danswer/connectors/slack/connector.py", line 64, in get_channel_messages
    client.conversations_join(
  File "/usr/local/lib/python3.11/site-packages/slack_sdk/web/client.py", line 2453, in conversations_join
    return self.api_call("conversations.join", params=kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/slack_sdk/web/base_client.py", line 156, in api_call
    return self._sync_send(api_url=api_url, req_args=req_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/slack_sdk/web/base_client.py", line 187, in _sync_send
    return self._urllib_api_call(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/slack_sdk/web/base_client.py", line 317, in _urllib_api_call
    ).validate()
      ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/slack_sdk/web/slack_response.py", line 199, in validate
    raise e.SlackApiError(message=msg, response=self)
slack_sdk.errors.SlackApiError: The request to the Slack API failed. (url: https://www.slack.com/api/conversations.join)
The server responded with: {'ok': False, 'error': 'is_archived'}

confluence space cannot be imported

error `GPT hurt itself in its confusion :(` - AI search doesn't work

I've installed on localhost
Added

github repo https://github.com/danswer-ai/danswer,
2 websites https://docs.danswer.dev/introduction + https://glarity.app/

and try to search "what is danswer" with AI search

Expected output: it use vector search of github repo readme.md and answer but unfortunately no(

error is GPT hurt itself in its confusion :(

of course I've added workable openai api key.

logs:

INFO:     100.43.95.255:57824 - "POST /stream-direct-qa HTTP/1.1" 200 OK
07/14/2023 04:55:05 AM    search_backend.py 107 : Received QA query: What is danswer ?
07/14/2023 04:55:05 AM            timing.py  29 : query_intent took 0.09182906150817871 seconds
07/14/2023 04:55:05 AM            timing.py  29 : semantic_retrieval took 0.03994250297546387 seconds
07/14/2023 04:55:06 AM            timing.py  29 : semantic_reranking took 0.40392088890075684 seconds
07/14/2023 04:55:06 AM   semantic_search.py  86 : Top links from semantic search: https://docs.danswer.dev/introduction, https://docs.danswer.dev/introduction, https://glarity.app/, https://glarity.app/
07/14/2023 04:55:06 AM            timing.py  29 : retrieve_ranked_documents took 0.4441695213317871 seconds
INFO:     100.43.95.255:58934 - "GET /users/me HTTP/1.1" 401 Unauthorized
INFO:     100.43.95.255:58940 - "GET /manage/connector HTTP/1.1" 401 Unauthorized
INFO:     100.43.95.255:58940 - "GET /auth/google/authorize HTTP/1.1" 200 OK
INFO:     100.43.95.255:58934 - "GET /users/me HTTP/1.1" 401 Unauthorized
INFO:     100.43.95.255:51906 - "POST /stream-direct-qa HTTP/1.1" 200 OK
07/14/2023 04:55:43 AM    search_backend.py 107 : Received QA query: danswer
07/14/2023 04:55:43 AM            timing.py  29 : query_intent took 0.12076759338378906 seconds
07/14/2023 04:55:44 AM            timing.py  29 : retrieve_keyword_documents took 1.2382943630218506 seconds
INFO:     100.43.95.255:35708 - "GET /health HTTP/1.1" 200 OK
INFO:     100.43.95.255:35720 - "POST /stream-direct-qa HTTP/1.1" 200 OK
07/14/2023 04:57:04 AM    search_backend.py 107 : Received QA query: What is danswer?
07/14/2023 04:57:04 AM            timing.py  29 : query_intent took 0.11964130401611328 seconds
07/14/2023 04:57:04 AM            timing.py  29 : semantic_retrieval took 0.04374074935913086 seconds
07/14/2023 04:57:04 AM            timing.py  29 : semantic_reranking took 0.1724071502685547 seconds
07/14/2023 04:57:04 AM   semantic_search.py  86 : Top links from semantic search: https://docs.danswer.dev/introduction, https://docs.danswer.dev/introduction, https://glarity.app/
07/14/2023 04:57:04 AM            timing.py  29 : retrieve_ranked_documents took 0.21661090850830078 seconds
INFO:     100.43.95.255:56106 - "GET /health HTTP/1.1" 200 OK
INFO:     100.43.95.255:56122 - "GET /manage/admin/connector/indexing-status HTTP/1.1" 200 OK
INFO:     100.43.95.255:56144 - "GET /health HTTP/1.1" 200 OK
INFO:     100.43.95.255:56138 - "GET /manage/credential HTTP/1.1" 200 OK
INFO:     100.43.95.255:58996 - "GET /health HTTP/1.1" 200 OK
INFO:     100.43.95.255:59006 - "GET /health HTTP/1.1" 200 OK
INFO:     100.43.95.255:59032 - "GET /manage/credential HTTP/1.1" 200 OK
INFO:     100.43.95.255:59020 - "GET /manage/admin/connector/indexing-status HTTP/1.1" 200 OK
INFO:     100.43.95.255:33780 - "GET /health HTTP/1.1" 200 OK
INFO:     100.43.95.255:35170 - "GET /health HTTP/1.1" 200 OK

Allow contribution of new Connector implementations

Are there any plans to allow contributions of new Connectors, or some other way of contributing to Connector availability?

Outlook 365 Connector

Similar to #137

Not able to install using docker compose

Stack Trace:

1223.9 Collecting nvidia-nccl-cu11==2.14.3 (from torch>=1.6.0->sentence-transformers==2.2.2->-r /tmp/requirements.txt (line 32))
1223.9 Downloading nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)
1249.8 ━━━━━━━━━━━━━╸ 60.3/177.1 MB 4.8 MB/s eta 0:00:25
1249.8 ERROR: Exception:
1249.8 Traceback (most recent call last):
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/urllib3/response.py", line 438, in _error_catcher
1249.8 yield
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/urllib3/response.py", line 561, in read
1249.8 data = self._fp_read(amt) if not fp_closed else b""
1249.8 ^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/urllib3/response.py", line 527, in _fp_read
1249.8 return self._fp.read(amt) if amt is not None else self._fp.read()
1249.8 ^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/http/client.py", line 466, in read
1249.8 s = self.fp.read(amt)
1249.8 ^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/socket.py", line 706, in readinto
1249.8 return self._sock.recv_into(b)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/ssl.py", line 1278, in recv_into
1249.8 return self.read(nbytes, buffer)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/ssl.py", line 1134, in read
1249.8 return self._sslobj.read(len, buffer)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 TimeoutError: The read operation timed out
1249.8
1249.8 During handling of the above exception, another exception occurred:
1249.8
1249.8 Traceback (most recent call last):
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/cli/base_command.py", line 169, in exc_logging_wrapper
1249.8 status = run_func(*args)
1249.8 ^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/cli/req_command.py", line 248, in wrapper
1249.8 return func(self, options, args)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/commands/install.py", line 377, in run
1249.8 requirement_set = resolver.resolve(
1249.8 ^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
1249.8 result = self._result = resolver.resolve(
1249.8 ^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
1249.8 state = resolution.resolve(requirements, max_rounds=max_rounds)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/resolvelib/resolvers.py", line 427, in resolve
1249.8 failure_causes = self._attempt_to_pin_criterion(name)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/resolvelib/resolvers.py", line 239, in _attempt_to_pin_criterion
1249.8 criteria = self._get_updated_criteria(candidate)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/resolvelib/resolvers.py", line 230, in _get_updated_criteria
1249.8 self._add_to_criteria(criteria, requirement, parent=candidate)
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
1249.8 if not criterion.candidates:
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/resolvelib/structs.py", line 156, in bool
1249.8 return bool(self._sequence)
1249.8 ^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in bool
1249.8 return any(self)
1249.8 ^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in
1249.8 return (c for c in iterator if id(c) not in self._incompatible_ids)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
1249.8 candidate = func()
1249.8 ^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 206, in _make_candidate_from_link
1249.8 self._link_candidate_cache[link] = LinkCandidate(
1249.8 ^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 293, in init
1249.8 super().init(
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in init
1249.8 self.dist = self._prepare()
1249.8 ^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
1249.8 dist = self._prepare_distribution()
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 304, in _prepare_distribution
1249.8 return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/operations/prepare.py", line 516, in prepare_linked_requirement
1249.8 return self._prepare_linked_requirement(req, parallel_builds)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/operations/prepare.py", line 587, in _prepare_linked_requirement
1249.8 local_file = unpack_url(
1249.8 ^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/operations/prepare.py", line 166, in unpack_url
1249.8 file = get_http_url(
1249.8 ^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/operations/prepare.py", line 107, in get_http_url
1249.8 from_path, content_type = download(link, temp_dir.path)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/network/download.py", line 147, in call
1249.8 for chunk in chunks:
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/cli/progress_bars.py", line 53, in _rich_progress_bar
1249.8 for chunk in iterable:
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_internal/network/utils.py", line 63, in response_chunks
1249.8 for chunk in response.raw.stream(
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/urllib3/response.py", line 622, in stream
1249.8 data = self.read(amt=amt, decode_content=decode_content)
1249.8 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/urllib3/response.py", line 560, in read
1249.8 with self._error_catcher():
1249.8 File "/usr/local/lib/python3.11/contextlib.py", line 155, in exit
1249.8 self.gen.throw(typ, value, traceback)
1249.8 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/urllib3/response.py", line 443, in _error_catcher
1249.8 raise ReadTimeoutError(self._pool, None, "Read timed out.")
1249.8 pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

failed to solve: process "/bin/sh -c pip install --no-cache-dir --upgrade -r /tmp/requirements.txt" did not complete successfully: exit code: 2

Zulip connector

I could really use a Zulip connector for this. I wonder if anyone else would like one?

Airtable Connector

Feature Request: Skip Login Button

To enable to SSO (single sign on) we would like an option to direct the user to the login provider (i.e. google or another OIDC provider) instead of showing a login button. First that button provides little to no value and second it breaks single sign on. Combined with #225 this would enable a relogin without a user noticing (in most cases).

Adding metrics to the backend

It might be a good idea to add some metrics about the internals of Danswer to make the system observable. Since this is already Dockerized and has a foot in Kubernetes, prometheus might be a good choice here

Support for local/opensource Models

Dose it support private deployment with offline environment?

Quickstart docker compose hangs on relational db

The quickstart instructions recommend two different approaches:
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --pull always --force-recreate
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --build --force-recreate

These work almost all the way, but get stuck while trying to start danswer-stack-relational_db-1

 => CACHED [web_server runner 5/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./                                                                                                                                                                              0.0s
 => CACHED [web_server runner 6/6] COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static                                                                                                                                                                      0.0s
 => [web_server] exporting to image                                                                                                                                                                                                                                                0.0s
 => => exporting layers                                                                                                                                                                                                                                                            0.0s
 => => writing image sha256:1bf5ef8a1d68f13ee6d7431887c9e38634e0db6ed2a0b0a7df69f9f47c9c63bc                                                                                                                                                                                       0.0s
 => => naming to docker.io/danswer/danswer-web-server:latest                                                                                                                                                                                                                       0.0s
[+] Running 6/7
 ⠿ Container danswer-stack-relational_db-1  Starting                                                                                                                                                                                                                              20.6s
 ✔ Container danswer-stack-search_engine-1  Started                                                                                                                                                                                                                               10.8s
 ✔ Container danswer-stack-vector_db-1      Started                                                                                                                                                                                                                               10.7s
 ✔ Container danswer-stack-background-1     Recreated                                                                                                                                                                                                                              0.1s
 ✔ Container danswer-stack-api_server-1     Recreated                                                                                                                                                                                                                              0.2s
 ✔ Container danswer-stack-web_server-1     Recreated                                                                                                                                                                                                                              0.1s
 ✔ Container danswer-stack-nginx-1          Recreated                                                                                                                                                                                                                              0.1s

There is no printed error, but the container never starts correctly.

Docker version:
Docker version 24.0.2, build cb74dfc

Azure OpenAI integration

From what I can tell all this tool does is pull relevant chunks of text from, more often than not, proprietary data and sends it to openai's completion api to get a coherent reply--

IMO it needs to integrate something like Azure OpenAI which lets you host your own version of their models in a cloud environment, so you essentially get your own private server which all the requests will be routed to.
either that or like others have pointed out, you allow for either locally hosted models or custom apis.