Giter Club home page Giter Club logo

azure-sql-db-openai's Introduction

page_type languages products urlFragment name description
sample
sql
azure-openai
azure-sql-database
azure-sql-db-openai
Vector similarity search with Azure SQL & Azure OpenAI
Use Azure OpenAI from Azure SQL database to get the vector embeddings of any choosen text, and then calculate the cosine similarity to find related topics

Vector similarity search with Azure SQL & Azure OpenAI

This example shows how to use Azure OpenAI from Azure SQL database to get the vector embeddings of any choosen text, and then calculate the cosine similarity against the Wikipedia articles (for which vector embeddings have been already calculated,) to find the articles that covers topics that are close - or similar - to the provided text.

For an introduction on text and code embeddings, check out this OpenAI article: Introducing text and code embeddings.

Azure SQL database can be used to easily and quickly perform vector similarity search. There are two options for this: a native option and a classic option.

The native option is to use the new Vector Functions, recently introduced in Azure SQL database. Vector Functions are a set of functions that can be used to perform vector operations directly in the database.

Note

Vector Functions are in Early Adopter Preview. Get access to the preview via https://aka.ms/azuresql-vector-eap-announcement

The classic option is to use the classic T-SQL to perform vector operations, with the support for columnstore indexes for getting good performances.

Important

This branch (the main branch) uses the native vector support in Azure SQL. If you want to use the classic T-SQL, switch to the classic branch.

Download and import the Wikipedia Article with Vector Embeddings

Download the wikipedia embeddings from here, unzip it and upload it (using Azure Storage Explorer for example) to an Azure Blob Storage container.

In the example the unzipped csv file vector_database_wikipedia_articles_embedded.csv is assumed to be uploaded to a blob container name playground and in a folder named wikipedia.

Once the file is uploaded, get the SAS token to allow Azure SQL database to access it. (From Azure storage Explorer, right click on the playground container and than select Get Shared Access Signature. Set the expiration date to some time in future and then click on "Create". Copy the generated query string somewhere, for example into the Notepad, as it will be needed later)

Use a client tool like Azure Data Studio to connect to an Azure SQL database and then use the ./vector-embeddings/01-import-wikipedia.sql to create the wikipedia_articles_embeddings where the uploaded CSV file will be imported.

Make sure to replace the <account> and <sas-token> placeholders with the value correct for your environment:

  • <account> is the name of the storage account where the CSV file has been uploaded
  • <sas-token> is the Share Access Signature obtained before

Run each section (each section starts with a comment) separately. At the end of the process (will take up to a couple of minutes) you will have all the CSV data imported in the wikipedia_articles_embeddings table.

Add embeddings columns to table

In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into more compact and optimzed binary format index. Thanks to JSON_ARRAY_TO_VECTOR, turning a vector into a set of values that can be saved into a column is very easy:

alter table wikipedia_articles_embeddings
add title_vector_native varbinary(8000);

update 
    wikipedia_articles_embeddings
set 
    title_vector_native = json_array_to_vector(title_vector),

The script ./vector-embeddings/02-use-native-vectors.sql does exactly that. It takes the existing columns with vectors stored in JSON arrays and turns them into vectors saved in binary format.

Find similar articles by calculating cosine distance

Make sure to have an Azure OpenAI embeddings model deployed and make sure it is using the text-embedding-ada-002 model.

Once the Azure OpenAI model is deployed, it can be called from Azure SQL database using sp_invoke_external_rest_endpoint, to get the embedding vector for the "the foundation series by isaac asimov", text, for example, using the following code (make sure to replace the <your-api-name> and <api-key> with yout Azure OpenAI deployment):

declare @inputText nvarchar(max) = 'the foundation series by isaac asimov';
declare @retval int, @response nvarchar(max);
declare @payload nvarchar(max) = json_object('input': @inputText);
exec @retval = sp_invoke_external_rest_endpoint
    @url = 'https://<your-api-name>.openai.azure.com/openai/deployments/<deployment-id>/embeddings?api-version=2023-03-15-preview',
    @method = 'POST',
    @headers = '{"api-key":"<api-key>"}',
    @payload = @payload,
    @response = @response output;
select @response;

The vector returned in the response can extrated using json_query:

set @re = json_query(@response, '$.result.data[0].embedding')

Now is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculate the cosine similarity. The math can be easily expressed in T-SQL:

vector_distance('cosine', @embedding, title_vector) 

Encapsulating logic to retrieve embeddings

The described process can be wrapped into stored procedures to make it easy to re-use it. The scripts in the ./vector-embeddings/ show how to create a stored procedure to retrieve the embeddings from OpenAI:

  • 03-store-openai-credentials.sql: stores the Azure OpenAI credentials in the Azure SQL database
  • 04-create-get-embeddings-procedure.sql: create a stored procedure to encapsulate the call to OpenAI using the script.

Finding similar articles

The script 05-find-similar-articles.sql uses the created stored procedure and the process explained above to find similar articles to the provided text.

Encapsulating logic to do similarity search

To make it even easier to use, the script 06-sample-function.sql shows a sample function that can be used to find similar articles by just providing the text, as demonstrated in script 07-sample-function-usage with the following example:

declare @embedding varbinary(8000);
declare @text nvarchar(max) = N'the foundation series by isaac asimov';

exec dbo.get_embedding 'embeddings', @text, embedding output;

select top(10)
    a.id,
    a.title,
    a.url,
    vector_distance('cosine', @embedding, title_vector) cosine_distance
from
    dbo.wikipedia_articles_embeddings a
order by
    cosine_distance;

Alternative sample with Python and a local embedding model

If you don't want or can't use OpenAI to generate embeddings, you can use a local model like https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1 to generate embeddings. The Python script ./python/hybrid_search.py shows how to

Make sure to setup the database for this sample using the ./python/00-setup-database.sql script. Database can be either an Azure SQL DB or a SQL Server database.

Conclusions

Azure SQL database, has now support to perform vector operations directly in the database, making it easy to perform vector similarity search. Using vector search along with fulltext search and BM25 ranking, it is possible to build powerful search engines that can be used in a variety of scenarios.

Note

Vector Functions are in Early Adopter Preview. Get access to the preview via https://aka.ms/azuresql-vector-eap-announcement

More resources

azure-sql-db-openai's People

Contributors

7effrey89 avatar adefwebserver avatar joncoelloaccess avatar microsoft-github-operations[bot] avatar microsoftopensource avatar mwiemer-microsoft avatar yorek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-sql-db-openai's Issues

Missed Embeddings in the 03-find-similar-articles.sql (line 19) Rest API URL

Please provide us with the following information:

This issue is for a: (mark with an x)

- [X] bug report -> please search issues before submitting
- [ ] feature request
- [X] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful

The following line of code misses the keyword embeddings after . It is however present in the subsequent functions code etc. Its minor, I just wanted to acknowledge this good stuff and correct the error in the script
Replace
'https://.openai.azure.com/openai/deployments/?api-version=2023-03-15-preview'
with
'https://.openai.azure.com/openai/deployments//embeddings?api-version=2023-03-15-preview'


Thanks! We'll be in touch soon.

calling sp_invoke_external_rest_endpoint for a private endpoint

Not sure if this is available and not documented, or an unsupported configuration, but I'd like to be able to use sp_invoke_external_rest_endpoint against a private endpoint rather than the public endpoint for my Azure openai service.

With public endpoints disabled, and a private endpoint available, a sql server in the same vnet hits the services public endpoint and there doesn't seem to be any well documented mechanism to specify outbound routing - all network config options relate to inbound

Is there any way to achieve this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.