Jul 27, 2025

Vector Search and Vector Index - working steps (demo)

 -- Setup Oracle 23ai 

Click Here : Oracle 23ai Database setup

Small Introduction about vector Database and Vector Search:

- Vector Database 

A vector database is a specialized type of database designed to store, index, and query data represented as high-dimensional vectors, also known as vector embeddings. These embeddings are numerical representations of various data types, such as text, images, audio, or other complex objects, generated by machine learning models like large language models (LLMs)

- Vector Search:

Vector search is a method of information retrieval that leverages these vector embeddings to find data points that are semantically similar to a given query. Instead of relying on exact keyword matches, vector search calculates the "distance" or "similarity" between the query vector and the vectors stored in the database. This allows for more intelligent and context-aware searches, enabling use cases like:

1) Semantic Search
2) Image and Audio Recognition
3) Retrieval-Augmented Generation (RAG)
4) Recommendation Systems

Vector databases and vector search are crucial components in modern AI applications, facilitating efficient and effective handling of unstructured and semi-structured data for various machine learning tasks.

Most Important part in AI is Retrieval-augmented generation (RAG). RAG is a technique that enhances the performance of large language models (LLMs) by combining them with an external knowledge base. Instead of relying solely on their pre-trained knowledge, RAG systems retrieve relevant information from external sources and incorporate it into the generation process, resulting in more accurate, informed, and up-to-date responses. 

Note: Create your own file system and directories to test the scenarios

Let us start the POC on loading LLM, Creating vector and generate test case on vector search.

Step#01 - create a directory to hold the model, download it, and unzip the model into that directory.

$ mkdir -p /u04/23ai_activity/models
$ cd /u04/23ai_activity/models
$ wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip

unzip -oq all_MiniLM_L12_v2_augmented.zip

Step#02 - Connect to the database, create a test user, create a directory object pointing to the physical directory, and grant access to it for our test user.

SQL> alter session set container=FREEPDB1;
Session altered.

- create user to test scenarios ( use strong password for the user)

create user if not exists testuser1 identified by testuser1 quota unlimited on users;
grant create session, db_developer_role, create mining model to testuser1;
create or replace directory model_dir as '/u04/23ai_activity/models';
grant read, write on directory model_dir to testuser1;



Step#3 - Now load the model into the database using the DBMS_VECTOR package.

sqlplus /nolog
conn testuser1/testuser1@//localhost:1521/freepdb1

SQL>
begin
  dbms_vector.drop_onnx_model (
    model_name => 'ALL_MINILM_L12_V2',
    force => true);
  dbms_vector.load_onnx_model (
    directory  => 'model_dir',
    file_name  => 'all_MiniLM_L12_v2.onnx',
    model_name => 'ALL_MINILM_L12_V2');
end;
/
PL/SQL procedure successfully completed.


- Now  see the model information in the USER_MINING_MODELS view.

column model_name format a30
column algorithm format a10
column mining_function format a15
select model_name, algorithm, mining_function
from   user_mining_models
where  model_name = 'ALL_MINILM_L12_V2';
MODEL_NAME                     ALGORITHM  MINING_FUNCTION
------------------------------ ---------- ---------------
ALL_MINILM_L12_V2              ONNX       EMBEDDING


Step#4 - Generating Vectors (VECTOR Data Type)

Generate vectors using the VECTOR_EMBEDDING function. In the following example we generate a vector for the text "Quick test". As you can see, the resulting vector is really big considering the size of the text.

set lines 160
set long 1000000
select vector_embedding(all_minilm_l12_v2 using 'Quick test' as data) AS my_vector;
MY_VECTOR
----------------------------------------------------------------------------------------------------------------------------------------------------------------
[-3.86444256E-002,7.27762803E-002,-6.99377712E-003,-7.29618035E-003,8.81512091E-003,-6.36086613E-002,4.39666817E-003,-4.20215651E-002,-1.32307202E-001,-5.837616
05E-003,-1.3236966E-002,-1.62914731E-002,6.54898351E-003,-4.983522E-002,-1.98450536E-002,-4.69920225E-002,1.03937663E-001,-8.96753445E-002,-2.77861813E-003,4.13
947664E-002,-6.51626661E-002,-1.09901905E-001,-8.73053819E-003,2.533352E-002,-1.42030632E-002,-2.42071245E-002,1.91591978E-002,4.93748812E-003,6.30869251E-003,-
1.24127813E-001,-7.17297941E-003,3.7317384E-002,4.97635901E-002,4.52162437E-002,1.49683114E-002,-2.21795831E-002,-3.67936082E-002,-6.20233943E-004,7.16803819E-0
02,5.33913262E-003,1.92087106E-002,-9.91346017E-002,3.90679464E-002,2.22725421E-002,5.04363105E-002,1.81943253E-002,5.34031466E-002,1.44161871E-002,-1.99907795E
-002,-1.20323608E-002,-2.63888389E-002,-4.14667316E-002,6.2473774E-002,-4.68838662E-002,1.16748568E-002,-2.43180972E-002,-3.11982706E-002,-7.5750039E-003,2.2546
6359E-002,-4.17359956E-002,1.23237111E-002,4.31706831E-002,-7.83750787E-002,1.24918511E-002,5.42060807E-002,4.33742851E-002,2.52278382E-003,-1.15482137E-002,-9.
98658361E-004,-2.12613102E-002,1.00960173E-002,3.17986645E-002,-1.13146752E-002,-1.26893371E-002,2.66182758E-002,-7.50683714E-003,-3.70341949E-002,1.94851588E-0
02,-2.9213747E-002,-2.61210538E-002,2.86212545E-002,-9.15900841E-002,1.50552345E-002,-4.98168021E-002,2.29324233E-002,7.82517716E-003,4.22972552E-002,3.37974802
E-002,-4.2345725E-002,-6.32970557E-002,3.84949856E-002,-1.93851739E-002,1.96233811E-003,-3.91593436E-004,7.80334743E-003,5.63595518E-002,4.45814878E-002,-4.9701
6348E-002,1.36384079E-002,2.76547611E-001,6.3580215E-002,-1.69337653E-002,-3.25948671E-002,2.74621621E-002,-1.84809547E-002,-3.58916223E-002,3.18280957E-003,-3.


 
===========
EXAMPLE-2
===========
Step#1 - Download below csv file from hugging face

https://huggingface.co/datasets/ygorgeurts/movie-quotes/resolve/main/movie_quotes.csv?download=true -O movie_quotes.csv

Step#2 - Connect to the database and create a new table from the CSV file of movie quotes.

sqlplus /nolog
conn testuser1/testuser1@//localhost:1521/freepdb1
drop table if exists movie_quotes purge;
create table movie_quotes as
select movie_quote, movie, movie_type, movie_year
from   external (
         (
           movie_quote  varchar2(400),
           movie        varchar2(200),
           movie_type   varchar2(50),
           movie_year   number(4)
         )
         type oracle_loader
         default directory model_dir
         access parameters (
           records delimited by newline
           skip 1
           badfile model_dir
           logfile model_dir:'moview_quotes_ext_tab_%a_%p.log'
           discardfile model_dir
           fields csv with embedded terminated by ',' optionally enclosed by '"'
           missing field values are null
           (
             movie_quote char(400),
             movie,
             movie_type,
             movie_year
           )
        )
        location ('movie_quotes.csv')
        reject limit unlimited
      );
  
  
Table created.
SQL>

SQL> desc movie_quotes
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 MOVIE_QUOTE                                        VARCHAR2(400)
 MOVIE                                              VARCHAR2(200)
 MOVIE_TYPE                                         VARCHAR2(50)
 MOVIE_YEAR                                         NUMBER(4)
SQL>

Step#4 - add a new column to hold the vector data for each movie quote. we are using the new VECTOR data type.

SQL> alter table movie_quotes add (
  movie_quote_vector vector
);  
Table altered.
SQL> desc movie_quotes
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 MOVIE_QUOTE                                        VARCHAR2(400)
 MOVIE                                              VARCHAR2(200)
 MOVIE_TYPE                                         VARCHAR2(50)
 MOVIE_YEAR                                         NUMBER(4)
 MOVIE_QUOTE_VECTOR                                 VECTOR(*, *)
SQL>


Step#5 - populate the new column by generating vectors from the movie quotes.


SQL> update movie_quotes
set    movie_quote_vector = vector_embedding(ALL_MINILM_L12_V2 using movie_quote as data);
732 rows updated.
SQL> commit;
Commit complete.


Step#5 - Vector Search using VECTOR_DISTANCE

Now perform a search using the VECTOR_DISTANCE function. 
This accepts two vectors and as the name suggests, returns a distance between them. 
Since the model we used generates vectors for text data, 
we would expect the vector distance to be smaller for two vectors that are similar. 
In the following examples we create a vector from our search text and order the output 
of the query by the vector distance between our search text and the quote text.
First we ask for "Films with motivational speaking in them".


SQL> variable search_text varchar2(100);
exec :search_text := 'Films with motivational speaking in them';SQL>
PL/SQL procedure successfully completed.
SQL>
set pages 100
set linesize 200
column movie format a50
column movie_quote format a100
SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
       movie,
       movie_quote
FROM   movie_quotes
order by 1
fetch approximate first 5 rows only;


OUT-PUT:
  
DISTANCE MOVIE                                              MOVIE_QUOTE
---------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------
6.786E-001 Once Upon a Time in Hollywood                      That was the best acting i've ever seen in my whole life.
6.979E-001 Dead Poets Society                                 You must strive to find your own voice because the longer you wait to begin, the less likely you are
                                                               going to find it at all.
7.169E-001 The Pursuit of Happyness                           Walk that walk and go forward all the time. Don't just talk that talk, walk it and go forward. Also,
                                                               the walk didn't have to be long strides; baby steps counted too. Go forward.
7.186E-001 Joker                                              My mother always tells me to smile and put on a happy face. She told me I had a purpose to bring lau
                                                              ghter and joy to the world.
7.234E-001 Blazing Saddles                                    Men, you are about to embark on a great crusade to stamp out runaway decency in the west. Now you me
                                                              n will only be risking your lives, whilst I will be risking an almost certain Academy Award nominati
                                                              on for Best Supporting Actor.




Next we ask for "Films about war".
-------------------------------

variable search_text varchar2(100);
exec :search_text := 'Films about war';
set linesize 200
column movie format a50
column movie_quote format a100
SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
       movie,
       movie_quote
FROM   movie_quotes
order by 1
fetch approximate first 5 rows only;

  DISTANCE MOVIE                                              MOVIE_QUOTE
---------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------
5.682E-001 Dr. Strangelove                                    Gentlemen, you can't fight in here! This is the War Room!
6.346E-001 Blazing Saddles                                    Men, you are about to embark on a great crusade to stamp out runaway decency in the west. Now you me
                                                              n will only be risking your lives, whilst I will be risking an almost certain Academy Award nominati
                                                              on for Best Supporting Actor.
6.587E-001 Fury                                               Ideals are peaceful; history is violent.
7.243E-001 The Kill Team                                      You give me your loyalty, and I?ll guarantee that each and every one of you will have a chance to be
                                                               a warrior, to actually be a part of history.
7.253E-001 Dr. No                                             Bond. James Bond



Step#6 - (Optional) Create a Vector Index 

If we want to create a vector index, we must set the VECTOR_MEMORY_SIZE parameter for the root container. The amount of memory we need depends on the size and complexity of the data being indexed.

conn / as sysdba
SQL> show parameter vector_memory_size;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
vector_memory_size                   big integer 0

alter system set vector_memory_size = 1G scope=spfile;
shutdown immediate;
startup;


-- There are two types of vector indexes.
-- Neighbor Partition Vector Indexes
-- In-Memory Neighbor Graph Vector Index

conn testuser1/testuser1@//localhost:1521/freepdb1
drop index if exists movie_quotes_vector_idx;

-- Neighbor Partition Vector Index

create vector index movie_quotes_vector_idx on movie_quotes(movie_quote_vector) organization neighbor partitions
distance cosine
with target accuracy 95;

drop index if exists movie_quotes_vector_idx;

-- In-Memory Neighbor Graph Vector Index

create vector index movie_quotes_vector_idx on movie_quotes(movie_quote_vector) organization inmemory neighbor graph
distance cosine
with target accuracy 95;

Note:  Vectors are big. This has to be taken into account from both a storage perspective, and a memory perspective, especially when creating vector indexes.

-----
-- Search the query again
variable search_text varchar2(100);
exec :search_text := 'Films about war';
set linesize 200
column movie format a50
column movie_quote format a100
SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
       movie,
       movie_quote
FROM   movie_quotes
order by 1
fetch approximate first 5 rows only;

Now see the difference when you have large amount of data.

Follow my blog and share with others.


No comments:

Post a Comment

Translate >>