-- Setup Oracle 23ai
Click Here : Oracle 23ai Database setup
Small Introduction about vector Database and Vector Search:
- Vector Database
A vector database is a specialized type of database designed to store, index, and query data represented as high-dimensional vectors, also known as vector embeddings. These embeddings are numerical representations of various data types, such as text, images, audio, or other complex objects, generated by machine learning models like large language models (LLMs)
- Vector Search:
Vector search is a method of information retrieval that leverages these vector embeddings to find data points that are semantically similar to a given query. Instead of relying on exact keyword matches, vector search calculates the "distance" or "similarity" between the query vector and the vectors stored in the database. This allows for more intelligent and context-aware searches, enabling use cases like:
2) Image and Audio Recognition
3) Retrieval-Augmented Generation (RAG)
4) Recommendation Systems
Vector databases and vector search are crucial components in modern AI applications, facilitating efficient and effective handling of unstructured and semi-structured data for various machine learning tasks.
Most Important part in AI is Retrieval-augmented generation (RAG). RAG is a technique that enhances the performance of large language models (LLMs) by combining them with an external knowledge base. Instead of relying solely on their pre-trained knowledge, RAG systems retrieve relevant information from external sources and incorporate it into the generation process, resulting in more accurate, informed, and up-to-date responses.
Note: Create your own file system and directories to test the scenarios
Let us start the POC on loading LLM, Creating vector and generate test case on vector search.
$ mkdir -p /u04/23ai_activity/models
$ cd /u04/23ai_activity/models
$ wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip
unzip -oq all_MiniLM_L12_v2_augmented.zip
Step#02 - Connect to the database, create a test user, create a directory object pointing to the physical directory, and grant access to it for our test user.
SQL> alter session set container=FREEPDB1;
Session altered.
create user if not exists testuser1 identified by testuser1 quota unlimited on users;
grant create session, db_developer_role, create mining model to testuser1;
create or replace directory model_dir as '/u04/23ai_activity/models';
grant read, write on directory model_dir to testuser1;
Step#3 - Now load the model into the database using the DBMS_VECTOR package.
sqlplus /nolog
conn testuser1/testuser1@//localhost:1521/freepdb1
SQL>
begin
dbms_vector.drop_onnx_model (
model_name => 'ALL_MINILM_L12_V2',
force => true);
dbms_vector.load_onnx_model (
directory => 'model_dir',
file_name => 'all_MiniLM_L12_v2.onnx',
model_name => 'ALL_MINILM_L12_V2');
end;
/
PL/SQL procedure successfully completed.
- Now see the model information in the USER_MINING_MODELS view.
column model_name format a30
column algorithm format a10
column mining_function format a15
select model_name, algorithm, mining_function
from user_mining_models
where model_name = 'ALL_MINILM_L12_V2';
MODEL_NAME ALGORITHM MINING_FUNCTION
------------------------------ ---------- ---------------
ALL_MINILM_L12_V2 ONNX EMBEDDING
Step#4 - Generating Vectors (VECTOR Data Type)
Generate vectors using the VECTOR_EMBEDDING function. In the following example we generate a vector for the text "Quick test". As you can see, the resulting vector is really big considering the size of the text.
set lines 160
set long 1000000
select vector_embedding(all_minilm_l12_v2 using 'Quick test' as data) AS my_vector;
MY_VECTOR
----------------------------------------------------------------------------------------------------------------------------------------------------------------
[-3.86444256E-002,7.27762803E-002,-6.99377712E-003,-7.29618035E-003,8.81512091E-003,-6.36086613E-002,4.39666817E-003,-4.20215651E-002,-1.32307202E-001,-5.837616
05E-003,-1.3236966E-002,-1.62914731E-002,6.54898351E-003,-4.983522E-002,-1.98450536E-002,-4.69920225E-002,1.03937663E-001,-8.96753445E-002,-2.77861813E-003,4.13
947664E-002,-6.51626661E-002,-1.09901905E-001,-8.73053819E-003,2.533352E-002,-1.42030632E-002,-2.42071245E-002,1.91591978E-002,4.93748812E-003,6.30869251E-003,-
1.24127813E-001,-7.17297941E-003,3.7317384E-002,4.97635901E-002,4.52162437E-002,1.49683114E-002,-2.21795831E-002,-3.67936082E-002,-6.20233943E-004,7.16803819E-0
02,5.33913262E-003,1.92087106E-002,-9.91346017E-002,3.90679464E-002,2.22725421E-002,5.04363105E-002,1.81943253E-002,5.34031466E-002,1.44161871E-002,-1.99907795E
-002,-1.20323608E-002,-2.63888389E-002,-4.14667316E-002,6.2473774E-002,-4.68838662E-002,1.16748568E-002,-2.43180972E-002,-3.11982706E-002,-7.5750039E-003,2.2546
6359E-002,-4.17359956E-002,1.23237111E-002,4.31706831E-002,-7.83750787E-002,1.24918511E-002,5.42060807E-002,4.33742851E-002,2.52278382E-003,-1.15482137E-002,-9.
98658361E-004,-2.12613102E-002,1.00960173E-002,3.17986645E-002,-1.13146752E-002,-1.26893371E-002,2.66182758E-002,-7.50683714E-003,-3.70341949E-002,1.94851588E-0
02,-2.9213747E-002,-2.61210538E-002,2.86212545E-002,-9.15900841E-002,1.50552345E-002,-4.98168021E-002,2.29324233E-002,7.82517716E-003,4.22972552E-002,3.37974802
E-002,-4.2345725E-002,-6.32970557E-002,3.84949856E-002,-1.93851739E-002,1.96233811E-003,-3.91593436E-004,7.80334743E-003,5.63595518E-002,4.45814878E-002,-4.9701
6348E-002,1.36384079E-002,2.76547611E-001,6.3580215E-002,-1.69337653E-002,-3.25948671E-002,2.74621621E-002,-1.84809547E-002,-3.58916223E-002,3.18280957E-003,-3.
===========
EXAMPLE-2
===========
Step#1 - Download below csv file from hugging face
https://huggingface.co/datasets/ygorgeurts/movie-quotes/resolve/main/movie_quotes.csv?download=true -O movie_quotes.csv
Step#2 - Connect to the database and create a new table from the CSV file of movie quotes.
sqlplus /nolog
conn testuser1/testuser1@//localhost:1521/freepdb1
drop table if exists movie_quotes purge;
create table movie_quotes as
select movie_quote, movie, movie_type, movie_year
from external (
(
movie_quote varchar2(400),
movie varchar2(200),
movie_type varchar2(50),
movie_year number(4)
)
type oracle_loader
default directory model_dir
access parameters (
records delimited by newline
skip 1
badfile model_dir
logfile model_dir:'moview_quotes_ext_tab_%a_%p.log'
discardfile model_dir
fields csv with embedded terminated by ',' optionally enclosed by '"'
missing field values are null
(
movie_quote char(400),
movie,
movie_type,
movie_year
)
)
location ('movie_quotes.csv')
reject limit unlimited
);
Table created.
SQL>
SQL> desc movie_quotes
Name Null? Type
----------------------------------------- -------- ----------------------------
MOVIE_QUOTE VARCHAR2(400)
MOVIE VARCHAR2(200)
MOVIE_TYPE VARCHAR2(50)
MOVIE_YEAR NUMBER(4)
SQL>
Step#4 - add a new column to hold the vector data for each movie quote. we are using the new VECTOR data type.
SQL> alter table movie_quotes add (
movie_quote_vector vector
);
Table altered.
SQL> desc movie_quotes
Name Null? Type
----------------------------------------- -------- ----------------------------
MOVIE_QUOTE VARCHAR2(400)
MOVIE VARCHAR2(200)
MOVIE_TYPE VARCHAR2(50)
MOVIE_YEAR NUMBER(4)
MOVIE_QUOTE_VECTOR VECTOR(*, *)
SQL>
Step#5 - populate the new column by generating vectors from the movie quotes.
SQL> update movie_quotes
set movie_quote_vector = vector_embedding(ALL_MINILM_L12_V2 using movie_quote as data);
732 rows updated.
SQL> commit;
Commit complete.
Step#5 - Vector Search using VECTOR_DISTANCE
Now perform a search using the VECTOR_DISTANCE function.
This accepts two vectors and as the name suggests, returns a distance between them.
Since the model we used generates vectors for text data,
we would expect the vector distance to be smaller for two vectors that are similar.
In the following examples we create a vector from our search text and order the output
of the query by the vector distance between our search text and the quote text.
First we ask for "Films with motivational speaking in them".
SQL> variable search_text varchar2(100);
exec :search_text := 'Films with motivational speaking in them';SQL>
PL/SQL procedure successfully completed.
SQL>
set pages 100
set linesize 200
column movie format a50
column movie_quote format a100
SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
movie,
movie_quote
FROM movie_quotes
order by 1
fetch approximate first 5 rows only;
OUT-PUT:
---------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------
6.786E-001 Once Upon a Time in Hollywood That was the best acting i've ever seen in my whole life.
6.979E-001 Dead Poets Society You must strive to find your own voice because the longer you wait to begin, the less likely you are
going to find it at all.
7.169E-001 The Pursuit of Happyness Walk that walk and go forward all the time. Don't just talk that talk, walk it and go forward. Also,
the walk didn't have to be long strides; baby steps counted too. Go forward.
7.186E-001 Joker My mother always tells me to smile and put on a happy face. She told me I had a purpose to bring lau
ghter and joy to the world.
7.234E-001 Blazing Saddles Men, you are about to embark on a great crusade to stamp out runaway decency in the west. Now you me
n will only be risking your lives, whilst I will be risking an almost certain Academy Award nominati
on for Best Supporting Actor.
Next we ask for "Films about war".
-------------------------------
variable search_text varchar2(100);
exec :search_text := 'Films about war';
set linesize 200
column movie format a50
column movie_quote format a100
SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
movie,
movie_quote
FROM movie_quotes
order by 1
fetch approximate first 5 rows only;
DISTANCE MOVIE MOVIE_QUOTE
---------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------
5.682E-001 Dr. Strangelove Gentlemen, you can't fight in here! This is the War Room!
6.346E-001 Blazing Saddles Men, you are about to embark on a great crusade to stamp out runaway decency in the west. Now you me
n will only be risking your lives, whilst I will be risking an almost certain Academy Award nominati
on for Best Supporting Actor.
6.587E-001 Fury Ideals are peaceful; history is violent.
7.243E-001 The Kill Team You give me your loyalty, and I?ll guarantee that each and every one of you will have a chance to be
a warrior, to actually be a part of history.
7.253E-001 Dr. No Bond. James Bond
Step#6 - (Optional) Create a Vector Index
If we want to create a vector index, we must set the VECTOR_MEMORY_SIZE parameter for the root container. The amount of memory we need depends on the size and complexity of the data being indexed.
conn / as sysdba
SQL> show parameter vector_memory_size;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
vector_memory_size big integer 0
alter system set vector_memory_size = 1G scope=spfile;
shutdown immediate;
startup;
-- There are two types of vector indexes.
-- Neighbor Partition Vector Indexes
-- In-Memory Neighbor Graph Vector Index
conn testuser1/testuser1@//localhost:1521/freepdb1
drop index if exists movie_quotes_vector_idx;
-- Neighbor Partition Vector Index
create vector index movie_quotes_vector_idx on movie_quotes(movie_quote_vector) organization neighbor partitions
distance cosine
with target accuracy 95;
drop index if exists movie_quotes_vector_idx;
-- In-Memory Neighbor Graph Vector Index
create vector index movie_quotes_vector_idx on movie_quotes(movie_quote_vector) organization inmemory neighbor graph
distance cosine
with target accuracy 95;
Note: Vectors are big. This has to be taken into account from both a storage perspective, and a memory perspective, especially when creating vector indexes.
-----
-- Search the query again
variable search_text varchar2(100);
exec :search_text := 'Films about war';
set linesize 200
column movie format a50
column movie_quote format a100
SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
movie,
movie_quote
FROM movie_quotes
order by 1
fetch approximate first 5 rows only;
No comments:
Post a Comment