• Pavol Drotován
  • DB expert
  • 10-05-2024

PL/SQL BULK COLLECT - Oracle

In PL/SQL, the BULK COLLECT function is used to increase the performance of SQL queries by fetching multiple rows at once and storing them in collections (arrays or nested tables) instead of processing one row at a time. This reduces the number of context switches between SQL and PL/SQL, resulting in greater efficiency. Bulk Collect is often used in conjunction with FORALL statements, allowing you to efficiently perform bulk DML operations. Basic command syntax BULK COLLECT:

DECLARE
   TYPE collection_type IS TABLE OF table_name%ROWTYPE;
   collection_name collection_type;
BEGIN
   SELECT column1, column2, ...
   BULK COLLECT INTO collection_name
   FROM table_name
   WHERE condition;

   -- Process the data in the collection
END;
                        

The syntax is PL/SQL collection_type (either array or nested table), which should be compatible with the structure of the selected columns.Improved performance Limited content switching: By fetching multiple rows at once, BULK COLLECT minimizes the number of context switches between SQL and PL/SQL, improving performance. Reduced SQL statement overhead: Executing a single BULK COLLECT statement is often more efficient than executing multiple individual queries. Usage Considerations Memory usage: since BULK COLLECT loads multiple rows into the collection, you should potentially increase memory usage, especially when working with large result sets. Selectivity: It is most advantageous when the KDE clause is sufficiently selected, i.e. there are more parameters in the condition to reduce the number of rows loaded. An example for BULK COLLECT looks like this:


   DECLARE
   TYPE employee_collection IS TABLE OF employees%ROWTYPE;
   emp_data employee_collection;
BEGIN
   SELECT * 
   BULK COLLECT INTO emp_data
   FROM employees
   WHERE department_id = 10;

   -- Process the data in the emp_data collection
   FOR i IN 1..emp_data.COUNT LOOP
      DBMS_OUTPUT.PUT_LINE('Employee ID: ' || emp_data(i).employee_id || ', Name: ' || emp_data(i).first_name);
   END LOOP;
END;
  

In this example, multiple rows from the employee tables are loaded into the emp_data collection, and the data is then processed in a loop. That is, first all the data are connected in the collection and the following ones are called through the cycle. Another option, but more complicated than the torch, is to use the cursor to read the data and then BULK COLLECT:


DECLARE
--definition of cursor
  TYPE empcurtyp IS REF CURSOR;
  --table all names
  TYPE namelist IS TABLE OF employees.last_name%TYPE;
  ---table all salaries
  TYPE sallist IS TABLE OF employees.salary%TYPE;
  emp_cv  empcurtyp;
  names   namelist;
  sals    sallist;
BEGIN
  --opening of cursor
  OPEN emp_cv FOR
    SELECT last_name, salary FROM employees
    WHERE job_id = 'SA_REP'
    ORDER BY salary DESC;
--Fetch collection
  FETCH emp_cv 
  BULK COLLECT INTO names, sals;
--closing cursor
  CLOSE emp_cv;
  -- loop through the names and sals collections
  FOR i IN names.FIRST .. names.LAST
  LOOP
    DBMS_OUTPUT.PUT_LINE
      ('Name = ' || names(i) || ', salary = ' || sals(i));
  END LOOP;
END;                    

A very important tool for working with BULK collect is to make changes in the tables based on the results and use it as e.g. merging only with much higher performance. An example looks like this:


DECLARE
P_MESSAGE_TEXT VARCHAR2(1500);

    TYPE recData IS RECORD (
      row_id                  ROWID,
      employee_id             hr.employees.employee_id%TYPE,
      first_name              hr.employees.first_name%TYPE,
      last_name               hr.employees.last_name%TYPE,
      email                   hr.employees.email%TYPE,
      phone_number            hr.employees.phone_number%TYPE,
      salary                  hr.employees.salary%TYPE,
      hire_date               hr.employees.hire_date%TYPE,
      job_id                  hr.employees.job_id%TYPE,
      manager_id              hr.employees.manager_id%TYPE,
      department_id           hr.employees.department_id%TYPE
      );
  
    TYPE tabDataType    IS TABLE OF recData INDEX BY PLS_INTEGER;
    TYPE tabPointerType IS TABLE OF PLS_INTEGER;
    
    TYPE curDataType    IS REF CURSOR;
    curData             curDataType;
    tabData             tabDataType;
    tabInsPointer       tabPointerType    := tabPointerType();
    tabUpdPointer       tabPointerType    := tabPointerType();
    vSqlStatm           varchar(3000);
    nRecProcessed       number;
      
  BEGIN
    
    vSqlStatm := 'SELECT e.ROWID                row_id,
                         a.employee_id,
                         a.first_name            first_name,
                         a.last_name          last_name,
                         a.email             email,
                         a.phone_number        phone_number,
                         a.salary    salary,
                         a.hire_date hire_date,
                         d.job_id job_id,
                         a.manager_id           manager_id,
                         b.department_id        department_id
                    FROM hr.emptst a,
                         hr.departments b,
                         hr.jobs d,
                         hr.employees e
                      WHERE b.department_id = a.department_id
                      and a.job_id=d.job_id
                      and a.employee_id=e.employee_id
                    union all 
                    SELECT NULL                row_id,
                         a.employee_id,
                         a.first_name            first_name,
                         a.last_name          last_name,
                         a.email             email,
                         a.phone_number        phone_number,
                         a.salary    salary,
                         a.hire_date hire_date,
                         d.job_id job_id,
                         a.manager_id           manager_id,
                         b.department_id        department_id
                    FROM hr.emp_temp a,
                         hr.departments b,
                         hr.jobs d
                      WHERE b.department_id = a.department_id
                      and a.job_id=d.job_id
                      and NOT EXISTS (select 1 from employees where employee_id=a.employee_id)';
    
    
    OPEN curData FOR vSqlStatm;
    
    LOOP
      tabData.DELETE;
      tabInsPointer.DELETE;
      tabUpdPointer.DELETE;
      
      FETCH curData
        BULK COLLECT INTO tabData;
      
      FOR i IN tabData.FIRST .. tabData.LAST LOOP
        CASE
          WHEN tabData(i).row_id IS NULL
            THEN
              
              tabInsPointer.EXTEND;
              tabInsPointer(tabInsPointer.LAST) := i;
          ELSE
            
            tabUpdPointer.EXTEND;
            tabUpdPointer(tabUpdPointer.LAST) := i;
        END CASE;
      END LOOP;
      
      FORALL i IN VALUES OF tabInsPointer
        INSERT INTO hr.employees (
                         employee_id,
                         first_name,
                         last_name,
                         email,
                         phone_number,
                         salary,
                         hire_date,
                         job_id,
                         manager_id,
                         department_id

            )
          VALUES (
            tabData(i).employee_id,
            tabData(i).first_name,
            tabData(i).last_name,
            tabData(i).email,
            tabData(i).phone_number,
            tabData(i).salary,
            tabData(i).hire_date,
            tabData(i).job_id,
            tabData(i).manager_id,
            tabData(i).department_id
            );
        
      nRecProcessed := nRecProcessed + SQL%ROWCOUNT;
      
      FORALL i IN VALUES OF tabUpdPointer
        UPDATE hr.employees SET
            employee_id          = tabData(i).employee_id,
            first_name       = tabData(i).first_name,
            last_name          = tabData(i).last_name,
            email     = tabData(i).email,
            phone_number = tabData(i).phone_number,
            salary     = tabData(i).salary,
            hire_date       =tabData(i).hire_date,
            --job_id     = tabData(i).job_id,
            manager_id     = tabData(i).manager_id
            --department_id     = tabData(i).department_id
          WHERE ROWID = tabData(i).row_id;
          
      nRecProcessed := nRecProcessed + SQL%ROWCOUNT;
      
      EXIT WHEN curData%NOTFOUND;
    END LOOP;
    
    CLOSE curData;
    exception when others
    then DBMS_OUTPUT.PUT_LINE(P_MESSAGE_TEXT);
    END;
                    

In the example, the records are added only if there is a match between the new records listed in the emp_temp table, and only those in the emp_test table are modified, i.e. it only makes a slice of the data that they process in bulk. This is how the logic of processing a large amount of data via DWH is built, where BULK fits.

BULK COLLECT is a powerful function in PL/SQL that can significantly increase the performance of data retrieval operations, especially when working with large data sets. However, it should be used judiciously, taking into account factors such as memory consumption and selectivity. The ideal memory database.