Performing Bulk Insert Using pgx and CopyFrom

Published in

Dev Genius

4 min readNov 16, 2022

Photo by Caspar Camille Rubin on Unsplash

PostgreSQL and Go, what a combination. One of the ways to get the two to work together is using the pgx library. pgx is a pure Go database driver and toolkit for PostgreSQL.

The pgx driver is a low-level, high performance interface that exposes PostgreSQL-specific features such as LISTEN / NOTIFY and COPY. It also includes an adapter for the standard database/sql interface.

Introduction

In the course of developing an application, you will definitely end up at a point where you need to perform bulk entries in the database, be it during the on-boarding process or at a point where there’s a many to many relationship between entities.

In this tutorial, the example considered will be that of students who are picking up specific learning areas/subjects to learn during a specific semester.

Scenario Breakdown

For the purposes of demonstration, there are three tables, one containing the students data and the second one containing the subjects available with the third one for the storage of the learning areas that the student has selected. The seed data in the students and the subjects tables is shown below.

The structure of the subject selection table is likewise a simplified one and is shown below. It contains the primary keys from the two main entities and the academic year and semester in which the subjects have been selected.

With the matters data handled and out of the way, let’s code away.

The CopyFrom Functionality

The CopyFrom function is available to a pgx connection and it utilizes PostgreSQL’s native copy functionality.

CopyFrom uses the PostgreSQL copy protocol to perform bulk data insertion. It returns the number of rows copied and an error.

PostgreSQL’s native copy functionality allows data to be copied from a table to an external file or a command and the reverse as well, enabling data to be copied from a command or an external file into an existing table in the database.

The CopyFrom function signature is as follows

func (c *Conn) CopyFrom (ctx context.Context, tableName Identifier, columnNames []string, rowSrc CopyFromSource) (int, error)

As seen above, it accepts four parameters, the first of which is the context. What follows is the name of the table into which bulk insertion is being performed. The name has to be of the Identifier type and therefore some conversion or casting has to occur. The third parameter is a slice of the names of the table columns in the order which the input data has been organized. The fourth and final parameter is the input data itself which will be converted into the CopyFromSource type.

Putting CopyFromSource To Use

The end point is receiving data a JSON payload from a client which it unpacks into the studentSubjects struct. One particular variable that is of note is the linkedSubejcts variable, which is an slice of slices [][]interface{}.

The inner slice is of type interface{} which means it can accept a value of any type. [][]interface{} is the type expected by the CopyFromRows() function, which returns an interface that is usable by the CopyFrom() function. The CopyFromRows() signature is as follows:

func CopyFromRows(rows [][]interface{}) CopyFromSource

CopyFromRows returns a CopyFromSource interface over the provided rows slice making it usable by *Conn.CopyFrom.

From line 24 through 27, the code iterates over the ids of subjects selected by a specific student and combines an instance of a subject id with the student’s id, the academic year and the semester into a single slice of type interface{}. The slice is then appended to the linkedSubjects variable.

The usage of CopyFrom can be seen from line 29 through 34, with the parameters used as described in the sub-section above. context.Background() is used as the context for the function. The table name is converted to the identifier type. The table column names is just a plain old slice of the string type.

CopyFrom is relatively easy to use and it returns two parameters, the number of rows copied and an error.

The JSON payload submitted to the server is shown below.

Using postman, we can simulate the process and once the endpoint is called, the student_subject table has 4 new rows:

One Step Further

While converting the data into [][]interface{} and using CopyFromRows() to wrap it in a CopyFromSource interface works well, we can take it a step further to utilize CopyFromSlice() to replace the logic in the loop.

CopyFromSlice returns a CopyFromSource interface over a dynamic func making it usable by *Conn.CopyFrom.

func CopyFromSlice(length int, next func (int) ([]interface{}, error)) CopyFromSource

The length parameter as shown above is the number of items/rows which we are performing bulk insert on. The second parameter of the function is an iterator function which returns an interface as shown in line 30 of the gist shown above.

This will produce the same result as the first piece of code.

Conclusion

Just like that, you are up and running with bulk insert operations in PostgreSQL and Go using pgx. Since CopyFrom relies on the native Copy protocol, it can be pretty powerful.

CopyFrom can be faster than an insert with as few as 5 rows.

To learn more about pgx’s CopyFrom functionality and all the things that it can offer, please read through pgx’s documentation here.

You can likewise directly access the documentation related to CopyFrom, CopyFromRows and CopyFromSlice.