I’m pretty sure that everybody at least once in his professional coding life had the need to convert a string of character separated values into tokens or rows.
Let’s be honest, we all ended writing something like

Problem is that among all those “bla” it’s quite easy to forget something (is there always a separator at the very end of the string? Any space between values and separators? Did I clean the buffer?) and in such code bugs might find a comfortable home.
Why should we write such risky code to do something that can be achieved differently in a completely reliable, fast and elegant way? And all of this just using one powerful feature that Oracle kindly introduced with version 10g (drum roll): regular expressions!
Yes folks, with regular expressions it’s very easy to extract values from a string and without the need to be worried about all code traps mentioned before.
For our purpose we’ll use the function regexp_substr

You can find a complete description of this function in the official Oracle documentation so I will only explain how to get to our goal.
This function returns from string and starting from position the occurrence of substring that matches the pattern according to standard of regular expressions rules (you need at least a basic understanding of those standards to take maximum advantage of this feature). Last parameter match_options is not mandatory and not important for this demo; it’s used to alter the behavior by enabling, for instance, case-sensitive, multi-row searches and so on.
Let’s go back to our problem and write the query

Translated into “human language” it means “read the string ‘Val1,Val2,Val3, split it using ‘,’ as a separator, start from position 1 and return me the occurrence number 1 of the resulting set of substrings“.
In fact when executing it we get

as a result. The pattern ‘[^,]+’ specifies that we want the string to be split at every comma. But we don’t want only the first, we want all the tokens, right? To do that we can use a neat trick by introducing a “connect by” inside our query and replacing the occurrence parameter with the level pseudo-column

Once executed this will return

That’s exactly what we wanted! But wait, in our query we had to explicitly specify the number of values 3 writing that “connect by level <= 3” but normally we don’t know in advance how many values are in the string.
Don’t worry, here comes a nice solution for that as well. There is another regex function regexp_count that returns the number of values we are looking for.
Cool, so let’s rewrite our query

and if we run it we get all our values

Obviously we can use this in Pl/Sql code, a generic example could be

that has the following output

Last thing we can do in case there might be leading/trailing spaces between values is to add a simple TRIM function.
This way the code

will extract all 3 values without any space. We can actually achieve the same by using only regular expressions simply adding a space to the pattern in the list of separators (no TRIM is needed in this case anymore)

Regexp functions are a really powerful tool and can also be used in check constraints to ensure that data adheres to specific formats. Imagine how easy it would be to validate an email column by using regular expressions, no need to write any custom (and error-prone) function!
But this is another story…

Share this post on Share on Facebook
Facebook
Tweet about this on Twitter
Twitter
Share on LinkedIn
Linkedin
Email this to someone
email

2 thoughts on “Oracle regexp: a smart and reliable way to convert character separated values into rows

  1. Avatar
    Rajnish kumar

    Hey Rob,
    The content was very helpful though I cannot use them in my field, and the examples are really great with simple language. Must admit it was happy reading.

Leave a Reply

Your email address will not be published. Required fields are marked *