Parsing and Joining on a Long Text Field Using SQL

Here’s a little SQL gymnastics used to parse and query against a predictably-formatted long text field.

The challenge: We have a single text field that contains a list person_id-to-area mappings, and we need the query the database for a list of those people with their names.

For example, this it the query used to fetch the mapping field:

select value  -- a "text" field (e.g. not a varchar) 
from setting		
where setting_name = 'InstructorAreaMap'

And that query returns “42|Accounting\r\n51|Marketing\r\n69|Strategy“. But what we want is a set of records that contains not only the Person ID and Area, but also the name of the person from the Person table, where Person ID is the key.

We can do this in a single query like so:

  1. Convert the text field into a varchar that we can use for replace operations.
  2. Replace the delimiters with xml tags.
  3. Convert the result into an XML data type.
  4. Select against that XML.
  5. Join in the Person table and select the result.

Step 2 builds an XML string that looks like “<r><i>42</i><a>Marketing</a></r><r><i>51</i><a>Accounting</a></r><r><i>69</i><a>Strategy</a></r>“, and Step 4 essentially converts that into a joinable table that looks like this:

42	Marketing
51	Accounting
70	Strategy

After this it’s easy to get the result we need:

Accounting     Fred Flintstone  51
Marketing      George Jetson    42
Communication  Barney Fife      70

Here’s the SQL query:

select assignments.area, p.display_name name, p.person_id
from (

	-- STEP 4: 
	-- Create a table of person_id (i), area (a) pairs from an xml string 
	-- in the form "<r><i>42</i><a>Area</a></r><r><i>..." called "assignments" 
	select c.value('(i)[1]', 'int') as person_id,
		c.value('(a)[1]', 'varchar(50)') as area
	from (

		-- STEP 3:
		-- select an XML type called "data" in the above form using 
		-- information in the "value" field of the "InstructorAreaMap" setting
		select cast (

			-- STEP 2: 
			replace( '<r><i>' + 
				replace( 

					-- STEP 1: 
					-- Convert the text field into something we can use 
					SUBSTRING( convert( varchar(max), value ), 1, DATALENGTH(value) ), -- search
					char(13)+char(10), -- find
					'</a></r><r><i>' ) -- replace with
				+ '</a></r>', -- outer serach
				'|', -- outer find
				'</i><a>' ) -- outer replace with
			as xml ) data

		from setting
		where setting_name = 'InstructorAreaMap'
	) as t

	cross apply data.nodes('r') as t2(c)
) as assignments

-- STEP 5:
-- Join against the Person table to get the name. 
inner join person p
	on p.person_id = assignments.person_id
order by area, name

Leave a Reply

Your email address will not be published. Required fields are marked *