TSQL function to replace HTML tags with delimiter

Introduction

Its often needed to remove all html tags for a column(s) from a database table. One way to do is regular expression pattern matching or using a custom function that will hunt down html tags and remove them. In this small post I will demonstrate this second method.

T-Sql Function

Here is a very simple function that replace/remove html tag’s and returns clean texts without using any patterns. Additionally you can pass a delimiter that will replace the html tag’s in-case you do not want to mix everything in one big chunk of text.

CREATE FUNCTION [dbo].[CleanHTMLTags] (@HTMLText VARCHAR(MAX),@ReplaceChar char(1))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT

SET @Start = CHARINDEX(‘<‘,@HTMLText) SET @End = CHARINDEX(‘>’,@HTMLText,CHARINDEX(‘<‘,@HTMLText)) SET @Length = (@End – @Start) + 1 WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
IF (UPPER(SUBSTRING(@HTMLText, @Start, 4)) <> ‘
‘) AND (UPPER(SUBSTRING(@HTMLText, @Start, 5)) <> ”)
begin
SET @HTMLText = RTRIM(LTRIM(STUFF(@HTMLText,@Start,@Length,@ReplaceChar)));
end
ELSE
SET @Length = 0;

SET @Start = CHARINDEX(‘<‘,@HTMLText, @End-@Length) SET @End = CHARINDEX(‘>’,@HTMLText,CHARINDEX(‘<‘,@HTMLText, @Start))

SET @Length = (@End – @Start) + 1
END

RETURN isnull(RTRIM(LTRIM(@HTMLText)) ,”)
END

And thats it. Say you have a table named “PRODUCTS” and that contains a column “ProductDetails” that contains html tags, in our scenario we have to remove all html tags and replace them by comma(,). Utilizing above function we can make the call like this

SELECT [dbo].[CleanHTMLTags](ProductDetails,’,’) FROM PRODUCTS

Resultant data should be a big text respect to each records in “PRODUCTS” table with html tags removed.

Leave a comment