Introduction
Its often needed to remove all html tags for a column(s) from a database table. One way to do is regular expression pattern matching or using a custom function that will hunt down html tags and remove them. In this small post I will demonstrate this second method.
T-Sql Function
Here is a very simple function that replace/remove html tag’s and returns clean texts without using any patterns. Additionally you can pass a delimiter that will replace the html tag’s in-case you do not want to mix everything in one big chunk of text.
CREATE FUNCTION [dbo].[CleanHTMLTags] (@HTMLText VARCHAR(MAX),@ReplaceChar char(1))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX(‘<‘,@HTMLText) SET @End = CHARINDEX(‘>’,@HTMLText,CHARINDEX(‘<‘,@HTMLText)) SET @Length = (@End – @Start) + 1 WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
IF (UPPER(SUBSTRING(@HTMLText, @Start, 4)) <> ‘
‘) AND (UPPER(SUBSTRING(@HTMLText, @Start, 5)) <> ”)
begin
SET @HTMLText = RTRIM(LTRIM(STUFF(@HTMLText,@Start,@Length,@ReplaceChar)));
end
ELSE
SET @Length = 0;
SET @Start = CHARINDEX(‘<‘,@HTMLText, @End-@Length) SET @End = CHARINDEX(‘>’,@HTMLText,CHARINDEX(‘<‘,@HTMLText, @Start))
SET @Length = (@End – @Start) + 1
END
RETURN isnull(RTRIM(LTRIM(@HTMLText)) ,”)
END
And thats it. Say you have a table named “PRODUCTS” and that contains a column “ProductDetails” that contains html tags, in our scenario we have to remove all html tags and replace them by comma(,). Utilizing above function we can make the call like this
SELECT [dbo].[CleanHTMLTags](ProductDetails,’,’) FROM PRODUCTS
Resultant data should be a big text respect to each records in “PRODUCTS” table with html tags removed.