Welcome to dbFreaks.com!
FAQFAQ    SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

Remove characters in an ASCII decimal range: How To?

 
   Database Help (Home) -> Programming RSS
Next:  Problems with mysql_connect and/or mysql_select_d..  
Author Message
bill

External


Since: Oct 17, 2008
Posts: 17



(Msg. 1) Posted: Fri May 14, 2010 9:20 pm
Post subject: Remove characters in an ASCII decimal range: How To?
Archived from groups: microsoft>public>sqlserver>programming (more info?)

I have a table that has some prettty nasty strings in it. I would
like to remove all characters outside the range of 33 - 127 decimal
from the string

Here is what I mean:

CREATE TABLE test_row
(
statement_nr INT NOT NULL
,statement_tx VARCHAR(200) NOT NULL
)
ALTER TABLE test_row ADD PRIMARY KEY (statement_nr)

INSERT INTO test_row SELECT 1, 'This string has an embedded
'+CHAR(0)+'null character';
INSERT INTO test_row SELECT 2, 'This string has an embedded
'+CHAR(7)+'bell character';
INSERT INTO test_row SELECT 3, 'This string has an embedded
'+CHAR(128)+'ASCII decimal 128 character';

If you query the table (SELECT statement_tx FROM test_row) you'll get
these results:
This string has an embedded
This string has an embedded bell character
This string has an embedded €ASCII decimal 128 character

Notice the first statement is terminated, because SQL stops when it
hits the null. The other statements (don't know if it comes through
on the newsgroup), show the actual ASCII character immediately after
the space that follows the word 'embedded'

I would like some SQL that strips out all the charcters other than 32
-127. My result set would thus look like:
This string has an embedded null character
This string has an embedded bell character
This string has an embedded ASCII decimal 128 character

I do not want to write a CLR function, and I would prefer to avoid
even a tsql scalar function (I don't much like functions in general).
I am hoping some set-oriented SQL would allow this.

I searched the archives and found a solution (http://
groups.google.com/group/microsoft.public.sqlserver.programming/
browse_thread/thread/f9786bbe4a3ca1e3/e785868b90b50ed1?lnk=gst&q=remove
+extended+ascii#e785868b90b50ed1) from Plamen that used XML path, but
I couldn't get it to work, and don't actually understand XML Path.
Perhaps someone could explain it to me, and tell me how to get rid of
the characters that XML path seems to add without resorting to
REPLACE?

Thanks,

Bill

 >> Stay informed about: Remove characters in an ASCII decimal range: How To? 
Back to top
Login to vote
Erland Sommarskog2

External


Since: May 30, 2004
Posts: 2061



(Msg. 2) Posted: Sat May 15, 2010 6:25 am
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

bill ( ) writes:
> I have a table that has some prettty nasty strings in it. I would
> like to remove all characters outside the range of 33 - 127 decimal
> from the string

You don't want to that. You may want to remove non-printable characters,
but believe me, there are plenty of printable characters beyond 127.
For instance, you cannot properly send goods to my address without such a
character. And I can tell you that I get pretty p*ssed when I enter my
address on a web form, and I am told that it contains an illegal character.

I don't know what's in your data, but I think you pretty naïve if you
believe all those characters are junk you want to get rid of. For instance:

This string has an embedded €ASCII decimal 128 character

What if you have a string that goes:

"The price for the service is €200,000."

You stripping venture will delete important information from the string.

> Notice the first statement is terminated, because SQL stops when it
> hits the null.

Not really, but the tools typically takes the NUL as a string terminator.

> I do not want to write a CLR function,

Why not? The CLR is the best tool for the purpose.

> I am hoping some set-oriented SQL would allow this.

Well, you can define a table that includes all characters you want
to remove. Then you can run a cursor over the table and run

replace(targettbl.col, badchars.ch, '')

on it. Not the most efficient, but it could be OK for a one-off.

....and it will not always work. If you have a Windows collation (and
you should), replace on char(0) does not work, because char(0) is
not defined in Unicode. But if you force an SQL collation, replace
will operate on char(0) as well.

> I searched the archives and found a solution (http://
> groups.google.com/group/microsoft.public.sqlserver.programming/
> browse_thread/thread/f9786bbe4a3ca1e3/e785868b90b50ed1?lnk=gst&q=remove
> +extended+ascii#e785868b90b50ed1) from Plamen that used XML path, but
> I couldn't get it to work, and don't actually understand XML Path.
> Perhaps someone could explain it to me, and tell me how to get rid of
> the characters that XML path seems to add without resorting to
> REPLACE?

Plamen's update statement first split the strings into tabular format,
with one row per character, then the strings are reconcatenated with XML
PATH. I think I see two problems with query:

1) There is no ORDER BY, so strings could be reassembled in the wrong
order.
2) The way the query is written, characters like & will be entisised into
& etc. This can be address by using FOR XML PATH(''), TYPE and
then convert the resulting XML to nvarchar.

On the other hand, the way his query is written, there is no risk for
issues like REPLACE.



--
Erland Sommarskog, SQL Server MVP, esquel.RemoveThis@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

 >> Stay informed about: Remove characters in an ASCII decimal range: How To? 
Back to top
Login to vote
Plamen Ratchev

External


Since: Aug 20, 2008
Posts: 672



(Msg. 3) Posted: Sat May 15, 2010 10:08 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sat, 15 May 2010 12:12:04 +0200, Erland Sommarskog
wrote:

>Plamen's update statement first split the strings into tabular format,
>with one row per character, then the strings are reconcatenated with XML
>PATH. I think I see two problems with query:
>
>1) There is no ORDER BY, so strings could be reassembled in the wrong
> order.

Good catch Erland! This can be solved by adding "ORDER BY n" to the
subquery that does the concatenation.

--
Plamen Ratchev
http://www.SQLStudio.com
 >> Stay informed about: Remove characters in an ASCII decimal range: How To? 
Back to top
Login to vote
bill

External


Since: Oct 17, 2008
Posts: 17



(Msg. 4) Posted: Sat May 15, 2010 10:26 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Erland,

I should have explained more thoroughly. The source is a commercial
application that only allows users to enter ASCII 33 - 127 into the
screen field that gets populated into the database column. However,
the source application adds formatting characters for its own display
format purposes that aren't documented.

I don't want those formatting characters, which are outside the range
of 33 - 127. I have found a number of the characters, but I don't
know if I have found them all. I figured it would be most
straightforward to simply exclude characters that are not in the range
33 - 127, instead of hunting samples for formatting characters.

Thanks,

Bill
 >> Stay informed about: Remove characters in an ASCII decimal range: How To? 
Back to top
Login to vote
bill

External


Since: Oct 17, 2008
Posts: 17



(Msg. 5) Posted: Sat May 15, 2010 10:32 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Plamen:

I'm interested in using your query, but as I am only vaguely familiar
with FOR XML PATH, I am not sure how to process the results.

If I use your query on my sample above, I get a result like this:

This is an interrupted string


I was hoping to simply strip the bad characters but have a normal
resulting string with real spaces, etc. I could do a REPLACE, but I
don't know what other character patterns your query will return, so I
don't know how many REPLACE functions to nest.

Is there some type of built-in SQL Server function that understands
the delimiters created by FOR XML PATH and can render the string for
me without the need for a REPLACE?

The query looks great, so a little guidance would be a big help.

Thanks,

Bill
 >> Stay informed about: Remove characters in an ASCII decimal range: How To? 
Back to top
Login to vote
Erland Sommarskog2

External


Since: May 30, 2004
Posts: 2061



(Msg. 6) Posted: Sun May 16, 2010 5:27 am
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

bill ( ) writes:
> I should have explained more thoroughly. The source is a commercial
> application that only allows users to enter ASCII 33 - 127 into the
> screen field that gets populated into the database column.

In that case, the application needs fixing. Smile

> I don't want those formatting characters, which are outside the range
> of 33 - 127. I have found a number of the characters, but I don't
> know if I have found them all. I figured it would be most
> straightforward to simply exclude characters that are not in the range
> 33 - 127, instead of hunting samples for formatting characters.

Here is a corrected version of Plamen's query, adapted for your test
table. It assumes the presence of the table Numbers. See
http://www.sommarskog.se/arrays-in-sql-2005.html#tblnum for details
on such a table. It will not clear the euro character from your test
data - that is left as an exercise to the reader.

CREATE TABLE test_row
(
statement_nr INT NOT NULL
,statement_tx VARCHAR(200) NOT NULL
)
ALTER TABLE test_row ADD PRIMARY KEY (statement_nr)

INSERT INTO test_row SELECT 1, 'This string has an embedded
'+CHAR(0)+'null character';
INSERT INTO test_row SELECT 2, 'This string has an embedded
'+CHAR(7)+'bell character';
INSERT INTO test_row SELECT 3, 'This string has an embedded
'+CHAR(128)+'ASCII decimal 128 character';
go
WITH UpdateCTE AS
(SELECT t.statement_nr, t.statement_tx,
(SELECT CASE WHEN ascii(substring(t1.statement_tx, n.Number, 1))
BETWEEN 0x00 AND 0x1F
THEN ''
ELSE substring(t1.statement_tx, n.Number, 1)
END + ''
FROM test_row t1
JOIN listtest..Numbers n
ON n.Number <= len(t1.statement_tx)
WHERE t.statement_nr = t1.statement_nr
FOR XML PATH(''), TYPE) AS cleantex
FROM test_row AS t)
UPDATE UpdateCTE
SET statement_tx = convert(nvarchar(200), cleantex)
go
select * from test_row
go
drop table test_row





--
Erland Sommarskog, SQL Server MVP, esquel RemoveThis @sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 >> Stay informed about: Remove characters in an ASCII decimal range: How To? 
Back to top
Login to vote
bill

External


Since: Oct 17, 2008
Posts: 17



(Msg. 7) Posted: Sun May 16, 2010 10:55 am
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Thanks Plamen and Erland. This query cleans up the strings nicely.

Thanks,

Bill
 >> Stay informed about: Remove characters in an ASCII decimal range: How To? 
Back to top
Login to vote
Tom Cooper

External


Since: Mar 12, 2004
Posts: 8



(Msg. 8) Posted: Sun May 16, 2010 1:17 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi Bill,
My experience is that you only get the html escapes like &#x20; (that's for
a space - and there are other characters that will do the same thing, like
&) when you don't concatenate anything to the

(SELECT CASE WHEN ASCII(SUBSTRING(statement_tx, n, 1))
BETWEEN 32 AND 127
THEN SUBSTRING(statement_tx, n, 1)
ELSE ' '
END -- one way to fix is to concatenate something here
FROM test_row AS B
JOIN Nums
ON n <= LEN(statement_tx)
WHERE B.statement_nr = A.statement_nr
FOR XML PATH('')) AS statement_tx_clean

part of the query. And it doesn't work to concatenate a null string (''),
it has to be a string of one or more characters. The way I would fix this
here is to concatenate a CHAR(128). This leaves extra CHAR(128)'s in your
string, but you can use one replace to get rid of them (since you know they
don't belong). For example:

CREATE TABLE test_row
(
statement_nr INT NOT NULL
,statement_tx VARCHAR(200) NOT NULL
);
ALTER TABLE test_row ADD PRIMARY KEY (statement_nr);

INSERT INTO test_row SELECT 1, 'This string has an embedded'+CHAR(0)+'null
character';
INSERT INTO test_row SELECT 2, 'This string has an embedded'+CHAR(7)+'bell
character';
INSERT INTO test_row SELECT 3, 'This string has an
embedded'+CHAR(128)+'ASCII decimal 128 character';

SELECT statement_tx FROM test_row;

WITH Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),
UpdateCTE
AS
(SELECT statement_nr, statement_tx,
(SELECT CASE WHEN ASCII(SUBSTRING(statement_tx, n, 1))
BETWEEN 32 AND 127
THEN SUBSTRING(statement_tx, n, 1)
ELSE ' '
END + Char(128)
FROM test_row AS B
JOIN Nums
ON n <= LEN(statement_tx)
WHERE B.statement_nr = A.statement_nr
FOR XML PATH('')) AS statement_tx_clean
FROM test_row AS A)
UPDATE UpdateCTE
SET statement_tx = Replace(statement_tx_clean,Char(128),'');

SELECT statement_tx FROM test_row;
go
Drop Table test_row;

Tom

"bill" wrote in message

> Hi Plamen:
>
> I'm interested in using your query, but as I am only vaguely familiar
> with FOR XML PATH, I am not sure how to process the results.
>
> If I use your query on my sample above, I get a result like this:
>
> This&#x20;is&#x20;an&#x20;interrupted&#x20;string
>
>
> I was hoping to simply strip the bad characters but have a normal
> resulting string with real spaces, etc. I could do a REPLACE, but I
> don't know what other character patterns your query will return, so I
> don't know how many REPLACE functions to nest.
>
> Is there some type of built-in SQL Server function that understands
> the delimiters created by FOR XML PATH and can render the string for
> me without the need for a REPLACE?
>
> The query looks great, so a little guidance would be a big help.
>
> Thanks,
>
> Bill
 >> Stay informed about: Remove characters in an ASCII decimal range: How To? 
Back to top
Login to vote
Plamen Ratchev

External


Since: Aug 20, 2008
Posts: 672



(Msg. 9) Posted: Sun May 16, 2010 10:38 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Perhaps still needs "ORDER BY n.Number" to guarantee correct
concatenation order. Smile

--
Plamen Ratchev
http://www.SQLStudio.com
 >> Stay informed about: Remove characters in an ASCII decimal range: How To? 
Back to top
Login to vote
Sylvain Lafontaine

External


Since: Mar 21, 2009
Posts: 6



(Msg. 10) Posted: Mon May 17, 2010 4:20 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

While investigating this &#x20 stuff with XML Path, I've found that
returning the result of the subquery as an XML object using the TYPE option
and then converting this object into a string get rid of these entities; so
the following query seems to work correctly on SQL-Server 2008:

WITH Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),

UpdateCTE
AS
(SELECT statement_nr, statement_tx,
Cast ((SELECT CASE WHEN ASCII(SUBSTRING(statement_tx, n, 1))
BETWEEN 0x00 AND 0x1F
THEN ''
ELSE SUBSTRING(statement_tx, n, 1)
END + ''
FROM test_row AS B
JOIN Nums
ON n <= LEN(statement_tx)
WHERE B.statement_nr = A.statement_nr
Order by Nums.N
FOR XML PATH(''), TYPE) as varchar(256)) AS DesNM_clean
FROM test_row AS A)

UPDATE UpdateCTE
SET statement_tx = DesNM_clean;

--
Sylvain Lafontaine, ing.
MVP - Windows Live Platform
Blog/web site: http://coding-paparazzi.sylvainlafontaine.com
Independent consultant and remote programming for Access and SQL-Server
(French)


"Tom Cooper" wrote in message

> Hi Bill,
> My experience is that you only get the html escapes like &#x20; (that's
> for a space - and there are other characters that will do the same thing,
> like &) when you don't concatenate anything to the
>
> (SELECT CASE WHEN ASCII(SUBSTRING(statement_tx, n, 1))
> BETWEEN 32 AND 127
> THEN SUBSTRING(statement_tx, n, 1)
> ELSE ' '
> END -- one way to fix is to concatenate something here
> FROM test_row AS B
> JOIN Nums
> ON n <= LEN(statement_tx)
> WHERE B.statement_nr = A.statement_nr
> FOR XML PATH('')) AS statement_tx_clean
>
> part of the query. And it doesn't work to concatenate a null string (''),
> it has to be a string of one or more characters. The way I would fix this
> here is to concatenate a CHAR(128). This leaves extra CHAR(128)'s in your
> string, but you can use one replace to get rid of them (since you know
> they don't belong). For example:
>
> CREATE TABLE test_row
> (
> statement_nr INT NOT NULL
> ,statement_tx VARCHAR(200) NOT NULL
> );
> ALTER TABLE test_row ADD PRIMARY KEY (statement_nr);
>
> INSERT INTO test_row SELECT 1, 'This string has an embedded'+CHAR(0)+'null
> character';
> INSERT INTO test_row SELECT 2, 'This string has an embedded'+CHAR(7)+'bell
> character';
> INSERT INTO test_row SELECT 3, 'This string has an
> embedded'+CHAR(128)+'ASCII decimal 128 character';
>
> SELECT statement_tx FROM test_row;
>
> WITH Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
> Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
> Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
> Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
> Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),
> UpdateCTE
> AS
> (SELECT statement_nr, statement_tx,
> (SELECT CASE WHEN ASCII(SUBSTRING(statement_tx, n, 1))
> BETWEEN 32 AND 127
> THEN SUBSTRING(statement_tx, n, 1)
> ELSE ' '
> END + Char(128)
> FROM test_row AS B
> JOIN Nums
> ON n <= LEN(statement_tx)
> WHERE B.statement_nr = A.statement_nr
> FOR XML PATH('')) AS statement_tx_clean
> FROM test_row AS A)
> UPDATE UpdateCTE
> SET statement_tx = Replace(statement_tx_clean,Char(128),'');
>
> SELECT statement_tx FROM test_row;
> go
> Drop Table test_row;
>
> Tom
>
> "bill" wrote in message
>
>> Hi Plamen:
>>
>> I'm interested in using your query, but as I am only vaguely familiar
>> with FOR XML PATH, I am not sure how to process the results.
>>
>> If I use your query on my sample above, I get a result like this:
>>
>> This&#x20;is&#x20;an&#x20;interrupted&#x20;string
>>
>>
>> I was hoping to simply strip the bad characters but have a normal
>> resulting string with real spaces, etc. I could do a REPLACE, but I
>> don't know what other character patterns your query will return, so I
>> don't know how many REPLACE functions to nest.
>>
>> Is there some type of built-in SQL Server function that understands
>> the delimiters created by FOR XML PATH and can render the string for
>> me without the need for a REPLACE?
>>
>> The query looks great, so a little guidance would be a big help.
>>
>> Thanks,
>>
>> Bill
>
 >> Stay informed about: Remove characters in an ASCII decimal range: How To? 
Back to top
Login to vote
Sylvain Lafontaine

External


Since: Mar 21, 2009
Posts: 6



(Msg. 11) Posted: Mon May 17, 2010 4:22 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Just out of curiosity, do you really intend to use this XML Path solution
instead of using an UDF in a production database?

--
Sylvain Lafontaine, ing.
MVP - Windows Live Platform
Blog/web site: http://coding-paparazzi.sylvainlafontaine.com
Independent consultant and remote programming for Access and SQL-Server
(French)


"bill" wrote in message

> Hi Plamen:
>
> I'm interested in using your query, but as I am only vaguely familiar
> with FOR XML PATH, I am not sure how to process the results.
>
> If I use your query on my sample above, I get a result like this:
>
> This&#x20;is&#x20;an&#x20;interrupted&#x20;string
>
>
> I was hoping to simply strip the bad characters but have a normal
> resulting string with real spaces, etc. I could do a REPLACE, but I
> don't know what other character patterns your query will return, so I
> don't know how many REPLACE functions to nest.
>
> Is there some type of built-in SQL Server function that understands
> the delimiters created by FOR XML PATH and can render the string for
> me without the need for a REPLACE?
>
> The query looks great, so a little guidance would be a big help.
>
> Thanks,
>
> Bill
 >> Stay informed about: Remove characters in an ASCII decimal range: How To? 
Back to top
Login to vote
Display posts from previous:   
Related Topics:
working with ascii decimal values - i have a bunch of values stored in an nvarchar column which are actually ascii dec values, but due to the column datatype, they are now in string format (i realize this is a backwards storage scenario- at best, but its what i must work with right now)...

Decimal Comma vs Decimal Point - Hello, I want to insert a Single/Float type value into a table using the following SQL statement. I'm executing this from a VB6 app using ADO 2.7. myValue = "0.5074655" (Entered by the user into a textbox) strSQL = "INSERT INTO myTable ...

Export query result to a comma delimited ascii file (with .. -

Decimal conversion - DECLARE @space_in_MB INT SET @space_in_MB = 10081216 SELECT @space_in_MB/1024 -- Gives the output as 9844 But in real division, 10081216 / 1024 gives 9844.9375. How can I get that decimal value (with only max 4 digits after decimal point) from the..

Decimal formatting - Hi, when I cast some number to decimal, I get dot as deciomal point istead of comma. Example: I get 1.233 instead of 1,233 even If I have all settings - windows and database collations to slovenian, which has comma for decimal point. How can I change...
   Database Help (Home) -> Programming All times are: Pacific Time (US & Canada)
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You can edit your posts in this forum
You can delete your posts in this forum
You can vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]