 |
|
 |
|
Next: Problems with mysql_connect and/or mysql_select_d..
|
| Author |
Message |
External

Since: Oct 17, 2008 Posts: 17
|
(Msg. 1) Posted: Fri May 14, 2010 9:20 pm
Post subject: Remove characters in an ASCII decimal range: How To? Archived from groups: microsoft>public>sqlserver>programming (more info?)
|
|
|
I have a table that has some prettty nasty strings in it. I would
like to remove all characters outside the range of 33 - 127 decimal
from the string
Here is what I mean:
CREATE TABLE test_row
(
statement_nr INT NOT NULL
,statement_tx VARCHAR(200) NOT NULL
)
ALTER TABLE test_row ADD PRIMARY KEY (statement_nr)
INSERT INTO test_row SELECT 1, 'This string has an embedded
'+CHAR(0)+'null character';
INSERT INTO test_row SELECT 2, 'This string has an embedded
'+CHAR(7)+'bell character';
INSERT INTO test_row SELECT 3, 'This string has an embedded
'+CHAR(128)+'ASCII decimal 128 character';
If you query the table (SELECT statement_tx FROM test_row) you'll get
these results:
This string has an embedded
This string has an embedded bell character
This string has an embedded €ASCII decimal 128 character
Notice the first statement is terminated, because SQL stops when it
hits the null. The other statements (don't know if it comes through
on the newsgroup), show the actual ASCII character immediately after
the space that follows the word 'embedded'
I would like some SQL that strips out all the charcters other than 32
-127. My result set would thus look like:
This string has an embedded null character
This string has an embedded bell character
This string has an embedded ASCII decimal 128 character
I do not want to write a CLR function, and I would prefer to avoid
even a tsql scalar function (I don't much like functions in general).
I am hoping some set-oriented SQL would allow this.
I searched the archives and found a solution (http://
groups.google.com/group/microsoft.public.sqlserver.programming/
browse_thread/thread/f9786bbe4a3ca1e3/e785868b90b50ed1?lnk=gst&q=remove
+extended+ascii#e785868b90b50ed1) from Plamen that used XML path, but
I couldn't get it to work, and don't actually understand XML Path.
Perhaps someone could explain it to me, and tell me how to get rid of
the characters that XML path seems to add without resorting to
REPLACE?
Thanks,
Bill >> Stay informed about: Remove characters in an ASCII decimal range: How To? |
|
| Back to top |
|
 |  |
External

Since: May 30, 2004 Posts: 2061
|
(Msg. 2) Posted: Sat May 15, 2010 6:25 am
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
bill ( ) writes:
> I have a table that has some prettty nasty strings in it. I would
> like to remove all characters outside the range of 33 - 127 decimal
> from the string
You don't want to that. You may want to remove non-printable characters,
but believe me, there are plenty of printable characters beyond 127.
For instance, you cannot properly send goods to my address without such a
character. And I can tell you that I get pretty p*ssed when I enter my
address on a web form, and I am told that it contains an illegal character.
I don't know what's in your data, but I think you pretty naïve if you
believe all those characters are junk you want to get rid of. For instance:
This string has an embedded €ASCII decimal 128 character
What if you have a string that goes:
"The price for the service is €200,000."
You stripping venture will delete important information from the string.
> Notice the first statement is terminated, because SQL stops when it
> hits the null.
Not really, but the tools typically takes the NUL as a string terminator.
> I do not want to write a CLR function,
Why not? The CLR is the best tool for the purpose.
> I am hoping some set-oriented SQL would allow this.
Well, you can define a table that includes all characters you want
to remove. Then you can run a cursor over the table and run
replace(targettbl.col, badchars.ch, '')
on it. Not the most efficient, but it could be OK for a one-off.
....and it will not always work. If you have a Windows collation (and
you should), replace on char(0) does not work, because char(0) is
not defined in Unicode. But if you force an SQL collation, replace
will operate on char(0) as well.
> I searched the archives and found a solution (http://
> groups.google.com/group/microsoft.public.sqlserver.programming/
> browse_thread/thread/f9786bbe4a3ca1e3/e785868b90b50ed1?lnk=gst&q=remove
> +extended+ascii#e785868b90b50ed1) from Plamen that used XML path, but
> I couldn't get it to work, and don't actually understand XML Path.
> Perhaps someone could explain it to me, and tell me how to get rid of
> the characters that XML path seems to add without resorting to
> REPLACE?
Plamen's update statement first split the strings into tabular format,
with one row per character, then the strings are reconcatenated with XML
PATH. I think I see two problems with query:
1) There is no ORDER BY, so strings could be reassembled in the wrong
order.
2) The way the query is written, characters like & will be entisised into
& etc. This can be address by using FOR XML PATH(''), TYPE and
then convert the resulting XML to nvarchar.
On the other hand, the way his query is written, there is no risk for
issues like REPLACE.
--
Erland Sommarskog, SQL Server MVP, esquel.RemoveThis@sommarskog.se
Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx >> Stay informed about: Remove characters in an ASCII decimal range: How To? |
|
| Back to top |
|
 |  |
External

Since: Aug 20, 2008 Posts: 672
|
(Msg. 3) Posted: Sat May 15, 2010 10:08 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
On Sat, 15 May 2010 12:12:04 +0200, Erland Sommarskog
wrote:
>Plamen's update statement first split the strings into tabular format,
>with one row per character, then the strings are reconcatenated with XML
>PATH. I think I see two problems with query:
>
>1) There is no ORDER BY, so strings could be reassembled in the wrong
> order.
Good catch Erland! This can be solved by adding "ORDER BY n" to the
subquery that does the concatenation.
--
Plamen Ratchev
http://www.SQLStudio.com >> Stay informed about: Remove characters in an ASCII decimal range: How To? |
|
| Back to top |
|
 |  |
External

Since: Oct 17, 2008 Posts: 17
|
(Msg. 4) Posted: Sat May 15, 2010 10:26 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Hi Erland,
I should have explained more thoroughly. The source is a commercial
application that only allows users to enter ASCII 33 - 127 into the
screen field that gets populated into the database column. However,
the source application adds formatting characters for its own display
format purposes that aren't documented.
I don't want those formatting characters, which are outside the range
of 33 - 127. I have found a number of the characters, but I don't
know if I have found them all. I figured it would be most
straightforward to simply exclude characters that are not in the range
33 - 127, instead of hunting samples for formatting characters.
Thanks,
Bill >> Stay informed about: Remove characters in an ASCII decimal range: How To? |
|
| Back to top |
|
 |  |
External

Since: Oct 17, 2008 Posts: 17
|
(Msg. 5) Posted: Sat May 15, 2010 10:32 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Hi Plamen:
I'm interested in using your query, but as I am only vaguely familiar
with FOR XML PATH, I am not sure how to process the results.
If I use your query on my sample above, I get a result like this:
This is an interrupted string
I was hoping to simply strip the bad characters but have a normal
resulting string with real spaces, etc. I could do a REPLACE, but I
don't know what other character patterns your query will return, so I
don't know how many REPLACE functions to nest.
Is there some type of built-in SQL Server function that understands
the delimiters created by FOR XML PATH and can render the string for
me without the need for a REPLACE?
The query looks great, so a little guidance would be a big help.
Thanks,
Bill >> Stay informed about: Remove characters in an ASCII decimal range: How To? |
|
| Back to top |
|
 |  |
External

Since: May 30, 2004 Posts: 2061
|
(Msg. 6) Posted: Sun May 16, 2010 5:27 am
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
bill ( ) writes:
> I should have explained more thoroughly. The source is a commercial
> application that only allows users to enter ASCII 33 - 127 into the
> screen field that gets populated into the database column.
In that case, the application needs fixing.
> I don't want those formatting characters, which are outside the range
> of 33 - 127. I have found a number of the characters, but I don't
> know if I have found them all. I figured it would be most
> straightforward to simply exclude characters that are not in the range
> 33 - 127, instead of hunting samples for formatting characters.
Here is a corrected version of Plamen's query, adapted for your test
table. It assumes the presence of the table Numbers. See
http://www.sommarskog.se/arrays-in-sql-2005.html#tblnum for details
on such a table. It will not clear the euro character from your test
data - that is left as an exercise to the reader.
CREATE TABLE test_row
(
statement_nr INT NOT NULL
,statement_tx VARCHAR(200) NOT NULL
)
ALTER TABLE test_row ADD PRIMARY KEY (statement_nr)
INSERT INTO test_row SELECT 1, 'This string has an embedded
'+CHAR(0)+'null character';
INSERT INTO test_row SELECT 2, 'This string has an embedded
'+CHAR(7)+'bell character';
INSERT INTO test_row SELECT 3, 'This string has an embedded
'+CHAR(128)+'ASCII decimal 128 character';
go
WITH UpdateCTE AS
(SELECT t.statement_nr, t.statement_tx,
(SELECT CASE WHEN ascii(substring(t1.statement_tx, n.Number, 1))
BETWEEN 0x00 AND 0x1F
THEN ''
ELSE substring(t1.statement_tx, n.Number, 1)
END + ''
FROM test_row t1
JOIN listtest..Numbers n
ON n.Number <= len(t1.statement_tx)
WHERE t.statement_nr = t1.statement_nr
FOR XML PATH(''), TYPE) AS cleantex
FROM test_row AS t)
UPDATE UpdateCTE
SET statement_tx = convert(nvarchar(200), cleantex)
go
select * from test_row
go
drop table test_row
--
Erland Sommarskog, SQL Server MVP, esquel RemoveThis @sommarskog.se
Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx >> Stay informed about: Remove characters in an ASCII decimal range: How To? |
|
| Back to top |
|
 |  |
External

Since: Oct 17, 2008 Posts: 17
|
(Msg. 7) Posted: Sun May 16, 2010 10:55 am
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
|
|
| Back to top |
|
 |  |
External

Since: Mar 12, 2004 Posts: 8
|
(Msg. 8) Posted: Sun May 16, 2010 1:17 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Hi Bill,
My experience is that you only get the html escapes like   (that's for
a space - and there are other characters that will do the same thing, like
&) when you don't concatenate anything to the
(SELECT CASE WHEN ASCII(SUBSTRING(statement_tx, n, 1))
BETWEEN 32 AND 127
THEN SUBSTRING(statement_tx, n, 1)
ELSE ' '
END -- one way to fix is to concatenate something here
FROM test_row AS B
JOIN Nums
ON n <= LEN(statement_tx)
WHERE B.statement_nr = A.statement_nr
FOR XML PATH('')) AS statement_tx_clean
part of the query. And it doesn't work to concatenate a null string (''),
it has to be a string of one or more characters. The way I would fix this
here is to concatenate a CHAR(128). This leaves extra CHAR(128)'s in your
string, but you can use one replace to get rid of them (since you know they
don't belong). For example:
CREATE TABLE test_row
(
statement_nr INT NOT NULL
,statement_tx VARCHAR(200) NOT NULL
);
ALTER TABLE test_row ADD PRIMARY KEY (statement_nr);
INSERT INTO test_row SELECT 1, 'This string has an embedded'+CHAR(0)+'null
character';
INSERT INTO test_row SELECT 2, 'This string has an embedded'+CHAR(7)+'bell
character';
INSERT INTO test_row SELECT 3, 'This string has an
embedded'+CHAR(128)+'ASCII decimal 128 character';
SELECT statement_tx FROM test_row;
WITH Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),
UpdateCTE
AS
(SELECT statement_nr, statement_tx,
(SELECT CASE WHEN ASCII(SUBSTRING(statement_tx, n, 1))
BETWEEN 32 AND 127
THEN SUBSTRING(statement_tx, n, 1)
ELSE ' '
END + Char(128)
FROM test_row AS B
JOIN Nums
ON n <= LEN(statement_tx)
WHERE B.statement_nr = A.statement_nr
FOR XML PATH('')) AS statement_tx_clean
FROM test_row AS A)
UPDATE UpdateCTE
SET statement_tx = Replace(statement_tx_clean,Char(128),'');
SELECT statement_tx FROM test_row;
go
Drop Table test_row;
Tom
"bill" wrote in message
> Hi Plamen:
>
> I'm interested in using your query, but as I am only vaguely familiar
> with FOR XML PATH, I am not sure how to process the results.
>
> If I use your query on my sample above, I get a result like this:
>
> This is an interrupted string
>
>
> I was hoping to simply strip the bad characters but have a normal
> resulting string with real spaces, etc. I could do a REPLACE, but I
> don't know what other character patterns your query will return, so I
> don't know how many REPLACE functions to nest.
>
> Is there some type of built-in SQL Server function that understands
> the delimiters created by FOR XML PATH and can render the string for
> me without the need for a REPLACE?
>
> The query looks great, so a little guidance would be a big help.
>
> Thanks,
>
> Bill >> Stay informed about: Remove characters in an ASCII decimal range: How To? |
|
| Back to top |
|
 |  |
External

Since: Aug 20, 2008 Posts: 672
|
(Msg. 9) Posted: Sun May 16, 2010 10:38 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
|
|
| Back to top |
|
 |  |
External

Since: Mar 21, 2009 Posts: 6
|
(Msg. 10) Posted: Mon May 17, 2010 4:20 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
While investigating this   stuff with XML Path, I've found that
returning the result of the subquery as an XML object using the TYPE option
and then converting this object into a string get rid of these entities; so
the following query seems to work correctly on SQL-Server 2008:
WITH Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),
UpdateCTE
AS
(SELECT statement_nr, statement_tx,
Cast ((SELECT CASE WHEN ASCII(SUBSTRING(statement_tx, n, 1))
BETWEEN 0x00 AND 0x1F
THEN ''
ELSE SUBSTRING(statement_tx, n, 1)
END + ''
FROM test_row AS B
JOIN Nums
ON n <= LEN(statement_tx)
WHERE B.statement_nr = A.statement_nr
Order by Nums.N
FOR XML PATH(''), TYPE) as varchar(256)) AS DesNM_clean
FROM test_row AS A)
UPDATE UpdateCTE
SET statement_tx = DesNM_clean;
--
Sylvain Lafontaine, ing.
MVP - Windows Live Platform
Blog/web site: http://coding-paparazzi.sylvainlafontaine.com
Independent consultant and remote programming for Access and SQL-Server
(French)
"Tom Cooper" wrote in message
> Hi Bill,
> My experience is that you only get the html escapes like   (that's
> for a space - and there are other characters that will do the same thing,
> like &) when you don't concatenate anything to the
>
> (SELECT CASE WHEN ASCII(SUBSTRING(statement_tx, n, 1))
> BETWEEN 32 AND 127
> THEN SUBSTRING(statement_tx, n, 1)
> ELSE ' '
> END -- one way to fix is to concatenate something here
> FROM test_row AS B
> JOIN Nums
> ON n <= LEN(statement_tx)
> WHERE B.statement_nr = A.statement_nr
> FOR XML PATH('')) AS statement_tx_clean
>
> part of the query. And it doesn't work to concatenate a null string (''),
> it has to be a string of one or more characters. The way I would fix this
> here is to concatenate a CHAR(128). This leaves extra CHAR(128)'s in your
> string, but you can use one replace to get rid of them (since you know
> they don't belong). For example:
>
> CREATE TABLE test_row
> (
> statement_nr INT NOT NULL
> ,statement_tx VARCHAR(200) NOT NULL
> );
> ALTER TABLE test_row ADD PRIMARY KEY (statement_nr);
>
> INSERT INTO test_row SELECT 1, 'This string has an embedded'+CHAR(0)+'null
> character';
> INSERT INTO test_row SELECT 2, 'This string has an embedded'+CHAR(7)+'bell
> character';
> INSERT INTO test_row SELECT 3, 'This string has an
> embedded'+CHAR(128)+'ASCII decimal 128 character';
>
> SELECT statement_tx FROM test_row;
>
> WITH Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
> Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
> Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
> Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
> Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),
> UpdateCTE
> AS
> (SELECT statement_nr, statement_tx,
> (SELECT CASE WHEN ASCII(SUBSTRING(statement_tx, n, 1))
> BETWEEN 32 AND 127
> THEN SUBSTRING(statement_tx, n, 1)
> ELSE ' '
> END + Char(128)
> FROM test_row AS B
> JOIN Nums
> ON n <= LEN(statement_tx)
> WHERE B.statement_nr = A.statement_nr
> FOR XML PATH('')) AS statement_tx_clean
> FROM test_row AS A)
> UPDATE UpdateCTE
> SET statement_tx = Replace(statement_tx_clean,Char(128),'');
>
> SELECT statement_tx FROM test_row;
> go
> Drop Table test_row;
>
> Tom
>
> "bill" wrote in message
>
>> Hi Plamen:
>>
>> I'm interested in using your query, but as I am only vaguely familiar
>> with FOR XML PATH, I am not sure how to process the results.
>>
>> If I use your query on my sample above, I get a result like this:
>>
>> This is an interrupted string
>>
>>
>> I was hoping to simply strip the bad characters but have a normal
>> resulting string with real spaces, etc. I could do a REPLACE, but I
>> don't know what other character patterns your query will return, so I
>> don't know how many REPLACE functions to nest.
>>
>> Is there some type of built-in SQL Server function that understands
>> the delimiters created by FOR XML PATH and can render the string for
>> me without the need for a REPLACE?
>>
>> The query looks great, so a little guidance would be a big help.
>>
>> Thanks,
>>
>> Bill
> >> Stay informed about: Remove characters in an ASCII decimal range: How To? |
|
| Back to top |
|
 |  |
External

Since: Mar 21, 2009 Posts: 6
|
(Msg. 11) Posted: Mon May 17, 2010 4:22 pm
Post subject: Re: Remove characters in an ASCII decimal range: How To? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Just out of curiosity, do you really intend to use this XML Path solution
instead of using an UDF in a production database?
--
Sylvain Lafontaine, ing.
MVP - Windows Live Platform
Blog/web site: http://coding-paparazzi.sylvainlafontaine.com
Independent consultant and remote programming for Access and SQL-Server
(French)
"bill" wrote in message
> Hi Plamen:
>
> I'm interested in using your query, but as I am only vaguely familiar
> with FOR XML PATH, I am not sure how to process the results.
>
> If I use your query on my sample above, I get a result like this:
>
> This is an interrupted string
>
>
> I was hoping to simply strip the bad characters but have a normal
> resulting string with real spaces, etc. I could do a REPLACE, but I
> don't know what other character patterns your query will return, so I
> don't know how many REPLACE functions to nest.
>
> Is there some type of built-in SQL Server function that understands
> the delimiters created by FOR XML PATH and can render the string for
> me without the need for a REPLACE?
>
> The query looks great, so a little guidance would be a big help.
>
> Thanks,
>
> Bill >> Stay informed about: Remove characters in an ASCII decimal range: How To? |
|
| Back to top |
|
 |  |
| Related Topics: | working with ascii decimal values - i have a bunch of values stored in an nvarchar column which are actually ascii dec values, but due to the column datatype, they are now in string format (i realize this is a backwards storage scenario- at best, but its what i must work with right now)...
Decimal Comma vs Decimal Point - Hello, I want to insert a Single/Float type value into a table using the following SQL statement. I'm executing this from a VB6 app using ADO 2.7. myValue = "0.5074655" (Entered by the user into a textbox) strSQL = "INSERT INTO myTable ...
Export query result to a comma delimited ascii file (with .. -
Decimal conversion - DECLARE @space_in_MB INT SET @space_in_MB = 10081216 SELECT @space_in_MB/1024 -- Gives the output as 9844 But in real division, 10081216 / 1024 gives 9844.9375. How can I get that decimal value (with only max 4 digits after decimal point) from the..
Decimal formatting - Hi, when I cast some number to decimal, I get dot as deciomal point istead of comma. Example: I get 1.233 instead of 1,233 even If I have all settings - windows and database collations to slovenian, which has comma for decimal point. How can I change... |
|
You can post new topics in this forum You can reply to topics in this forum You can edit your posts in this forum You can delete your posts in this forum You can vote in polls in this forum
|
|
|
|
 |
|
|