Welcome to dbFreaks.com!
FAQFAQ      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

Performance issues SQL Server 2005 Data Mining

 
   Database Help (Home) -> Datamining RSS
Next:  how to copy attributes and hierarchies?  
Author Message
Absolute

External


Since: Nov 18, 2007
Posts: 2



(Msg. 1) Posted: Sun Nov 18, 2007 1:48 pm
Post subject: Performance issues SQL Server 2005 Data Mining
Archived from groups: microsoft>public>sqlserver>datamining (more info?)

Hello

I am trying to run a clustering model (building mode) against a 15.000.000
records table consisting in 45 fields (categorical and numeric). I have to
kill the process after 5 hours because never ends. I have no problem with
other in-database-dataming servers.
I am using a 4-way box running Windows 2003 Server RC2, with 4 GB RAM.

Any suggestion where to begin to analyze?

Thanks in advanced.

 >> Stay informed about: Performance issues SQL Server 2005 Data Mining 
Back to top
Login to vote
Gustavo Frederico

External


Since: Oct 25, 2007
Posts: 2



(Msg. 2) Posted: Tue Nov 20, 2007 10:20 pm
Post subject: RE: Performance issues SQL Server 2005 Data Mining [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

You can try some things,
You can try to discretize your numeric fields. You can begin with a few bins to exagerate the difference from numeric and see if it makes lots of difference in performance.
You can also check which fields are relevant and which are not. For that you can do some feature quality/relevancy assessment. There are statistical ways to do that: you can rank you fields using Information Gain and use the first 2/3 of the attributes. You can try also Gain Ratio, Chi Squared (Chi2), RELIEF, 1R, Gini, etc. Then try to use only a few of the top ranked attributes, and see if it makes a difference. I remember some presentation by either Donald Farmer or Rafal Lukawiecki at Tech Ed 07 where he showed feature ranking inside Excel if I'm not mistaken (it looked like information gain).

cheers,
Gustavo Frederico

>
> Hello
>
> I am trying to run a clustering model (building mode) against a 15.000.000
> records table consisting in 45 fields (categorical and numeric). I have to
> kill the process after 5 hours because never ends. I have no problem with
> other in-database-dataming servers.
> I am using a 4-way box running Windows 2003 Server RC2, with 4 GB RAM.
>
> Any suggestion where to begin to analyze?
>
> Thanks in advanced.
>

 >> Stay informed about: Performance issues SQL Server 2005 Data Mining 
Back to top
Login to vote
Absolute

External


Since: Nov 18, 2007
Posts: 2



(Msg. 3) Posted: Thu Nov 22, 2007 8:33 am
Post subject: Re: Performance issues SQL Server 2005 Data Mining [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Thanks for your answer.

However, I am afraid this is not what I am expecting from a mining tool. I
can understand and manage the kind of transformations you are suggesting,
but the reason to implement them should never be guided by a lack of
scalability of the mining middleware.

Also, the kind of preprocessing you are suggesting is based in several
assumptions in the data distribution I don't want to care about.

In other words, can SQL Server manage this amount of data with any
configuration?

Again, thanks for your inputs.

Best regards.

Victor


"Gustavo Frederico" <gustavo.frederico DeleteThis @cactuscommerce.com> wrote in message
news:87dc87a0-e480-488c-82eb-a7b1ddb32cd8@msnews.microsoft.com...
> You can try some things,
> You can try to discretize your numeric fields. You can begin with a few
> bins to exagerate the difference from numeric and see if it makes lots of
> difference in performance.
> You can also check which fields are relevant and which are not. For that
> you can do some feature quality/relevancy assessment. There are
> statistical ways to do that: you can rank you fields using Information
> Gain and use the first 2/3 of the attributes. You can try also Gain Ratio,
> Chi Squared (Chi2), RELIEF, 1R, Gini, etc. Then try to use only a few of
> the top ranked attributes, and see if it makes a difference. I remember
> some presentation by either Donald Farmer or Rafal Lukawiecki at Tech Ed
> 07 where he showed feature ranking inside Excel if I'm not mistaken (it
> looked like information gain).
>
> cheers,
> Gustavo Frederico
>
>>
>> Hello
>>
>> I am trying to run a clustering model (building mode) against a
>> 15.000.000
>> records table consisting in 45 fields (categorical and numeric). I have
>> to
>> kill the process after 5 hours because never ends. I have no problem with
>> other in-database-dataming servers.
>> I am using a 4-way box running Windows 2003 Server RC2, with 4 GB RAM.
>>
>> Any suggestion where to begin to analyze?
>>
>> Thanks in advanced.
>>
 >> Stay informed about: Performance issues SQL Server 2005 Data Mining 
Back to top
Login to vote
Bogdan Crivat MSFT

External


Since: Feb 20, 2004
Posts: 12



(Msg. 4) Posted: Thu Nov 29, 2007 9:11 am
Post subject: Re: Performance issues SQL Server 2005 Data Mining [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Can you try setting the SAMPLE_SIZE algorithm parameter to 5,000,000 (if you
are using a 64 bit machine + 64 bit SQL Server) or, say, 2,000,000 for a 32
bit machine/SQL Server ? By default, the algorithm is rather conservative
in using the system resources.

> I am using a 4-way box running Windows 2003 Server RC2, with 4 GB RAM.
Is it a 64 bit machine?
Just to make sure you are taking full advantage of the hardware: you should
be running the Enterprise edition of SQL Server 2005 (for parallel
processing) and, if you are using a 64 bit machine, the 64 bit of SQL Server
(to take advantage of the 4 GB or RAM)



--
--
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Please do not send email directly to this alias. It is for newsgroup
purposes only.

thanks,
bogdan

"Absolute" <vfando.TakeThisOut@gmail.com> wrote in message
news:u2F2FFeKIHA.1164@TK2MSFTNGP02.phx.gbl...
> Hello
>
> I am trying to run a clustering model (building mode) against a 15.000.000
> records table consisting in 45 fields (categorical and numeric). I have to
> kill the process after 5 hours because never ends. I have no problem with
> other in-database-dataming servers.
> I am using a 4-way box running Windows 2003 Server RC2, with 4 GB RAM.
>
> Any suggestion where to begin to analyze?
>
> Thanks in advanced.
 >> Stay informed about: Performance issues SQL Server 2005 Data Mining 
Back to top
Login to vote
Alex Taylor

External


Since: Jan 31, 2008
Posts: 1



(Msg. 5) Posted: Thu Jan 31, 2008 12:00 am
Post subject: Re: Performance issues SQL Server 2005 Data Mining [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi!

We had similar - irreducible - problems with Yukon. It is very
surprising from a RDBMS like this. After all I returned to my favorite
FireBird and it handles these situation in a very handy way. On a single
box PC (AMD Athlon64 2800+, 1 GByte RAM) with 40 million records in a
table the complicated queries take about few seconds. A dual aggregate
with date filter and of course grouping and ordering takes about 0.3
sec. It was a dream after Yukon, considering the size of the server - 20
MByte - , the hardware, and the memory usage of the process. I had
already knew that FB faster than almost any other engine, but never
thought that with this amount of data. Since then I am happy with
FireBird. Install is 10 sec, no dependencies on any framework, etc.

Regards:Alex Smile
 >> Stay informed about: Performance issues SQL Server 2005 Data Mining 
Back to top
Login to vote
Display posts from previous:   
   Database Help (Home) -> Datamining All times are: Pacific Time (US & Canada) (change)
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You can edit your posts in this forum
You can delete your posts in this forum
You can vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]