Welcome to dbFreaks.com!
FAQFAQ    SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

sequential disk read speed

 
Goto page 1, 2
   Database Help (Home) -> Technology and Theory RSS
Next:  Excellent peopleSoft HRMS Consultant Available  
Author Message
Darren

External


Since: Aug 20, 2008
Posts: 4



(Msg. 1) Posted: Wed Aug 20, 2008 5:36 pm
Post subject: sequential disk read speed
Archived from groups: comp>databases>theory (more info?)

I am learning about database systems, and I am reading a book called
"Physical Database Design".

It gets to a bit about a large sequential access (e.g. for a full
table scan), and does the following:

It says "Since most disk systems use prefetch buffers to speed up
table scans, we
assume a 64 KB prefetch block"

So to calculate the time for a full table scan, it multiples the
number of 64KB blocks by the time it takes to seek and read (2.02ms).
In other words, it is seeking each 64KB block.

Why can a disk only read 64KB at a time? Is this a valid assumption?
Is this a disk limitation or a file system limitation?

Thanks

 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
Bob Badour

External


Since: Jan 15, 2008
Posts: 1017



(Msg. 2) Posted: Wed Aug 20, 2008 10:48 pm
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Darren wrote:

> I am learning about database systems, and I am reading a book called
> "Physical Database Design".
>
> It gets to a bit about a large sequential access (e.g. for a full
> table scan), and does the following:
>
> It says "Since most disk systems use prefetch buffers to speed up
> table scans, we
> assume a 64 KB prefetch block"
>
> So to calculate the time for a full table scan, it multiples the
> number of 64KB blocks by the time it takes to seek and read (2.02ms).
> In other words, it is seeking each 64KB block.
>
> Why can a disk only read 64KB at a time? Is this a valid assumption?
> Is this a disk limitation or a file system limitation?
>
> Thanks

It's arbitrary. Some dbmses have a fixed block size; others allow one to
configure it as a parameter. Other systems try to read an entire track
or cylinder at a time. Whether the latter is feasible can change when
available cylinder sizes grow at different rates from available memory.

I suspect the book is giving a hypothetical just to demonstrate the
calculations involved.

 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
David BL

External


Since: Jan 22, 2008
Posts: 177



(Msg. 3) Posted: Thu Aug 21, 2008 12:08 am
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Aug 21, 8:36 am, Darren wrote:
> I am learning about database systems, and I am reading a book called
> "Physical Database Design".
>
> It gets to a bit about a large sequential access (e.g. for a full
> table scan), and does the following:
>
> It says "Since most disk systems use prefetch buffers to speed up
> table scans, we
> assume a 64 KB prefetch block"
>
> So to calculate the time for a full table scan, it multiples the
> number of 64KB blocks by the time it takes to seek and read (2.02ms).
> In other words, it is seeking each 64KB block.
>
> Why can a disk only read 64KB at a time? Is this a valid assumption?
> Is this a disk limitation or a file system limitation?

A high end modern HD with 4ms average seek will on average take about
7ms to access and an additional 0.5ms to read a randomly located 64k
buffer. This mismatch shows that 64k blocks are too small for
optimal read performance. 512k or 1Mb blocks would be more suitable.
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
Tim X

External


Since: Jul 17, 2008
Posts: 39



(Msg. 4) Posted: Thu Aug 21, 2008 7:29 pm
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Darren writes:

> I am learning about database systems, and I am reading a book called
> "Physical Database Design".
>
> It gets to a bit about a large sequential access (e.g. for a full
> table scan), and does the following:
>
> It says "Since most disk systems use prefetch buffers to speed up
> table scans, we
> assume a 64 KB prefetch block"
>
> So to calculate the time for a full table scan, it multiples the
> number of 64KB blocks by the time it takes to seek and read (2.02ms).
> In other words, it is seeking each 64KB block.
>
> Why can a disk only read 64KB at a time? Is this a valid assumption?
> Is this a disk limitation or a file system limitation?
>

This is a valid 'assumption'. It does not mean that this is all any disk
can do. To make their point, the authors have to pick some value and 64k
is as good as any other. In reality, it will depend on the hardware, the
way data is written to the disk, the speed of the host, bus, type of
data transfer technology, what the system is optimised for etc etc.
Things actually get even more complex because most DBMS do a certain
amount of their own caching as well. What the authors are attempting to
do is provide a clear abstract explination that doesn't get overly
complex. this is also the reason why whenever it comes to working out
performance and tuning the system (at all levels) it is essential to use
available tools and why most large enterprise level databases usually
have tools to assist in calculating this type of thing.

I'm not familiar with the book in question, but I suspect they are about
to explain how things like record sizes, available indexes etc can
impact on performance and possibly show why often held belief that
indexes always make things faster can be misleading or just plain
wrong. If its a good book, it will emphasise the importance on gathering
hard figures and stats in order to optimise performance and how
dangerous 'rules' regarding optimisation and performance can be. It is
partially due to the complexities and variations involved that you don't
get many databases that can successfully tune for performance
automatically.

Tim


--
tcross (at) rapttech dot com dot au
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
-CELKO-

External


Since: Jan 31, 2008
Posts: 63



(Msg. 5) Posted: Fri Aug 22, 2008 7:13 am
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

>> Why can a disk only read 64KB at a time? Is this a valid assumption? Is this a disk limitation or a file system limitation? <<

The author has to provide some numbers to show how to calculate an
estimation for disk access. Frankly, 64KB seems a little small for a
modern computer other than a desktop machine.

What you might consider is the rise of solid state storage, which will
start replacing moving disk hardware in the next few years. This with
multi-core processors will change database design radically. We work
in a trade where everything you know is wrong in five years Smile
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
David BL

External


Since: Jan 22, 2008
Posts: 177



(Msg. 6) Posted: Fri Aug 22, 2008 11:04 pm
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Aug 22, 10:13 pm, -CELKO- wrote:
> >> Why can a disk only read 64KB at a time? Is this a valid assumption? Is this a disk limitation or a file system limitation? <<
>
> The author has to provide some numbers to show how to calculate an
> estimation for disk access. Frankly, 64KB seems a little small for a
> modern computer other than a desktop machine.

Also on a desktop, 64k is too small. A desktop HD has higher seek
+rotational delay and lower transfer rate giving about the same
product as for an enterprise HD.

> What you might consider is the rise of solid state storage, which will
> start replacing moving disk hardware in the next few years. This with
> multi-core processors will change database design radically. We work
> in a trade where everything you know is wrong in five years Smile

I wonder whether it will be less radical than might at first be
imagined. CPU caches lead to significant variation in memory access
times.

I few years ago I wrote a transient B+Tree and compared the
performance to the STL map (a red black tree) that ships with MS
Visual C++. I ran tests involving inserting a million randomly
generated keys on a map keyed by 32 bit integers. The B+Tree was
twice as fast at insertions and deletions, 35% faster at look up, and
10 times faster at iteration through the elements.
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
Darren

External


Since: Aug 20, 2008
Posts: 4



(Msg. 7) Posted: Sat Aug 23, 2008 10:26 am
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Aug 21, 12:08 am, David BL wrote:
> On Aug 21, 8:36 am, Darren wrote:
>
>
>
>
>
> > I am learning about database systems, and I am reading a book called
> > "Physical Database Design".
>
> > It gets to a bit about a large sequential access (e.g. for a full
> > table scan), and does the following:
>
> > It says "Since most disk systems use prefetch buffers to speed up
> > table scans, we
> > assume a 64 KB prefetch block"
>
> > So to calculate the time for a full table scan, it multiples the
> > number of 64KB blocks by the time it takes to seek and read (2.02ms).
> > In other words, it is seeking each 64KB block.
>
> > Why can a disk only read 64KB at a time? Is this a valid assumption?
> > Is this a disk limitation or a file system limitation?
>
> A high end modern HD with 4ms average seek will on average take about
> 7ms to access and an additional 0.5ms to read a randomly located 64k
> buffer.   This mismatch shows that 64k blocks are too small for
> optimal read performance.   512k or 1Mb blocks would be more suitable.- Hide quoted text -
>
> - Show quoted text -

But what dictates the block size? Is this defined by the physical
disk, the file system, or the database code?
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
Brian Selzer

External


Since: Jan 15, 2008
Posts: 527



(Msg. 8) Posted: Sun Aug 24, 2008 12:39 am
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

"Darren" wrote in message

> On Aug 21, 12:08 am, David BL wrote:
> > On Aug 21, 8:36 am, Darren wrote:
> >
> >
> >
> >
> >
> > > I am learning about database systems, and I am reading a book called
> > > "Physical Database Design".
> >
> > > It gets to a bit about a large sequential access (e.g. for a full
> > > table scan), and does the following:
> >
> > > It says "Since most disk systems use prefetch buffers to speed up
> > > table scans, we
> > > assume a 64 KB prefetch block"
> >
> > > So to calculate the time for a full table scan, it multiples the
> > > number of 64KB blocks by the time it takes to seek and read (2.02ms).
> > > In other words, it is seeking each 64KB block.
> >
> > > Why can a disk only read 64KB at a time? Is this a valid assumption?
> > > Is this a disk limitation or a file system limitation?
> >
> > A high end modern HD with 4ms average seek will on average take about
> > 7ms to access and an additional 0.5ms to read a randomly located 64k
> > buffer. This mismatch shows that 64k blocks are too small for
> > optimal read performance. 512k or 1Mb blocks would be more suitable.-
> > Hide quoted text -
> >
> > - Show quoted text -
>
> But what dictates the block size? Is this defined by the physical
> disk, the file system, or the database code?
>

There can be a marginal reduction in cpu cycles and in some cases I/O by
matching the block size of the disk subsystem to the block size specified in
the database engine, but in any system where performance is critical, the
disk subsystem will involve a caching disk array or multiple caching disk
arrays, and the block size becomes moot as a result of the caching disk
array controller reading an entire track at a time. Memory to memory
transfers are negligible when compared to disk to memory transfers--let
alone seeks. Moreover, caching controllers employ a number of technologies,
such as elevator seeking, to minimize the impact of seek times, and by
reading an entire track at a time, latency--that is, the time it takes for
the data requested to arrive at the head for reading--is effectively
eliminated as a factor. (There is still a negligible latency which is
rougly half the time it takes for the head to pass over a sector, but as
there are hundreds of sectors per track, the time involved is not worth
considering.) Not to mention that a savvy dba will spread data access
across as many heads as can be budgeted for to maximize throughput and
minimize seek time. If you have a 100GB database and you put it on single
100GB disk drive, your best average seek time is the average seek time of
the disk drive, but if you put the database on four 100GB disk drives, the
the best average seek time will only be a fraction of the seek time of the
single disk. Suppose that the full-stroke seek time on the 100GB disk is
7ms and the track-to-track seek time is 1ms. Well, with four disks, instead
of an average 4ms seek time, the individual seek time of each disk is
reduced to roughly 2.5ms, and since there are four disks, the average seek
time for the disk subsystem is reduced to a quarter of that or roughly
..625ms. Add a mirror (RAID 0+1 using 8 drives), and you introduce fault
tolerance while at the same time halving again seek time for reads. Note
that except in rare cases, the ratio of reads to writes in a database is
better than 10:1, so any strategy that improves read time without
significantly impeding write time, such as is the case with implementing
RAID 0+1, should be vigorously pursued. Incidentally, RAID 0+1 is also good
for transaction logs and temporary tables, which involve mostly writes or a
roughly equal number of reads and writes, but it usually makes sense to
segregate the logs from the database--for recovery if nothing else, and it
also often makes sense to segregate the location where temporary tables are
housed from both the database and the logs.
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
David BL

External


Since: Jan 22, 2008
Posts: 177



(Msg. 9) Posted: Sun Aug 24, 2008 7:29 pm
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Aug 24, 1:26 am, Darren wrote:
> On Aug 21, 12:08 am, David BL wrote:
>
>
>
>
>
> > On Aug 21, 8:36 am, Darren wrote:
>
> > > I am learning about database systems, and I am reading a book called
> > > "Physical Database Design".
>
> > > It gets to a bit about a large sequential access (e.g. for a full
> > > table scan), and does the following:
>
> > > It says "Since most disk systems use prefetch buffers to speed up
> > > table scans, we
> > > assume a 64 KB prefetch block"
>
> > > So to calculate the time for a full table scan, it multiples the
> > > number of 64KB blocks by the time it takes to seek and read (2.02ms).
> > > In other words, it is seeking each 64KB block.
>
> > > Why can a disk only read 64KB at a time? Is this a valid assumption?
> > > Is this a disk limitation or a file system limitation?
>
> > A high end modern HD with 4ms average seek will on average take about
> > 7ms to access and an additional 0.5ms to read a randomly located 64k
> > buffer. This mismatch shows that 64k blocks are too small for
> > optimal read performance. 512k or 1Mb blocks would be more suitable
>
> But what dictates the block size? Is this defined by the physical
> disk, the file system, or the database code?

As far as a physical disk is concerned the term “block” was commonly
used to refer to the intersection of a sector and a track, but now
days “sector” tends to be used instead. It is the smallest unit of
reading/writing and is often 512 bytes. Some disks use 1024 byte
sectors.

A file system provides buffering, and that allows an application to
seek within a file and read/write a single byte. However behind the
scenes an entire sector must be read/written to disk.

Typically a DBMS will read/write the disk without file buffering
provided by the OS. For example on Win32, the function CreateFile can
take the parameter FILE_FLAG_NO_BUFFERING. This forces the DBMS to
work at the granularity of sectors – and it’s fairly low level. Eg
there is a requirement that memory buffers be aligned on 512 byte
boundaries to comply with the DMA constraints.

To avoid excessive seeking the DBMS will tend to organise the store
into much courser units that are typically called “blocks”. The block
size is up to the DBMS, but in practise will always be a multiple of a
sector. In some cases it may relate back to track or cylinder
boundaries, but that constraint is not imposed by the disk controller
(which will happily allow for random access to any sector).
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
David BL

External


Since: Jan 22, 2008
Posts: 177



(Msg. 10) Posted: Wed Aug 27, 2008 6:46 pm
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Aug 24, 12:39 pm, "Brian Selzer" wrote:
>
> If you have a 100GB database and you put it on single
> 100GB disk drive, your best average seek time is the average seek time of
> the disk drive, but if you put the database on four 100GB disk drives, the
> the best average seek time will only be a fraction of the seek time of the
> single disk. Suppose that the full-stroke seek time on the 100GB disk is
> 7ms and the track-to-track seek time is 1ms. Well, with four disks, instead
> of an average 4ms seek time, the individual seek time of each disk is
> reduced to roughly 2.5ms

Is this because less of the disk is actually being used so on a given
platter the head doesn't have such a large range of tracks to move
over?

> , and since there are four disks, the average seek
> time for the disk subsystem is reduced to a quarter of that or roughly
> .625ms.

In order for the effective seek time to be reduced to a quarter the
seeking must be independent. To achieve that I think the striping
would need to be very coarse (eg 512kb or 1Mb).
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
David BL

External


Since: Jan 22, 2008
Posts: 177



(Msg. 11) Posted: Wed Aug 27, 2008 9:32 pm
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Aug 28, 10:47 am, "Brian Selzer" wrote:
> "David BL" wrote in message
>
>
>
>
>
>
>
> > On Aug 24, 12:39 pm, "Brian Selzer" wrote:
>
> >> If you have a 100GB database and you put it on single
> >> 100GB disk drive, your best average seek time is the average seek time of
> >> the disk drive, but if you put the database on four 100GB disk drives,
> >> the
> >> the best average seek time will only be a fraction of the seek time of
> >> the
> >> single disk. Suppose that the full-stroke seek time on the 100GB disk is
> >> 7ms and the track-to-track seek time is 1ms. Well, with four disks,
> >> instead
> >> of an average 4ms seek time, the individual seek time of each disk is
> >> reduced to roughly 2.5ms
>
> > Is this because less of the disk is actually being used so on a given
> > platter the head doesn't have such a large range of tracks to move
> > over?
>
> Yes. And the bit density is generally greater at the outside of the
> platter, so it generally takes fewer tracks to store the same information
> there as opposed to near the center; consequently, simply dividing the
> difference of the full-stroke seek and the track-to-track seek by four is a
> perhaps overly conservative method of estimation. I want to stress that
> this is not just a hair-brained theory of mine: I've had significant success
> using this mechanism to boost performance. In one application, by
> installing a disk that was seven times the size required and creating a
> partition on the outer edge of the disk, performance improved by over 6000%:
> batch processes that had been taking over 25 hours to complete were
> finishing in under 25 minutes.

How do you explain a 60 fold increase?

> >> , and since there are four disks, the average seek
> >> time for the disk subsystem is reduced to a quarter of that or roughly
> >> .625ms.
>
> > In order for the effective seek time to be reduced to a quarter the
> > seeking must be independent. To achieve that I think the striping
> > would need to be very coarse (eg 512kb or 1Mb).
>
> Drives that support disconnection or some other command queueing mechanism
> are all that is needed for seeking to be independent.

If stripes are somewhat smaller than the DBMS block size, then every
drive (in the RAID 0) will be involved in the reading of each and
every DBMS block. No matter how you order those reads, each drive
needs to read a large amount of scattered data and the head will seek
around a lot. If that is the case then the only advantage arises
from your previously mentioned reduction in the overall range of
tracks over which the data resides on a given platter.

Alternatively if the stripe size is larger then each drive will read a
somewhat independent set of the DBMS blocks, and the effective seek
time can be reduced assuming the DBMS is able to issue overlapping
read requests for the DBMS blocks.

> I think using a coarse stripe is counterproductive. There would be a bigger
> chance that a seek in the middle of the read would be required. Consider:
> if 3.5 stripes fit on a track in one zone of the disk, then on average every
> fourth read would require an additional seek to get the remaining half
> stripe. If on the other hand, 28 stripes fit on a track, then no additional
> seeks would be necessary. Even if it were 28.5 stripes instead of 28, one
> additional seek for every 29 reads is a whole lot better than one for every
> 4.

Firstly, hard-disks are quite good at stepping onto the next track in
the manner normally used for very large "contiguous" reads or writes.

Secondly your analysis misses the point that coarser granularity
stripes lead to fewer overall seeks, not more! Seeks per read is
not a very useful stat.

The following

http://www.dba-oracle.com/oracle_tips_raid_usage.htm

discusses the ideal stripe size and the formulae indicate that ~1Mb
would be appropriate for a modern disk.
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
Brian Selzer

External


Since: Jan 15, 2008
Posts: 527



(Msg. 12) Posted: Wed Aug 27, 2008 10:47 pm
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

"David BL" wrote in message

> On Aug 24, 12:39 pm, "Brian Selzer" wrote:
>>
>> If you have a 100GB database and you put it on single
>> 100GB disk drive, your best average seek time is the average seek time of
>> the disk drive, but if you put the database on four 100GB disk drives,
>> the
>> the best average seek time will only be a fraction of the seek time of
>> the
>> single disk. Suppose that the full-stroke seek time on the 100GB disk is
>> 7ms and the track-to-track seek time is 1ms. Well, with four disks,
>> instead
>> of an average 4ms seek time, the individual seek time of each disk is
>> reduced to roughly 2.5ms
>
> Is this because less of the disk is actually being used so on a given
> platter the head doesn't have such a large range of tracks to move
> over?
>

Yes. And the bit density is generally greater at the outside of the
platter, so it generally takes fewer tracks to store the same information
there as opposed to near the center; consequently, simply dividing the
difference of the full-stroke seek and the track-to-track seek by four is a
perhaps overly conservative method of estimation. I want to stress that
this is not just a hair-brained theory of mine: I've had significant success
using this mechanism to boost performance. In one application, by
installing a disk that was seven times the size required and creating a
partition on the outer edge of the disk, performance improved by over 6000%:
batch processes that had been taking over 25 hours to complete were
finishing in under 25 minutes.

>> , and since there are four disks, the average seek
>> time for the disk subsystem is reduced to a quarter of that or roughly
>> .625ms.
>
> In order for the effective seek time to be reduced to a quarter the
> seeking must be independent. To achieve that I think the striping
> would need to be very coarse (eg 512kb or 1Mb).
>

Drives that support disconnection or some other command queueing mechanism
are all that is needed for seeking to be independent.

I think using a coarse stripe is counterproductive. There would be a bigger
chance that a seek in the middle of the read would be required. Consider:
if 3.5 stripes fit on a track in one zone of the disk, then on average every
fourth read would require an additional seek to get the remaining half
stripe. If on the other hand, 28 stripes fit on a track, then no additional
seeks would be necessary. Even if it were 28.5 stripes instead of 28, one
additional seek for every 29 reads is a whole lot better than one for every
4.
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
Brian Selzer

External


Since: Jan 15, 2008
Posts: 527



(Msg. 13) Posted: Thu Aug 28, 2008 9:34 am
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

"David BL" wrote in message

> On Aug 28, 10:47 am, "Brian Selzer" wrote:
>> "David BL" wrote in message
>>
>>
>>
>>
>>
>>
>>
>> > On Aug 24, 12:39 pm, "Brian Selzer" wrote:
>>
>> >> If you have a 100GB database and you put it on single
>> >> 100GB disk drive, your best average seek time is the average seek time
>> >> of
>> >> the disk drive, but if you put the database on four 100GB disk drives,
>> >> the
>> >> the best average seek time will only be a fraction of the seek time of
>> >> the
>> >> single disk. Suppose that the full-stroke seek time on the 100GB disk
>> >> is
>> >> 7ms and the track-to-track seek time is 1ms. Well, with four disks,
>> >> instead
>> >> of an average 4ms seek time, the individual seek time of each disk is
>> >> reduced to roughly 2.5ms
>>
>> > Is this because less of the disk is actually being used so on a given
>> > platter the head doesn't have such a large range of tracks to move
>> > over?
>>
>> Yes. And the bit density is generally greater at the outside of the
>> platter, so it generally takes fewer tracks to store the same information
>> there as opposed to near the center; consequently, simply dividing the
>> difference of the full-stroke seek and the track-to-track seek by four is
>> a
>> perhaps overly conservative method of estimation. I want to stress that
>> this is not just a hair-brained theory of mine: I've had significant
>> success
>> using this mechanism to boost performance. In one application, by
>> installing a disk that was seven times the size required and creating a
>> partition on the outer edge of the disk, performance improved by over
>> 6000%:
>> batch processes that had been taking over 25 hours to complete were
>> finishing in under 25 minutes.
>
> How do you explain a 60 fold increase?
>

Fewer and shorter seeks is my guess.

>> >> , and since there are four disks, the average seek
>> >> time for the disk subsystem is reduced to a quarter of that or roughly
>> >> .625ms.
>>
>> > In order for the effective seek time to be reduced to a quarter the
>> > seeking must be independent. To achieve that I think the striping
>> > would need to be very coarse (eg 512kb or 1Mb).
>>
>> Drives that support disconnection or some other command queueing
>> mechanism
>> are all that is needed for seeking to be independent.
>
> If stripes are somewhat smaller than the DBMS block size, then every
> drive (in the RAID 0) will be involved in the reading of each and
> every DBMS block. No matter how you order those reads, each drive
> needs to read a large amount of scattered data and the head will seek
> around a lot. If that is the case then the only advantage arises
> from your previously mentioned reduction in the overall range of
> tracks over which the data resides on a given platter.
>
> Alternatively if the stripe size is larger then each drive will read a
> somewhat independent set of the DBMS blocks, and the effective seek
> time can be reduced assuming the DBMS is able to issue overlapping
> read requests for the DBMS blocks.
>

Your argument rests on the assumption that data is randomly distributed in
the stripes on the disk and doesn't take into account the fact that a
high-end caching controller eliminates latency by reading an entire track at
once. Isn't it true that there is a physical affinity between related data?
Isn't it more likely that an index will occupy contiguous stripes than some
random set--regardless of stripe size? Can you show that the number of
tracks accessed by say, 128 coarse stripe reads is any less than the number
of tracks accessed by 1024 fine stripe reads?

>> I think using a coarse stripe is counterproductive. There would be a
>> bigger
>> chance that a seek in the middle of the read would be required.
>> Consider:
>> if 3.5 stripes fit on a track in one zone of the disk, then on average
>> every
>> fourth read would require an additional seek to get the remaining half
>> stripe. If on the other hand, 28 stripes fit on a track, then no
>> additional
>> seeks would be necessary. Even if it were 28.5 stripes instead of 28,
>> one
>> additional seek for every 29 reads is a whole lot better than one for
>> every
>> 4.
>
> Firstly, hard-disks are quite good at stepping onto the next track in
> the manner normally used for very large "contiguous" reads or writes.
>

The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
writes. That's phenomenal but can still add up.

> Secondly your analysis misses the point that coarser granularity
> stripes lead to fewer overall seeks, not more! Seeks per read is
> not a very useful stat.

Coarser granularity stripes lead to fewer overall reads, not necessarily
fewer overall seeks--and not necessarily reduced overall seek time.

A finer granularity means more commands to be processed.
More commands to be processed increases the likelyhood that the read of one
track will satisfy more than one command.
More commands to be processed increases the likelyhood that elevator seeking
can be used to reduce overall seek time.

Coarser granularity stripes lead to fewer overall reads--not necessarily
fewer overall seeks. In fact, it could lead to more overall seeks.
Suppose, for example, that many of the stripes on disk are less than half
populated with data--in much the same way that a FAT16 files system with a
huge number of tiny files can fill up the disk even though the sum of the
actual file sizes can be less than a quarter of the formatted capacity. Any
seek that is needed in order to read the rest of a stripe when the rest of
the stripe isn't populated with data would be unnecessary if a smaller
stripe size were used. In much the same way, with a high-end processor, it
is often possible to improve performance by setting the compressed attribute
on a file. A compressed file typically occupies half the space of an
uncompressed file, and with a high-end cpu, it can actually take less time
to read and uncompress data than to read uncompressed data.

>
> The following
>
> http://www.dba-oracle.com/oracle_tips_raid_usage.htm
>
> discusses the ideal stripe size and the formulae indicate that ~1Mb
> would be appropriate for a modern disk.
>

I am not convinced, knowing what I know and have had experience with when it
comes to storage subsystems. I would have to read the papers Mike Ault
vaguely refers to.
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
David BL

External


Since: Jan 22, 2008
Posts: 177



(Msg. 14) Posted: Thu Aug 28, 2008 8:43 pm
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Aug 28, 9:34 pm, "Brian Selzer" wrote:
> "David BL" wrote in message
>
>
>
>
>
>
>
> > On Aug 28, 10:47 am, "Brian Selzer" wrote:
> >> "David BL" wrote in message
>
> >>
>
>
> >> >> , and since there are four disks, the average seek
> >> >> time for the disk subsystem is reduced to a quarter of that or roughly
> >> >> .625ms.
>
> >> > In order for the effective seek time to be reduced to a quarter the
> >> > seeking must be independent. To achieve that I think the striping
> >> > would need to be very coarse (eg 512kb or 1Mb).
>
> >> Drives that support disconnection or some other command queueing
> >> mechanism
> >> are all that is needed for seeking to be independent.
>
> > If stripes are somewhat smaller than the DBMS block size, then every
> > drive (in the RAID 0) will be involved in the reading of each and
> > every DBMS block. No matter how you order those reads, each drive
> > needs to read a large amount of scattered data and the head will seek
> > around a lot. If that is the case then the only advantage arises
> > from your previously mentioned reduction in the overall range of
> > tracks over which the data resides on a given platter.
>
> > Alternatively if the stripe size is larger then each drive will read a
> > somewhat independent set of the DBMS blocks, and the effective seek
> > time can be reduced assuming the DBMS is able to issue overlapping
> > read requests for the DBMS blocks.
>
> Your argument rests on the assumption that data is randomly distributed in
> the stripes on the disk and doesn't take into account the fact that a
> high-end caching controller eliminates latency by reading an entire track at
> once. Isn't it true that there is a physical affinity between related data?
> Isn't it more likely that an index will occupy contiguous stripes than some
> random set--regardless of stripe size? Can you show that the number of
> tracks accessed by say, 128 coarse stripe reads is any less than the number
> of tracks accessed by 1024 fine stripe reads?

Yes, sometimes the DBMS manages to cluster all the necessary data so
there is very little seeking required, and in that case it won’t
matter what stripe size is used.

However, that is not always possible. For example consider a B+Tree
on 1 billion records and in a short period of time the DBMS needs to
read 100 records for given index values that are effectively at random
with respect to the ordering on that data type. To keep it simple
ignore the reading of the internal nodes of the B+Tree. Typically
those 100 records will appear in roughly 100 different leaf nodes of
the B+Tree. Furthermore due to the sheer size of the overall data
those leaf nodes will tend to reside on different tracks. The
unfortunate reality is that it isn’t possible to read these records
without a lot of head seeking, even if the reads are ordered according
to track position (ie elevator seeking). Now if RAID0 is used and the
stripes are smaller that the B+Tree leaf nodes, then every drive will
need to contribute to the reading of every leaf node. Each drive can
read the stripes in any order it likes but it won’t avoid the fact
that each drive performs ~100 seeks. If instead, each B+Tree leaf
node resides in a single stripe (and therefore on a single drive) then
with four drives in the RAID0, each drive will only need to perform
~25 seeks.


> >> I think using a coarse stripe is counterproductive. There would be a
> >> bigger
> >> chance that a seek in the middle of the read would be required.
> >> Consider:
> >> if 3.5 stripes fit on a track in one zone of the disk, then on average
> >> every
> >> fourth read would require an additional seek to get the remaining half
> >> stripe. If on the other hand, 28 stripes fit on a track, then no
> >> additional
> >> seeks would be necessary. Even if it were 28.5 stripes instead of 28,
> >> one
> >> additional seek for every 29 reads is a whole lot better than one for
> >> every
> >> 4.
>
> > Firstly, hard-disks are quite good at stepping onto the next track in
> > the manner normally used for very large "contiguous" reads or writes.
>
> The best track-to-track seek time I've seen is 0.2ms for reads, 0.4ms for
> writes. That's phenomenal but can still add up.

It’s insignificant when reading or writing 1Mb at a time.
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
David BL

External


Since: Jan 22, 2008
Posts: 177



(Msg. 15) Posted: Fri Aug 29, 2008 7:23 am
Post subject: Re: sequential disk read speed [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Aug 29, 7:47 pm, "Brian Selzer" wrote:
> "David BL" wrote in message
>
>
>
>
>
>
>
> >On Aug 28, 9:34 pm, "Brian Selzer" wrote:
> >> "David BL" wrote in message
>
> >>
>
> >> > On Aug 28, 10:47 am, "Brian Selzer" wrote:
> >> >> "David BL" wrote in message
>
> >> >>
>
> >> >> >> , and since there are four disks, the average seek
> >> >> >> time for the disk subsystem is reduced to a quarter of that or
> >> >> >> roughly
> >> >> >> .625ms.
>
> >> >> > In order for the effective seek time to be reduced to a quarter the
> >> >> > seeking must be independent. To achieve that I think the striping
> >> >> > would need to be very coarse (eg 512kb or 1Mb).
>
> >> >> Drives that support disconnection or some other command queueing
> >> >> mechanism
> >> >> are all that is needed for seeking to be independent.
>
> >> > If stripes are somewhat smaller than the DBMS block size, then every
> >> > drive (in the RAID 0) will be involved in the reading of each and
> >> > every DBMS block. No matter how you order those reads, each drive
> >> > needs to read a large amount of scattered data and the head will seek
> >> > around a lot. If that is the case then the only advantage arises
> >> > from your previously mentioned reduction in the overall range of
> >> > tracks over which the data resides on a given platter.
>
> >> > Alternatively if the stripe size is larger then each drive will read a
> >> > somewhat independent set of the DBMS blocks, and the effective seek
> >> > time can be reduced assuming the DBMS is able to issue overlapping
> >> > read requests for the DBMS blocks.
>
> >> Your argument rests on the assumption that data is randomly distributed
> >> in
> >> the stripes on the disk and doesn't take into account the fact that a
> >> high-end caching controller eliminates latency by reading an entire track
> >> at
> >> once. Isn't it true that there is a physical affinity between related
> >> data?
> >> Isn't it more likely that an index will occupy contiguous stripes than
> >> some
> >> random set--regardless of stripe size? Can you show that the number of
> >> tracks accessed by say, 128 coarse stripe reads is any less than the
> >> number
> >> of tracks accessed by 1024 fine stripe reads?
>
> >Yes, sometimes the DBMS manages to cluster all the necessary data so
> >there is very little seeking required, and in that case it won’t
> >matter what stripe size is used.
>
> >However, that is not always possible. For example consider a B+Tree
> >on 1 billion records and in a short period of time the DBMS needs to
> >read 100 records for given index values that are effectively at random
> >with respect to the ordering on that data type. To keep it simple
> >ignore the reading of the internal nodes of the B+Tree. Typically
> >those 100 records will appear in roughly 100 different leaf nodes of
> >the B+Tree. Furthermore due to the sheer size of the overall data
> >those leaf nodes will tend to reside on different tracks. The
> >unfortunate reality is that it isn’t possible to read these records
> >without a lot of head seeking, even if the reads are ordered according
> >to track position (ie elevator seeking). Now if RAID0 is used and the
> >stripes are smaller that the B+Tree leaf nodes, then every drive will
> >need to contribute to the reading of every leaf node. Each drive can
> >read the stripes in any order it likes but it won’t avoid the fact
> >that each drive performs ~100 seeks. If instead, each B+Tree leaf
> >node resides in a single stripe (and therefore on a single drive) then
> >with four drives in the RAID0, each drive will only need to perform
> >~25 seeks.
>
> You're oversimplifying. With a stripe size of 64K, it is highly unlikely
> that a leaf node will span more than one stripe; therefore, it is highly
> unlikely for every drive to contribute to the reading of every leaf node.

I don't see how I'm oversimplifying.

My point is that stripes need to be at least as coarse as the DBMS
block size. Do you agree?

The choice of DBMS block size is another question entirely.

> Also, you appear to be discounting concurrency, and environments where
> concurrency is important such as typical OLTP environments are where
> technologies such as elevator seeking are most effective.

Concurrency has nothing to do with the fact that if the stripe size is
too small the seeking of the drives won't be independent.

> By the way, Oracle documentation states that an 8K block size is optimal for
> most systems and defaults DB_FILE_MULTIBLOCK_READ_COUNT to 8. 8K * 8 = 64K.
> Interestingly, Sql Server uses 8K pages organized into 64K extents, which
> happens to be the unit of physical storage allocation. Do you know
> something they don't?

Sql Server 6.5 used 2k pages and this changed to 8k pages in Sql
Server 7.0 released in 1998. Do you expect that 64k extents are still
optimal a decade later given that the product of transfer rate and
seek time has been steadily increasing?

64k blocks are generally too small on modern disks. A 64k block can
be transferred in a tenth of the time it takes to seek to it.
 >> Stay informed about: sequential disk read speed 
Back to top
Login to vote
Display posts from previous:   
Related Topics:
"Fuzzy" text search using n-grams (bigrams) -- speed? - Hi all, Let's suppose I'm writing a website where users search for movie titles, and suppose there are 200,000 movies. The site is in PHP/ MySQL. I'd like to implement a "fuzzy" text search so that similar movie titles come up in a list no ma...

How to flush data most efficiently from memory to disk whe.. - Hi, I'm looking into designing an in-memory DB and I wonder: How to flush data most efficiently when I checkpoint? Say I have a page size of 8K and 1K of those have been updated in random places, that is, the changes may be contiguous but most likely....

Problem with Nested Sets - Hello, I have a table which represents a tree of forums using nested sets. Here are the fields: id, root_id, left, right, level, label. I have a string which is like a path. For example, Forum/Sub-forum/Sub-sub-forum I want to get the id of the forum..

space filling curves - I recently started reading about various ways of making various functions of a DB faster...and I keep running into space filling curves. Unfortunately, I just can't grasp the concept. Following is what I understand so far, I'll appreciate it if someone...

can two stored procedures in same transaction cause deadlock - Hi, We are experiencing a deadlock issue using MS SQL 2000 that's generating some debate in our office. We have two stored procedures SP1 and SP2 running in the same transaction along with couple other stored procedures, SP1 does a deletion on one..
   Database Help (Home) -> Technology and Theory All times are: Pacific Time (US & Canada)
Goto page 1, 2
Page 1 of 2

 
You can post new topics in this forum
You can reply to topics in this forum
You can edit your posts in this forum
You can delete your posts in this forum
You can vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]