| Author |
Message |
Sam M
Guest
|
Posted:
Tue Oct 12, 2004 10:55 am Post subject:
Trying to pinpoint cause of performance issues |
|
|
Hello All,
I have Dual Xeon 3.0GHz server running Windows 2003 Server connected to a FC
SAN that is presenting the server with a 14x147GB RAID10 LUN. This server is
used as a mail-store (thousands of folders each containing thousands of
small files - not the best with a NTFS file system) and another Windows 2003
Server running our mail application accesses the mail-store via a NTFS
share. What I am seeing is a sharp drop in the mail-store server's available
work items (connections to the mail server are paused for less than 1-2
seconds) and a corresponding rise in the Active POP3 Connections on the mail
server (when connections can once again be made). Network activity on both
servers will also drop with the available work items but this is expected.
Now I'm not exactly sure where the problem lies and I'm hoping someone will
be able to guide me in the right direction. I'm rather reluctant to believe
it is the SAN that is causing the issues and am leaning to either of the two
Windows 2003 Servers being at fault. I initially had many problems with the
SAN and mail-store but eventually worked out this was due to an I/O
bottleneck and moving towards a 14x147 RAID10 LUN looks to have resolved all
these issues. I've also recently disabled the creation of short names and
disabled the last access update on the mail-store server (yet to be
rebooted) however I doubt this will resolve the issue I'm seeing. Are there
any performance counters I should be looking at to determine where the
problem lies?
Regards,
Sam |
|
| Back to top |
|
 |
Pat [MSFT]
Guest
|
Posted:
Tue Oct 12, 2004 8:32 pm Post subject:
Re: Trying to pinpoint cause of performance issues |
|
|
Have you checked the disk queuing? That will tell you how many commands are
queued up against the LUN (items go to the queue after the OS/Filesystem
stuff has run). If the queue is large, then that will imply that the
problem is in the IO subsystem (driver, HBA, switch, SAN) and you could
troubleshoot from there. If the queue is low, but latency is high, then you
could look at the CPU and possibly check for blocking filter drivers (AV
drivers for example).
Pat
"Sam M" <Sam@westnet.com.au> wrote in message
news:enJcBiCsEHA.2776@TK2MSFTNGP14.phx.gbl...
| Quote: | Hello All,
I have Dual Xeon 3.0GHz server running Windows 2003 Server connected to a
FC
SAN that is presenting the server with a 14x147GB RAID10 LUN. This server
is
used as a mail-store (thousands of folders each containing thousands of
small files - not the best with a NTFS file system) and another Windows
2003
Server running our mail application accesses the mail-store via a NTFS
share. What I am seeing is a sharp drop in the mail-store server's
available
work items (connections to the mail server are paused for less than 1-2
seconds) and a corresponding rise in the Active POP3 Connections on the
mail
server (when connections can once again be made). Network activity on both
servers will also drop with the available work items but this is expected.
Now I'm not exactly sure where the problem lies and I'm hoping someone
will
be able to guide me in the right direction. I'm rather reluctant to
believe
it is the SAN that is causing the issues and am leaning to either of the
two
Windows 2003 Servers being at fault. I initially had many problems with
the
SAN and mail-store but eventually worked out this was due to an I/O
bottleneck and moving towards a 14x147 RAID10 LUN looks to have resolved
all
these issues. I've also recently disabled the creation of short names and
disabled the last access update on the mail-store server (yet to be
rebooted) however I doubt this will resolve the issue I'm seeing. Are
there
any performance counters I should be looking at to determine where the
problem lies?
Regards,
Sam
|
|
|
| Back to top |
|
 |
Sam M
Guest
|
Posted:
Wed Oct 13, 2004 4:26 am Post subject:
Re: Trying to pinpoint cause of performance issues |
|
|
Hi Pat,
Thanks for the reply. The crrent disk queue length is being monitored and
this stays relatively low. One thing I noticed when monitoring the Server
Work Queue - Bytes Transferred/sec of the mail-store server, only one CPU
(0) looks to be doing the work. Is this normal?
Regards,
Sam
"Pat [MSFT]" <patfilot@online.microsoft.com> wrote in message
news:#83DfkHsEHA.3080@TK2MSFTNGP15.phx.gbl...
Have you checked the disk queuing? That will tell you how many commands are
queued up against the LUN (items go to the queue after the OS/Filesystem
stuff has run). If the queue is large, then that will imply that the
problem is in the IO subsystem (driver, HBA, switch, SAN) and you could
troubleshoot from there. If the queue is low, but latency is high, then you
could look at the CPU and possibly check for blocking filter drivers (AV
drivers for example).
Pat
"Sam M" <Sam@westnet.com.au> wrote in message
news:enJcBiCsEHA.2776@TK2MSFTNGP14.phx.gbl...
| Quote: | Hello All,
I have Dual Xeon 3.0GHz server running Windows 2003 Server connected to a
FC
SAN that is presenting the server with a 14x147GB RAID10 LUN. This server
is
used as a mail-store (thousands of folders each containing thousands of
small files - not the best with a NTFS file system) and another Windows
2003
Server running our mail application accesses the mail-store via a NTFS
share. What I am seeing is a sharp drop in the mail-store server's
available
work items (connections to the mail server are paused for less than 1-2
seconds) and a corresponding rise in the Active POP3 Connections on the
mail
server (when connections can once again be made). Network activity on both
servers will also drop with the available work items but this is expected.
Now I'm not exactly sure where the problem lies and I'm hoping someone
will
be able to guide me in the right direction. I'm rather reluctant to
believe
it is the SAN that is causing the issues and am leaning to either of the
two
Windows 2003 Servers being at fault. I initially had many problems with
the
SAN and mail-store but eventually worked out this was due to an I/O
bottleneck and moving towards a 14x147 RAID10 LUN looks to have resolved
all
these issues. I've also recently disabled the creation of short names and
disabled the last access update on the mail-store server (yet to be
rebooted) however I doubt this will resolve the issue I'm seeing. Are
there
any performance counters I should be looking at to determine where the
problem lies?
Regards,
Sam
|
|
|
| Back to top |
|
 |
Pat [MSFT]
Guest
|
Posted:
Wed Oct 13, 2004 9:12 pm Post subject:
Re: Trying to pinpoint cause of performance issues |
|
|
If the same thread is writing to the wire (and under relatively light load
this may be the case) that would be normal. The reason is that the thread
will have a 'preferred' CPU so that when it runs the OS will try to schedule
it for the same CPU to try to maximize the possibility that it will have a
good L1/L2 cache hit rate (better perf). If that CPU is unavailable, the OS
will go to a different CPU.
Another way that can happen is if the NIC has the processor affinity set.
This is a technique where you basically map a NIC to a CPU to accomplish
much the same behavior as described above (i.e. if the same CPU always
handles a particular NIC, the CPU cache is more likely to be valid than a
random selection).
Finally some chipsets will map a particular slot to a particular CPU, which
can also cause the behavior.
Overall, its not a big deal unless the CPU in question is blocking on other
work.
Pat
"Sam M" <Sam@westnet.com.au> wrote in message
news:OFgTNtLsEHA.3336@tk2msftngp13.phx.gbl...
| Quote: | Hi Pat,
Thanks for the reply. The crrent disk queue length is being monitored and
this stays relatively low. One thing I noticed when monitoring the Server
Work Queue - Bytes Transferred/sec of the mail-store server, only one CPU
(0) looks to be doing the work. Is this normal?
Regards,
Sam
"Pat [MSFT]" <patfilot@online.microsoft.com> wrote in message
news:#83DfkHsEHA.3080@TK2MSFTNGP15.phx.gbl...
Have you checked the disk queuing? That will tell you how many commands
are
queued up against the LUN (items go to the queue after the OS/Filesystem
stuff has run). If the queue is large, then that will imply that the
problem is in the IO subsystem (driver, HBA, switch, SAN) and you could
troubleshoot from there. If the queue is low, but latency is high, then
you
could look at the CPU and possibly check for blocking filter drivers (AV
drivers for example).
Pat
"Sam M" <Sam@westnet.com.au> wrote in message
news:enJcBiCsEHA.2776@TK2MSFTNGP14.phx.gbl...
Hello All,
I have Dual Xeon 3.0GHz server running Windows 2003 Server connected to a
FC
SAN that is presenting the server with a 14x147GB RAID10 LUN. This server
is
used as a mail-store (thousands of folders each containing thousands of
small files - not the best with a NTFS file system) and another Windows
2003
Server running our mail application accesses the mail-store via a NTFS
share. What I am seeing is a sharp drop in the mail-store server's
available
work items (connections to the mail server are paused for less than 1-2
seconds) and a corresponding rise in the Active POP3 Connections on the
mail
server (when connections can once again be made). Network activity on
both
servers will also drop with the available work items but this is
expected.
Now I'm not exactly sure where the problem lies and I'm hoping someone
will
be able to guide me in the right direction. I'm rather reluctant to
believe
it is the SAN that is causing the issues and am leaning to either of the
two
Windows 2003 Servers being at fault. I initially had many problems with
the
SAN and mail-store but eventually worked out this was due to an I/O
bottleneck and moving towards a 14x147 RAID10 LUN looks to have resolved
all
these issues. I've also recently disabled the creation of short names and
disabled the last access update on the mail-store server (yet to be
rebooted) however I doubt this will resolve the issue I'm seeing. Are
there
any performance counters I should be looking at to determine where the
problem lies?
Regards,
Sam
|
|
|
| Back to top |
|
 |
|
|
|
|