Wednesday, January 11, 2017

How to measure infiniband latency and bandwidth

If you need to test the IB network between 2 nodes look at my
tests on X4-2 Exadata.

I. Measuring Latency

On the 2nd node run the listener program and go to the 1st node:
[root@ed04dbadm02 ~]# ib_read_lat

On the 1st node run the test and see the report:

[root@ed04dbadm01 ~]# ib_read_lat -d mlx4_0 ed04dbadm02
------------------------------------------------------------------
                    RDMA_Read Latency Test
 Number of qps   : 1
 Connection type : RC
 TX depth        : 50
 Mtu             : 2048B
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0x0c QPN 0x0088 PSN 0x41d82e OUT 0x10 RKey 0x1f31e00 VAddr 0x000000021b5000
 remote address: LID 0x0d QPN 0x006e PSN 0x3b5a1e OUT 0x10 RKey 0x2661800 VAddr 0x00000001431000
------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]
 2       1000          1.77           14.12        1.82  
------------------------------------------------------------------



 

And another run:
[root@ed04dbadm01 ~]# ib_read_lat -a -d mlx4_0 ed04dbadm02
------------------------------------------------------------------
                    RDMA_Read Latency Test
 Number of qps   : 1
 Connection type : RC
 TX depth        : 50
 Mtu             : 2048B
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0x0c QPN 0x008a PSN 0x266b05 OUT 0x10 RKey 0x1f5f800 VAddr 0x007f7a60362000
 remote address: LID 0x0d QPN 0x0071 PSN 0xc9915e OUT 0x10 RKey 0x268c100 VAddr 0x000000011ce000
------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]
 2       1000          2.34           20.83        2.88  
 4       1000          2.52           37.00        2.87  
 8       1000          2.53           34.74        2.88  
 16      1000          2.33           20.61        2.78  
 32      1000          2.36           16.59        2.87  
 64      1000          2.54           17.71        2.89  
 128     1000          2.59           30.65        2.95  
 256     1000          2.56           20.28        3.16  
 512     1000          2.46           25.53        3.45  
 1024    1000          3.23           30.31        3.75  
 2048    1000          4.21           28.89        4.56  
 4096    1000          4.82           20.08        5.40  
 Completion with error at client
 Failed status 10: wr_id 1 syndrom 0x88
scnt=1, ccnt=1
[root@ed04dbadm01 ~]#


And see the report from 2nd node:

[root@ed04dbadm02 ~]# ib_read_lat
------------------------------------------------------------------
                    RDMA_Read Latency Test
 Number of qps   : 1
 Connection type : RC
 Mtu             : 2048B
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0x0d QPN 0x006e PSN 0x3b5a1e OUT 0x10 RKey 0x2661800 VAddr 0x00000001431000
 remote address: LID 0x0c QPN 0x0088 PSN 0x41d82e OUT 0x10 RKey 0x1f31e00 VAddr 0x000000021b5000
-----------------------------------------------------------------

 

II. Bandwidth


Next stage is the bandwidth test.

On 2nd node we run the listener:

[root@ed04dbadm02 ~]# ib_read_bw

And on 1st node we run the test itself:

[root@ed04dbadm01 ~]# ib_read_bw -a -d mlx4_0 ed04dbadm02
------------------------------------------------------------------
                    RDMA_Read BW Test
 Number of qps   : 1
 Connection type : RC
 TX depth        : 300
 CQ Moderation   : 50
 Mtu             : 2048B
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0x0c QPN 0x0089 PSN 0x4f763c OUT 0x10 RKey 0x1f50900 VAddr 0x007fb36137f000
 remote address: LID 0x0d QPN 0x0070 PSN 0x418495 OUT 0x10 RKey 0x267f600 VAddr 0x007fe31d2ab000
------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
 2          1000           9.51               9.41  
 4          1000           20.29              20.21 
 8          1000           40.42              40.25 
 16         1000           80.85              78.35 
 32         1000           161.99             161.52
 64         1000           309.76             308.93
 128        1000           650.33             648.20
 256        1000           1291.16            1286.98
 512        1000           2610.95            2523.90
 1024       1000           3074.05            3070.13
 2048       1000           3192.15            3159.41
 4096       1000           3220.35            3219.55
 8192       1000           3230.01            3229.81
 16384      1000           3241.36            3241.14
 32768      1000           3245.32            3245.29
 65536      1000           3248.53            3248.47
 Completion with error at client
 Failed status 10: wr_id 1 syndrom 0x88
scnt=300, ccnt=0


[root@ed04dbadm02 ~]# ib_read_bw
------------------------------------------------------------------
                    RDMA_Read BW Test
 Number of qps   : 1
 Connection type : RC
 CQ Moderation   : 50
 Mtu             : 2048B
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0x0d QPN 0x0070 PSN 0x418495 OUT 0x10 RKey 0x267f600 VAddr 0x007fe31d2ab000

 remote address: LID 0x0c QPN 0x0089 PSN 0x4f763c OUT 0x10 RKey 0x1f50900 VAddr 0x007fb36137f000
pp_read_keys: Success
Couldn't read remote address
 Unable to write to socket/rdam_cm
Failed to close connection between server and client
[root@ed04dbadm02 ~]#


And bi-directional test:
[root@ed04dbadm01 ~]# ib_read_bw -a -b -d mlx4_0 ed04dbadm02
------------------------------------------------------------------
                    RDMA_Read Bidirectional BW Test
 Number of qps   : 1
 Connection type : RC
 TX depth        : 300
 CQ Moderation   : 50
 Mtu             : 2048B
 Link type       : IB
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0x0c QPN 0x0091 PSN 0x71081f OUT 0x10 RKey 0x1f94b00 VAddr 0x007f851c57d000
 remote address: LID 0x0d QPN 0x0076 PSN 0xae8412 OUT 0x10 RKey 0x26bf600 VAddr 0x007fc7ed0af000
------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
 2          1000           20.47              20.09 
 4          1000           40.95              40.82 
 8          1000           81.74              81.41 
 16         1000           164.40             163.81
 32         1000           328.80             327.14
 64         1000           658.83             656.19
 128        1000           1320.12            1315.04
 256        1000           2482.47            2459.42
 512        1000           5260.80            5244.75
 1024       1000           6250.11            6232.63
 2048       1000           6395.13            6386.76
 4096       1000           6466.50            6466.02
 8192       1000           6459.10            6449.20
 16384      1000           6490.17            6489.88
 32768      1000           6503.01            6502.80
 65536      1000           6499.86            6499.85
 Completion with error at client
 Failed status 10: wr_id 1 syndrom 0x88
scnt=300, ccnt=0
[root@ed04dbadm01 ~]#


III. HELP


[root@ed04dbadm01 ~]# ib_read_lat -h
Usage:
  ib_read_lat            start a server and wait for connection
  ib_read_lat <host>     connect to server at <host>

Options:
  -p, --port=<port>  Listen on/connect to port <port> (default 18515)
  -d, --ib-dev=<dev>  Use IB device <dev> (default first device found)
  -R, --rdma_cm  Connect QPs with rdma_cm and run test on those QPs
  -z, --com_rdma_cm  Communicate with rdma_cm module to exchange data - use regular QPs
  -i, --ib-port=<port>  Use port <port> of IB device (default 1)
  -c, --connection=<RC/UC/UD>  Connection type RC/UC/UD (default RC)
  -m, --mtu=<mtu>  Mtu size : 256 - 4096 (default port mtu)
  -s, --size=<size>  Size of message to exchange (default 2)
  -a, --all  Run sizes from 2 till 2^23
  -n, --iters=<iters>  Number of exchanges (at least 2, default 1000)
  -t, --tx-depth=<dep>  Size of tx queue (default 50)
  -u, --qp-timeout=<timeout>  QP timeout, timeout value is 4 usec * 2 ^(timeout), default 14
  -S, --sl=<sl>  SL (default 0)
  -x, --gid-index=<index>  Test uses GID with GID index (Default : IB - no gid . ETH - 0)
  -F, --CPU-freq  Do not fail even if cpufreq_ondemand module is loaded
  -V, --version  Display version number
  -I, --inline_size=<size>  Max size of message to be sent in inline (default 400)
  -e, --events  Sleep on CQ events (default poll)
  -o, --outs=<num>  num of outstanding read/atom(default max of device)
  -C, --report-cycles  report times in cpu cycle units (default microseconds)
  -H, --report-histogram  Print out all results (default print summary only)
  -U, --report-unsorted  (implies -H) print out unsorted results (default sorted)



[root@ed04dbadm01 ~]# ib_read_bw -h
Usage:
  ib_read_bw            start a server and wait for connection
  ib_read_bw <host>     connect to server at <host>

Options:
  -p, --port=<port>  Listen on/connect to port <port> (default 18515)
  -d, --ib-dev=<dev>  Use IB device <dev> (default first device found)
  -R, --rdma_cm  Connect QPs with rdma_cm and run test on those QPs
  -z, --com_rdma_cm  Communicate with rdma_cm module to exchange data - use regular QPs
  -i, --ib-port=<port>  Use port <port> of IB device (default 1)
  -c, --connection=<RC/UC/UD>  Connection type RC/UC/UD (default RC)
  -m, --mtu=<mtu>  Mtu size : 256 - 4096 (default port mtu)
  -s, --size=<size>  Size of message to exchange (default 65536)
  -a, --all  Run sizes from 2 till 2^23
  -n, --iters=<iters>  Number of exchanges (at least 2, default 1000)
  -t, --tx-depth=<dep>  Size of tx queue (default 300)
  -u, --qp-timeout=<timeout>  QP timeout, timeout value is 4 usec * 2 ^(timeout), default 14
  -S, --sl=<sl>  SL (default 0)
  -x, --gid-index=<index>  Test uses GID with GID index (Default : IB - no gid . ETH - 0)
  -F, --CPU-freq  Do not fail even if cpufreq_ondemand module is loaded
  -V, --version  Display version number
  -I, --inline_size=<size>  Max size of message to be sent in inline (default 0)
  -b, --bidirectional  Measure bidirectional bandwidth (default unidirectional)
  -Q, --cq-mod  Generate Cqe only after <--cq-mod> completion
  -e, --events  Sleep on CQ events (default poll)
  -N, --no peak-bw  Cancel peak-bw calculation (default with peak)
  -o, --outs=<num>  num of outstanding read/atom(default max of device)

No comments:

Post a Comment

Exadata account locked, pam_tally2 and host_access_control

Exam accounts are set up so that they are blocked for 10 minutes after the first wrong password entered. This brings a lot of inconvenien...