[Performance Analysis] DPM/ACA gRPC Performance Report#384
[Performance Analysis] DPM/ACA gRPC Performance Report#384haboy52581 wants to merge 11 commits intofuturewei-cloud:masterfrom
Conversation
update table for test result
xieus
left a comment
There was a problem hiding this comment.
@haboy52581 Some initial comments. Thanks for setting up tests and collecting the data points.
| @@ -0,0 +1,215 @@ | |||
| = ALCOR CONTROL AGENT-ALCOR DATAPLANE MANAGER Test Report | |||
There was a problem hiding this comment.
Suggested to change to "Alcor gRPC Performance Test Report"
| |*cpu MHz* |2231.772 |2599.079 | ||
| |*Memory* |192GB |386GB | ||
| |*Network* |NetXtreme BCM5719 Gigabit Ethernet PCIe (GB network) |82599ES 10-Gigabit SFI/SFP+ Network Connection | ||
| |*Storage* |LSI raid (no ssd) |AVAGO (no ssd) |
There was a problem hiding this comment.
I think the DPM machine (.188) has 6X1600GB SSD. Could you confirm?
| |*Model Name* |Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz |Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz | ||
| |*cpu MHz* |2231.772 |2599.079 | ||
| |*Memory* |192GB |386GB | ||
| |*Network* |NetXtreme BCM5719 Gigabit Ethernet PCIe (GB network) |82599ES 10-Gigabit SFI/SFP+ Network Connection |
There was a problem hiding this comment.
Check the network bandwidth. As the results shows DPM client is network bounded, so we would need to revisit this configuration.
| [arabic, start=2] | ||
| . *Test step:* | ||
|
|
||
| F send goal state message to A-E at the same time concurrently after first warming up then wait for the response, goal state message is different in each payload |
There was a problem hiding this comment.
Can you upload the test scripts or codes that generate the payload to https://github.com/futurewei-cloud/alcor-int/tree/master/tools? This can be done in a sperate PR.
|
|
||
| F send goal state message to A-E at the same time concurrently after first warming up then wait for the response, goal state message is different in each payload | ||
|
|
||
| On A-E there are 2600 ACA running on each box, ACA code has been revised to cut off the ovsdb and mq operations |
There was a problem hiding this comment.
2,600 or 2,000? I thought 2,000 is the stable setup. Need to update the image accordingly.
| image::128-2.png["128 thread 2nd time",width=262,height=156] | ||
| ____ | ||
|
|
||
| for 256 threads and below, the success rate is 100% |
There was a problem hiding this comment.
Can we add one more data point of 256 threads? People will be interested in seeing the limit.
There was a problem hiding this comment.
also can we put some resource utilization diagram including CPU, RAM, Disk IO and Network IO in this extreme case? This would help.
| ____ | ||
|
|
||
| ____ | ||
| * 10k neighbor, every connection time cost for different concurrent thread number* |
There was a problem hiding this comment.
Please explain the x-axis, what do those numbers represent? for example, first number is number of threads and the second number is number of successful run out of a total of 10K runs.
| ____ | ||
|
|
||
| ____ | ||
| * 10k neighbor, every connection time cost for different concurrent thread number* |
There was a problem hiding this comment.
Also, as discussed, we need to verify the extreme large value (5,594,098) and rerun the test.
|
|
||
|
|
||
| ____ | ||
| * when neighbor number changed, every connection time cost and overall time cost for different concurrent thread number* |
There was a problem hiding this comment.
This image is important. Let us work to collect more data based on two dimensions (concurrent thread # and neighbor numbers), fix one and adjust the other.
| ____ | ||
|
|
||
| ____ | ||
| * when neighbor number changed, overall time cost for different concurrent thread number* |
There was a problem hiding this comment.
Same comment as to image other-ov-jc.png.
"Let us work to collect more data based on two dimensions (concurrent thread # and neighbor numbers), fix one and adjust the other."
we can take out the data point for "1t-1w" and explain in the texts.
|
|
||
| different payload sizes vary from 1 neighbor to 10000 neighbor(2MB) each | ||
|
|
||
| *1WR+other OV-MAX+average* |
There was a problem hiding this comment.
Could you elaborate what this means?
| @@ -65,6 +65,28 @@ image::p1.png["Test Deployment",width=488,height=302] | |||
| |*90% TILE* |12 |11 |32 |28 |78 |84 |292 |262 | |||
| |=== | |||
There was a problem hiding this comment.
The column and row of this table is opposite of the next one. Could we make them consistent?
No description provided.