19-08-2017

AMD Threadripper 1950X @4.0Ghz 16cores/32threads in compare with Intel Core i9 7960X @4.0Ghz ,@5.2Ghz 16cores/32threads !!

https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit#gid=0

18-12-2016

Made a bench run with new Andscacs 0.89b & Andscacs 0.89zb ,to test if Numa works!
Also with last new asmFish 2016-12-17
And last Stockfish dev. 171216 BMI2

Same as before ,just scroll down for new data to date 18-12-2016 : https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit#gid=0


21-11-2016

Good news.. finally Stockfish can use numa..thank to Marco for fixing this!! Stockfish 211116 64 bmi2 Numa

Also tested last Cfish 211116 x64 bmi2 numa & BrainFish 161119 x64 BMI2 numa

Data in spreadsheet ..scroll down to date 21-11-2016 : https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit#gid=0


02-11-2016

New benches done on a Xeon E5-2699 v4 2x22cores =44cores/88threads
OS: Windows Server 2012 R2
HT OFF

Engines used :

Stockfish 8 ,Cfish 8 ,BrainFish 161030 ,Stockfish 011116 Numa ,pedantFish 2016-10-17

Same link for spreadsheet ,scroll down to date 02-11-2016


31-10-2016

Same tests done today with HT ON!

With same engines + Stockfish prepare builds for TCEC9 Final ..i get two compiles from Kiran ,one normal ,one with numa
Marco had put the sources on Fishcooking so i have also compile these two..also to compare speed

Same thing..all data in spreadsheet updated ,scroll down to the date 31-10-2016
https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit#gid=0

Stockfish have still work to do! Numa doesn't work..
asmFish ,Cfish & BrainFish are numa-aware and can use 88threads @100%

Ipman.

30-10-2016

New benches done on a Xeon E5-2699 v4 2x22cores =44cores/88threads
OS: Windows Server 2012 R2
HT OFF

Engines used :

asmFish 2016-10-17 bmi2 Numa

Cfish 301016 bmi2 Numa

BrainFish_161025_x64_bmi2 Numa

Stockfish 16103013 64 BMI2 ,last Dev. version from Abrok.eu

Andscacs 0.88bx bmi2 version with 128cores

All data in spreadsheet: https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit#gid=0

Scroll down to the date: 30-10-2016



02-09-2016

Here benches from 9 engines!
6 off them where Stockfish versions without CounterMovesHistory(CMH) ,3 from Marco who are/where in testing in Testframe.
and 3 versions from mstembera who also want i check them out..
2 last versions from asmFish & pedantfish + i made a compile from Cfish with last source.

Each engine get 10 runs ..so that's 90 times copy & paste these 3 command lines :)
All data under i have put again in spreadsheet to have a nice compare..just scroll down to the date 01-09-2016 for this new data:
https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit#gid=0
some info is in the spreadsheet..

With the first version from Marco, i compiled Stockfish 300816 64 BMI2_CMH and let him play for my list,and he's doing very well..
http://www.ipmanchess.yolasite.com/i7-5960x.php

Ipman.

Stockfish 300816 64 BMI2_CMH

setoption name threads value 72
setoption name hash value 1024
go depth 26


8cores
info depth 26 seldepth 36 multipv 1 score cp 25 nodes 214055533 nps 9495011

16cores
info depth 26 seldepth 36 multipv 1 score cp 22 nodes 358385571 nps 17831015

18cores=1cpu
info depth 26 seldepth 32 multipv 1 score cp 19 nodes 360810155 nps 20196482

22cores
info depth 26 seldepth 35 multipv 1 score cp 27 nodes 369745039 nps 24947374

32cores
info depth 26 seldepth 36 multipv 1 score cp 9 nodes 1560728534 nps 34864928

36cores =2cpu's
info depth 26 seldepth 34 multipv 1 score cp 12 nodes 710489804 nps 39462886

44cores
info depth 26 seldepth 41 multipv 1 score cp 10 nodes 1388361035 nps 39583766

54cores=3cpu's
info depth 26 seldepth 37 multipv 1 score cp 14 nodes 1048836479 nps 39761789

64cores
info depth 26 seldepth 40 multipv 1 score cp 12 nodes 1503213792 nps 39707683

72cores=4cpu's
info depth 26 seldepth 35 multipv 1 score cp 24 nodes 865096200 nps 39825807

------------------------------------------------------------------
Stockfish 300816 64 BMI2CMH2

setoption name threads value 72
setoption name hash value 1024
go depth 26


8cores
info depth 26 seldepth 38 multipv 1 score cp 21 nodes 401806145 nps 9518540

16cores
info depth 26 seldepth 37 multipv 1 score cp 25 nodes 473307550 nps 18439595

18cores=1cpu
info depth 26 seldepth 38 multipv 1 score cp 13 nodes 583243311 nps 20010406

22cores
info depth 26 seldepth 37 multipv 1 score cp 19 nodes 477733138 nps 24681397

32cores
info depth 26 seldepth 33 multipv 1 score cp 13 nodes 603749626 nps 34838408

36cores =2cpu's
info depth 26 seldepth 38 multipv 1 score cp 10 nodes 1331144638 nps 39798625

44cores
info depth 26 seldepth 32 multipv 1 score cp 14 nodes 1177764580 nps 39794721

54cores=3cpu's
info depth 26 seldepth 38 multipv 1 score cp 15 nodes 1126361002 nps 39984416

64cores -> Takes very long time
info depth 26 seldepth 38 multipv 1 score cp 6 nodes 5523698121 nps 40199832

72cores=4cpu's
info depth 26 seldepth 34 multipv 1 score cp 12 nodes 1585321510 nps 40403739

------------------------------------------------------------------
Stockfish 300816 64 BMI2mst

setoption name threads value 72
setoption name hash value 1024
go depth 26


8cores
info depth 26 seldepth 35 multipv 1 score cp 13 nodes 358909485 nps 9511567

16cores
info depth 26 seldepth 37 multipv 1 score cp 12 nodes 761497853 nps 17981059

18cores=1cpu
info depth 26 seldepth 35 multipv 1 score cp 16 nodes 567991898 nps 19257879

22cores
info depth 26 seldepth 36 multipv 1 score cp 22 nodes 511505916 nps 24389944

32cores
info depth 26 seldepth 34 multipv 1 score cp 9 nodes 1133493016 nps 33885175

36cores =2cpu's
info depth 26 seldepth 39 multipv 1 score cp 12 nodes 916328149 nps 37933770

44cores
info depth 26 seldepth 36 multipv 1 score cp 10 nodes 1004299723 nps 38079158

54cores=3cpu's
info depth 26 seldepth 36 multipv 1 score cp 16 nodes 1078875527 nps 38253927

64cores
info depth 26 seldepth 34 multipv 1 score cp 9 nodes 1595882215 nps 37774148

72cores=4cpu's
info depth 26 seldepth 40 multipv 1 score cp 13 nodes 1628443910 nps 37853182

------------------------------------------------------------------
Stockfish 300816 64 BMI2mst2

setoption name threads value 72
setoption name hash value 1024
go depth 26

8cores

info depth 26 seldepth 35 multipv 1 score cp 19 nodes 293358077 nps 9181211

16cores
info depth 26 seldepth 35 multipv 1 score cp 8 nodes 925633066 nps 17225887

18cores=1cpu
info depth 26 seldepth 34 multipv 1 score cp 19 nodes 365213352 nps 20088743

22cores
info depth 26 seldepth 37 multipv 1 score cp 17 nodes 729413787 nps 23431217

32cores
info depth 26 seldepth 37 multipv 1 score cp 11 nodes 911256954 nps 34033873

36cores =2cpu's
info depth 26 seldepth 39 multipv 1 score cp 15 nodes 889570054 nps 37915354

44cores
info depth 26 seldepth 36 multipv 1 score cp 22 nodes 710521220 nps 37991723

54cores=3cpu's
info depth 26 seldepth 39 multipv 1 score cp 20 nodes 962220043 nps 38388990

64cores
info depth 26 seldepth 34 multipv 1 score cp 12 nodes 1096920071 nps 38261539

72cores=4cpu's -> very very slow
info depth 26 seldepth 37 multipv 1 score cp 15 nodes 9634407155 nps 38629412
info depth 17 seldepth 25 multipv 1 score cp 12 nodes 6555393053 nps 38540051

------------------------------------------------------------------
Stockfish 020916 64 BMI2mst3

setoption name threads value 72
setoption name hash value 1024
go depth 26

8cores

info depth 26 seldepth 35 multipv 1 score cp 22 nodes 221049847 nps 9395980

16cores
info depth 26 seldepth 37 multipv 1 score cp 26 nodes 401911504 nps 18269535

18cores=1cpu
info depth 26 seldepth 36 multipv 1 score cp 11 nodes 486618272 nps 20863414

22cores
info depth 26 seldepth 37 multipv 1 score cp 9 nodes 747748105 nps 24085167

32cores
info depth 26 seldepth 39 multipv 1 score cp 15 nodes 696410851 nps 35596547

36cores =2cpu's
info depth 26 seldepth 36 multipv 1 score cp 25 nodes 527866108 nps 39354813

44cores
info depth 26 seldepth 32 multipv 1 score cp 21 nodes 406935399 nps 39298445

54cores=3cpu's
info depth 26 seldepth 38 multipv 1 score cp 8 nodes 1715176705 nps 39772213

64cores
info depth 26 seldepth 38 multipv 1 score cp 16 nodes 833563471 nps 39881511

72cores=4cpu's
info depth 26 seldepth 36 multipv 1 score cp 29 nodes 906649524 nps 40032211

------------------------------------------------------------------
Stockfish 020916 64 BMI2cmh3

setoption name threads value 72
setoption name hash value 1024
go depth 26

8cores

info depth 26 seldepth 37 multipv 1 score cp 29 nodes 301100912 nps 9403526

16cores
info depth 26 seldepth 37 multipv 1 score cp 28 nodes 323755501 nps 18288171

18cores=1cpu
info depth 26 seldepth 32 multipv 1 score cp 29 nodes 339921221 nps 21013923

22cores
info depth 26 seldepth 34 multipv 1 score cp 22 nodes 452383861 nps 24043787

32cores
info depth 26 seldepth 36 multipv 1 score cp 12 nodes 771180084 nps 35590736

36cores =2cpu's
info depth 26 seldepth 34 multipv 1 score cp 27 nodes 555000419 nps 39434447

44cores
info depth 26 seldepth 37 multipv 1 score cp 12 nodes 956124931 nps 39665004

54cores=3cpu's
info depth 26 seldepth 34 multipv 1 score cp 22 nodes 729261250 nps 39590730

64cores
info depth 26 seldepth 35 multipv 1 score cp 13 nodes 1207781603 nps 39917427

72cores=4cpu's
info depth 26 seldepth 34 multipv 1 score cp 10 nodes 1292179539 nps 39814498

------------------------------------------------------------------
asmFishW_2016-08-30_bmi2

setoption name threads value 72
setoption name hash value 1024
go depth 26

8cores

info depth 26 multipv 1 time 31753 nps 10972497 score cp 29 nodes 348409702

16cores
info depth 26 multipv 1 time 36902 nps 21711162 score cp 15 nodes 801185320

18cores=1cpu
info depth 26 multipv 1 time 26279 nps 24423214 score cp 18 nodes 641817658

22cores
info depth 26 multipv 1 time 22431 nps 28761248 score cp 21 nodes 645143573

32cores
info depth 26 multipv 1 time 25245 nps 40873769 score cp 8 nodes 1031858299

36cores =2cpu's
info depth 26 multipv 1 time 15610 nps 45551445 score cp 13 nodes 711058069

44cores
info depth 26 multipv 1 time 36508 nps 55228839 score cp 17 nodes 2016294484

54cores=3cpu's
info depth 26 multipv 1 time 16629 nps 65340175 score cp 15 nodes 1086541784

64cores
info depth 26 multipv 1 time 17468 nps 77270204 score cp 13 nodes 1349755940

72cores=4cpu's
info depth 26 multipv 1 time 16162 nps 86486534 score cp 15 nodes 1397795377

------------------------------------------------------------------
pedantFishW_2016-08-30_bmi2

setoption name threads value 72
setoption name hash value 1024
go depth 26

8cores

info depth 26 multipv 1 time 21027 nps 10878231 score cp 24 nodes 228736582

16cores
info depth 26 multipv 1 time 20295 nps 21646435 score cp 22 nodes 439314417

18cores=1cpu
info depth 26 multipv 1 time 28576 nps 24313067 score cp 11 nodes 694770222

22cores
info depth 26 multipv 1 time 11635 nps 27857828 score cp 24 nodes 324125832

32cores
info depth 26 multipv 1 time 17154 nps 40372184 score cp 12 nodes 692544449

36cores =2cpu's
info depth 26 multipv 1 time 23145 nps 45632643 score cp 9 nodes 1056167535

44cores
info depth 26 multipv 1 time 19434 nps 54445430 score cp 17 nodes 1058092506

54cores=3cpu's
info depth 26 multipv 1 time 27473 nps 66605592 score cp 12 nodes 1829855442

64cores
info depth 26 multipv 1 time 15605 nps 76666220 score cp 15 nodes 1196376371

72cores=4cpu's
info depth 26 multipv 1 time 15577 nps 86257272 score cp 13 nodes 1343629537

------------------------------------------------------------------
Cfish 010916 64 BMI2

setoption name threads value 72
setoption name Hash value 1024
go depth 26


8cores
info depth 26 seldepth 34 multipv 1 score cp 20 nodes 278090066 nps 9884835

16cores
info depth 26 seldepth 37 multipv 1 score cp 24 nodes 628402635 nps 18807692

18cores=1cpu
info depth 26 seldepth 36 multipv 1 score cp 15 nodes 568309036 nps 20325061

22cores
info depth 26 seldepth 37 multipv 1 score cp 13 nodes 769875461 nps 25172490

32cores
info depth 26 seldepth 35 multipv 1 score cp 15 nodes 514903226 nps 34453210

36cores =2cpu's
info depth 26 seldepth 36 multipv 1 score cp 11 nodes 1009104946 nps 39106531

44cores
info depth 26 seldepth 35 multipv 1 score cp 9 nodes 1048757566 nps 39135665

54cores=3cpu's
info depth 26 seldepth 36 multipv 1 score cp 18 nodes 1178637942 nps 39269605

64cores
info depth 26 seldepth 40 multipv 1 score cp 16 nodes 2335439489 nps 39310545

72cores=4cpu's
info depth 26 seldepth 37 multipv 1 score cp 16 nodes 1415523931 nps 39449415

------------------------------------------------------------------
28-08-2016

Last evening again a new run with some benches from last asmFish 2016-08-25 ,last Dev.Stockfish 270816 ,Hannibal 1.7 & Andscacs 0.872b

Just scroll down to the date 28-08-2016 for this last data that i put in this spreadsheet:
https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit#gid=0

Scaling from asmFish is just great ,getting from 22 to 44cores 103,79%!! and Stockfish is hanging around only 60%..

Some info: later i will have the chance to run on this system ;)
https://www.supermicro.com/products/system/7U/7088/SYS-7088B-TR4FT.cfm


07-08-2016

System: Xeon E7-8870 v3 4x18cores=72cores/144threads -> HT Off
OS: Windows Server 2012 R2

Did run some benches last evening with last version from asmFish 2016-07-26 ,Stockfish 020816 ,Texel 1.06 & Hannibal 1.6.58

I have put data in spreadsheet + some explenations: (scroll down to date 07-08-2016)
https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit#gid=0

Conclusion : The engine with working Numa-aware code will win TCEC9 final when using a 44cores system!
when you see Stockfish & Komodo so close in strenght in these LTC games..look out for Houdini 5 dev. ,it has Numa-aware..Robert has only
to update his Numa code so that it works well with more cores..will be dangerous when get in Final! Is it possible?! ;)
Or will Stockfish & Komodo wake up and include Numa-aware ..

04-07-2016

System: Xeon E7-8870 v3 4x18cores=72cores/144threads -> HT Off
OS: Windows Server 2012 R2

asmFish 2016.07.02 working great on 72cores!!

I just copy data here.. i have put data in spreadsheet..

1- asmFish 2016.07.02 bmi2 Numa-aware 256cores

setoption name threads value 72
go depth 26


8cores
info depth 26 multipv 1 time 22556 nps 12331147 score cp 16 nodes 278141354

16cores
info depth 26 multipv 1 time 21316 nps 24409902 score cp 21 nodes 520321486

18cores=1cpu
info depth 26 multipv 1 time 27139 nps 27220856

22cores
info depth 26 multipv 1 time 28209 nps 31327220 score cp 20 nodes 883709550

32cores
info depth 26 multipv 1 time 44069 nps 42207807 score cp 16 nodes 1860055863

36cores =2cpu's
info depth 26 multipv 1 time 21664 nps 47434619 score cp 18 nodes 1027623602

44cores
info depth 26 multipv 1 time 22554 nps 58173721 score cp 14 nodes 1312050121

54cores=3cpu's
info depth 26 multipv 1 time 21809 nps 66881689 score cp 10 nodes 1458622771

64cores
info depth 26 multipv 1 time 25701 nps 80426269 score cp 17 nodes 2067035554

72cores=4cpu's
info depth 26 multipv 1 time 32731 nps 89286816 score cp 12 nodes 2922446805

---------------------------------------------------------------------------------------

setoption name threads value 72
go depth 27


54cores=3cpu's
info depth 27 multipv 1 time 28940 nps 69954328 score cp 15 nodes 2024478253

64cores
info depth 27 multipv 1 time 37896 nps 79536394 score cp 11 nodes 3014111220

72cores=4cpu's
info depth 27 multipv 1 time 37211 nps 89921467 score cp 10 nodes 3346067732

------------------------------------------------------------------------------------
2- asmFish 2016.07.02 bmi2 Numa-aware 256cores

2 positions data i get:
setoption name threads value 72
setoption name hash value 1024
position fen 2b5/1r6/2kBp1p1/p2pP1P1/2pP4/1pP3K1/1R3P2/8 b - -


64cores
go depth 34
info depth 34 multipv 1 time 13402 nps 118459898 score cp 287 nodes 1587599559

72cores
go depth 34
info depth 34 multipv 1 time 23059 nps 138338434 score cp 139 nodes 3189945968
---------------------------------------------------------------------------------
setoption name threads value 72
setoption name hash value 1024
position fen 8/k1b5/P4p2/1Pp2p1p/K1P2P1P/8/3B4/8 w - -

72cores

go depth 60
info depth 60 multipv 1 time 4134 nps 141039045 score mate 24 nodes 583055416

go depth 80
info depth 80 multipv 1 time 46997 nps 224021783 score mate 24 nodes 10528351740

go depth 100
info depth 100 multipv 1 time 213240 nps 216222910 score mate 24 nodes 46107373489

And yes..with a bmi2 compile and 72cores i pass the 200Million nodes/s. with this position!
At the moment it's playing on my 3systems..and start as first engine on my 3computers..for sure with 1core,it was already first,now even bigger jump!!

Ipman.

03-07-2016

Yesterday evening i run some tests with asmFish NumaTest_base
It was after testing that i find out it was only a 64cores version ,so it was normal that 72cores gives nothing more!
Also this was a base version..not a pocnt or even a bmi2 compile who would be even faster..but the gain is very clear higher thanks to Numa-aware!
So,have already two engines where Numa-aware works how it should be..Texel & asmFish ..who is next ;)
To see all these cores awake and running @100% it takes 1sec.!! very fast..

First bench was just selecting cores and use go depth 25
2- Numatest_Base.exe

setoption name threads value 72
go depth 25

8cores

info depth 25 multipv 1 time 20115 nps 11155655 score cp 19 nodes 224396020

16cores
info depth 25 multipv 1 time 20874 nps 22091255 score cp 24 nodes 461132876

18cores=1cpu
info depth 25 multipv 1 time 27752 nps 24705698 score cp 21 nodes 685632535

22cores
info depth 25 multipv 1 time 29275 nps 27782844 score cp 15 nodes 813342773

32cores
info depth 25 multipv 1 time 8600 nps 40916985 score cp 21 nodes 351886074

36cores =2cpu's
info depth 25 multipv 1 time 14514 nps 45452953 score cp 14 nodes 659704165

44cores
info depth 25 multipv 1 time 7000 nps 51543263 score cp 21 nodes 360802847

54cores=3cpu's
info depth 25 multipv 1 time 16290 nps 61816104 score cp 14 nodes 1006984341

64cores
info depth 25 multipv 1 time 15957 nps 74512417 score cp 17 nodes 1188994639

72cores=4cpu's -> 8cores Not used? same result as 64cores
info depth 25 multipv 1 time 17371 nps 74202550 score cp 17 nodes 1288972511

Next test was with 2 positions using Fen code

setoption name threads value 64
setoption name hash value 1024
position fen 2b5/1r6/2kBp1p1/p2pP1P1/2pP4/1pP3K1/1R3P2/8 b - -


64cores
go depth 34
info depth 34 multipv 1 time 28189 nps 117960186

72cores - 8cores not used
go depth 34
info depth 34 multipv 1 time 12625 nps 112852554
---------------------------------------------------------------------------------
setoption name threads value 64
setoption name hash value 1024
position fen 8/k1b5/P4p2/1Pp2p1p/K1P2P1P/8/3B4/8 w - -

64cores

go depth 34
was for Texel Numa,it was too fast for asmFish that i have to start with go depth 60!

go depth 60
info depth 60 multipv 1 time 5114 nps 125342789

go depth 80
info depth 80 multipv 1 time 25335 nps 170001086

go depth 100
info depth 100 multipv 1 time 211734 nps 194384966

And this was done with only base version 64cores..so i think with bmi2 version and using 72cores it will be easy above 200Million nodes/sec.
with this position!
Now this morning i see that Mohammed Li has updated asmFish 2016-07-02 with 256cores now!
https://github.com/tthsqe12/asm/find/master

Spreadsheet is also updated with these data:
https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit#gid=0



01-07-2016


Sometimes programmers listen to testers and it's nice to see that i get response on that and bite in this Numa aware thing!
Thanks to Peter ,Mikael & Moh who trying things out with tools or changing code in there engine that we have now a working engine with Numa aware!

Down under this report you see a screen from system again i could run on William system a Xeon E7-8870 v3 4x18cores=72cores/144threads
OS: Windows server 2012 R2 with HT Off

Texel 1.06a48 512cores Numatest 3 from Peter Osterlund

He was so kind to make a new release Texel 1.06a49 with source and can put the link here:

https://dl.dropboxusercontent.com/u/89684995/texel106a49.7z


setoption name hash value 1024
setoption name threads value 72
go depth 22


8cores
info nodes 523751942 nps 8781890 time 59640

16cores
info nodes 465912687 nps 16656990 time 27971

18cores=1cpu
info nodes 700218378 nps 18361567 time 38135

22cores
info nodes 474150640 nps 20808858 time 22786

32cores
info nodes 915012019 nps 31049985 time 29469

36cores =2cpu's
info nodes 789579580 nps 35225499 time 22415

44cores
info nodes 1054602231 nps 42282183 time 24942

54cores=3cpu's
info nodes 960015462 nps 48781273 time 19680

64cores
info nodes 1113647063 nps 56077700 time 19859

72cores
info nodes 987339559 nps 62092922 time 15901

I did also used go depth 24 for last two ,because it was to quickly finished with go depth 22

64cores - go depth 24
info nodes 4634818942 nps 63032176 time 73531

72cores - go depth 24
info nodes 6291455339 nps 69499644 time 90525


26-06-2016

Here some benches on a Xeon E7-8870 v3 4x18cores=72cores/144threads!
And like other engines Andscacs scaling well till 2cpu's ,above you get nothing more!!

Spreadsheet updated: https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit#gid=0

Andscacs 0.86189

setoption name threads value x (x=8,16,18,22,36,44,54,64,72)
go depth 25

8cores
info nodes 255460822 nps 6482131 time 39410

16cores
info nodes 350094373 nps 13520811 time 25893

18cores = 1cpu
info nodes 318964178 nps 14368402 time 22199

22cores
info nodes 405473123 nps 17569682 time 23078

36cores = 2cpu's
info nodes 291845317 nps 24808340 time 11764

44cores
info nodes 441290180 nps 25184920 time 17522

54cores = 3cpu's
info nodes 229471568 nps 24573952 time 9338

64cores
info nodes 305918688 nps 25332783 time 12076

72cores = 4cpu's
info nodes 501804459 nps 24806192 time 20229


go depth 26

18cores
info nodes 259833042 nps 14963031 time 17365

36cores
info nodes 712096249 nps 25144641 time 28320

go depth 27

18cores
info nodes 789678361 nps 14579126 time 54165

36cores
info nodes 632270469 nps 25137979 time 25152

--------------------------------------------------------------------------------------

Texel had a first try with Numa..but also here no gain after 2cpu's..even a slowdown..also SMP can be better!

Texel 1.06a48 512cores Numa

setoption name threads value 8
go depth 22

8cores
info nodes 535900101 nps 7208965 time 74338

16cores
info nodes 666496072 nps 13589203 time 49046

18cores
info nodes 467481214 nps 15624894 time 29919

22cores
info nodes 719367492 nps 17628099 time 40808

32cores
info nodes 906991000 nps 21920702 time 41376

36cores
info nodes 924095974 nps 22428969 time 41201

44cores
info nodes 958960416 nps 15240947 time 62920

54cores
info nodes 1357458581 nps 12538295 time 108265

64cores
info nodes 1145138765 nps 14235759 time 80441

72cores
info nodes 1056645969 nps 14224983 time 74281

------------------------------------------------------------


18-06-2016

I read that Crafty 25.0.1 is Numa aware ..time to test it out..

System : Xeon E7-8870 v3 4x18cores=72cores/144threads HT Off

OS: Windows server 2012 R2

When i open it in console i get this: (in blue color)

EPD Kit revision date: 1996.04.21
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Initializing multiple threads.
System is NUMA. 4 nodes reported by Windows  -> It shows that system is Numa and has 4cpu's! ..but only use max. two off them?
Node 0 CPUs:
Node 1 CPUs: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Node 2 CPUs:
Node 3 CPUs: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Current ideal CPU is 20
Exchanging nodes 0 and 3

Crafty v25.0.1 JA (1 cpu)

White(1):
bench  -> first bench uses only 1core
Total nodes: 185182294
Raw nodes per second: 4314592
Total elapsed time: 42.92
time used =  42.94
White(1):
---------------------------------------------------
White(1): mt=8
max threads set to 8.
White(1): bench
Running benchmark. . .
......
Total nodes: 355860844
Raw nodes per second: 27671916
Total elapsed time: 12.86
time used =  12.94
------------------------------------------------
White(1): mt=16
max threads set to 16.
White(1): bench
Running benchmark. . .
......
Total nodes: 503496086
Raw nodes per second: 45116140
Total elapsed time: 11.16
time used =  11.23
----------------------------------------------
White(1): mt=18
max threads set to 18.
White(1): bench
Running benchmark. . .
......
Total nodes: 296057861
Raw nodes per second: 38004860
Total elapsed time: 7.79
time used =   7.83
White(1):
-----------------------------------------------------------
White(1): mt=36
max threads set to 36.
White(1): bench
Running benchmark. . .
......
Total nodes: 360114781
Raw nodes per second: 40191380
Total elapsed time: 8.96
time used =   9.03
White(1):
------------------------------------------------------------
White(1): mt=54
max threads set to 54.
White(1): bench
Running benchmark. . .
......
Total nodes: 441466484
Raw nodes per second: 1794798
Total elapsed time: 245.97
time used =   4:06
White(1):
--------------------------------------------------
White(1): mt=72
ERROR - Crafty was compiled with CPUS=64.  mt can not exceed this value.
max threads set to 64.
White(1): mt=64
max threads set to 64.
White(1): bench
Running benchmark. . .
......
Total nodes: 474695747
Raw nodes per second: 4151616
Total elapsed time: 114.34
time used =   1:54
White(1):

So Crafty 25.0.1 Numa also doesn't work like it should work on a 4cpu system?

William show me Cinebench..when you start it..it shows directly it has 72cores and bench use them all 4 x 18cores..4cpu's 100%!!

--------------------------------------------------------------------
A request from Peter to use default depth

Stockfish 130616 without CMH

bench 1024 18 24 default depth
===========================
Total time (ms) : 290343
Nodes searched  : 7010082938
Nodes/second    : 24144143

bench 1024 36 24 default depth
===========================
Total time (ms) : 179340
Nodes searched  : 8645517290
Nodes/second    : 48207412

bench 1024 54 24 default depth
===========================
Total time (ms) : 356931
Nodes searched  : 19942128779
Nodes/second    : 55871103

bench 1024 22 24 default depth
===========================
Total time (ms) : 233990
Nodes searched  : 7124988406
Nodes/second    : 30449969

bench 1024 44 24 default depth

===========================
Total time (ms) : 480633
Nodes searched  : 23304937874
Nodes/second    : 48488010

Between 18 and 36cores i get almost a perfect scaling in nodes/sec. 99,67%!!

Spreadsheet is updated..


16-06-2016

Today i get a other system for testing:

system: Xeon E7-8870 v3 4x18cores=72cores/144threads

This gives me a chance to compare these results and my findings from other system..

Used same Stockfish 130616 without CounterMovesHistory(CMH)

Data is also updated in spreadsheet : https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit?usp=sharing

Intel Xeon E7-8870 v3 @2.1Ghz
4x18cores=72cores/144threads system

Stockfish 130616 without CMH

bench 1024 8 2000 default time
===========================
Total time (ms) : 74123
Nodes searched  : 1240250555
Nodes/second    : 16732330

bench 1024 16 2000 default time
===========================
Total time (ms) : 74148
Nodes searched  : 2286158365
Nodes/second    : 30832367

bench 1024 18 2000 default time  -> 1cpu
===========================
Total time (ms) : 74132
Nodes searched  : 2851865262
Nodes/second    : 38470097

bench 1024 22 2000 default
===========================
Total time (ms) : 74125
Nodes searched  : 3135009861
Nodes/second    : 42293556

bench 1024 24 2000 default
===========================
Total time (ms) : 74158
Nodes searched  : 3383158382
Nodes/second    : 45620949

bench 1024 32 2000 default
===========================
Total time (ms) : 74128
Nodes searched  : 4661922731
Nodes/second    : 62890172

bench 1024 36 2000 default time  -> 2cpu's
===========================
Total time (ms) : 74117
Nodes searched  : 5397609260
Nodes/second    : 72825522

bench 1024 44 2000 default time
===========================
Total time (ms) : 74103
Nodes searched  : 5393135858
Nodes/second    : 72778913

bench 1024 54 2000 default time  -> 3cpu's
===========================
Total time (ms) : 74136
Nodes searched  : 5442621036
Nodes/second    : 73414009

bench 1024 72 2000 default time  -> 4cpu's
===========================
Total time (ms) : 74127
Nodes searched  : 5499511667
Nodes/second    : 74190398


Again same happen ,till 2cpu's everything goes well ..more cores using cpu3 & cpu4 doesn't give more nodes/sec.anymore and system slowdown!?

Needs to be NUMA aware if we want chessengines using these 4sockets systems optimal!

A little check from 18cores till 36cores (1cpu till 2cpu's) gives me 89,30% speed gain in nodes/sec. ,so same as other system who gives me 90%!!


Ipman.

-------------------------------------------

15-06-2016

Made a little spreadsheet with the data i get:

https://docs.google.com/spreadsheets/d/156Iztrz4erBxTntb6A9-4hCNAtLyazdBBh8zGy0RApE/edit?usp=sharing

And here Fishcooking link:

https://groups.google.com/forum/#!topic/fishcooking/YZ16ksLBHUc


Next test done on 14-06-2016

sytem:Xeon E7-8890 v4 4x24cores=96cores/192threads

I have contacted a few people after first test with the question if they have more idea's that i can test out
on this extreme system..i don't think we have many times the possibility to run some benches
on a 96cores/192threads system.

I get a e-mail from Mikael with later on a source from Stockfish without CounterMoveHistory!!
And i made a compile from it and ready to run some benches..

A other person asked me can you also run with 44cores as the final from TCEC9 will run a system
with 44cores..so i did..with the thinking it will be 2x22cores ,so tested also with 22cores
with the knowing that this system each cpu has 24cores..so 2cores different can show some
difference in nodes/s ,because the results before that where not going well after 24cores..but!

WoW what a difference..that is what i want to see,with a great scaling till 48cores now and
with much higher nodes/sec.

Just check this:

Stockfish 130616 bmi2 256cores without CounterMoveHistory!!

Bench 1024 8 2000 default time
===========================
Total time (ms) : 74179
Nodes searched  : 1111563212
Nodes/second    : 14984877

Bench 1024 16 2000 default time
===========================
Total time (ms) : 74171
Nodes searched  : 2473072243
Nodes/second    : 33342846

Bench 1024 20 2000 default time
===========================
Total time (ms) : 74175
Nodes searched  : 3163890488
Nodes/second    : 42654404

Bench 1024 22 2000 default time -> Test run for 1cpu compare TCEC9 final sytem
===========================
Total time (ms) : 74171
Nodes searched  : 3439099279
Nodes/second    : 46367168

Bench 1024 24 2000 default time
===========================
Total time (ms) : 74135
Nodes searched  : 3714919446
Nodes/second    : 50110196

Bench 1024 32 2000 default time
===========================
Total time (ms) : 74127
Nodes searched  : 4856036008
Nodes/second    : 65509679

Bench 1024 44 2000 default time -> Test run with 2cpu's compare TCEC9 final system
===========================
Total time (ms) : 74131
Nodes searched  : 6532558230
Nodes/second    : 88121814

Bench 1024 48 2000 default time
===========================
Total time (ms) : 74128
Nodes searched  : 7113106606
Nodes/second    : 95957082

Bench 1024 64 2000 default time
===========================
Total time (ms) : 74140
Nodes searched  : 7094556559
Nodes/second    : 95691348

Bench 1024 80 2000 default time
===========================
Total time (ms) : 74148
Nodes searched  : 7282006695
Nodes/second    : 98209077

Bench 1024 96 2000 default time
===========================
Total time (ms) : 74136
Nodes searched  : 7235127846
Nodes/second    : 97592638


Compare the results from first test..here all numbers are much higher and scaling nice till 48cores!
Just enough for TCEC9 final system ,see from 22cores to 44cores..almost dubble nodes/sec.!!
Test before give highest nodes/sec. with 64cores -> 72Million nodes/s ..now with 48cores -> 96Million nodes/s!!

And this thanks to Mikael with one single change in source ..who will do better ;)

Then we can go for next problem..how to handle cpu3 & cpu4 a extra 48cores who do nothing now?!
Imagine he continue scalling like these first 48cores!

Do i have only to count on Mikael ? ;)
For sure bring this compile to TCEC9 ..maybe i will try to contact Anton ,that he just try and see
the difference in speed on there final system..

-----------------------------------------------------------------------

Again i had some free time left on the system..so did some more tests:
Used here different time use..but they are same as above..it shows every time the same big
difference in nodes/sec. (proves for me it works very well)

Bench 1024 22 3000 default time
===========================
Total time (ms) : 111134
Nodes searched  : 5084586687
Nodes/second    : 45751855

Bench 1024 44 3000 default time
===========================
Total time (ms) : 111139
Nodes searched  : 9988630925
Nodes/second    : 89875119


Bench 1024 22 4000 default time
===========================
Total time (ms) : 148161
Nodes searched  : 6939900820
Nodes/second    : 46840267

Bench 1024 44 4000 default time
===========================
Total time (ms) : 148149
Nodes searched  : 13151531526
Nodes/second    : 88772327

Bench 1024 48 4000 default time
===========================
Total time (ms) : 148227
Nodes searched  : 14372201115
Nodes/second    : 96960750

------------------------------------------------------------

Then with different Hash ..results where a little lower

Bench 8192 48 2000 default time
===========================
Total time (ms) : 74131
Nodes searched  : 6457869967
Nodes/second    : 87114297

Bench 8192 48 3000 default time
===========================
Total time (ms) : 111131
Nodes searched  : 10000089574
Nodes/second    : 89984698


Bench 8192 48 4000 default time
===========================
Total time (ms) : 148176
Nodes searched  : 12904412990
Nodes/second    : 87088415

Bench 2048 48 4000 default time
===========================
Total time (ms) : 148155
Nodes searched  : 13997014132
Nodes/second    : 94475475

Bench 512 48 4000 default time
===========================
Total time (ms) : 148160
Nodes searched  : 13651252173
Nodes/second    : 92138581


Bench 512 22 2000 default time
===========================
Total time (ms) : 74130
Nodes searched  : 3446013987
Nodes/second    : 46486091

Bench 512 44 2000 default time
===========================
Total time (ms) : 74142
Nodes searched  : 6366179189
Nodes/second    : 85864681


----------------------------------------------------------

Did also run with new asmFish bmi2 130616..but scaling is totally not good..

asmFish bmi2

go depth 26 8c
info depth 26 multipv 1 time 37001 nps 9548522 score cp 21 nodes

go depth 26 16c
info depth 26 multipv 1 time 23123 nps 19538363 score cp 16 node

go depth 26 24c
info depth 26 multipv 1 time 20535 nps 24635035 score cp 12 nodes

go depth 26 32c
info depth 26 multipv 1 time 29463 nps 21899786 score cp 15 nodes

go depth 26 48c
info depth 26 multipv 1 time 18359 nps 41925656 score cp 17 nodes

go depth 26 64c
info depth 26 multipv 1 time 27842 nps 33100524 score cp 26 nodes

go depth 26 80c
info depth 26 multipv 1 time 21907 nps 52996489 score cp 20 nodes

go depth 27 80c
info depth 27 multipv 1 time 25354 nps 57872847 score cp 17 nodes 1467308167

go depth 28 80c
info depth 28 multipv 1 time 43913 nps 49413512 score cp 15 nodes 2169895595

go depth 27 96c
info depth 27 multipv 1 time 41803 nps 44760056 score cp 10 nodes 1871104649


Ipman.
-------------------------------------------------------------------------------------------

11-06-2016

Today i get a chance to test some chess benches on a Xeon E7-8890 v4 with 4 sockets!!

means 4cpu's x 24cores = 96cores or 192threads!

Operating system : Windows Server 2012 R2

1cpu = $7174  http://ark.intel.com/nl/products/family/93797/Intel-Xeon-Processor-E7-v4-Family

Thanks to William H. who was so kind to let me use this monster system via Teamviewer!
It was my first experience..and after a little searching how to transfer files ,copy & paste data i was ready to run some benches..

Engines that i used :
I compiled Stockfish with last source and set it to 256cores
I compiled DON 100616 also last source with 256cores
I get Komodo 1656 with 200cores ,Thanks to Mark & Larry!
Houdini 4 Pro 4 B only 32cores ,but has Numa (later more)

I had put a request on Fishcooking..and i getting some good info about how to bench best..so thanks for this information!
Also DON programmer Ehsan was kind enough to help me out..
Will come back during these tests..

I had prepare myself with all these Bench commands and i wanted do as many possible benches in the time i get on this
great system..so i choose to use a fix time for every bench ..or with some engines i use go depth 24 ,go depth 26 ..depending
how long it takes..

With my first contact had William run some tests..and i see it didn't work how i liked to see with these 192threads
Also programmers would like to see results with HyperThreading OFF ..so i decided to run benches with HT Off!
Lucky i told this William the day before ,so that he this morning can boot with HT Off (takes long time..)

For explenation..it will be clear ,i put some between these results.. so you need to scroll a lot with these few tests i did ;)

Let me start first with Stockfish:
Some people propose me to use 8Gb hash and i did this in every test..at the end i had some time left and wanted to see
what engines do with different Hash.

So..first you see bench 8192 8 2000 default time = 8192 Hash using 8cores and 2sec. for each position x 37
Stockfish has 37 positions into his bench test.
You can see Time is almost and always same.. cores i used where : 8,16,24,32,48,64,80 en 96cores
The result with different Hash i have put inbetween afterwards with the same total cores.

Stockfish bench 8192 8 2000 default time
===========================
Total time (ms) : 74198
Nodes searched  : 941792202
Nodes/second    : 12692959

Stockfish bench 8192 16 2000 default time
===========================
Total time (ms) : 74135
Nodes searched  : 2150714471
Nodes/second    : 29010783
-----------------------------------------------------------------
Stockfish bench 8192 24 2000 default time
===========================
Total time (ms) : 74169
Nodes searched  : 2996733841
Nodes/second    : 40404128

Stockfish bench 512 24 2000 default time -> tried some other Hash values ,to see if i get something better..
===========================
Total time (ms) : 74157
Nodes searched  : 3106997530
Nodes/second    : 41897562

Stockfish bench 1024 24 2000 default time -> you will see later with other engines i use 1024 Hash ,because highest Nodes/sec.!
===========================
Total time (ms) : 74148
Nodes searched  : 3321030231
Nodes/second    : 44789208

Stockfish bench 2048 24 2000 default time
===========================
Total time (ms) : 74141
Nodes searched  : 3246180633
Nodes/second    : 43783879

Stockfish bench 4096 24 2000 default time
===========================
Total time (ms) : 74124
Nodes searched  : 3174197239
Nodes/second    : 42822800
------------------------------------------------------------------
Stockfish bench 8192 32 2000 default time
===========================
Total time (ms) : 74136
Nodes searched  : 3718498964
Nodes/second    : 50157804

Stockfish bench 8192 48 2000 default time
===========================
Total time (ms) : 74165
Nodes searched  : 4872528188
Nodes/second    : 65698485

Stockfish bench 8192 64 2000 default time
===========================
Total time (ms) : 74145
Nodes searched  : 5165559495
Nodes/second    : 69668345

Stockfish bench 1024 64 2000 default time
===========================
Total time (ms) : 74154
Nodes searched  : 5354535743
Nodes/second    : 72208319 -> Peter told me you should see +70Million ;)

Stockfish bench 8192 80 2000 default time
===========================
Total time (ms) : 74166
Nodes searched  : 5211511508
Nodes/second    : 70268202

Stockfish bench 8192 96 2000 default time
===========================
Total time (ms) : 74151
Nodes searched  : 5264618852
Nodes/second    : 70998622

Stockfish bench 1024 96 2000 default time
===========================
Total time (ms) : 74186
Nodes searched  : 5379118902
Nodes/second    : 72508544

Stockfish bench 16384 96 2000 default time
===========================
Total time (ms) : 74163
Nodes searched  : 5040791490
Nodes/second    : 67969088

It was clearly using 2000 default time was better with Hash=1024
Now something Important!!
While these Nodes/sec. looking great i see a big problem during these tests..
But there is also good news!
First problem..till 24cores everything goes fast..above 24cores and the more cores i add ,how slower the test begins to go
even at the end you see almost same time used..but that's not true..the max.cores i see running where 48cores?
and you can see the nodes/sec. don't change so much anymore when using 64,80 and 96cores then when you compare from 8cores till 24cores..so when selected 64,80 or 96cores ,i see 48cores running?
With 32cores you will say,they go nice higher..but it's already lower then it should be!
So the good news is..it's not SMP problem ,but NUMA is needed when running a system with multi Intel cpu's!!
Why i say Intel ..because from AMD i don't know it and not test it yet..
1cpu = 24cores..everything till there goes great
2cpu's or more need NUMA!!

Next DON engine from Ehsan:

With DON i had to use 2000 movetime in place off 2000 default time to get a fix time bench!
Same thing here..till 48cores ,nodes/sec. goes nice up and then almost no change anymore
Total time same..but goes slower and slower with more cores to finish the test..

DON bench 8192 8 2000 movetime
=================================
Total time (ms) :           74195
Nodes searched  :       712564065
Nodes/second    :         9603936
---------------------------------
DON bench 8192 16 2000 movetime
=================================
Total time (ms) :           74175
Nodes searched  :      1500633293
Nodes/second    :        20230984
---------------------------------
DON bench 8192 24 2000 movetime
=================================
Total time (ms) :           74221
Nodes searched  :      2092625949
Nodes/second    :        28194526
---------------------------------
DON bench 8192 32 2000 movetime
=================================
Total time (ms) :           74242
Nodes searched  :      2513426090
Nodes/second    :        33854504
---------------------------------
DON bench 8192 48 2000 movetime
=================================
Total time (ms) :           74246
Nodes searched  :      3574674169
Nodes/second    :        48146353
---------------------------------
DON bench 8192 64 2000 movetime
=================================
Total time (ms) :           74248
Nodes searched  :      3609990121
Nodes/second    :        48620705
---------------------------------
DON bench 8192 80 2000 movetime
=================================
Total time (ms) :           74212
Nodes searched  :      3634574998
Nodes/second    :        48975569
---------------------------------
DON bench 8192 96 2000 movetime
=================================
Total time (ms) :           74219
Nodes searched  :      3647528726
Nodes/second    :        49145484
---------------------------------
With Hash=1024 i get higher nodes/sec. except with 48cores was lower

DON bench 1024 24 2000 movetime
=================================
Total time (ms) :           74204
Nodes searched  :      2292225565
Nodes/second    :        30890862
---------------------------------
DON bench 1024 32 2000 movetime
=================================
Total time (ms) :           74252
Nodes searched  :      2594433993
Nodes/second    :        34940930
---------------------------------
DON bench 1024 48 2000 movetime
=================================
Total time (ms) :           74197
Nodes searched  :      3463283303
Nodes/second    :        46676864
---------------------------------

Next Komodo from Mark & Larry:

Komodo bench commands:
----------------------
-setoption name hash value 1024 (to set Hash) , (128,256,512,1024,2048,4096,8192)
-setoption name threads value 8 (to set cores) , (8,16,24,32,48,64,80,96)-> HT Off
-go depth 24

With time left i run again the tests using, go depth 26
H=Hash , C=cores

Komodo go depth 24 H=8192 C=8
info time 23895 nodes 176383325 nps 7381324 hashfull 124

Komodo go depth 24 H=8192 C=16
info time 10523 nodes 138581276 nps 13169325 hashfull 79

Komodo go depth 24 H=8192 C=24
info time 9112 nodes 149030645 nps 16355127 hashfull 52

Komodo go depth 24 H=8192 C=48
info time 23736 nodes 237036210 nps 9986052 hashfull 39

Komodo go depth 24 H=8192 C=64
info time 31560 nodes 393609855 nps 12471693 hashfull 51

-----------------------------------------------------------
Komodo go depth 26 H=8192 C=8
info time 25355 nodes 185176070 nps 7303248 hashfull 135

Komodo go depth 26 H=8192 C=16
info time 23215 nodes 287714964 nps 12393381 hashfull 158

Komodo go depth 26 H=8192 C=24 -> run it twice ,because i see it lower then with 16c
info time 71465 nodes 767250121 nps 10735952 hashfull 271
info time 86053 nodes 896317258 nps 10415819 hashfull 302

Komodo go depth 26 H=8192 C=20 -> so i tried 20cores
info time 91149 nodes 1063403519 nps 11666604 hashfull 441
---------------------------------------------------------
Hash=1024 gives again higher nodes/sec.

Komodo go depth 26 H=1024 C=8
info time 71923 nodes 571277810 nps 7942805 hashfull 924

Komodo go depth 26 H=2048 C=8
info time 72453 nodes 538932168 nps 7438362 hashfull 889

Komodo go depth 26 H=1024 C=16
info time 48976 nodes 827701203 nps 16899827 hashfull 799

Komodo go depth 26 H=1024 C=24
info time 36725 nodes 663240265 nps 18059567 hashfull 602

Komodo go depth 26 H=1024 C=32
info time 52107 nodes 1131195820 nps 21708717 hashfull 451

Komodo go depth 26 H=1024 C=48
info time 33298 nodes 708926657 nps 21289807 hashfull 350

Komodo has already a slowdown in nodes/sec. gain after 16cores.. some work to do!

And as last Houdini:

When i see the problems with slowdown and not using all cores ,i said yes Houdini has Numa
but when i check ,it handle only 32cores ..but i say okay maybe i can see a little difference when i go from
24cores to 32cores

Used same bench commands as Komodo

Houdini go depth 24 H=8192 C=8
info multipv 1 depth 24 seldepth 51 score cp 10 time 46315 nodes 542050599 nps
11703000 tbhits 0 hashfull 474

Houdini go depth 24 H=8192 C=16
info multipv 1 depth 24 seldepth 47 score cp 12 time 22326 nodes 418218428 nps
18732000 tbhits 0 hashfull 386

Houdini go depth 24 H=8192 C=24
info multipv 1 depth 24 seldepth 53 score cp 6 time 27876 nodes 714840424 nps
25643000 tbhits 0 hashfull 614

Houdini go depth 24 H=8192 C=32
info multipv 1 depth 24 seldepth 49 score cp 11 time 34623 nodes 1086256131 nps
31373000 tbhits 0 hashfull 816

-----------------------------------------
Houdini go depth 24 H=1024 C=24
info multipv 1 depth 24 seldepth 54 score cp 7 time 27100 nodes 710782395 nps
26228000 tbhits 0 hashfull 1000

Hash=1024 would give again higher nodes/sec. but i had a problem when i want to set 32cores
Till i check and see this:

Houdini 4 Pro x64
(c) 2013 Robert Houdart

info string 48 processor(s) found, POPCNT available
info string NUMA configuration with 4 node(s), offset 0
info string 128 MB Hash
info string No valid license found
setoption name threads value 32
setoption name threads value 24
info string 24 threads used
setoption name threads value 32

I have a Licensed from all Houdini's i have..but have not thought when i copy the engine to a other computer
that it will be No valid..anymore
Also this "info string 32 threads used" came not when i press Enter..so i re-start Houdini every time ,and time by time it
get this "info string 32 threads used" i did the run then..
But it was after that ,that i say something is not right..and see later on that my version from Houdini was Not licensed anymore
..so i can say that these results are maybe not valid as i know Houdini will not play at full strenght.

Now i'm hoping that Robert with his new coming Houdini 5 will put enough cores as he has Numa!

It will be interesting to test this further out..and i think there will be more people to have a system with more then 1cpu..
So the chess programmers have to think to include NUMA in there chess engines if we want to profit and gain more nodes/sec.
in our systems!!

Again a big thanks to William H. to let me run these tests! It's a pleasure to meet you ;)
Thanks to the programmers for the info and compiles that gives me a chance to run these tests,even i coudn't really use all these cores
but now we know what is needed..Numa!

Ipman.

PS: Did make this little report after testing..so it's possible i have to adjust some off my explenations later on..


A little video link to the system i was using: https://onedrive.live.com/?authkey=%21AOlWofWK7RCBrYg&cid=2B3864A4D682E0B9&id=2B3864A4D682E0B9%2130873&parId=2B3864A4D682E0B9%21164&o=OneUp

Make a free website with Yola