You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This ticket is just to document what I am currently coding.
First step is to transpose the current core code to asm. Currently the decode stage is done.Execution stage will follow. This version will be the scalar version where all optimizations are done, especially the removal of all branches, replaced by logical computation.
Second step will be to create an hypervisor in C++ that can manage several hundreds of CPU and mutualise the different stages. The output of the decode stage will be stored and used as input of the execution stage. Each operation will have N queues ,with N being the numbers of cycles that take this operation. At each step, we fill the Nth queue and we execute the first queue, then we rotate the queues for all operations. The constraint is that all CPU's have the same clock.
Third step will be to transpose the C++ loops used by the hypervisor to asm, keeping the hypervisor management in C++.
Last step will be to vectorize the loops of the decode stage, and later of the execution stage if it's worth the pain.
The challenge is to reach 10k VM running at ease on my PC that can currently run about 1200 VMs.
The asm implementation will support 32 bit or 64 bit platforms, but the 32 bit implementation must be statically linked.
Help will be appreciated on the hypervisor in C++, since I am not yet ready to start it.
The text was updated successfully, but these errors were encountered:
This ticket is just to document what I am currently coding.
First step is to transpose the current core code to asm. Currently the decode stage is done.Execution stage will follow. This version will be the scalar version where all optimizations are done, especially the removal of all branches, replaced by logical computation.
Second step will be to create an hypervisor in C++ that can manage several hundreds of CPU and mutualise the different stages. The output of the decode stage will be stored and used as input of the execution stage. Each operation will have N queues ,with N being the numbers of cycles that take this operation. At each step, we fill the Nth queue and we execute the first queue, then we rotate the queues for all operations. The constraint is that all CPU's have the same clock.
Third step will be to transpose the C++ loops used by the hypervisor to asm, keeping the hypervisor management in C++.
Last step will be to vectorize the loops of the decode stage, and later of the execution stage if it's worth the pain.
The challenge is to reach 10k VM running at ease on my PC that can currently run about 1200 VMs.
The asm implementation will support 32 bit or 64 bit platforms, but the 32 bit implementation must be statically linked.
Help will be appreciated on the hypervisor in C++, since I am not yet ready to start it.
The text was updated successfully, but these errors were encountered: