Friday, October 10, 2014

Using Assembly For Scalable Optimizations

I've been learning Assembly for Intel x86 processors. I'm hoping as an end result that I will be able to use inline Assembly in C/C++ in order to speed up some processing. I find it critical to optimize clock cycles during certain computations in order to provide faster response times between client and server.

Let's assume this (without regards to hardware): One clock cycle is 1/8,000,000 of a second. 1,000 milliseconds are in a second. Therefore one clock cycle is 1/8,000 of a millisecond.

If we were to have a million clients on one server and they requested a computation that required around 150 clock cycles. That would mean a total of 150,000,000 clock cycles have to be performed. Let's also note that the server can handle this request one at a time because of necessary synchronization.

Each request requires a total of 0.00001875 seconds.
In order to serve every client its request, it would take 18.75 seconds to complete.

And we can always vertically scale our hardware to increase the amount of clock cycles per second.

When players play an online game, they expect to have a latency of around 150 ms and lower. This sort of latency would not be acceptable.

If we were able to cut down the clock cycles of 150 to an amount of 50, then it would only require 6.25 seconds to complete. If we were to be able to cut it down to an amount of 25, it would then require only 3.125 seconds to finish entirely.

With careful multithreading of certain I/O tasks and inline Assembly, we can see a huge difference in performance if done correctly. Finding ways to reduce O(n) to O(1) is also beneficial, however, it may run at the cost of memory.

No comments:

Post a Comment