After obtaining some mainstream media attention, the development team of ProgPoW IfDefElse received a lot of arithmetic questions, their answers to some common problems. The original authors agree that this translation report ore horizon.
Q: in the etheric Fang governance, what is your position?
Answer: no position, we feel that many problems should be left to the community to answer, such as whether or when to use ProgPoW. We propose a new algorithm, and willing to answer technical questions related.
Q: ProgPoW come from?
Answer: IfDefElse is a small team analysis and optimization of PoW algorithm. We observed that the ETH community has repeatedly requested the use of a new PoW algorithm, in this algorithm, professional ASIC machine and conventional hardware facilities than up, little advantage. Seeing many algorithms in the ASIC machine before cannot withstand a single blow, it is heartbreaking, each new ASIC machine will make the entire ETH community was plunged into depression.
So in the spring of 2018 in a day, we have to modify Ethash algorithm, to achieve the desired effect of the idea of mining GPU. Preliminary edited the algorithm, we put it in a public forum on GitHub development and fine-tuning.
Q: who were assessed for ProgPoW?
Answer: in the process of a collection of algorithms using feedback, we are lucky enough to receive from the etheric Fang foundation engineer, Fang Ethernet core R & D Engineer, feedback mail NVIDIA engineers and AMD engineers. NVIDIA and AMD have made a positive evaluation of general engineer of the algorithm.
It is worth mentioning that there are two update optimization algorithm is to evaluate the community members mbevand and based on Schemykh.
Q: what reaction AMD?
Answer: AMD’s response to solve the two major concerns:
If you use ProgPoW Ethash algorithm instead of PoW algorithm, ASIC machine manufacturers have no way to do research on the source code and the rapid manufacturing of specialized ASIC machine?
ProgPoW algorithm will let GPU miners dig Ethernet?
An engineer AMD gives an affirmative answer, the theory is to build a new ProgPoW ASIC machine, but it need to have special knowledge of GPU manufacturing background, especially the memory controller technology.
Not only that, they also expressed the cache (local data sharing and AMD chip data on the size of the worry).
They mentioned in the message, regardless of whether the cache is 8KB or 16KB, AMD and NVIDIA had no big difference in performance. But in 32KB and 64KB may have a significant impact on the two kinds of GPU architecture in the Polaris and Vega manufacturers, also there will be no compatibility.
According to their feedback, we put the size of the PROGPOW_CACHE_BYTES is set to 16KB.
Q: what reaction NVIDIA?
Answer: NVIDIA engineers generally agree our method. They say that the algorithm fills the gap between memory access by operation, rather than GPU as noble as the memory controller idle idle in the.
Their main concern is that if the algorithm added too many random operations, finally becomes compute bound, rather than by the storage limit. As a result, by the calculation of limit algorithm to create ASIC machine may gain greater efficiency and gain.
According to their feedback, we tuned the PROGPOW_CNT_CACHE and PROGPOW_CNT_MATH to ensure that the algorithm of GPU is still under the present most of the storage limit.
Q: if ProgPoW is the main loop call module and use the kiss99 () method to select random instruction, then the algorithm design of ASIC machine is not more efficient?
Answer: This is the first time to see the algorithm often some misunderstanding. In fact, the main circuit modules and kiss99 (written) call, is calculated by the CPU and then generate a random process, and then compiled by CPU. GPU is responsible for the implementation of optimized code, and the code has been solved to perform what instruction and use mixed state what problem.
As Alexey said, ProgPoW every 50 blocks a generation of source code. See example generation program: kernel.cu.
We will also make further explanation in the standard.
Q: to compile the source code, the miners will need to install AMD or NVIDIA software development kit?
Answer: No. Drive AMD and NVIDIA contained in OpenCL, DirectX and Vulkan compiler. For CUDA, the binary file will be kernel and a small part of the software development kit with distribution.
Q: ProgPoW algorithm based on GPU preference?
Answer: No, the original design of ProgPoW algorithm is as far as possible to ensure fairness. OpenCL and CUDA had no difference in the implementation of 16KB, the size of the cache can run smoothly in these two kinds of architecture.
We avoided only 16 or 24 operations in one structure, whether it is indexed register file AMD, or NVIDIA LOP3, all operations are well supported across generations architecture.
The performance of ProgPoW algorithm in GPU mining work load will also reflect the average performance of the game GPU.
Q: why after a large number of modifications of VBIOS GPU between Ethash and ProgPoW, but the speed difference of more than 2 times slower than expected?
Answer: ProgPoW read each hash memory is twice Ethash, so the expected hash rate was 1/2. We previously reported all the tuning and sample hash rate (see “Results:Hashrate”) is run in normal rate above GPU. In order to reduce the core frequency and a large number of modifications of VBIOS will lead to limited computing machine running the algorithm, rather than by the storage limit.
If the user wants to change to the new algorithm, modify and tuning of VBIOS will need to be re.
Q: can you talk about Ethash ASIC than the GPU machine to mill efficiency two times higher?
The Ethash algorithm only needs to perform 3 components:
High bandwidth memory (for DAG access)
Keccak F1600 engine (for the initial / final hash)
Micro computing core (for inner loop and FNV module)
FPGA data show that the cost of computing power of Keccak is almost negligible. We estimate that in the implementation of the Ethash algorithm, the GPU power is only about 1/2 spent on memory access. The power of Keccak and calculation of the core Ethash ASIC machine is negligible, the power consumption in the main memory access, so improve the mining efficiency in GPU space and two times.
The current Ethash mining hardware quick summary:
In addition to Titan V, all data are from whattomine.com and asicminervalue.com.
The first generation of Ethash ASIC machine, Antminer E3, bit compared with the GPU machine without any efficiency advantages. This is because the DDR3 memory is higher than GPU and its GDDR memory power consumption.
To our knowledge, has not yet released Innosilicon A10 ETHMaster will have better performance in efficiency. Because Innosilicon use of GDDR6 IP technology in the machine series, which will make it the efficiency can reach two times the most efficient mining of GPU RTX 2070.
Q: how practical HBM?
Answer: our initial evaluation algorithm is to use the same memory type with the standard comparison. HBM low power consumption, but the price is expensive, it is not practical. For example, with HBM NVIDIA Titan V A10 ETHMaster than the efficiency, only a little less, but the cost of $3000, was not practical.
With HBM AMD Vega card price is reasonable, but for some reason it is only up to 175 KH/s/W. What do we limit the efficiency of Vega is still uncertain, increasing the access size can significantly improve the situation (the bandwidth utilization rate increased from 61% to 75%, see “Results:Hashrate”) but the power consumption of Vega cards are still too high. We look forward to the newly announced AMD Radeon VII graphics card can double the bandwidth is significantly improved in efficiency.
We estimate the power of HBM is about half of GDDR6, if the manufacturing of expensive Ethash ASIC machine using HBM, the calculated stress will exceed 1 MH/s/W, the efficiency is about 4 times of conventional GPU on the market.
Q: ProgPoW ASIC can have high efficiency?
Answer: ProgPoW is designed to significantly reduce the efficiency gain of dedicated ASIC machine. The implementation of the algorithm must satisfy the following components:
High bandwidth memory (for DAG access)
Keccak F800 engine (for the initial / final hash)
Large register files (for mixed state)
High throughput SIMD whole mathematics (for random operations)
High throughput SIMD cache (for random access cache)
Keccak capacity becomes small, so it has on the GPU power consumption is negligible. In this way, ASIC mills at lower power consumption advantages will cease to exist.
In order to perform random sequence, ProgPoW ASIC and GPU mills need to calculate the core implementation on very similar things. Register access, all SIMD math and cache access needs to be similar to the GPU operating environment.
Yes, ProgPoW ASIC ISA can accurately design, to match the ProgPoW algorithm, such as delete, increase the explicit merge (floating-point operations). However, this specialization will only provide a small amount of marginal benefit, rather than the magnitude of the income increase.
Optimistic, we assume that a well-designed ProPoW ASIC ISA can remove the 1/4 core computing power. Because the GPU kernel in the implementation of the ProPoW is much more active, we estimate the memory interface consumes about GPU power 1/3. Then use the GDDR Prop PoW ASIC and its relative power consumption is:
1/3 (memory) * 1 + 2/3 (calculated) * 3/4 = 5/6
The advantage of 1.2 times
If you use HBM, relative power ProgPoW ASIC machine is:
1/3 (memory) * 1/2 + 2/3 * 3/4 = 2/3 (Computing)
The advantage of 1.5 times
Q: can run on ProgPoW FPGA?
Answer: first of all, ProgPoW practical problems run in FPGA. Because the random program changed once every 12.5 minutes, so often to compile and load a new bit stream to. This task is completed by the tools and facilities basically does not exist.
Even ignoring this problem, ProgPoW is not well mapped to FPGA, FPGA for computationally intensive algorithms (such as Keccak or Lyra) effective. Through multiple operations into a single clock cycle, running multiple operations at the same time, the algorithm can significantly improve the performance and reduce power consumption.
ProgPoW algorithm has many cyclic interleaved in sequence in the cache, which greatly reduces can be packed into a single clock cycle or parallel operation. In ProgPoW algorithm, packing operation FPGA not only reduces the performance of mining hardware, and increase the length of the channel information. Because of the large mixed state (16 lanes * 32 regs * 4 bytes = 2 kilobytes), increased the length of the channel information has become a problem.
If along each information channel of the copy of the mixed state, will waste a lot of power. Of course, we can also put the mixed state is stored in the register file, let the calculation of core FPGA looks like ASIC or GPU, but to do so, the operational efficiency of FPGA will be significantly lower than that of ASIC.
Q: all of the above answer seems very long, can do a simple summary?
Answer: of course
The relative efficiency of mining hardware
We took the Ethash and ProgPoW of 2 times and 1.2 times the estimate is that the same type of memory with the standard. At the time of this writing, we have realized that most of the GPU when using GDDR, we also compare to different standards, such as using HBM ASIC machine for comparison.
The original link:
Original author: IfDefElse translation & proofreading: fish
The ore horizon of translation finishing editing, reprint, please indicate the source.