High concurrency high frequency crawler need to consider many factors, including computer configuration, bandwidth, server response speed, crawler code optimization and so on. Which factors need to be optimized should be analyzed according to the actual situation.
For computer configuration, mainly involved in CPU, memory, hard disk and other aspects. If the crawler needs a lot of page parsing, it is recommended to use a computer with a strong CPU; If you need to store a large amount of data, you are advised to use a computer with a large hard disk capacity. If you need to process multiple requests at the same time, a computer with more memory is recommended.
For bandwidth, you need to consider the response speed of the website server and its own bandwidth limitations. If the website server response speed is slow, you can consider using proxy or distributed crawler and other ways to accelerate crawler; If your own bandwidth is small, you can consider using cloud servers to increase the bandwidth.
In addition to hardware factors, code optimization is also very important. The efficiency and stability of crawler can be improved by setting the request header, using cache, using asynchronous request and so on to reduce the load on the website server.
In short, in order to realize high concurrency and high frequency crawler, it is necessary to consider various factors such as hardware configuration, network bandwidth and code optimization.