High concurrency and high frequency crawler crawling a web page, does it need computer configuration or bandwidth?

doudou19861221 注册会员
2023-02-28 06:26

1. The first is bandwidth, the more requests, the higher the pressure on broadband requirements
2. Generally configured computers can basically meet the requirements of high concurrent requests
3. In addition, we also need to look at the response speed and bandwidth of the other server. If the bandwidth of the target server is relatively low, after reaching a certain amount of concurrency, you simply increase the bandwidth and configuration will not significantly improve the speed, but also may make the target server crash, which is a very dangerous thing.

wangjz38257372 注册会员
2023-02-28 06:26

Make a cpu on the line, or directly make a cloud server, let the crawler run on the server, high computing speed, of course, to consider the benefits, computing speed is linked with the cpu computing speed

csddjj1199 注册会员
2023-02-28 06:26

High concurrency high frequency crawler need to consider many factors, including computer configuration, bandwidth, server response speed, crawler code optimization and so on. Which factors need to be optimized should be analyzed according to the actual situation.

For computer configuration, mainly involved in CPU, memory, hard disk and other aspects. If the crawler needs a lot of page parsing, it is recommended to use a computer with a strong CPU; If you need to store a large amount of data, you are advised to use a computer with a large hard disk capacity. If you need to process multiple requests at the same time, a computer with more memory is recommended.

For bandwidth, you need to consider the response speed of the website server and its own bandwidth limitations. If the website server response speed is slow, you can consider using proxy or distributed crawler and other ways to accelerate crawler; If your own bandwidth is small, you can consider using cloud servers to increase the bandwidth.

In addition to hardware factors, code optimization is also very important. The efficiency and stability of crawler can be improved by setting the request header, using cache, using asynchronous request and so on to reduce the load on the website server.

In short, in order to realize high concurrency and high frequency crawler, it is necessary to consider various factors such as hardware configuration, network bandwidth and code optimization.

ds1989126 注册会员
2023-02-28 06:26

Resources required by high concurrency and high frequency crawler include computer configuration and bandwidth, because these two factors will affect the performance of crawler
The processing capacity of the CPU, memory size, and read/write speed of the hard disk must be considered. If the crawler needs frequent data processing, storage or analysis, these factors will affect its performance. Therefore, when carrying out high concurrency and high frequency crawler, it is necessary to ensure that the hardware configuration of the computer reaches a certain level in order to run the crawler smoothly.
Bandwidth, the size and stability of the network bandwidth should be considered. In the process of high concurrency and high frequency crawler, it may be necessary to download a large amount of data from the website frequently. If the bandwidth is insufficient, the speed of data download by the crawler program will be slowed down, and even the network failure will occur. Therefore, it is necessary to ensure the size and stability of network bandwidth, so as to smoothly carry out the download and analysis of crawler program.

To sum up, the high concurrency and high frequency crawler needs to consider the two factors of computer configuration and bandwidth to ensure that the crawler can run quickly and stably.

About the Author

Question Info

Publish Time
2023-02-28 06:26
Update Time
2023-02-28 06:26