0
Follow
0
View

How to solve the problem of #HTML crawling text string space indent #?

dlzthzzh 注册会员
2023-02-27 18:35

the reference answer GPT ᴼ ᴾ ᴱ ᴺ ᴬ ᴵ < br / > in the HTML text, Spaces and line breaks are blank characters can appear in any position, and will be ignored by the parser. Therefore, if we crawled the HTML text string directly, we might contain a lot of unnecessary Spaces and newlines, which would affect the display of the string.

To solve this problem, you can use string handling functions to remove unnecessary whitespace characters. Here are some common functions:

  1. strip() : Used to remove whitespace characters such as Spaces and newlines at both ends of a string.

  2. replace() : Used to replace one character in a string with another. Multiple consecutive Spaces can be replaced with a single space.

  3. split() : Used to split a string into a list that can be separated by Spaces or other characters.

  4. join() : Used to join the elements of a list into a string. You can concatenate the elements in the list using Spaces or other characters.

Using these functions, you can remove unnecessary Spaces and newlines from HTML text strings, making the strings prettier and easier to handle.

ddl891106 注册会员
2023-02-27 18:35

with the replace(', '). The replace(' \ n ', ') to replace all the Spaces and line breaks

About the Author

Question Info

Publish Time
2023-02-27 18:35
Update Time
2023-02-27 18:35

Related Question

我似乎无法改变莎玛琳的顺序.CommunityToolkit TabView选项卡.有办法做到这一点吗?

Working out a column value based on another column as a percentage value

AutoMapper实体到Dto的转换:映射类型错误

在Apache Tomcat中集成Apache Fuseki,并从Java servlet运行SPARQL查询

CentOS7安装matlab环境Java调用异常,如何解决?

python 爬虫 token 加密方式

Linux下安装tomcat

使用axios在头中发送X-Auth-Token

BERT文本分类使用pytorch

如何从MQTT发送动态度量到Prometheus?