Metadata-Version: 2.1
Name: PrSpiders
Version: 0.5.2.8
Summary: Inherit the requests module, add xpath functionality to expand the API, and handle request failures and retries
Home-page: https://github.com/peng0928/prequests
Author: penr
Author-email: 1944542244@qq.com
License: MIT
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: requests
Requires-Dist: urllib3
Requires-Dist: lxml
Requires-Dist: xpinyin
Requires-Dist: PrSpiders
Requires-Dist: loguru
Requires-Dist: pymysql


# *[PrSpiders线程池爬虫框架](https://github.com/peng0928/PrSpiders)*



## *PrSpiders安装*



```python
 pip install PrSpiders
```




## *开始  Go start!*



**1.Demo**

   



```python
from PrSpider import PrSpiders  

class Spider(PrSpiders):  

    start_urls = 'https://www.runoob.com'  

    def parse(self, response):  
        # print(response.text)  
        print(response, response.code, response.url)  

  #<Response Code=200 Len=323273> 200 https://www.runoob.com/

if __name__ == '__main__':  

    Spider()
```
       

**2.重写入口函数-start_requests**



> start_requests是框架的启动入口，PrSpiders.Requests是发送请求的发送，参数下面会列举。



```python
from PrSpider import PrSpiders  
 

class Spider(PrSpiders):  

    def start_requests(self, **kwargs):  
        start_urls = 'https://www.runoob.com'  
        PrSpiders.Requests(url=start_urls, callback=self.parse)  

    def parse(self, response):  
        # print(response.text)  
        print(response, response.code, response.url)  


if __name__ == '__main__':  

    Spider()
```



**3.PrSpiders基本配置**



> 底层使用ThreadPoolExecutor



    workers: 线程池
    retry: 是否开启请求失败重试，默认开启
    download_delay: 请求周期，默认0s
    download_num: 每次线程请求数量，默认1秒5个请求
    logger: 日志保存本地，默认False，开启Ture OR str（文件名），如 logger='test'
    log_level: 日志等级,默认Info,等级(debug, info, warn, error)
    log_stdout: 日志存储是否重定向，默认关闭



> 使用方法如下



```python
from PrSpider import PrSpiders  
 

class Spider(PrSpiders):  

  retry = False  
  download_delay = 3  
  download_num = 10  

  def start_requests(self, **kwargs):  
        start_urls = ['https://www.runoob.com' for i in range(100)]  
        PrSpiders.Requests(url=start_urls, callback=self.parse)  

  def parse(self, response):  
        # print(response.text)  
        print(response, response.code, response.url)  


if __name__ == '__main__':  

    Spider()
```


**4.PrSpiders.Requests基本配置**
> 基本参数：
> url：请求网址
> callback：回调函数
> headers：请求头
> retry_time：请求失败重试次数
> method：请求方式（默认Get方法），
> meta：回调参数传递
> encoding：编码格式（默认utf-8）
> retry_interval：重试间隔
> timeout：请求超时时间（默认10s）
> data or params: 请求参数
> **kwargs：继承requests的参数如（data, params, proxies）



```python
   PrSpiders.Requests(url=start_urls, headers={}, method='post', encoding='gbk', callback=self.parse,  retry_time=10, retry_interval=0.5, meta={'hhh': 'ggg'})
```


## *Api*



**GET Status Code**



    response.code



**GET Text**


    response.text



**GET Content**



    response.content

**GET Url**



    response.url



**GET History**



    response.history



**GET Headers**



    response.headers



**GET Text Length**



    response.len



**GET Lxml Xpath**



    response.xpath



## *Xpath Api*

 1. text()方法:将xpath结果转成text
 2. date()方法:将xpath结果转成date
 3. get()方法:将xpath结果提取
 4. getall()方法:将xpath结果全部提取，拥有text()方法和date()方法

```python
from PrSpider import PrSpiders


class Spider(PrSpiders):
    log_level = 'info'

    def start_requests(self, **kwargs):
        start_urls = "https://blog.csdn.net/nav/python"
        PrSpiders.Requests(url=start_urls, callback=self.parse)

    def parse(self, response):
        lisqueryall = response.xpath("//div[@class='content']").getall()
        for query in lisqueryall:
            title = query.xpath(".//span[@class='blog-text']").text(lists=True)
            lishref = query.xpath(".//a[@class='blog']/@href").get()
            print({
                '写法': '第一种',
                '列表标题': title,
                '列表链接': lishref
            })
        title = response.xpath("//span[@class='blog-text']").text()
        lisquery = response.xpath("//div[@class='content']/a[@class='blog']/@href").get()
        print({
            '写法': '第二种',
            '列表标题': title,
            '列表链接': lisquery
        })
        PrSpiders.Requests(url=lisquery, callback=self.cparse)

    def cparse(self, response):
        title = response.xpath("//h1[@id='articleContentId']").text()
        pudate = response.xpath("//span[@class='time']").date()
        content = response.xpath("//div[@id='content_views']").text()
        print({
            '标题': title,
            '时间': str(pudate),
            'href': response.url,
        })


if __name__ == "__main__":
    Spider()

```



       

## *Please contact me if there are any bugs*




> email ->

> 1944542244@qq.com


