# novel

**Repository Path**: billy_git/novel

## Basic Information

- **Project Name**: novel
- **Description**: 爬虫，针对biquge 网页进行内容爬取
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2018-09-05
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# novel

#### 项目介绍
python scrapy 框架学习，对 笔趣阁 网站进行内容爬取

#### 软件架构

    采用 scrapy 框架，数据存储使用 本地 json 文件，以及 mongoDB 数据库存储， json 方式没做完，只提供了一个方式

#### 安装教程

1. mangoDB 安装
2. python 3.5
3. pip install Scrapy 
4. pip install pymongo

#### 使用说明

# 启动

mongo 数据库安装好，并且做好链接配置

./novel/settings.py 文件下对Mongo链接进行设置

说明: scrapy 框架管道配置我修改过路径， 原本在 项目 目录下

BOT_NAME = 'novel'

SPIDER_MODULES = ['novel.spiders']
NEWSPIDER_MODULE = 'novel.spiders'

ROBOTSTXT_OBEY = True

# Txt文件保存数据相关配置 Start

# ITEM_PIPELINES = {
#     'novel.Pipelines.Txt.TxtPipelines': 100,
# }

# DATA_URL = "./TxtData/"

# Txt文件保存数据相关配置 End


# JSON文件保存数据相关配置 Start

# ITEM_PIPELINES = {
#     'novel.Pipelines.Jsond.JsondPipelines': 100,
# }

# DATA_URL = "./JsonData/"

# JSON文件保存数据相关配置 End


# MongoDB 保存数据相关配置 Start

ITEM_PIPELINES = {
    'novel.Pipelines.Mongo.MongoPipelines': 100,
}

MONGODB_SERVER = "xxx.xxx.xxx.xxx"
MONGODB_PORT = 27017
MONGODB_USER = "xxxxx"
MONGODB_PWD = "xxxxx"
MONGODB_BIQUGE_DB = 'novel'

# MongoDB 保存数据相关配置 End


启动爬虫 scrapy crawl biquge

#### 参与贡献

1. Fork 本项目
2. 新建 Feat_xxx 分支
3. 提交代码
4. 新建 Pull Request