预备知识

500px 爬虫里用到的库以外，还需要一个调度框架，这样才能让自动签到小程序每天自动运行。经过一番搜索，发现最常用的大约是 APScheduler，python 官方也自带一个调度模块 sched。

此外，time 模块是这 2 个调度工具都需要引入的，也纳入学习计划吧

作为一个好学宝宝，下面就开始吧

time

https://docs.python.org/3/library/time.html

epoch 在unix上，epoch 就是 1970-01-01 00:00:00.000
time.time() 返回从 epoch 到现在的秒数
time.monotonic() 这个是系统启动后到现在为止流逝的时间，以秒为单位的小数。该值无法干预，而 time.time() 则可以通过修改系统时间来干预
time.sleep(secs) 将当前线程挂起 secs 秒
time.struct_time 这是一个表示时间的类，其属性既可以用索引来访问，又可以用属性名来访问，即 命名元组，属性如下

索引	属性名	值
0	tm_year	示例:1993
1	tm_mon	[1, 12]
2	tm_mday	[1, 31]
3	tm_hour	[0, 23]
4	tm_min	[0, 59]
5	tm_sec	[0, 61]
6	tm_wday	[0, 6]
7	tm_yday	[1, 366]
8	tm_isdst	0, 1, -1
	tm_zone	时区
	tm_gmtoff	相对 UTC 时间的偏移量，秒为单位

time.strftime(format[, t]) 将时间格式化，参数 t 必须是个 time.struct_time 类或者元组，示例

 time.strftime('%Y-%m-%d %H:%M:%S',time.localtime())
'2016-09-29 10:21:01'

sched

就一个类 Scheduler

class sched.scheduler(timefunc=time.monotonic, delayfunc=time.sleep)

参数

timefunc 无参的函数，返回一个表示时间的数字，若 time.monotonic 不可用的话(某些系统不支持这个函数)，就用 time.time()
delayfunc 有一个参数的函数，该函数将在每个任务运行后被调用，在多线程应用程序里给其他线程一个运行的机会

方法

scheduler.enterabs(time, priority, action, argument=(), kwargs={})
- time 一个表示时间的数字
- priority 如果同时间有多个任务被调度，这个表示优先级
- 要调度的任务就是 action(argument, kwargs)
scheduler.enter(delay, priority, action, argument=(), kwargs={})
- delay 延迟一段时间开始调度
scheduler.cancel(event) 将 event 从队列里移除
scheduler.empty() 队列是否为空
scheduler.run(blocking=True) 执行所有的调度任务，该方法会等到(通过 delayfunc 函数)任务时间到，执行之，直到没有任务可调度
- blocking 参数含义没搞懂......
- 如果调度任务时抛出了异常，后续的任务不会被执行
scheduler.queue 只读属性，返回任务队列，按执行顺序排列，每个任务是一个 命名元组，字段如下 time, priority, action, argument, kwargs

左看右看，这个库不能满足要求嘛，不能周期性执行任务，连 cron 表达式都不支持，看来只能放弃了

pip

那么试下 APScheduler 吧，这是 python 第三方模块，需要安装，python 用 pip 来安装第三方模块，所以就先来了解一下 pip 吧

首先试一下 pip install 命令

# /data/python/bin/pip3 install APScheduler
Collecting APScheduler
  Downloading APScheduler-3.2.0-py2.py3-none-any.whl (52kB)
    100% |████████████████████████████████| 61kB 45kB/s
Requirement already satisfied (use --upgrade to upgrade): setuptools>=0.7 in /data/python/lib/python3.5/site-packages (from APScheduler)
Collecting tzlocal>=1.2 (from APScheduler)
  Downloading tzlocal-1.2.2.tar.gz
Collecting six>=1.4.0 (from APScheduler)
  Downloading six-1.10.0-py2.py3-none-any.whl
Collecting pytz (from APScheduler)
  Downloading pytz-2016.6.1-py2.py3-none-any.whl (481kB)
    100% |████████████████████████████████| 481kB 209kB/s
Installing collected packages: pytz, tzlocal, six, APScheduler
  Running setup.py install for tzlocal ... done
Successfully installed APScheduler-3.2.0 pytz-2016.6.1 six-1.10.0 tzlocal-1.2.2
You are using pip version 8.1.1, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

ok，安装成功，根据提示，pip 的版本貌似可以升级，那就来一下吧

# /data/python/bin/pip3 install --upgrade pip
Collecting pip
  Downloading pip-8.1.2-py2.py3-none-any.whl (1.2MB)
    100% |████████████████████████████████| 1.2MB 116kB/s
Installing collected packages: pip
  Found existing installation: pip 8.1.1
    Uninstalling pip-8.1.1:
      Successfully uninstalled pip-8.1.1
Successfully installed pip-8.1.2

鉴于 pip 不是重点这里就不深入了，接下来学习一下 APScheduler 的使用

APScheduler

http://apscheduler.readthedocs.io/en/latest/

4 个概念

triggers 触发器，包含了调度逻辑，每个作业都有其触发器，以判断作业何时执行
job stores 作业仓库，安置了被调度的作业，默认的作业仓库在内存里保存作业，其他仓库也可在数据库里保存作业
executors 执行器，运行作业
schedulers 调度器，一般情况下，一个应用程序只需要一个调度器

选择调度器

BlockingScheduler 如果调度器是你的程序里唯一运行的那个冬冬
BackgroundScheduler 如果你没有用任何的框架，而且想要调度器在后台运行
AsyncIOScheduler 如果你的程序使用了 asyncio 模块
GeventScheduler 如果你的程序使用了 gevent
TornadoScheduler 如果你在创建 Tornado 应用
TwistedScheduler 如果你在创建 Twisted 应用
QtScheduler 如果你在创建 Qt 应用

配置调度器

示例，BackgroundScheduler，默认的作业仓库和执行器

from apscheduler.schedulers.background import BackgroundScheduler


scheduler = BackgroundScheduler()

# Initialize the rest of the application here, or before the scheduler initialization

上面的代码返回了一个 BackgroundScheduler，使用名为 default 的内存作业仓库，一个名为 default 的 ThreadPoolExecutor，最大线程数为 10

假设上面的调度器不能满足你，你想要 2 个作业仓库，使用 2 个执行器，还想要调整新作业的默认值，设置不同的时区，正如下面的例子，你将得到

MongoDBJobStore，名为 mongo
SQLAlchemyJobStore，名为 default，使用 SQLite
ThreadPoolExecutor，名为 “default”, 工作线程数为 20
ProcessPoolExecutor,名为 “processpool”, 工作线程数为 5
时区为 UTC
新的作业，coalescing 默认为 off
最多 3 个新作业实例

其一

from pytz import utc

from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.jobstores.mongodb import MongoDBJobStore
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
from apscheduler.executors.pool import ThreadPoolExecutor, ProcessPoolExecutor


jobstores = {
    'mongo': MongoDBJobStore(),
    'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite')
}
executors = {
    'default': ThreadPoolExecutor(20),
    'processpool': ProcessPoolExecutor(5)
}
job_defaults = {
    'coalesce': False,
    'max_instances': 3
}
scheduler = BackgroundScheduler(jobstores=jobstores, executors=executors, job_defaults=job_defaults, timezone=utc)

其二

from apscheduler.schedulers.background import BackgroundScheduler


# The "apscheduler." prefix is hard coded
scheduler = BackgroundScheduler({
    'apscheduler.jobstores.mongo': {
         'type': 'mongodb'
    },
    'apscheduler.jobstores.default': {
        'type': 'sqlalchemy',
        'url': 'sqlite:///jobs.sqlite'
    },
    'apscheduler.executors.default': {
        'class': 'apscheduler.executors.pool:ThreadPoolExecutor',
        'max_workers': '20'
    },
    'apscheduler.executors.processpool': {
        'type': 'processpool',
        'max_workers': '5'
    },
    'apscheduler.job_defaults.coalesce': 'false',
    'apscheduler.job_defaults.max_instances': '3',
    'apscheduler.timezone': 'UTC',
})

其三

from pytz import utc

from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
from apscheduler.executors.pool import ProcessPoolExecutor


jobstores = {
    'mongo': {'type': 'mongodb'},
    'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite')
}
executors = {
    'default': {'type': 'threadpool', 'max_workers': 20},
    'processpool': ProcessPoolExecutor(max_workers=5)
}
job_defaults = {
    'coalesce': False,
    'max_instances': 3
}
scheduler = BackgroundScheduler()

# .. do something else here, maybe add jobs etc.

scheduler.configure(jobstores=jobstores, executors=executors, job_defaults=job_defaults, timezone=utc)

开始调度

调用 start() 方法即可启动调度器，除了 BlockingScheduler 外，其他调度器将立即返回；而 BlockingScheduler，完成了初始化以后再调用 start() 方法

注意

当一个调度器启动后，不能再修改其设置

添加作业

调用 add_jon() 方法，这是最常用的方式，返回一个 apscheduler.job.Job 实例，可以在稍后修改或移除该作业
用装饰器，scheduled_job() 方法装饰一个函数，声明一个不可修改的作业

移除作业

调用 remove_job()，传入作业 id 和作业仓库名称
调用作业实例的 remove() 方法

如果一个作业的调度结束，即其触发器不在产生下一次执行的时间，会自动移除

示例

job = scheduler.add_job(myfunc, 'interval', minutes=2)
job.remove()

scheduler.add_job(myfunc, 'interval', minutes=2, id='my_job_id')
scheduler.remove_job('my_job_id')

调度器事件

调度器可以配备事件监听器，示例如下

def my_listener(event):
    if event.exception:
        print('The job crashed :(')
    else:
        print('The job worked :)')

scheduler.add_listener(my_listener, EVENT_JOB_EXECUTED | EVENT_JOB_ERROR)

random

随机数模块

https://docs.python.org/3/library/random.html

logging

日志模块

https://docs.python.org/3/library/logging.html

预备知识

预备知识

time

sched

pip

APScheduler

4 个概念

选择调度器

配置调度器

开始调度

添加作业

移除作业

调度器事件

random

logging

results matching ""

No results matching ""