开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 系统运维 -> Linux系列之soft lockup机制浅析 -> 正文阅读

[系统运维]Linux系列之soft lockup机制浅析

1、背景

此章节转自宋宝华老师博文
提到soft lockup，大家都不会陌生：

BUG: soft lockup - CPU#3 stuck for 23s! [kworker/3:0:32]

这个几乎和panic，oops并列，也是非常难以排查甚至比panic更麻烦。至少panic之后你可以去分析一个静态的尸体，然而soft lockup，那是一个动态的过程，甚至转瞬即逝，自带自愈功能。

那么soft lockup是由于什么原因导致的呢？

几乎没有这方面的文章，能找到的也只有个别的案例分析，所以我想趁着周末降至来写一篇关于soft lockup的通用解释。

首先澄清两个关于soft lockup的误区：

soft lockup并不仅仅是由死循环引起的。

soft lockup并不是说在一段代码里执行了23秒，22秒。

这里简单解释一下上面的两点。

事实上，死循环并不一定会导致soft lockup，比如Linux内核生命周期内的0号进程就是一个死循环，此外很多的内核线程都是死循环。

此外，更难指望一段代码可以执行20多秒，要对现代计算机的速度有所概念。

soft lockup发生的真实场景是：

soft lockup是针对单独CPU而不是整个系统的。

soft lockup指的是发生的CPU上在20秒(默认)中没有发生调度切换。

第一点无须解释，下面重点看第二点。

很显然，只要让一个CPU在20秒左右的时间内都不发生进程切换，就会触发soft lockup ，这个 “20秒内不切换” 就是soft lockup发生的根因！

好了，现在我们来看20秒不切换的场景。

死循环的情况。这是最简单的场景，但细节往往不像看起来那么简单。比如你写了一个死循环在内核中执行，它一定会导致soft lockup吗？

我们来看一个内核死循环：

#include <linux/module.h>
#include <linux/kthread.h>
 
 
static int loop_func(void *arg)
{
  int i = 0;
  while(!kthread_should_stop()) {
      i++;
  }
    return 0;
}
 
 
struct task_struct *kt;
static int __init init_loop(void)
{
  kt = kthread_run(loop_func, NULL, "loop_thread");
  if (IS_ERR(kt)) {
    return -1;
  }
 
 
  return 0;
}
 
 
static void __exit exit_test(void)
{
  kthread_stop(kt);
}
 
 
module_init(init_loop);
module_exit(exit_loop);
MODULE_LICENSE("GPL");

加载这个模块，会soft lockup吗？

我们知道，虽然loop thread是一个死循环，但是它看起来正如一个普通用户态进程一样，在执行i++循环的时候，其实是可以被其它task抢占掉的，这是最基本的进程调度的常识。

但是如果你真的去加载这个模块，你会发现在有些机器上，它确实会soft lockup，但有的机器上不会，这又是为什么？

这里的关键在于内核抢占。你看下自己系统内核的配置文件，如果下面的配置打开，意味着上述模块的死循环不会造成soft lockup：

CONFIG_PREEMPT=y

如果这个配置没有开，那么便刑不上内核了，因为它在内核态执行，所以没有谁可以抢占它，进而发生soft lockup。

我们对上述的死循环代码是否会触发soft lockup已经很明确了，下面我们看另一种情况。

如果死循环不在内核线程上下文，而是在软中断上下文，会怎样？

很显然，软中断不能被进程抢占，所以一定会soft lockup。

当然，如果真的发生了死循环导致的soft lockup，那肯定是在一个循环代码中执行超过20秒了，不说20秒，如果无人干涉，200000秒都是有的…

2、什么是lockup？

Linux内核Lockup就是linux内核占用CPU不放，Lockup分为两种：soft lockup 和 hard lockup。

soft lockup是指CPU被内核代码占据，以至于无法执行其它进程。检测soft lockup的原理是给每个CPU分配一个定时执行的内核线程[watchdog/x]，如果该线程在设定的期限内没有得到执行的话就意味着发生了soft lockup，[watchdog/x]是SCHED_FIFO实时进程，优先级为最高的99，拥有优先运行的特权。

hard lockup比soft lockup更加严重，CPU不仅无法执行其它进程，而且不再响应中断。检测hard lockup的原理利用了PMU的NMI perf event，因为NMI中断是不可屏蔽的，在CPU不再响应中断的情况下仍然可以得到执行，它再去检查时钟中断的计数器hrtimer_interrupts是否在保持递增，如果停滞就意味着时钟中断未得到响应，也就是发生了hard lockup。

linux内核的代码实现在kernel/watchdog.c中，主体涉及到了3个东西：kernel线程，时钟中断，NMI中断（不可屏蔽中断）。这3个东西具有不一样的优先级，依次是kernel线程 < 时钟中断 < NMI中断。

2.1 lockup检测机制

Linux kernel设计了一个检测lockup的机制，称为NMI Watchdog，是利用NMI中断实现的，用NMI是因为lockup有可能发生在中断被屏蔽的状态下，这时唯一能把CPU抢下来的方法就是通过NMI，因为NMI中断是不可屏蔽的。NMI Watchdog 中包含 soft lockup detector 和 hard lockup detector，2.6之后的内核的实现方法如下。

NMI Watchdog 的触发机制包括两部分：

一个高精度计时器(hrtimer)，对应的中断处理例程是kernel/watchdog.c: watchdog_timer_fn()，在该例程中：
要递增计数器hrtimer_interrupts，这个计数器供hard lockup detector用于判断CPU是否响应中断；
要唤醒[watchdog/x]内核线程，该线程的任务是更新一个时间戳；
soft lock detector检查时间戳，如果超过soft lockup threshold一直未更新，说明[watchdog/x]未得到运行机会，意味着CPU被霸占，也就是发生了soft lockup。

2.基于PMU的NMI perf event，当PMU的计数器溢出时会触发NMI中断，对应的中断处理例程是 kernel/watchdog.c: watchdog_overflow_callback()，

hard lockup detector就在其中，它会检查上述hrtimer的中断次数(hrtimer_interrupts)是否在保持递增，如果停滞则表明hrtimer中断未得到响应，也就是发生了hard lockup。

2.2 softlockup的工作原理

softlockup 主要用于检测内核的进程调度是否正常，当发生softlockup时，内核不能被调度，
但是中断还是可以响应，而hrtimer属于中断的下半部，所以此情况下也可以响应。

系统在每个cpu上创建一个内核线程，当hrtimer定期执行的回调后会尝试唤醒此线程，如果线程有被正常调度而被唤醒，

它会更新时间变量watchdog_touch_ts，如果没有则不会更新。在hrtimer的回调函数中会判断watchdog_touch_ts和当前时间差，如果超过给定值，那就证明内核调度失败，接着就打印异常log。

code流程大概（简化版）：

lockup_detector_init ->
    cpu_callback ->    //action CPU_UP_PREPARE
        watchdog_prepare_cpu ->    //hrtimer对应function是watchdog_timer_fn
    cpu_callback ->
        watchdog_enable ->
            kthread_create ->    //为每个cpu创建名字叫watchdog/x的thread, x是cpu number，对应function为watchdog().
                watchdog ->    
                    sched_priority = MAX_RT_PRIO-1    //设置线程优先级成最高
                    __touch_watchdog    //初始化watchdog_touch_ts为当前时间
                    hrtimer_start    //启动timer，时间周期从get_sample_period(),由watchdog_thresh决定,
                                    除5是在触发hardlockup之前给hrtimer5次机会触发，后面文章会提到hardlockup。
                    __touch_watchdog    //重新更新时间戳。
                    schedule //睡眠等待设置时间后hrtimer触发
                    watchdog_timer_fn     -> //hrtimer被触发
                        wake_up_process    //唤醒前面休眠的进程
                        hrtimer_forward_now    //重新设置hrtimer
                        is_softlockup    //如果计算时间差超过了最大时间，证明没有进程调度了、
                        print_modules    //打印信息。
                        dump_stack    //打印堆栈信息。
            另一方面，watchdog线程会被唤醒，然后执行__touch_watchdog 会重新更新watchdog_touch_ts。

3、soft lockup机制分析

lockup_detector_init()函数首先获取sample_period以及watchdog_cpumask，然后根据情况创建线程，启动喂狗程序；创建hrtimer启动看门狗。

然后有两个重点一个是创建内核线程的API以及struct smp_hotplug_thread结构体。

void __init lockup_detector_init(void)
{
    set_sample_period();----------------------------------------获取变量sample_period，为watchdog_thresh*2/5，即4秒喂一次狗。
...
    cpumask_copy(&watchdog_cpumask, cpu_possible_mask);

    if (watchdog_enabled)
        watchdog_enable_all_cpus();
}

static int watchdog_enable_all_cpus(void)
{
    int err = 0;

    if (!watchdog_running) {----------------------------------如果当前watchdog_running没有再运行，那么为每个CPU创建一个watchdog/x线程，这些线程每隔sample_period时间喂一次狗。watchdog_threads时watchdog/x线程的主要输入参数，watchdog_cpumask规定了为哪些CPU创建线程。
        err = smpboot_register_percpu_thread_cpumask(&watchdog_threads,
                                 &watchdog_cpumask);
        if (err)
            pr_err("Failed to create watchdog threads, disabled\n");
        else
            watchdog_running = 1;
    } else {
        err = update_watchdog_all_cpus();

        if (err) {
            watchdog_disable_all_cpus();
            pr_err("Failed to update lockup detectors, disabled\n");
        }
    }

    if (err)
        watchdog_enabled = 0;

    return err;
}

static void watchdog_disable_all_cpus(void)
{
    if (watchdog_running) {
        watchdog_running = 0;
        smpboot_unregister_percpu_thread(&watchdog_threads);
    }
}

static int update_watchdog_all_cpus(void)
{
    int ret;

    ret = watchdog_park_threads();
    if (ret)
        return ret;

    watchdog_unpark_threads();

    return 0;
}

static int watchdog_park_threads(void)
{
    int cpu, ret = 0;

    atomic_set(&watchdog_park_in_progress, 1);

    for_each_watchdog_cpu(cpu) {
        ret = kthread_park(per_cpu(softlockup_watchdog, cpu));---------------------------设置struct kthread->flags的KTHREAD_SHOULD_PARK位，在watchdog/x线程中会调用unpark成员函数进行处理。
        if (ret)
            break;
    }

    atomic_set(&watchdog_park_in_progress, 0);

    return ret;
}

static void watchdog_unpark_threads(void)
{
    int cpu;

    for_each_watchdog_cpu(cpu)
        kthread_unpark(per_cpu(softlockup_watchdog, cpu));-------------------------------清空struct kthread->flags的KTHREAD_SHOULD_PARK位，在watchdog/x线程中会调用park成员函数。
}

3.1 watchdog_threads结构体介绍

在介绍如何创建watchdog/x线程之前，有必要先介绍一些struct smp_hotplug_thread线程。

struct smp_hotplug_thread {
    struct task_struct __percpu    **store;--------------------------存放percpu strcut task_strcut指针的指针。
    struct list_head        list;
    int                (*thread_should_run)(unsigned int cpu);-------检查是否应该运行watchdog/x线程。
    void                (*thread_fn)(unsigned int cpu);--------------watchdog/x线程的主函数。
    void                (*create)(unsigned int cpu);
    void                (*setup)(unsigned int cpu);------------------在运行watchdog/x线程之前的准备工作。
    void                (*cleanup)(unsigned int cpu, bool online);---在退出watchdog/x线程之后的清楚工作。
    void                (*park)(unsigned int cpu);-------------------当CPU offline时，需要临时停止。
    void                (*unpark)(unsigned int cpu);-----------------当CPU变成online时，进行准备工作。
    cpumask_var_t            cpumask;--------------------------------允许哪些CPU online。
    bool                selfparking;
    const char            *thread_comm;------------------------------watchdog/x线程名称。
};

watchdog_threads是soft lockup监控线程的实体，基于此创建 watchdog/x线程。

static struct smp_hotplug_thread watchdog_threads = {
    .store            = &softlockup_watchdog,
    .thread_should_run    = watchdog_should_run,
    .thread_fn        = watchdog,
    .thread_comm        = "watchdog/%u",
    .setup            = watchdog_enable,
    .cleanup        = watchdog_cleanup,
    .park            = watchdog_disable,
    .unpark            = watchdog_enable,
};

static void watchdog_enable(unsigned int cpu)
{
    struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);

    /* kick off the timer for the hardlockup detector */
    hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
    hrtimer->function = watchdog_timer_fn;------------------------------------------创建一个hrtimer，超时函数为watchdog_timer_fn，这里面会检查watchdog_touch_ts变量是否超过20秒没有被更新。如果是，则有soft lockup。

    /* Enable the perf event */
    watchdog_nmi_enable(cpu);

    /* done here because hrtimer_start can only pin to smp_processor_id() */
    hrtimer_start(hrtimer, ns_to_ktime(sample_period),
              HRTIMER_MODE_REL_PINNED);---------------------------------------------启动一个超时为sample_period(4秒)的hrtimer，HRTIMER_MODE_REL_PINNED表示此hrtimer和当前CPU绑定。

    /* initialize timestamp */
    watchdog_set_prio(SCHED_FIFO, MAX_RT_PRIO - 1);---------------------------------设置当前线程为实时FIFO，并且优先级为实时99.这个优先级表示高于所有的非实时线程，但是实时优先级最低的。
    __touch_watchdog();-------------------------------------------------------------更新watchdog_touch_ts变量，相当于喂狗操作。
}

static void watchdog_set_prio(unsigned int policy, unsigned int prio)
{
    struct sched_param param = { .sched_priority = prio };

    sched_setscheduler(current, policy, &param);
}

/* Commands for resetting the watchdog */
static void __touch_watchdog(void)
{
    __this_cpu_write(watchdog_touch_ts, get_timestamp());----------------------------喂狗的操作就是更新watchdog_touch_ts变量，也即当前时间戳。
}


static void watchdog_disable(unsigned int cpu)-------------------------------------相当于watchdog_enable()反操作，将线程恢复为普通线程；取消hrtimer。
{
    struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);

    watchdog_set_prio(SCHED_NORMAL, 0);
    hrtimer_cancel(hrtimer);
    /* disable the perf event */
    watchdog_nmi_disable(cpu);
}

static void watchdog_cleanup(unsigned int cpu, bool online)
{
    watchdog_disable(cpu);
}

static int watchdog_should_run(unsigned int cpu)
{
    return __this_cpu_read(hrtimer_interrupts) !=
        __this_cpu_read(soft_lockup_hrtimer_cnt);------------------------------------hrtimer_interrupts记录了产生hrtimer的次数；在watchdog()中，将hrtimer_interrupts赋给soft_lockup_hrtimer_cnt。两者相等表示没有hrtimer产生，不需要运行watchdog/x线程；相反不等，则需要watchdog/x线程运行。
}
static void watchdog(unsigned int cpu)
{
    __this_cpu_write(soft_lockup_hrtimer_cnt,
             __this_cpu_read(hrtimer_interrupts));-----------------------------------更新soft_lockup_hrtimer_cnt，在watch_should_run()中就返回false，表示线程不需要运行，即不需要喂狗。
    __touch_watchdog();--------------------------------------------------------------虽然就是一句话，但是却很重要的喂狗操作。

    if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
        watchdog_nmi_disable(cpu);
}

3.2 创建喂狗线程watchdog/x

在分析了watchdog_threads之后，再来看看如何创建watchdog/x线程。

int smpboot_register_percpu_thread_cpumask(struct smp_hotplug_thread *plug_thread,
                       const struct cpumask *cpumask)
{
    unsigned int cpu;
    int ret = 0;

    if (!alloc_cpumask_var(&plug_thread->cpumask, GFP_KERNEL))
        return -ENOMEM;
    cpumask_copy(plug_thread->cpumask, cpumask);

    get_online_cpus();
    mutex_lock(&smpboot_threads_lock);
    for_each_online_cpu(cpu) {------------------------------------------------遍历所有online CPU，为每个CPU创建一个percpu的watchdog/x线程。
        ret = __smpboot_create_thread(plug_thread, cpu);
        if (ret) {
            smpboot_destroy_threads(plug_thread);-----------------------------创建失败则释放相关资源。
            free_cpumask_var(plug_thread->cpumask);
            goto out;
        }
        if (cpumask_test_cpu(cpu, cpumask))
            smpboot_unpark_thread(plug_thread, cpu);--------------------------如果当前CPU不在cpumask中，则清空KTHREAD_SHOULD_PARK，进而调用watchdog_therads的umpark成员函数。
    }
    list_add(&plug_thread->list, &hotplug_threads);
out:
    mutex_unlock(&smpboot_threads_lock);
    put_online_cpus();
    return ret;
}

static int
__smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
{
    struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
    struct smpboot_thread_data *td;

    if (tsk)
        return 0;

    td = kzalloc_node(sizeof(*td), GFP_KERNEL, cpu_to_node(cpu));
    if (!td)
        return -ENOMEM;
    td->cpu = cpu;
    td->ht = ht;

    tsk =kthread_create_on_cpu(smpboot_thread_fn, td, cpu,
                    ht->thread_comm);-----------------------------------------在指定CPU上创建watchdog/x线程，处理函数为smpboot_thread_fn()。
    if (IS_ERR(tsk)) {
        kfree(td);
        return PTR_ERR(tsk);
    }
    /*
     * Park the thread so that it could start right on the CPU
     * when it is available.
     */
    kthread_park(tsk);--------------------------------------------------------在CPU上立即启动watchdog/x线程。
    get_task_struct(tsk);-----------------------------------------------------增加对线程的引用计数。
    *per_cpu_ptr(ht->store, cpu) = tsk;---------------------------------------store存放线程结构体指针的指针。
    if (ht->create) {
        if (!wait_task_inactive(tsk, TASK_PARKED))
            WARN_ON(1);
        else
            ht->create(cpu);
    }
    return 0;
}

static int smpboot_thread_fn(void *data)
{
    struct smpboot_thread_data *td = data;
    struct smp_hotplug_thread *ht = td->ht;

    while (1) {
        set_current_state(TASK_INTERRUPTIBLE);
        preempt_disable();
        if (kthread_should_stop()) {----------------------------------------如果可以终止线程，调用cleanup，退出线程。
            __set_current_state(TASK_RUNNING);
            preempt_enable();
            /* cleanup must mirror setup */
            if (ht->cleanup && td->status != HP_THREAD_NONE)
                ht->cleanup(td->cpu, cpu_online(td->cpu));
            kfree(td);
            return 0;
        }

        if (kthread_should_park()) {----------------------------------------如果KTHREAD_SHOULD_PARK置位，调用park()暂停进程执行。
            __set_current_state(TASK_RUNNING);
            preempt_enable();
            if (ht->park && td->status == HP_THREAD_ACTIVE) {
                BUG_ON(td->cpu != smp_processor_id());
                ht->park(td->cpu);
                td->status = HP_THREAD_PARKED;
            }
            kthread_parkme();
            /* We might have been woken for stop */
            continue;
        }

        BUG_ON(td->cpu != smp_processor_id());

        /* Check for state change setup */
        switch (td->status) {
        case HP_THREAD_NONE:-----------------------------------------------相当于第一次运行，调用setup()进行初始化操作。
            __set_current_state(TASK_RUNNING);
            preempt_enable();
            if (ht->setup)
                ht->setup(td->cpu);
            td->status = HP_THREAD_ACTIVE;
            continue;

        case HP_THREAD_PARKED:---------------------------------------------从parked状态恢复。
            __set_current_state(TASK_RUNNING);
            preempt_enable();
            if (ht->unpark)
                ht->unpark(td->cpu);
            td->status = HP_THREAD_ACTIVE;
            continue;
        }

        if (!ht->thread_should_run(td->cpu)) {-----------------------------如果不需要进程运行，schedule()主动放弃CPU给其他线程使用。
            preempt_enable_no_resched();
            schedule();
        } else {
            __set_current_state(TASK_RUNNING);
            preempt_enable();
            ht->thread_fn(td->cpu);----------------------------------------调用struct smpboot_thread_fn->thread_fn及watchdog()，进行喂狗操作。
        }
    }
}

void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread)----将创建的内核线程移除操作。
{
    get_online_cpus();
    mutex_lock(&smpboot_threads_lock);
    list_del(&plug_thread->list);
    smpboot_destroy_threads(plug_thread);
    mutex_unlock(&smpboot_threads_lock);
    put_online_cpus();
    free_cpumask_var(plug_thread->cpumask);
}

static void smpboot_destroy_threads(struct smp_hotplug_thread *ht)
{
    unsigned int cpu;

    /* We need to destroy also the parked threads of offline cpus */
    for_each_possible_cpu(cpu) {
        struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);

        if (tsk) {
            kthread_stop(tsk);
            put_task_struct(tsk);
            *per_cpu_ptr(ht->store, cpu) = NULL;
        }
    }
}

3.3 hrtimer看门狗

在分析了喂狗线程watchdog/x之后，再来分析看门狗是如何实现的？

看门狗是通过启动一个周期为4秒的hrtimer来实现的，这个hrtimer和CPU绑定，使用的变量都是percpu的。确保每个CPU之间不相互干扰。

每次hrtimer超时，都会唤醒watchdog/x线程，并进行一次喂狗操作。

因为hrtimer超时函数在软中断中调用，在中断产生后会比线程优先得到执行。

所以在watchdog/x线程没有得到执行的情况下，通过is_softlockup()来判断看门狗是否超过20秒没有得到喂狗。

static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
{
    unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);
    struct pt_regs *regs = get_irq_regs();
    int duration;
    int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;

    if (atomic_read(&watchdog_park_in_progress) != 0)
        return HRTIMER_NORESTART;

    /* kick the hardlockup detector */
    watchdog_interrupt_count();------------------------------------------------------------------没产生一次中断，hrtimer_interrupts计数加1.hrtimer_interrupts记录了产生hrtimer的次数。

    /* kick the softlockup detector */
    wake_up_process(__this_cpu_read(softlockup_watchdog));---------------------------------------唤醒watchdog/x线程，进行喂狗操作。

    /* .. and repeat */
    hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));------------------------------------重新设置超时点，形成周期性时钟。
...
    duration = is_softlockup(touch_ts);----------------------------------------------------------返回非0表示，看门狗超时。
    if (unlikely(duration)) {--------------------------------------------------------------------看门狗超时情况的处理。
        if (kvm_check_and_clear_guest_paused())
            return HRTIMER_RESTART;

        /* only warn once */
        if (__this_cpu_read(soft_watchdog_warn) == true) {
            if (__this_cpu_read(softlockup_task_ptr_saved) !=
                current) {
                __this_cpu_write(soft_watchdog_warn, false);
                __touch_watchdog();
            }
            return HRTIMER_RESTART;
        }

        if (softlockup_all_cpu_backtrace) {
            if (test_and_set_bit(0, &soft_lockup_nmi_warn)) {
                /* Someone else will report us. Let's give up */
                __this_cpu_write(soft_watchdog_warn, true);
                return HRTIMER_RESTART;
            }
        }

        pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
            smp_processor_id(), duration,
            current->comm, task_pid_nr(current));-------------------------------------------------打印哪个CPU被卡死duration秒，以及死在哪个进程。
        __this_cpu_write(softlockup_task_ptr_saved, current);
        print_modules();
        print_irqtrace_events(current);-----------------------------------------------------------显示开关中断、软中断信息，禁止中断和软中断也是造成soft lockup的一个原因。
        if (regs)---------------------------------------------------------------------------------有寄存器显示寄存器信息，同时显示栈信息。
            show_regs(regs);
        else
            dump_stack();

        if (softlockup_all_cpu_backtrace) {
            trigger_allbutself_cpu_backtrace();

            clear_bit(0, &soft_lockup_nmi_warn);
            /* Barrier to sync with other cpus */
            smp_mb__after_atomic();
        }

        add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
        if (softlockup_panic)---------------------------------------------------------------------如果定义softlockup_panic则进入panic()。
            panic("softlockup: hung tasks");
        __this_cpu_write(soft_watchdog_warn, true);
    } else
        __this_cpu_write(soft_watchdog_warn, false);

    return HRTIMER_RESTART;
}
  static void watchdog_interrupt_count(void)  {      __this_cpu_inc(hrtimer_interrupts);  }
static int is_softlockup(unsigned long touch_ts)
{
    unsigned long now = get_timestamp();

    if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
        /* Warn about unreasonable delays. */
        if (time_after(now, touch_ts + get_softlockup_thresh()))
            return now - touch_ts;
    }
    return 0;
}

4、对watchdog的设置

对watchdog行为的设置有两个途径：通过命令行传入参数和通过proc设置。

4.1 通过命令行设置

通过命令行传入参数，可以对soft lockup进行开关设置、超时过后是否panic等等行为。

static int __init softlockup_panic_setup(char *str)
{
    softlockup_panic = simple_strtoul(str, NULL, 0);

    return 1;
}
__setup("softlockup_panic=", softlockup_panic_setup);

static int __init nowatchdog_setup(char *str)
{
    watchdog_enabled = 0;
    return 1;
}
__setup("nowatchdog", nowatchdog_setup);

static int __init nosoftlockup_setup(char *str)
{
    watchdog_enabled &= ~SOFT_WATCHDOG_ENABLED;
    return 1;
}
__setup("nosoftlockup", nosoftlockup_setup);

#ifdef CONFIG_SMP
static int __init softlockup_all_cpu_backtrace_setup(char *str)
{
    sysctl_softlockup_all_cpu_backtrace =
        !!simple_strtol(str, NULL, 0);
    return 1;
}
__setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
static int __init hardlockup_all_cpu_backtrace_setup(char *str)
{
    sysctl_hardlockup_all_cpu_backtrace =
        !!simple_strtol(str, NULL, 0);
    return 1;
}
__setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
#endif

4.2 通过sysfs节点调节watchdog

watchdog相关的配置还可以通过proc文件系统进行配置。

/proc/sys/kernel/nmi_watchdog-------------------------hard lockup开关，proc_nmi_watchdog()。
/proc/sys/kernel/soft_watchdog------------------------soft lockup开关，proc_soft_watchdog()。
/proc/sys/kernel/watchdog-----------------------------watchdog总开关，proc_watchdog()。
/proc/sys/kernel/watchdog_cpumask---------------------watchdog cpumaks，proc_watchdog_cpumask()。
/proc/sys/kernel/watchdog_thresh----------------------watchdog超时阈值设置，proc_watchdog_thresh()。

4.3 定位soft lockup异常

引起soft lockup的原因一般是死循环或者死锁，死循环可以通过栈回溯找到问题点；死锁问题需要打开内核的lockdep功能。

打开内核的lockdep功能可以参考《Linux死锁检测-Lockdep》。

下面看一个while(1)引起的soft lockup异常分析：

[ 5656.032325] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [cat:157]-----------------------CPU、进程等信息粗略定位。
[ 5656.039314] Modules linked in:
[ 5656.042386] 
[ 5656.042386] CURRENT PROCESS:
[ 5656.042386] 
[ 5656.048229] COMM=cat PID=157
[ 5656.051117] TEXT=00008000-000c5a68 DATA=000c6f1c-000c7175 BSS=000c7175-000c8000
[ 5656.058432] USER-STACK=7fc1ee50  KERNEL-STACK=bd0b7080
[ 5656.058432] 
[ 5656.065069] PC: 0x8032a1b2 (clk_summary_show+0x62/0xb4)--------------------------------------------PC指向出问题的点，更加精确的定位。
[ 5656.070302] LR: 0x8032a186 (clk_summary_show+0x36/0xb4)
[ 5656.075531] SP: 0xbd8b1b74...
[ 5656.217622] 
Call Trace:-----------------------------------------------------------------------------------------通过Call Trace，可以了解如何做到PC指向的问题点的。来龙去脉一目了然。
[<80155c5e>] seq_read+0xc2/0x46c
[<802826ac>] full_proxy_read+0x58/0x98
[<8013239c>] do_readv_writev+0x31c/0x384
[<80132458>] vfs_readv+0x54/0x8c
[<80160b52>] default_file_splice_read+0x166/0x2b0
[<801606ee>] do_splice_to+0x76/0xb0
[<801607de>] splice_direct_to_actor+0xb6/0x21c
[<801609c2>] do_splice_direct+0x7e/0xa8
[<80132a5a>] do_sendfile+0x21a/0x45c
[<80133776>] SyS_sendfile64+0xf6/0xfc
[<80046186>] csky_systemcall+0x96/0xe0

系统运维最新文章

配置小型公司网络WLAN基本业务（AC通过三层

如何用DWDM射频光纤技术实现200公里外的站点

国内顺畅下载k8s.gcr.io的镜像

自动化测试appium

ctfshow ssrf

Linux操作系统学习之实用指令（Centos7/8均