Bug #111349

测试 Test-IT #110867: V1.0测试功能汇总

测试 Test-IT #110868: BSP-BVT测试-上/下电测试

【BSP】【EVT】【power】【低概率】执行adb reboot,低概率起不来

Added by SZTS_TEST 邹涛 almost 3 years ago. Updated over 2 years ago.

Status:CLOSEDStart date:2022-09-15
Priority:HighDue date:
Assignee:SZTS_TEST 邹涛% Done:

100%

Category:SYSTEM
Target version:VX1_MCE_FSE_V3.0_20221030
Need_Info:-- Found Version:0.0.0.20220818_alpha_004
Resolution:-- Degrated:--
Severity:Critical Verified Version:cdiot /Pre_figure/VerifyBuild/Pre_figure_turbox-c2130c-la1.1-qssi12-dev/20220907/202209072155-1168
Reproducibility:Rarely Fixed Version:2022-10-19
Test Type:Bring Up Test Root cause:https://dev.thundercomm.com/gerrit/c/general/kernel/msm-4.19/+/147583
drm_atomic_state_alloc函数中未对空指针做判断处理

Description

前置条件:
1、DUT各模块功能正常且处于开机状态

操作步骤:
1、cmd内执行adb reboot命令,重启过程中查看设备状态

实际结果:
1、板子低概率(1/12)起不来

预期结果:

Serial-COM119_20220818144212_reboot起不来.log (202 KB) SZTS_TEST 邹涛, 2022-08-18 16:16

Serial-COM9_20220901230244.rar (655 KB) SZTS_TEST 邹涛, 2022-09-02 00:43

Port_COM10.rar (40.9 MB) SZTS_TEST 邹涛, 2022-09-02 11:03

2022-09-05-22-23-29.png (31.7 KB) CD SYSTEM-赵正军, 2022-09-05 22:35

572119275.jpg (3.07 MB) CD SYSTEM-赵正军, 2022-09-13 12:08

minicom.zip (15.4 MB) CD SYSTEM-赵正军, 2022-09-13 12:12

Serial-COM119_20220927111935.rar (4.68 MB) SZTS_TEST 邹涛, 2022-09-27 17:31

重启285次未见异常_串口日志.rar (5.13 MB) SZTS_TEST 邹涛, 2022-10-12 17:17

Serial-COM78_20221015000000.rar (25.2 MB) SZTS_TEST 邹涛, 2022-10-17 11:30

2022-09-05-22-23-29 572119275

Subtasks

Bug #111975: 【BSP】【EVT】【power】【低概率】执行adb reboot,概率出现drm 空指针CLOSEDSZTS_TEST 邹涛


Related issues

Related to Figure - Bug #111881: 【BSP】【EVT】【power】【概率:1/15】循环执行reboot命令15次,出现一次板子未起来,屏幕黑屏,... CLOSED 2022-09-09
Related to Figure - Bug #111703: 【BSP】【EVT】【power】【概率:1/30】插拔DC电源30次,出现一次设备进入900E CLOSED 2022-09-02
Related to Figure - Bug #111971: 【VC1】【BSP】【EVT】【power】【低概率】脚本循环执行reboot,2H后出现板子进900e,无法正常... CLOSED 2022-09-15 2023-03-10

History

#1 Updated by CD SYSTEM-赵正军 almost 3 years ago

  • Assignee changed from SZ 系统-张丽果 to CD SYSTEM-赵正军
  • % Done changed from 0 to 50

从现有log中的发现,drm 图像显示逻辑存在空指针,同时本地也压测(循环重启50次出现一次)复现,已取ramdump,待进一步分析
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030

[14:42:38][ 32.178041] OF: graph: no port node found in /soc/qcom,dsi-display-primary
[14:42:38][ 32.185230] [drm] [msm-dsi-warn]: [hx82101+hx8692 video mode dsi truly panel] fallback to default te-pin-select
[14:42:38][ 32.195660] NPU_INFO: npu_reboot_handler: 760 Device is rebooting with code 1
[14:42:38][ 32.197156] msm_drm ae00000.qcom,mdss_mdp: Linked as a consumer to regulator.31
[14:42:38][ 32.210942] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
[14:42:38][ 32.211016] register_client_adhoc:find path.src 1 dest 590
[14:42:38][ 32.212332] register_client_adhoc:Client handle 16 mdss_reg
[14:42:38][ 32.220007] Mem abort info:
[14:42:38][ 32.220014] ESR = 0x96000006
[14:42:38][ 32.220018] Exception class = DABT (current EL), IL = 32 bits
[14:42:38][ 32.220021] SET = 0, FnV = 0
[14:42:38][ 32.225816] register_client_adhoc:find path.src 22 dest 512
[14:42:38][ 32.227070] register_client_adhoc:find path.src 23 dest 512
[14:42:38][ 32.231438] EA = 0, S1PTW = 0
[14:42:38][ 32.235499] register_client_adhoc:Client handle 66 mdss_sde
[14:42:38][ 32.235925] [drm:sde_dbg_init:4676] evtlog_status: enable:11, panic:1, dump:2
[14:42:38][ 32.237484] Data abort info:
[14:42:38][ 32.237487] ISV = 0, ISS = 0x00000006
[14:42:38][ 32.237490] CM = 0, WnR = 0
[14:42:38][ 32.237495] user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffdebef1d000
[14:42:38][ 32.237498] [0000000000000030] pgd=00000000ffc2e003, pud=00000000ffc2e003, pmd=0000000000000000
[14:42:38][ 32.243668] msm_drm ae00000.qcom,mdss_mdp: bound soc:qcom,wb-display@0 (ops sde_wb_comp_ops)
[14:42:38][ 32.246933] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[14:42:42][ 32.253011] msm-dsi-display soc:qcom,dsi-display-primary: Linked as a consumer to regulator.25
[14:42:42][ 32.258227] Modules linked in:
[14:42:42][ 32.258229] Process init (pid: 1, stack limit = 0xffffff8008058000)
[14:42:42][ 32.258231] CPU: 4 PID: 1 Comm: init Tainted: G S W 4.19.157+ #1
[14:42:42][ 32.258232] Hardware name: Qualcomm Technologies, Inc. kona MTP-RmPM8150b (DT)
[14:42:42][ 32.258233] pstate: 60400005 (nZCv daif +PAN -UAO)
[14:42:42][ 32.258239] pc : drm_atomic_state_alloc+0x18/0x80
[14:42:42][ 32.258245] lr : msm_lastclose+0x28c/0x458
[14:42:42][ 32.262005] [drm:dsi_display_bind] [msm-dsi-info]: Successfully bind display panel 'qcom,mdss_dsi_panel_hx82101_hx8692_truly_v2_video'

#2 Updated by CD SYSTEM-赵正军 almost 3 years ago

1,加入调试代码后单刷boot/dtbo分区,压力测试会概率出现镜像校验错误系统无法启动,
和security同事沟通,建议不要刷单个镜像测稳定性问题,起jenkins编译VB验证.
2,在本地完整编译版本.替换DB版本的xbl文件,这种情况能稳定出现reboot不能启动的问题,日志也一致.
3,同步提高通case,解析dump文件, 之前本地解析过一次,无法解析,但是版本没有问题,需要确认symbol文件是否打包正确.

#3 Updated by SZTS_TEST 邹涛 almost 3 years ago

0.0.0.20220901_alpha_private_userdebug 版本验证adb reboot 23次后,板子无法起来,插拔DC后恢复;串口log已经上传

#4 Updated by SZTS_TEST 邹涛 almost 3 years ago

#5 Updated by CD SYSTEM-赵正军 almost 3 years ago

目前高通定位是PCIE引起的问题,提供了一个修改方案如下,

经验证,问题还是存在

#6 Updated by CD SYSTEM-赵正军 almost 3 years ago

通过"fastboot oem select-display-panel none"的方式关闭连续显示
定位drm相关代码修改,移除提交#14061#140603的修改,版本同样存在问题
目前考虑从出问题之前的报错日志分析定位,从现有日志看分区选择也存在问题,日志如下
[ 35.197102] init: [libfs_mgr]Error updating for slotselect
[ 35.202770] init: [libfs_mgr]ReadFstabFromFile(): failed to load fstab from : '/fstab.qcom'

#7 Updated by CD SYSTEM-赵正军 almost 3 years ago

目前可以确定的是drivers/gpu/drm/drm_atomic.c drm_atomic_state_alloc函数里面
结构体指针config和其成员funcs存在NULL的情况.目前通过如下修改来处理指针为NULL的情况

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index cb3cc5a2d2ef..07332b7fda41 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -112,6 +112,10 @@ struct drm_atomic_state *
 drm_atomic_state_alloc(struct drm_device *dev)
 {
        struct drm_mode_config *config = &dev->mode_config;
+        if (!config || !config->funcs) {
+               printk("drmtest %d %s\n",__LINE__, __func__);
+               return NULL;
+        }

        if (!config->funcs->atomic_state_alloc) {
                struct drm_atomic_state *state;

目前在起VB验证

#8 Updated by CD SYSTEM-赵正军 almost 3 years ago

  • Status changed from New to RESOLVED
  • % Done changed from 50 to 100
  • Verified Version set to /Pre_figure/VerifyBuild/Pre_figure_turbox-c2130c-la1.1-qssi12-dev/20220907/202209072155-1168

使用VB版本测试4个小时,循环重启两百多次,没有空指针报错,提交如下
https://dev.thundercomm.com/gerrit/c/147583

#9 Updated by CD SYSTEM-赵正军 almost 3 years ago

  • Assignee changed from CD SYSTEM-赵正军 to SZTS_TEST 邹涛
  • Verified Version changed from /Pre_figure/VerifyBuild/Pre_figure_turbox-c2130c-la1.1-qssi12-dev/20220907/202209072155-1168 to cdiot /Pre_figure/VerifyBuild/Pre_figure_turbox-c2130c-la1.1-qssi12-dev/20220907/202209072155-1168

还请测试同事复测确认,谢谢

#10 Updated by CD SYSTEM-赵正军 almost 3 years ago

目前用如下版本测试345次,没有出现问题, 需要继续使用其他板子进一步确认
ftp://dvbuild:thundercomm@10.0.76.28/home/scm/VerifyBuild/Pre_figure_turbox-c2130c-la1.1-qssi12-dev/20220909/202209092232-1206
测试log见附件minicom.zip

#12 Updated by CD TPM-申艳艳 almost 3 years ago

  • Category set to SYSTEM

#13 Updated by SZTS_TEST 邹涛 almost 3 years ago

ftp://dvbuild:thundercomm@10.0.76.28/home/scm/VerifyBuild/Pre_figure_turbox-c2130c-la1.1-qssi12-dev/20220909/202209092232-1206
使用修改gpio的硬件+上面的软件版本,执行重启脚本2H后设备进入900e

#14 Updated by CD SYSTEM-赵正军 almost 3 years ago

从测试飞书群里发的log Serial-COM119_20220914175848.log来看,没有出现问题一开始drm空指针以及slotselect相关的报错,出现的是以下报错,
[19:27:44][ 80.889803] kernel BUG at drivers/net/wireless/cnss2/pci.c:2433!

[19:27:36][ 72.681214] cnss: fatal: Timeout waiting for FW ready indication
[19:27:44][ 80.876331] cnss: Timeout (40000ms) waiting for calibration to complete
[19:27:44][ 80.885980] cnss: ASSERT at line 2433
[19:27:44][ 80.889803] kernel BUG at drivers/net/wireless/cnss2/pci.c:2433!
[19:27:44][ 80.896066] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[19:27:44][ 80.901703] Modules linked in: wlan(O+) machine_dlkm(O) wcd938x_slave_dlkm(O) wcd938x_dlkm(O) wcd9xxx_dlkm(O) mbhc_dlkm(O) tx_macro_dlkm(O) rx_macro_dlkm(O) va_macro_dlkm(O) wsa_macro_dlkm(O) swr_ctrl_dlkm(O) bolero_cdc_dlkm(O) wsa881x_dlkm(O) wcd_core_dlkm(O) stub_dlkm(O) hdmi_dlkm(O) swr_dlkm(O) pinctrl_lpi_dlkm(O) pinctrl_wcd_dlkm(O) usf_dlkm(O) native_dlkm(O) platform_dlkm(O) q6_dlkm(O) adsp_loader_dlkm(O) apr_dlkm(O) snd_event_dlkm(O) q6_notifier_dlkm(O) q6_pdr_dlkm(O) msm_11ad_proxy
[19:27:44][ 80.946055] Process modprobe (pid: 1229, stack limit = 0xffffff8017768000)
[19:27:44][ 80.953111] CPU: 6 PID: 1229 Comm: modprobe Tainted: G S W O L 4.19.157+ #1
[19:27:44][ 80.961059] Hardware name: Qualcomm Technologies, Inc. kona MTP-RmPM8150b (DT)
[19:27:44][ 80.968469] pstate: 60400005 (nZCv daif +PAN -UAO)
[19:27:44][ 80.973394] pc : cnss_wlan_register_driver+0x2f0/0x2f8
[19:27:44][ 80.978670] lr : cnss_wlan_register_driver+0x2f0/0x2f8
[19:27:44][ 80.983945] sp : ffffff801776b9e0

我们的修改并没有改wireless相关的代码,还请测试同事另起bug给到LC团队

#15 Updated by CD SYSTEM-胡兵 almost 3 years ago

  • Assignee changed from SZTS_TEST 邹涛 to CD SYSTEM-赵正军
  • Target version set to 619

最新VB200次复测中已触发问题,目前规避方案生效,设备重启。相关问题已有单:https://share.thundersoft.com/redmine/issues/111881

V2.0 从解决root case

提交:
https://dev.thundercomm.com/gerrit/c/general/platform/system/core/+/150066

问题日志:
[14:51:00][ 23.029657] Run /init as init process
[14:51:00][ 23.038763] init: init first stage started!
[14:51:00][ 23.043355] init: Unable to open /lib/modules, skipping module loading.
[14:51:00][ 23.051362] init: [libfs_mgr]dt_fstab: Skip disabled entry for partition vendor
[14:51:00][ 23.058968] init: [libfs_mgr]ReadFstabFromDt(): failed to read fstab from dt
[14:51:00][ 23.066637] init: [libfs_mgr]dt_fstab: Skip disabled entry for partition vendor
[14:51:00][ 23.074445] init: [libfs_mgr]GetFstabPath()fstab_path:/odm/etc/fstab.default
[14:51:00][ 23.081808] init: [libfs_mgr]GetFstabPath()fstab_path:/vendor/etc/fstab.default
[14:51:00][ 23.089378] init: [libfs_mgr]GetFstabPath()fstab_path:/fstab.default
[14:51:00][ 23.096201] init: [libfs_mgr]GetFstabPath()fstab_path:/odm/etc/fstab.qcom
[14:51:00][ 23.103771] init: [libfs_mgr]GetFstabPath()fstab_path:/vendor/etc/fstab.qcom

[14:51:00][ 23.123476] OF: graph: no port node found in /soc/qcom,dsi-display-primary
[14:51:00][ 23.130639] [drm] [msm-dsi-warn]: [hx82101+hx8692 video mode dsi truly panel] fallback to default te-pin-select
[14:51:00][ 23.141065] NPU_INFO: npu_reboot_handler: 760 Device is rebooting with code 1
[14:51:00][ 23.142548] msm_drm ae00000.qcom,mdss_mdp: Linked as a consumer to regulator.31
[14:51:00][ 23.156369] drmtest 116 drm_atomic_state_alloc
[14:51:00][ 23.156492] register_client_adhoc:find path.src 1 dest 590
[14:51:00][ 23.157769] register_client_adhoc:Client handle 16 mdss_reg
[14:51:00][ 23.160997] last close failed: -12
[14:51:00][ 23.177464] register_client_adhoc:find path.src 22 dest 512
[14:51:00][ 23.178714] register_client_adhoc:find path.src 23 dest 512
[14:51:00][ 23.185645] register_client_adhoc:Client handle 66 mdss_sde
[14:51:00][ 23.192179] [drm:sde_dbg_init:4676] evtlog_status: enable:11, panic:1, dump:2
[14:51:00][ 23.200303] qcom_rpmh DRV:apps_rsc TCS Busy, retrying RPMH message send: addr=0x41b08
[14:51:00][ 23.205788] msm_drm ae00000.qcom,mdss_mdp: bound soc:qcom,wb-display@0 (ops sde_wb_comp_ops)
[14:51:00][ 23.217353] reboot: Restarting system with command ''

#16 Updated by 物联网项目组-RD3_CDTS 周飞 almost 3 years ago

  • Target version changed from 619 to VC1_FSE_B sample_20221015

#19 Updated by CD SYSTEM-赵正军 almost 3 years ago

  • Assignee changed from CD SYSTEM-赵正军 to SZTS_TEST 邹涛

还请测试同事随机跟踪几个DB版本测试 看是否有问题

#20 Updated by CDTS_TEST 王成 over 2 years ago

  • Target version changed from VC1_FSE_B sample_20221015 to VX1_MCE_FSE_V2.0_update_20221012

10.12号交付版本与10.15号的是同一个版本

#21 Updated by CD SYSTEM-赵正军 over 2 years ago

Hi,邹涛

还请用最新的DB做一次压力测试,同时保存串口log。我们需要从log确认是否还有一开始报出的问题

谢谢!

#22 Updated by CDTS_TEST 刘勇 over 2 years ago

【未复现的串口LOG】使用 1011的UD版本,压测1000次的串口log链接: https://thundersoft.feishu.cn/file/boxcnj5Oko6DqubzgWqFmNFQnve

#23 Updated by SZTS_TEST 邹涛 over 2 years ago

使用release分支1011号userdebug版本reboot验证285次,板子未发生异常

#24 Updated by CDTS_TEST 刘勇 over 2 years ago

【未复现】使用20221012的日构建UD版本压测655次未复现,串口LOG地址:https://thundersoft.feishu.cn/file/boxcnGCvCR7kUs20EIg4Po97qhd

#25 Updated by CDTS_TEST 王成 over 2 years ago

  • Target version changed from VX1_MCE_FSE_V2.0_update_20221012 to VX1_MCE_FSE_V3.0_20221030

持续跟踪,以当前上千次的测试结果来分析,为极低概率

#26 Updated by SZTS_TEST 邹涛 over 2 years ago

EVT2板子使用1013号userdebug版本,验证reboot 2400多次后出现板子卡在900e
dump文件路径:https://thundersoft.feishu.cn/file/boxcnKx4yzOYPN8ioaAjRY5FYVb?from=from_copylink

#27 Updated by CD SYSTEM-赵正军 over 2 years ago

2560000 [22:04:08][   78.828557] cnss: ASSERT at line 2433
2560001 [22:04:08][   78.832364] kernel BUG at drivers/net/wireless/cnss2/pci.c:2433!
2560002 [22:04:08][   78.838610] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
2560003 [22:04:08][   78.844241] Modules linked in: wlan(O+) tsnv(O) machine_dlkm(O) wcd938x_slave_dlkm(O) wcd938x_dlkm(O) wcd9xxx_dlkm(O) mbhc_dlkm(O) tx_macro_dlkm(O) rx_macro_dlkm(O) va_macro_dlkm(O) wsa_macr        o_dlkm(O) swr_ctrl_dlkm(O) bolero_cdc_dlkm(O) wsa881x_dlkm(O) wcd_core_dlkm(O) stub_dlkm(O) hdmi_dlkm(O) swr_dlkm(O) pinctrl_lpi_dlkm(O) pinctrl_wcd_dlkm(O) usf_dlkm(O) native_dlkm(O) platform_dlkm(O) q6        _dlkm(O) adsp_loader_dlkm(O) apr_dlkm(O) snd_event_dlkm(O) q6_notifier_dlkm(O) q6_pdr_dlkm(O) msm_11ad_proxy
2560004 [22:04:08][   78.889300] Process modprobe (pid: 1206, stack limit = 0xffffff8016f50000)
2560005 [22:04:08][   78.896360] CPU: 7 PID: 1206 Comm: modprobe Tainted: G S      W  O L    4.19.157+ #1
2560006 [22:04:08][   78.904309] Hardware name: Qualcomm Technologies, Inc. kona MTP-RmPM8150b. VX1(MCE) EVT2 (DT)
2560007 [22:04:08][   78.913052] pstate: 60400005 (nZCv daif +PAN -UAO)
2560008 [22:04:08][   78.917984] pc : cnss_wlan_register_driver+0x2f0/0x2f8
2560009 [22:04:08][   78.923255] lr : cnss_wlan_register_driver+0x2f0/0x2f8

Bug同#111971

#28 Updated by CD SYSTEM-赵正军 over 2 years ago

  • Fixed Version set to 2022-10-19
  • Root cause set to https://dev.thundercomm.com/gerrit/c/general/kernel/msm-4.19/+/147583 drm_atomic_state_alloc函数中未对空指针做判断处理

Hi,邹涛
reboot起不来的原因有多种,目前为止跟踪多个版本,从多次提供的log来看,此票一开始引起进900E的原因没在出现
现申请关闭此bug,其他出现900E的情况,还请另起bug跟踪,不然bug重复,一个bug用了两个票管理

谢谢

#29 Updated by SZTS_TEST 邹涛 over 2 years ago

  • Status changed from RESOLVED to VERIFIED

如研发所述,先关闭该bug

#30 Updated by SZTS_TEST 邹涛 over 2 years ago

  • Status changed from VERIFIED to CLOSED

#31 Updated by CD FW-王伟 over 2 years ago

Gerrit Merge Information
ID Project Branch Uploader
146958 general/turbox/tools/build_script/platform Pre_figure_turbox-c2130c-la1.1-vendor-dev
SYSTEM:Add ELF file to meta package for ramdump analysis
1. Add ELF file to meta package
TC-RID: 1200-0800301
IssueID: TS-R-BUG-111349
Change-Id: Icb1cf21989f56098cdff417200a284de7aab4f3b

#32 Updated by CD FW-王伟 over 2 years ago

Gerrit Merge Information
ID Project Branch Uploader
147583 general/kernel/msm-4.19 Pre_figure_turbox-c2130c-la1.1-vendor-dev
SYSTEM:Prevent function parameters from passing in null pointers
1. if (!config
!config->funcs) return NULL;
TC-RID: 1200-0800301
IssueID: TS-R-BUG-111349
Change-Id: Ib2b53b7538dc0ae1b38f51a93300ae966acbbde9

Also available in: Atom PDF