Bug #113610
测试 Test-ST #113421: V4.0功能与专项测试
测试 Test-ST #113425: V4.0专项--BSP专项--压测
【BSP】【EVT1】【MEM】【必现】【压测】设备填充内存后,手动reboot设备,重启后主界面出现BT或其他服务弹框,然后设备发生crash进dump, 最后一直循序开机->dump无法正常使用
Status: | CLOSED | Start date: | 2022-11-02 | ||
---|---|---|---|---|---|
Priority: | High | Due date: | 2023-03-10 | ||
Assignee: | 移动测试一组_CDTS 刘强 | % Done: | 50% | ||
Category: | LC | ||||
Target version: | VC1_FSE_0082_20230314 | ||||
Need_Info: | -- | Found Version: | FlatBuild_HH_VX1_MCE_FSE.M.R.userdebug.01.00.0041.X101 | ||
Resolution: | FIXED | Degrated: | -- | ||
Severity: | Critical | Verified Version: | |||
Reproducibility: | Every time | Fixed Version: | |||
Test Type: | IT | Root cause: |
Description
【前提条件】
1、设备已开机
【测试步骤】
1.使用内存填充apk对设备内存进行填充-100%
2.进入设置-存储,查看剩余大小
3.adb reboot 重启设备
4.进入设置-存储,查看剩余大小
5.再次push一个大于剩余空间的文件
6.adb reboot 再次重启
【预期结果】
4.第一次设备能够正常重启,之后查看剩余内存在3-20M左右
5&6.文件push 失败,即使文件push成功后,设备应该能够正常重启
【实际结果】
4.第一次设备能够正常重启,之后查看剩余内存在3-20M左右
5.文件push成功
6.设备重启至主界面,然后马上弹出“BT已经停止运行”的弹框,随后设备crash关机进入dump,dump日志抓完后又继续重启crash,然后再次进入dump,一直循环
Related issues
History
#1 Updated by 移动测试一组_CDTS 刘强 over 2 years ago
- File full.txt
added
- File full_men.log added
- File 20221102-164218.mp4 added
#2 Updated by 移动测试一组_CDTS 刘强 over 2 years ago
- Due date set to 2022-11-30
- Category set to BSP
#3 Updated by CD BSP-唐金泽 over 2 years ago
- Status changed from New to ASSIGNED
- Target version set to VX1_MCE_FSE_V2.0_update_20221012
认为和bug #113511 关联,
输出本地版本请 测试帮忙验证,进行中
#4 Updated by CD SYSTEM-胡兵 over 2 years ago
Hi
报错如下,与#111971报错一致,请使用#111971复测ok的版本进行验证。
[ 78.835510] kernel BUG at drivers/net/wireless/cnss2/pci.c:2433! [ 78.841811] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 78.847453] Modules linked in: wlan(O+) tsnv(O) machine_dlkm(O) wcd938x_slave_dlkm(O) wcd938x_dlkm(O) wcd9xxx_dlkm(O) mbhc_dlkm(O) tx_macro_dlkm(O) rx_macro_dlkm(O) va_macro_dlkm(O) wsa_macro_dlkm(O) swr_ctrl_dlkm(O) bolero_cdc_dlkm(O) wsa881x_dlkm(O) wcd_core_dlkm(O) stub_dlkm(O) hdmi_dlkm(O) swr_dlkm(O) pinctrl_lpi_dlkm(O) pinctrl_wcd_dlkm(O) usf_dlkm(O) native_dlkm(O) platform_dlkm(O) q6_dlkm(O) adsp_loader_dlkm(O) apr_dlkm(O) snd_event_dlkm(O) q6_notifier_dlkm(O) q6_pdr_dlkm(O) msm_11ad_proxy [ 78.892538] Process modprobe (pid: 1199, stack limit = 0xffffff80173e8000) [ 78.899603] CPU: 4 PID: 1199 Comm: modprobe Tainted: G S W O 4.19.157+ #1 [ 78.907550] Hardware name: Qualcomm Technologies, Inc. kona MTP-RmPM8150b. VX1(MCE) EVT1 (DT) [ 78.916298] pstate: 60400005 (nZCv daif +PAN -UAO) [ 78.921226] pc : cnss_wlan_register_driver+0x2f0/0x2f8 [ 78.926506] lr : cnss_wlan_register_driver+0x2f0/0x2f8
Thanks.
#5 Updated by CD BSP-唐金泽 over 2 years ago
- Status changed from ASSIGNED to RESOLVED
- Assignee changed from CD BSP-唐金泽 to 移动测试一组_CDTS 刘强
#6 Updated by CD BSP-唐金泽 over 2 years ago
- Resolution changed from -- to DUPLICATE
#7 Updated by CDTS_TEST 王成 over 2 years ago
- Target version changed from VX1_MCE_FSE_V2.0_update_20221012 to VX1_MCE_FSE_V3.0_update_20221130
#8 Updated by 移动测试一组_CDTS 刘强 over 2 years ago
- Parent task set to #113425
#9 Updated by CDTS_TEST 王成 over 2 years ago
- Severity changed from Normal to Critical
#10 Updated by 移动测试一组_CDTS 刘强 over 2 years ago
- Status changed from RESOLVED to ASSIGNED
- Assignee changed from 移动测试一组_CDTS 刘强 to CD BSP-唐金泽
根据111971最新备注,111971不修改,使用user版本验证测试
此问题是最新user版本测试,仍然复现
#11 Updated by CD BSP-唐金泽 over 2 years ago
- Assignee changed from CD BSP-唐金泽 to CD BSP-杜磊
#12 Updated by CDTS-TEST 周婷 over 2 years ago
- Priority changed from Normal to High
#13 Updated by CD BSP-杜磊 over 2 years ago
- File minicom.cap added
- % Done changed from 0 to 10
Hi, 洪普,
从log上看,发生了wifi rash, 等待固件超时了
[ 62.432060] Timed-out!!
[ 66.784328] DTC_ETH,link_delay_work_func:3003, SQI=0, link=0
[ 69.718335] bt_ioctl: BT_CMD_PWR_CTRL pwr_cntrl:0
[ 69.824365] bt_vreg_disable: vreg_disable successful for : qca,bt-vdd-asd
[ 69.831376] bt_vreg_disable: vreg_disable successful for : qca,bt-vdd-rfa2
[ 69.838444] bt_vreg_disable: vreg_disable successful for : qca,bt-vdd-rfa1
[ 69.845504] bt_vreg_disable: vreg_disable successful for : qca,bt-vdd-dig
[ 69.852484] bt_vreg_disable: vreg_disable successful for : qca,bt-vdd-aon
[ 71.904269] DTC_ETH,link_delay_work_func:3003, SQI=0, link=0
[ 77.024273] DTC_ETH,link_delay_work_func:3003, SQI=0, link=0
[ 78.816037] cnss: fatal: Timeout waiting for FW ready indication
[ 78.924151] Kernel panic - not syncing: subsys-restart: Resetting the SoC - wlan crashed.
[ 78.932560] CPU: 3 PID: 89 Comm: kworker/3:1 Tainted: G S W O 4.19.157-perf+ #43
[ 78.941137] Hardware name: Qualcomm Technologies, Inc. kona MTP-RmPM8150b. VX1 EVT3 (DT)
[ 78.949896] Workqueue: events device_restart_work_hdlr
[ 78.955173] Call trace:
[ 78.957694] dump_backtrace+0x0/0x208
[ 78.961457] show_stack+0x14/0x20
[ 78.964872] dump_stack+0xb8/0xf4
[ 78.968284] panic+0x158/0x2d8
[ 78.971431] device_restart_work_hdlr+0x44/0x48
[ 78.986522] process_one_work+0x278/0x440
[ 78.990643] worker_thread+0x260/0x4a8
[ 78.994501] kthread+0x140/0x150
[ 78.997824] ret_from_fork+0x10/0x18
[ 79.001507] SMP: stopping secondary CPUs
[ 79.005541] CPU1: stopping
[ 79.008333] CPU: 1 PID: 1208 Comm: logd.reader.per Tainted: G S W O 4.19.157-perf+ #43
[ 79.017435] Hardware name: Qualcomm Technologies, Inc. kona MTP-RmPM8150b. VX1 EVT3 (DT)
[ 79.026191] pstate: 60400005 (nZCv daif +PAN -UAO)
[ 79.031116] pc : _raw_spin_unlock_irqrestore+0x14/0x48
[ 79.036398] lr : __wake_up_common_lock+0x84/0xc8
[ 79.041140] sp : ffffff80145f3a60
[ 79.044552] x29: ffffff80145f3a60 x28: 00000000ffffffff
#14 Updated by CD BSP-杜磊 over 2 years ago
- Assignee changed from CD BSP-杜磊 to CD LC 陶洪普
Hi, 洪普,
注释掉wifi驱动后,系统能正常启动,不会发生crash,请帮忙分析一下
#15 Updated by CD TPM-王祥林 over 2 years ago
- Target version changed from VX1_MCE_FSE_V3.0_update_20221130 to VC1_FSE_0078_20230228
#16 Updated by CD BSP-杜磊 over 2 years ago
- Category changed from BSP to LC
#17 Updated by CD LC 陶洪普 over 2 years ago
在FlatBuild_HH_VX1_MCE_FSE.M.R.userdebug.01.00.0075.X101版本中复现了该问题.
获取了串口和Dump日志,下一步分析Dump.
(日志太大,在分钟中.)
#18 Updated by CD LC 陶洪普 over 2 years ago
- % Done changed from 10 to 20
full.txt分析
----------------
01-01 08:25:20.993 0 0 I cnss-daemon: interop issues ap: read_iot_ap_file: No such file /data/vendor/wifi/iotap_ps.bin
01-01 08:25:20.993 0 0 I cnss-daemon: interop issues ap: read_iot_ap_ps_from_file, read file error
01-01 08:25:20.995 0 0 I cnss-daemon: Fail to bind user socket No space left on device
01-01 08:25:20.995 0 0 I cnss-daemon: Failed to init user interface
01-01 08:25:20.995 0 0 I cnss-daemon: Failed to get sock name Bad file descriptor
01-01 08:25:20.996 0 0 I cnss-daemon: interop issues ap: save_iot_ap_file: Failed to open file /data/vendor/wifi/iotap_ps.bin
01-01 08:25:20.996 0 0 I cnss-daemon: interop issues ap: save_iot_ap_ps_to_file, save file error
在启动过程中,cnss报了多次 ail to bind user socket No space left on device
===================================================
NSS platform driver and CNSS daemon components have a communication channel to synchronizewith
each other before interacting with WLAN firmware.
QMI is proposed as a desired method ofcommunication between
CNSS platform driver and daemon
请确认保证该Socket通信下进行, 是否可以预留最小空间?
#19 Updated by CD LC 陶洪普 over 2 years ago
System与BSP组编译了VB 确保300M的版本,但验证仍失败,从空间显示,只保留了128M,
确认是否生效保护了300M用户空间
#20 Updated by CD LC 陶洪普 over 2 years ago
保留300M后,未见wifi驱动加载异常.
但需要做压力测试,确认是否还有其他问题?合适的预留空间应该多大?
-------------
确认预留空间应该多大后,再确认该现象.
#21 Updated by CD SYSTEM-夏旭 over 2 years ago
本地userdebug验证ok,设备多次填充之后能够正常开机
麻烦测试验证一下user版本
VB编译user预留300M版本:ftp://cdiot@192.168.87.46/Figure/VerifyBuild/Pre_figure_turbox-c2130c-la1.1-qssi12-dev/20230227/202302271446-163
#22 Updated by CD SYSTEM-夏旭 over 2 years ago
- Assignee changed from CD LC 陶洪普 to 移动测试一组_CDTS 刘强
#23 Updated by CD LC 陶洪普 over 2 years ago
2023/02/28 (王祥林,杜磊,胡兵,刘强,夏旭,陶洪普)
会议记录:
1. 将保留用户空间最少300M不被占用的版本 给测试 夏旭 Done
刘强
2. 测试1. 版本系统是否可以正常启动(记录测试次数), 启动后, 模拟用户使用场景, 是否有严重问题?
3. 测试后(2/28), 请@刘强 拉会议同步结果确认下一步
#24 Updated by CD LC 陶洪普 over 2 years ago
- Status changed from ASSIGNED to NEED_INFO
#25 Updated by CD TPM-王祥林 over 2 years ago
- Target version changed from VC1_FSE_0078_20230228 to VC1_FSE_0082_20230314
#26 Updated by CD LC 陶洪普 over 2 years ago
- File 20230228-teraterm.log added
更新:
测试刘强反馈:放了一段时间,什么都没做,自动reboot了.
----------------
[ 56.948825] [MAX975X:max96755x_init] ERROR read:752 0x13:00
[ 56.954936] [MAX975X:max96755x_init] ERROR read:755 0x13:02
[ 56.960841] [MAX975X:max96755x_init] ERROR done
[ 58.336150] cnss: Timeout (40000ms) waiting for calibration to complete
[ 58.343772] cnss: ASSERT at line 2433
[ 58.354242] cnss: Mode request failed, mode: OFF, result: 1, err: 90
[ 58.848883] [MAX975X:mcu_keep_wakeup_work_func] ERROR ERR gpio detected
[ 80.960360] i2c_geni 980000.i2c: i2c error :-107
[ 80.965219] [MAX975X:max96755x_init] ERROR Failed to read dev:0x4c addr:0x0013
[ 80.973422] i2c_geni 980000.i2c: i2c error :-107
[ 80.976095] Kernel panic - not syncing: subsys-restart: Resetting the SoC - wlan crashed.
[ 80.978277] [MAX975X:max96755x_init] ERROR Failed to write dev:0x4c addr:0x0010 value:0x00
[ 80.986568] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G S W O 4.19.157-perf+ #1
在58.336150 的时候已经异常,调用了cnss: ASSERT ,没有立即异常,等到了 80.976095] 这里才触发panic.
抓取Dump和adb pull /data/vendor/wifi/wlan_logs . 日志来确认.
#27 Updated by 移动测试一组_CDTS 刘强 over 2 years ago
- File wlan_logs.zip added
- Status changed from NEED_INFO to ASSIGNED
- Assignee changed from 移动测试一组_CDTS 刘强 to CD LC 陶洪普
上传wlan——log
#28 Updated by CD LC 陶洪普 over 2 years ago
从Dump解析出来的结果看,还是no space 引起socket通信错误.
创建了高通case,高通反馈也是"no space"错误原因.
对目前占用空间的测试方法,预留了600M,先做反复重启压力测试.
#29 Updated by CD LC 陶洪普 over 2 years ago
1. 使用1G版本 ftp://dvbuild@10.0.76.28/home/scm/VerifyBuild/Pre_figure_turbox-c2130c-la1.1-qssi12-dev/20230301/202303011636-2768
预留600M /sdcard 空间,重启压力测试1000次,未发现异常
2. 使用 20230304 user版本 FlatBuild_HH_MCE_FSE.M.D.user.01.00.C101.202303040019.zip
预留300M /sdcard 空间,反复重启压力测试中
#30 Updated by CD LC 陶洪普 over 2 years ago
1. 与System胡兵,蒋富雄 夏旭 确认 1G的版本,在/data分区预留了1G空间
该1G空间被应用不能使用(复制动作没有效果,但未提示失败,通过"应用宝"下载软件,提示空间已满)
2. 使用这个版本沾满空间(除了预留的1G)测试9次,进入900E
3. 加入adb root,再次复测
4. 归于resgid=1065使用reserve_boot空间,确认是否包括cnss_daemon可以使用这个空间?
高通回复,预留10%空间? 可能太多了? 请系统组确认.
Pls enable quota feature inroduced by google since android q to prevent apps using more than 90% of disk space or 50% of inodes, intended to enhance system stability
You can check link to get detailed descriptions
https://source.android.com/docs/core/storage/faster-stats
To enable this feature
Enable the CONFIG_QUOTA, CONFIG_QFMT_V2, and CONFIG_QUOTACTL kernel options.
Add the quota option to your userdata partition in your fstab file
#31 Updated by CD LC 陶洪普 over 2 years ago
预留1G的版本,填充满之后, 重启压力测试
2. 使用这个版本沾满空间(除了预留的1G)测试9次,进入900E
昨天已记录.
在开始时,加入adb root后,开机测试1000多次,未异常,但是从日志中可以见出现了WiFi接收不到QMI信息现场,等待时间短未到进入异常,reboot重启了.
-----
System提出
1. 增加reboot的延时,30s --> 100s,确认no space导致wlan crash (今晚20230307测试)
2. 预留root 的存储保持0.9G,未被cnss-daemon使用,cnss-daemon为vendor侧service, 如果修改成user root,不确定selinux的影响有多大?
663 service cnss-daemon /system/vendor/bin/cnss-daemon -n -l
664 class late_start
665 user system
666 group system inet net_admin wifi
667 capabilities NET_ADMIN
668
3. 能否修改子系统触发panic等级,在检测到wlan由于内部存储填满引起的功能异常,开机后弹窗提示用户清除存储?
#32 Updated by CD LC 陶洪普 over 2 years ago
上午 胡兵, 祥林, 富雄, 刘强, 洪普 会议:
保证系统能启动, 在存储空间填充满情况下, 暂允许部分应用使用异常.WiFi驱动在此情况下, 如加载失败,不造成系统异常.
在后续加入提示用户清理空间提示, 用户清理空间后, 重启系统, WiFi驱动要正常加载.
-------------------------------------
https://dev.thundercomm.com/gerrit/c/general/kernel/msm-4.19/+/186449
Jenkins
http://jenkins.thundercomm.com/job/VerifyBuild_for_IOT_6490/2864/
#33 Updated by CD LC 陶洪普 over 2 years ago
wifi驱动发生ssr时, 不调用subsystem_restart
https://dev.thundercomm.com/gerrit/c/general/kernel/msm-4.19/+/186449
ftp://cdiot@192.168.87.46/Pre_figure/VerifyBuild/Pre_figure_turbox-c2130c-la1.1-qssi12-dev/20230308/202303082014-2864
这个是只加上面186449 patch( 默认reserve_boot=128M) 的版本.
开发自测这边测试
1. 填充满用户data后, 系统可启动, wifi 打不开. 但不会造成系统重启进入900E
2. 清除部分空间后(我这里清除了超过1G) 重启, WiFi驱动可加载, WiFi正常打开, 连接.
#34 Updated by CD LC 陶洪普 over 2 years ago
- Due date changed from 2022-11-30 to 2023-03-10
- Assignee changed from CD LC 陶洪普 to 移动测试一组_CDTS 刘强
Hi 刘强,
请确认上面一条Comment, 我们已自测,请帮确认修改后版本.
谢谢!
#35 Updated by CD LC 陶洪普 over 2 years ago
- Status changed from ASSIGNED to NEED_INFO
#36 Updated by IoT scm over 2 years ago
ID | Project | Branch | Uploader |
186449 | general/kernel/msm-4.19 | Pre_figure_turbox-c2130c-la1.1-vendor-dev | taohp0107@thundersoft.com |
LC:WLAN: Skip calling subsystem_restart_dev if wlan ssr occurs If all user space is used, wlan ssr may cause. The system cannot start up, and calls subsystem_restart_dev to restart or enter 900E. At this situation, to ensure the system can start up, skip calling subsystem_ restart_ dev and ignore wlan ssr. After the user clears the user space and restarts the system, wlan can work normally. TC-RID: 1200-0400206 IssueID: TS-R-BUG-113610 Change-Id: I0fb8bbccde4b504168030fb732e0f8879f9ad079 |
#37 Updated by CD LC 陶洪普 over 2 years ago
- Status changed from NEED_INFO to RESOLVED
- % Done changed from 20 to 50
- Resolution changed from DUPLICATE to FIXED
Patch已Merged.
请使用20230311及以后的版本,谢谢!
#38 Updated by IoT scm over 2 years ago
ID | Project | Branch | Uploader |
184466 | general/platform/vendor/qcom/kona | Pre_figure_turbox-c2130c-la1.1-vendor-dev | xu.xia@thundersoft.com |
SYSTEM:Stability: Forcibly reserve 300M storage space in UFS In order to prevent the userdata partition from being overwritten by users, causing the system to freeze or errors. Change the default reserved 128M to 300M. TC-RID: 1200-0800302 IssueID: TS-R-DF-113610 Change-Id: I251bb851a32a85abce2aa8422188399366e5b989 |
#39 Updated by IoT scm over 2 years ago
ID | Project | Branch | Uploader |
185097 | general/kernel/msm-4.19 | Pre_figure_turbox-c2130c-la1.1-vendor-dev | xu.xia@thundersoft.com |
SYSTEM:Stability: Forcibly reserve 300M storage space in UFS In order to prevent the userdata partition from being overwritten by users, causing the system to freeze or errors. Change the default reserved 128M to 300M. TC-RID: 1200-0800302 IssueID: TS-R-DF-113610 Change-Id: Ic2898649e0b0adbff313d486e77cadfcc1058f50 |
#40 Updated by 移动测试一组_CDTS 刘强 over 2 years ago
- Status changed from RESOLVED to VERIFIED
根据以下验证结果:验证通过,先关闭
1.填充内存100G,应用宝下载app至剩余空间显示为0,压测50次未复现死机现象
2.释放内存,再填充内存至剩余空间显示为0,adb reboot,压测42次未复现死机现象
3.push 视频文件及下载app占满内存,adb reboot,压测40次未复现死机现象
4.内存占满状态下,息屏进入屏保后唤醒,adb reboot,压测52次未复现死机现象
5. 压测脚本14H(14号19:00-15号9:42)还没有出现死机
#41 Updated by 移动测试一组_CDTS 刘强 over 2 years ago
- Status changed from VERIFIED to CLOSED