Bug #118091
【IOVDEV-29251】【内部】【VC1】【0072】【台架】【EE环境】【副屏】副屏死机重启
Status: | CLOSED | Start date: | 2023-05-18 | |
---|---|---|---|---|
Priority: | Urgent | Due date: | 2023-05-26 | |
Assignee: | 物联网测试组_TSCD 王维 | % Done: | 100% | |
Category: | CD-FW | |||
Target version: | - | |||
Need_Info: | -- | Found Version: | 副屏:01.00.0095.C103 | |
Resolution: | FIXED | Degrated: | -- | |
Severity: | Critical | Verified Version: | ||
Reproducibility: | Occasionally | Fixed Version: | 2023-06-16 | |
Test Type: | ST | Root cause: | 问题原因:
系统卡顿、应用cpu占比高、内存不足 |
Description
详情
类型: 故障
状态:开放 (查看工作流)
优先级: A
解决结果:未解决
影响版本:
01.00.0095.C103
修复的版本:无
模块:
副屏
标签:无
Repetition Rate:
偶发
测试环境:台架手动测试
项目名:
VC1
描述
【环境信息】EE环境
【问题出现时间】:20230518 16:22
【开机次数】:第148次
【前提条件】:开启了小窗并处于亮屏
【问题操作步骤】:/
【实际结果】:手势无响应、点击所有类型应用无响应、已打开小窗无响应,刚开始点击操作和手势操作屏幕上显示点击动画,后续点击屏幕无任何响应,两分钟左右屏幕出现动画大概持续30s左右副屏重启成功(自动重启)
【期望结果】:/
【备注】:不知什么原因导致的崩溃死机,未人为重启,副屏自动重启
台架Log地址:https://ofs.human-horizons.com/#/download/index/Cn34jwuHzC4%3D
台架视频地址:副屏死机 https://ofs.human-horizons.com/#/download/index/TPuggNkDxb8%3D
副屏重启 https://ofs.human-horizons.com/#/download/index/Ewj0hj9VKZo%3D
【影响版本】:副屏:01.00.0095.C103
【VIN】:HRYTTESTVINMGM400
【硬件版本号】:/
【是否能恢复】:
日志请通过jira查看
History
#1 Updated by CD FW 曹覃刚 about 2 years ago
- Status changed from New to NEED_INFO
- Assignee changed from CD FW 曹覃刚 to 物联网测试组_TSCD 王维
请帮忙下载下日志,研发这边bug数量太多,只有一个账号可以下载日志,速度太慢,经常断连
#2 Updated by CDTS_TEST 王成 about 2 years ago
- Severity changed from Normal to Critical
#4 Updated by CD FW 曹覃刚 about 2 years ago
- Assignee changed from 物联网测试组_TSCD 王维 to CDTS_TEST 王成
Hi 王成
如飞书沟通
此日志是实时日志,只有18时的日志,问题发生时间为16时
需要客户抓取历史日志,谢谢
#5 Updated by CD FW 曹覃刚 about 2 years ago
Hi 王成
05/22在飞书上提供的最新日志,和3天前抓的日志一样
是实时日志,只有18时的日志,问题发生时间为16时
#6 Updated by 物联网项目组-RD3_CDTS 周飞 about 2 years ago
- File 1684756574817.zip added
- File 1684756551834.zip added
- File 1684756533779.zip added
#7 Updated by CDTS_TEST 王成 about 2 years ago
- Status changed from NEED_INFO to ASSIGNED
- Assignee changed from CDTS_TEST 王成 to CD FW 曹覃刚
请用新添加的附件日志再次分析
#8 Updated by CD FW 曹覃刚 about 2 years ago
- Status changed from ASSIGNED to NEED_INFO
- Assignee changed from CD FW 曹覃刚 to CDTS_TEST 王成
Hi 王成
新提供的三份日志,时间是是0518/04 ~ 0518/15的
问题出现时间:20230518 16:22
已飞书联系PM
#9 Updated by 物联网项目组-RD3_CDTS 周飞 about 2 years ago
- File 29251.zip added
#10 Updated by CDTS_TEST 王成 about 2 years ago
- Fixed Version set to 2023-05-26
#11 Updated by CDTS_TEST 王成 about 2 years ago
- Due date set to 2023-05-26
#12 Updated by CDTS_TEST 王成 about 2 years ago
- Status changed from NEED_INFO to ASSIGNED
- Assignee changed from CDTS_TEST 王成 to CD FW 曹覃刚
HI,覃刚,
麻烦用周飞新加的日志分析
#13 Updated by CD FW 曹覃刚 about 2 years ago
■ 我的分析
05-18 16:24:37.470 1532 1620 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on main thread (main), Blocked in handler on ActivityManager (ActivityManager)
05-18 16:24:37.470 1532 1620 W Watchdog: android.fg annotated stack trace:
05-18 16:24:37.470 1532 1620 W Watchdog: at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:15167)
05-18 16:24:37.471 1532 1620 W Watchdog: - waiting to lock <0x0429d37b> (a com.android.server.am.ActivityManagerService)
05-18 16:24:37.471 1532 1620 W Watchdog: at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:275)
05-18 16:24:37.471 1532 1620 W Watchdog: at android.os.Handler.handleCallback(Handler.java:938)
05-18 16:24:37.471 1532 1620 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:99)
05-18 16:24:37.471 1532 1620 W Watchdog: at android.os.Looper.loopOnce(Looper.java:201)
05-18 16:24:37.471 1532 1620 W Watchdog: at android.os.Looper.loop(Looper.java:288)
05-18 16:24:37.471 1532 1620 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:67)
05-18 16:24:37.471 1532 1620 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44)
1.05-18 16:24:37 WATCHDOG KILLING SYSTEM PROCESS 系统进程被杀了
2.一两分钟前system_server发生anr导致系统进程被杀, anr_2023-05-18-16-23-02-594和anr_2023-05-18-16-24-02-848
3.两份anr发生原因相同,都是因为下面的线程拿了锁一直没释放导致,这个从堆栈信息看,是在做socket通信
"Binder:1532_15" prio=5 tid=154 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x133811b8 self=0xb400007429b47910
| sysTid=7030 nice=-2 cgrp=default sched=0/0 handle=0x71b75a2cb0
| state=S schedstat=( 485048913 363499895 2341 ) utm=36 stm=11 core=0 HZ=100
| stack=0x71b74ab000-0x71b74ad000 stackSize=991KB
| held mutexes=
native: #00 pc 00000000000a28a8 /apex/com.android.runtime/lib64/bionic/libc.so (_sendmsg+8)
native: #01 pc 000000000001269c /system/lib64/libbase.so (android::base::SendFileDescriptorVector(android::base::borrowed_fd, void const*, unsigned long, std::_1::vector<int, std::__1::allocator<int> > const&)+412)
native: #02 pc 000000000015c00c /system/lib64/libandroid_runtime.so (android::socket_write_all(_JNIEnv*, _jobject*, int, void*, unsigned long)+220)
native: #03 pc 000000000015b8d8 /system/lib64/libandroid_runtime.so (android::socket_writeba(_JNIEnv*, _jobject*, _jbyteArray*, int, int, _jobject*)+440)
at android.net.LocalSocketImpl.writeba_native(Native method)
at android.net.LocalSocketImpl.access$500(LocalSocketImpl.java:37)
at android.net.LocalSocketImpl$SocketOutputStream.write(LocalSocketImpl.java:144)
- locked <0x01c79050> (a java.lang.Object)
at com.android.server.am.LmkdConnection.write(LmkdConnection.java:267)
- locked <0x0dd10249> (a java.lang.Object)
at com.android.server.am.LmkdConnection.exchange(LmkdConnection.java:293)
at com.android.server.am.ProcessList.writeLmkd(ProcessList.java:1539)
at com.android.server.am.ProcessList.setOomAdj(ProcessList.java:1454)
at com.android.server.am.OomAdjuster.applyOomAdjLSP(OomAdjuster.java:2760)
at com.android.server.am.OomAdjuster.updateAndTrimProcessLSP(OomAdjuster.java:1262)
at com.android.server.am.OomAdjuster.updateOomAdjInnerLSP(OomAdjuster.java:1036)
at com.android.server.am.OomAdjuster.performUpdateOomAdjPendingTargetsLocked(OomAdjuster.java:907)
- locked <0x0df01a9e> (a com.android.server.am.ActivityManagerProcLock)
at com.android.server.am.OomAdjuster.updateOomAdjPendingTargetsLocked(OomAdjuster.java:887)
at com.android.server.am.ActivityManagerService.updateOomAdjPendingTargetsLocked(ActivityManagerService.java:14673)
at com.android.server.am.ActiveServices.sendServiceArgsLocked(ActiveServices.java:4228)
at com.android.server.am.ActiveServices.bringUpServiceLocked(ActiveServices.java:3873)
at com.android.server.am.ActiveServices.startServiceInnerLocked(ActiveServices.java:1136)
at com.android.server.am.ActiveServices.startServiceInnerLocked(ActiveServices.java:925)
at com.android.server.am.ActiveServices.startServiceLocked(ActiveServices.java:823)
at com.android.server.am.ActiveServices.startServiceLocked(ActiveServices.java:651)
at com.android.server.am.ActivityManagerService.startService(ActivityManagerService.java:11981)
- locked <0x0429d37b> (a com.android.server.am.ActivityManagerService)
at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:2519)
at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2528)
at android.os.Binder.execTransactInternal(Binder.java:1179)
at android.os.Binder.execTransact(Binder.java:1143)
■ 下一步计划
socket通信一直没响应,暂时对这个socket没有想法,拉下组内的开发一起讨论下这个下一步计划
#14 Updated by 物联网项目组-RD3_CDTS 周飞 about 2 years ago
起服务的线程一直没有响应,导致system_server发生anr导致系统进程被杀。下一步:起服务的线程一直没有响应的原因
#15 Updated by CD FW 曹覃刚 about 2 years ago
- Assignee changed from CD FW 曹覃刚 to CD FW周平
#16 Updated by CD FW周平 about 2 years ago
从两份ANR的log看,一直在等待锁(0x0df01a9e),这个锁一直没有被释放,导致了AMS卡死,触发了看门狗。
at com.android.server.am.OomAdjuster.updateAndTrimProcessLSP(OomAdjuster.java:1262)
at com.android.server.am.OomAdjuster.updateOomAdjInnerLSP(OomAdjuster.java:1036)
at com.android.server.am.OomAdjuster.performUpdateOomAdjPendingTargetsLocked(OomAdjuster.java:907)
- locked <0x0df01a9e> (a com.android.server.am.ActivityManagerProcLock)
at com.android.server.am.OomAdjuster.updateOomAdjPendingTargetsLocked(OomAdjuster.java:887)
at com.android.server.am.ActivityManagerService.updateOomAdjPendingTargetsLocked(ActivityManagerService.java:14673)
at com.android.server.am.ActiveServices.sendServiceArgsLocked(ActiveServices.java:4228)
Next action:
调查为什么锁一直没有被释放。
#17 Updated by CD FW周平 about 2 years ago
结论: 系统memory被耗尽,所以导致了socket一直通信不成功,从而导致了AMS卡死后被看门狗杀死,系统重启。
分析:
1, 在anr_2023-05-18-16-23-02-594 中,我们看到:
Total number of allocations 5898276 ---》 分配的对象有5898276
Total bytes allocated 357MB
Total bytes freed 321MB
Free memory 21MB ---》 剩余的memory
Free memory until GC 21MB
Free memory until OOME 476MB
Total memory 57MB
Max memory 512MB
2, 在anr_2023-05-18-16-24-02-848 中,我们看到:
Total number of allocations 6019231 ---》 分配的对象有6019231
Total bytes allocated 364MB
Total bytes freed 321MB
Free memory 14MB ---》 剩余的memory
Free memory until GC 14MB
Free memory until OOME 469MB
Total memory 57MB
Max memory 512MB
3, 在1分钟内,分配的对象从 5898276 增加到了 6019231, 剩余内存从21MB 减少到了14MB.
说明系统有大量的内存被占用而没有释放。
4, 从两份ANR trace我们都看到 所有的线程都在等 tid=154 这个线程。
这个线程拿了多个锁没有释放,而这个线程是想和lmkd进行socket 通信。
但是一直没有响应。
0x0429d37b 被 android.fg 线程等待。
0x0df01a9e 被 main 线程等待。
"Binder:1532_15" prio=5 tid=154 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x133811b8 self=0xb400007429b47910
| sysTid=7030 nice=-2 cgrp=default sched=0/0 handle=0x71b75a2cb0
| state=S schedstat=( 485150267 363499895 2343 ) utm=36 stm=11 core=0 HZ=100
| stack=0x71b74ab000-0x71b74ad000 stackSize=991KB
| held mutexes=
native: #00 pc 00000000000a28a8 /apex/com.android.runtime/lib64/bionic/libc.so (_sendmsg+8)
native: #01 pc 000000000001269c /system/lib64/libbase.so (android::base::SendFileDescriptorVector(android::base::borrowed_fd, void const*, unsigned long, std::_1::vector<int, std::__1::allocator<int> > const&)+412)
native: #02 pc 000000000015c00c /system/lib64/libandroid_runtime.so (android::socket_write_all(_JNIEnv*, _jobject*, int, void*, unsigned long)+220)
native: #03 pc 000000000015b8d8 /system/lib64/libandroid_runtime.so (android::socket_writeba(_JNIEnv*, _jobject*, _jbyteArray*, int, int, _jobject*)+440)
at android.net.LocalSocketImpl.writeba_native(Native method)
at android.net.LocalSocketImpl.access$500(LocalSocketImpl.java:37)
at android.net.LocalSocketImpl$SocketOutputStream.write(LocalSocketImpl.java:144)
- locked <0x01c79050> (a java.lang.Object)
at com.android.server.am.LmkdConnection.write(LmkdConnection.java:267) --------------------》 LmkdConnection想和 lmkd 通信,但是一直没有返回
- locked <0x0dd10249> (a java.lang.Object)
at com.android.server.am.LmkdConnection.exchange(LmkdConnection.java:293)
at com.android.server.am.ProcessList.writeLmkd(ProcessList.java:1539)
at com.android.server.am.ProcessList.setOomAdj(ProcessList.java:1454)
at com.android.server.am.OomAdjuster.applyOomAdjLSP(OomAdjuster.java:2760)
at com.android.server.am.OomAdjuster.updateAndTrimProcessLSP(OomAdjuster.java:1262)
at com.android.server.am.OomAdjuster.updateOomAdjInnerLSP(OomAdjuster.java:1036)
at com.android.server.am.OomAdjuster.performUpdateOomAdjPendingTargetsLocked(OomAdjuster.java:907)
- locked <0x0df01a9e> (a com.android.server.am.ActivityManagerProcLock) -------------------》 被 main 线程等待。
at com.android.server.am.OomAdjuster.updateOomAdjPendingTargetsLocked(OomAdjuster.java:887)
at com.android.server.am.ActivityManagerService.updateOomAdjPendingTargetsLocked(ActivityManagerService.java:14673)
at com.android.server.am.ActiveServices.sendServiceArgsLocked(ActiveServices.java:4228)
at com.android.server.am.ActiveServices.bringUpServiceLocked(ActiveServices.java:3873)
at com.android.server.am.ActiveServices.startServiceInnerLocked(ActiveServices.java:1136)
at com.android.server.am.ActiveServices.startServiceInnerLocked(ActiveServices.java:925)
at com.android.server.am.ActiveServices.startServiceLocked(ActiveServices.java:823)
at com.android.server.am.ActiveServices.startServiceLocked(ActiveServices.java:651)
at com.android.server.am.ActivityManagerService.startService(ActivityManagerService.java:11981)
- locked <0x0429d37b> (a com.android.server.am.ActivityManagerService) ------》 android.fg 线程等待。
at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:2519)
at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2528)
at android.os.Binder.execTransactInternal(Binder.java:1179)
at android.os.Binder.execTransact(Binder.java:1143)
5,从上面的日志看,AMS是想去触发lmk执行一些操作,但是没有得到响应,导致了死锁。
Next action:
调查AMS 想调用lmk执行的具体操作。
#18 Updated by CD FW周平 about 2 years ago
从代码上看,当系统内存不足的时候,AMS去根据内存使用情况和应用程序的类型来调整进程的内存使用权重和占用比例。
然后通过updateOomAdjPendingTargetsLocked来更新进程的adj,这个时候会通过socket把以下内容发送到lmkd
buf.putInt(LMK_PROCPRIO);
buf.putInt(pid);
buf.putInt(uid);
buf.putInt(amt);
但是这个时候socket通信失败,导致了该方法被block了。
next action:
打开AMS OOM相关的log开关,编译VB,重新跑monkey, 获取log
#19 Updated by CD FW 曹覃刚 about 2 years ago
- Assignee changed from CD FW周平 to CD-DEV 唐军
#20 Updated by CD FW 曹覃刚 about 2 years ago
唐军的分析进展
当前状态: 不好精确定位到根源.
1, 系统有点慢, 启动进程超时:
./bk/logcat/000148_persist_00235_230518_162041.log:513:05-18 16:20:42.669 1532 1637 I ActivityManager: Killing 8186:com.kwai.m2u/u0a97 (adj -10000): start timeout
./bk/logcat/000148_persist_00235_230518_162041.log:514:05-18 16:20:42.669 1532 1637 I am_kill : [0,8186,com.kwai.m2u,-10000,start timeout]
./bk/logcat/000148_persist_00235_230518_162041.log:1014:05-18 16:20:43.739 1532 1637 I ActivityManager: Killing 8199:com.taobao.taobao/u0a96 (adj -10000): start timeout
./bk/logcat/000148_persist_00235_230518_162041.log:1015:05-18 16:20:43.739 1532 1637 I am_kill : [0,8199,com.taobao.taobao,-10000,start timeout]
./bk/logcat/000148_persist_00235_230518_162041.log:1020:05-18 16:20:43.743 1532 1637 I ActivityManager: Killing 8200:com.achievo.vipshop/u0a146 (adj -10000): start timeout
./bk/logcat/000148_persist_00235_230518_162041.log:1021:05-18 16:20:43.743 1532 1637 I am_kill : [0,8200,com.achievo.vipshop,-10000,start timeout]
./bk/logcat/000148_persist_00235_230518_162041.log:1029:05-18 16:20:43.749 1532 1637 I ActivityManager: Killing 8201:cn.kuwo.player/u0a151 (adj -10000): start timeout
./bk/logcat/000148_persist_00235_230518_162041.log:1030:05-18 16:20:43.749 1532 1637 I am_kill : [0,8201,cn.kuwo.player,-10000,start timeout]
./bk/logcat/000148_persist_00237_230518_162050.log:1090:05-18 16:20:52.920 1532 1637 I ActivityManager: Killing 8248:com.kwai.m2u/u0a97 (adj -10000): start timeout
./bk/logcat/000148_persist_00237_230518_162050.log:1091:05-18 16:20:52.920 1532 1637 I am_kill : [0,8248,com.kwai.m2u,-10000,start timeout]
./bk/logcat/000148_persist_00238_230518_162056.log:500:05-18 16:20:59.385 1532 1637 I ActivityManager: Killing 8282:com.taobao.taobao/u0a96 (adj -10000): start timeout
./bk/logcat/000148_persist_00238_230518_162056.log:501:05-18 16:20:59.385 1532 1637 I am_kill : [0,8282,com.taobao.taobao,-10000,start timeout]
./bk/logcat/000148_persist_00238_230518_162056.log:506:05-18 16:20:59.391 1532 1637 I ActivityManager: Killing 8283:com.achievo.vipshop/u0a146 (adj -10000): start timeout
./bk/logcat/000148_persist_00238_230518_162056.log:507:05-18 16:20:59.391 1532 1637 I am_kill : [0,8283,com.achievo.vipshop,-10000,start timeout]
./bk/logcat/000148_persist_00238_230518_162056.log:512:05-18 16:20:59.397 1532 1637 I ActivityManager: Killing 8284:cn.kuwo.player/u0a151 (adj -10000): start timeout
./bk/logcat/000148_persist_00238_230518_162056.log:513:05-18 16:20:59.397 1532 1637 I am_kill : [0,8284,cn.kuwo.player,-10000,start timeout]
./bk/logcat/000148_persist_00238_230518_162056.log:548:05-18 16:20:59.768 1532 1637 I ActivityManager: Killing 8296:com.baidu.netdisk/u0a98 (adj -10000): start timeout
./bk/logcat/000148_persist_00238_230518_162056.log:549:05-18 16:20:59.768 1532 1637 I am_kill : [0,8296,com.baidu.netdisk,-10000,start timeout]
./bk/logcat/000148_persist_00239_230518_162108.log:1120:05-18 16:21:16.763 1532 1637 I ActivityManager: Killing 84
2, updateAndTrimProcessLSP里在遍历LRU list, 历史app太多会导致ActivityManagerProcLock锁长时间占用, 进而导致WatchDog.
下一步: WatchDog前没有整个系统的dump或者bugreport, 这导致无法精准了解WatchDog前系统资源的状态或者趋势.
我们可以修改代码在WatchDog前做系统dump.
#21 Updated by CD FW王武军 about 2 years ago
1、5月31日最新进展
(1)归纳总结之前的分析
根据日志信息和anr文件,之前的同事们的分析结论大致是:
1)出现问题时系统很慢;
2)出现anr时,系统内存很低;
3)system_server线程被阻塞,引起发了watchdog超时,并重启
之前的策略是想在oom和出现watchdog超时时加日志和dump系统,这个应该是继续分析和定位问题的一个方向,
这个需要复现才能进行下一步,那么怎么复现和还原问题的发生是一个问题?
(2)结合之前大佬的分析,我自己做了些分析
我认为触发watchdog引起重启,是系统变慢和低内存的最后表现,我想:是从什么操作或者状态引起了系统当前的低内存和变慢的角度,继续分析这个问题。
下面一段日志:
11-11 00:01:43.447 1532 1532 I am_kill : [0,3110,com.android.keychain,999,empty #31]
11-11 00:01:48.716 1532 1532 I am_kill : [0,3337,android.process.acore,999,empty #31]
11-11 00:01:56.443 1532 1638 I am_proc_start: [0,5995,10102,com.kwai.videoeditor,pre-top-activity,{com.kwai.videoeditor/com.kwai.videoeditor.activity.MainActivity}]
11-11 00:01:57.240 1532 1638 I am_proc_start: [0,6312,10102,com.kwai.videoeditor:messagesdk,service,{com.kwai.videoeditor/com.kwai.chat.kwailink.service.NewKwaiLinkService}]
11-11 00:01:57.538 1532 1638 I am_proc_start: [0,6403,10102,com.kwai.videoeditor:push_v3,service,{com.kwai.videoeditor/com.kwai.video.push.PushService}]
11-11 00:02:19.143 1532 1638 I am_proc_start: [0,6878,10107,com.UCMobile,pre-top-activity,{com.UCMobile/com.UCMobile.main.UCMobile}]
11-11 00:02:31.805 1532 1638 I am_proc_start: [0,7117,10035,com.microsoft.emmx,top-activity,{com.microsoft.emmx/com.microsoft.ruby.Main}]
11-11 00:02:32.394 1532 1638 I am_proc_start: [0,7217,90000,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:0,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:0}]
11-11 00:02:32.707 1532 1638 I am_proc_start: [0,7246,10035,com.microsoft.emmx:privileged_process0,service,{com.microsoft.emmx/org.chromium.content.app.PrivilegedProcessService0}]
11-11 00:02:32.912 1532 1638 I am_proc_start: [0,7316,90001,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:1,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:1}]
11-11 00:02:33.127 1532 1638 I am_proc_start: [0,7347,90002,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:2,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:2}]
11-11 00:02:35.365 1532 6847 I am_kill : [0,7347,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:2,0,isolated not needed]
11-11 00:02:35.377 1532 6847 I am_kill : [0,7217,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:0,0,isolated not needed]
11-11 00:02:35.406 1532 2192 I am_kill : [0,7316,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:1,0,isolated not needed]
11-11 00:05:10.171 1532 1638 I am_proc_start: [0,7659,90000,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:3,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:3}]
11-11 00:05:10.176 1532 1638 I am_proc_start: [0,7660,90001,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:4,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:4}]
11-11 00:05:10.182 1532 1638 I am_proc_start: [0,7666,90002,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:5,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:5}]
05-18 16:19:38.210 1532 2192 I am_kill : [0,7660,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:4,0,isolated not needed]
05-18 16:20:32.684 1532 1638 I am_proc_start: [0,8186,10097,com.kwai.m2u,pre-top-activity,{com.kwai.m2u/com.kwai.m2u.launch.M2uRouterActivity}]
05-18 16:20:33.803 1532 1638 I am_proc_start: [0,8199,10096,com.taobao.taobao,NULL,]
05-18 16:20:33.810 1532 1638 I am_proc_start: [0,8200,10146,com.achievo.vipshop,NULL,]
05-18 16:20:33.816 1532 1638 I am_proc_start: [0,8201,10151,cn.kuwo.player,NULL,]
05-18 16:20:42.669 1532 1637 I am_kill : [0,8186,com.kwai.m2u,-10000,start timeout]
05-18 16:20:42.997 1532 1638 I am_proc_start: [0,8248,10097,com.kwai.m2u,pre-top-activity,{com.kwai.m2u/com.kwai.m2u.launch.M2uRouterActivity}]
05-18 16:20:43.739 1532 1637 I am_kill : [0,8199,com.taobao.taobao,-10000,start timeout]
05-18 16:20:43.743 1532 1637 I am_kill : [0,8200,com.achievo.vipshop,-10000,start timeout]
05-18 16:20:43.749 1532 1637 I am_kill : [0,8201,cn.kuwo.player,-10000,start timeout]
05-18 16:20:49.446 1532 1638 I am_proc_start: [0,8282,10096,com.taobao.taobao,NULL,]
05-18 16:20:49.452 1532 1638 I am_proc_start: [0,8283,10146,com.achievo.vipshop,NULL,]
05-18 16:20:49.459 1532 1638 I am_proc_start: [0,8284,10151,cn.kuwo.player,NULL,]
05-18 16:20:49.830 1532 1638 I am_proc_start: [0,8296,10098,com.baidu.netdisk,pre-top-activity,{com.baidu.netdisk/com.baidu.netdisk.ui.DefaultMainActivity}]
05-18 16:20:50.320 1532 6847 I am_mem_factor: [3,0]
05-18 16:20:52.920 1532 1637 I am_kill : [0,8248,com.kwai.m2u,-10000,start timeout]
05-18 16:20:59.385 1532 1637 I am_kill : [0,8282,com.taobao.taobao,-10000,start timeout]
05-18 16:20:59.391 1532 1637 I am_kill : [0,8283,com.achievo.vipshop,-10000,start timeout]
05-18 16:20:59.397 1532 1637 I am_kill : [0,8284,cn.kuwo.player,-10000,start timeout]
05-18 16:20:59.768 1532 1637 I am_kill : [0,8296,com.baidu.netdisk,-10000,start timeout]
05-18 16:21:03.495 1532 1617 I am_mem_factor: [1,3]
05-18 16:21:03.610 1532 3185 I am_kill : [0,7666,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:5,0,isolated not needed]
05-18 16:21:03.654 1532 2960 I am_kill : [0,7659,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:3,0,isolated not needed]
05-18 16:21:07.859 1532 1638 I am_proc_start: [0,8428,90003,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:6,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:6}]
05-18 16:21:07.937 1532 1638 I am_proc_start: [0,8432,90004,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:7,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:7}]
05-18 16:21:08.018 1532 1638 I am_proc_start: [0,8448,90005,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:8,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:8}]
05-18 16:21:11.834 1532 1638 I am_proc_start: [0,8499,90006,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:9,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:9}]
05-18 16:21:11.909 1532 1638 I am_proc_start: [0,8501,90007,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:10,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:10}]
05-18 16:21:11.983 1532 1638 I am_proc_start: [0,8504,90008,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:11,,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService0:11}]
05-18 16:21:16.763 1532 1637 I am_kill : [0,8428,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:6,-10000,start timeout]
05-18 16:21:16.780 1532 1638 I am_proc_start: [0,8537,99001,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:6,service,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService1:6}]
05-18 16:21:16.840 1532 1637 I am_kill : [0,8432,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:7,-10000,start timeout]
05-18 16:21:16.848 1532 1638 I am_proc_start: [0,8538,99002,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:7,service,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService1:7}]
05-18 16:21:16.920 1532 1637 I am_kill : [0,8448,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:8,-10000,start timeout]
05-18 16:21:16.926 1532 1638 I am_proc_start: [0,8539,99003,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:8,service,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService1:8}]
05-18 16:21:21.743 1532 1637 I am_kill : [0,8499,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:9,-10000,start timeout]
05-18 16:21:21.756 1532 1638 I am_proc_start: [0,8551,99004,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:9,service,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService1:9}]
05-18 16:21:21.818 1532 1637 I am_kill : [0,8501,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:10,-10000,start timeout]
05-18 16:21:21.823 1532 1638 I am_proc_start: [0,8552,99005,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:10,service,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService1:10}]
05-18 16:21:21.886 1532 1638 I am_proc_start: [0,8553,99006,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:11,service,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService1:11}]
05-18 16:21:21.893 1532 1637 I am_kill : [0,8504,com.microsoft.emmx:sandboxed_process0:org.chromium.content.app.SandboxedProcessService0:11,-10000,start timeout]
05-18 16:21:27.715 1532 1637 I am_kill : [0,8537,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:6,-10000,start timeout]
05-18 16:21:27.781 1532 1637 I am_kill : [0,8538,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:7,-10000,start timeout]
05-18 16:21:27.859 1532 1637 I am_kill : [0,8539,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:8,-10000,start timeout]
05-18 16:21:30.846 1532 1638 I am_proc_start: [0,8578,99007,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:12,service,{com.microsoft.emmx/org.chromium.content.app.SandboxedProcessService1:12}]
05-18 16:21:31.708 1532 1637 I am_kill : [0,8551,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:9,-10000,start timeout]
05-18 16:21:31.773 1532 1637 I am_kill : [0,8552,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:10,-10000,start timeout]
05-18 16:21:31.836 1532 1637 I am_kill : [0,8553,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:11,-10000,start timeout]
05-18 16:21:40.785 1532 1637 I am_kill : [0,8578,com.microsoft.emmx:sandboxed_process1:org.chromium.content.app.SandboxedProcessService1:12,-10000,start timeout]
这部分日志是发生anr前一段时间的信息,从日志中可以发现:
1)系统当时不停的启动应用,但是又被系统杀除了,应用又重启启动,如此反复了几次,特别是几个应用:
com.kwai.m2u、com.microsoft.emmx、com.microsoft.emmx、com.microsoft.emmx
2)在日志中可以明显看到系统在启动com.kwai.m2u和com.baidu.netdisk时有明显的内存变化
05-18 16:20:50.320 1532 6847 I am_mem_factor: [3,0]
05-18 16:21:03.495 1532 1617 I am_mem_factor: [1,3]
系统内存从正常变为非常严重的紧缺,当进程被杀时又回复了相对正常,如果应用如此反复重启,那么系统的内存就会反复的经历:
从正常到紧缺的过程,必然会引起系统内存的剧烈抖动,这个信息从侧面也反映了发生anr时,内存很低的原因(个别应用启动使用了大量的内存,使系统内存告急)
小结:票中问题的根本原因,我怀疑是因为启动了某些应用,这些应用启动又占用大量的内存,导致系统不能接受,应用就被杀,但是应用还在尝试启动,如此反复使系统
内存告急、系统变慢。
2、下一步
1)因为获取到的日志信息,与发生anr时比较近的信息只到了:05-18 16:19:13.798,更早一点的信息没有看到,比如:
05-18 16:10:13 到 05-18 16:19:13.798 系统的状态是怎样,没有日志可以跟踪,如果能拿到那段时间的日志,可能对anr前10-20分钟的系统情况有一个相对准确的了解。
@测试 王成 是否可以再帮忙找找。
2)按照我上面的怀疑点,我自己做下针对性的测试:将票中涉及的这几个应用,在本地进行测试,看能否复现问题的现象。
#22 Updated by CD FW王武军 about 2 years ago
1、6月1日最新进展
1)找王成帮忙,重新从服务器上获取了 05-18 15:58:25.260 to 05-18 16:19:13.798
这个时间段的日志包。很遗憾日志中没有包含这个时间段的日志信息。
2)通过测试王维联系了与华人运通的测试确认了一些当时的测试信息
1)在发现问题之前,测试是在进行小窗和手势相关的测试,并持续了一段时间;
2)在发生问题之前的20分钟内,他没有做任何操作,16点19的样子开始操作副屏,副屏就开始卡顿,慢慢的没了响应
2、下一步
1、结合日志和测试的情景还原,准备在本地做一个测试,以确认问题是否与开启应用有关系
#24 Updated by CD FW王武军 about 2 years ago
1、6月5日最新进展
1)自己本地测试
com.microsoft.emmx 微软浏览器
com.UCMobile UC浏览器
com.kwai.videoeditor 快影
com.baidu.input 百度输入法
com.tencent.mobileqq qq软件
com.tencent.android.qqdownloader 应用宝
com.taobao.taobao 淘宝
com.kwai.m2u 一甜相机
com.achievo.vipshop 唯品会
cn.kuwo.player 酷我音乐
com.baidu.netdisk 百度网盘
使用这些应用进行了测试,经过反复测试发现:
a)发现存在内存的变化,在某个时候启动其中的一些应用会存在内存验证紧缺的发生,这个信息与票中的问题可以对应上
05-28 07:10:17.234 1079 23939 I am_mem_factor: [3,0]
05-28 07:10:18.018 1079 23939 I am_mem_factor: [2,3]
05-28 07:10:18.580 1079 23939 I am_mem_factor: [1,2]
b)随着测试的反复进行,会明显感觉到一些应用启动变慢,并且存在anr的情况
2)集合本地的测试情况和之前与测试的沟通信息
重新分析15:00-16:20之前的这段日志信息,看能否找到除打开特定应用而出现的其他影响系统性能的日志
小结:本地自测怀疑的应用,发现这些应用的运行确实会引起内存紧缺的情况
2、下一步
1)继续分析15:00-16:20之前的这段日志信息,看能否找到除开特定应用的其他引起系统性能问题的日志信息。
#25 Updated by CD FW王武军 about 2 years ago
6月6月最新进展
目前分析小结:
1、继续分析了15:00-16:20的日志
1) 15:00-15:58
这个时间段测试在做关于小窗的和手势的功能验证,从日志信息中未发现有明显异常的情况(未出现内存紧缺、系统卡顿),
虽然存在一些进程kill的日志,应该都是正常的系统调度;
2) 15:58-16:19
这个时间段的日志是缺失的,无法准备定位该阶段系统具体的状态(这个阶段应该是分析问题很重要的阶段)
3)16:20:16:25
出现问题的阶段,这个阶段在启动应用时已经出现启动超时,导致进程被杀除的问题,
启动部分应用时又出现内存验证紧缺,进而触发lmkd机制去处理进程的逻辑,这个时候因为系统卡顿,导致lmkd的socket请求超时,
进而触发Watchdog超时杀除了system_server,系统重启。
4)自己测试的情况,测试票中涉及的部分应用,来回启动和切换,会发生内存紧缺的情况
2、综合来看
1)部分应用在启动时它本身就存在消耗很大内存的情况,如果系统这个时候因为某些原因已经卡顿,那么内存的紧缺将会是压死
系统的最后一根稻草,但是系统是什么原因出现卡顿(日志的缺失,没办法判断15:58-16:19这个时间段系统的情况)。
2)为了应对上面的情况,只有通过添加日志复现问题的方式来还原问题和追踪问题
3)日志patch的策略如下
https://dev.thundercomm.com/gerrit/c/general/platform/frameworks/base/+/205160
a)我打开了process、pss、oom、service、freeze、lru等日志开关,这些日志都是涉及内存、进程创建和杀除、服务等信息;
b)涉及到ActivityManager: Killing xxx start timeout 启动耗时进程被杀时,主动dump meminfo信息
3、下一步
1)用日志patch编译Jenkins版本,让测试同事帮忙测试和复现
http://10.0.76.24:8080/job/VerifyBuild_for_IOT_6490/4225/
测试要求:
为了还原场景测试过程大致分为3个阶段:
阶段一:小窗功能测试和手势滑动(上滑返回home等)
在这个阶段测试的应用大致是:QQ、本地文件、Edge、设置、酷我音乐
阶段二:小窗停留在Edge,不做任何操作等待大概20-30分钟
阶段三:等待回来后回来后点击启动应用,并且来回切换应用
启动的应用大致有:微软浏览器、UC浏览器、快影、qq、应用宝、 淘宝、一甜相机、唯品会、 酷我音乐、百度网盘
日志获取:
1)为了确保日志完整,需要关闭SELinux权限,操作如下(这个操作在开始测试,抓取日志前完成)
adb root;
adb shell;
然后执行 setenforce 0
关闭掉SELinux权限检查
如果不做这个操作部分日志是无法获取的会出现:Exception=Cannot run program "dumpsys": error=13, Permission denied
2)如果在测试过程复现了问题,在获取了日志后,再做下bugreport,拉取出zip文件
adb bugreport
#26 Updated by CD FW王武军 about 2 years ago
Hi,
@唐霞,麻烦你这边帮忙使用vb版本根据上一个调查状态的信息做下复现测试,谢谢。
测试要求:
为了还原场景测试过程大致分为3个阶段:
阶段一:小窗功能测试和手势滑动(上滑返回home等)
在这个阶段测试的应用大致是:QQ、本地文件、Edge、设置、酷我音乐
阶段二:小窗停留在Edge,不做任何操作等待大概20-30分钟
阶段三:等待回来后回来后点击启动应用,并且来回切换应用
启动的应用大致有:微软浏览器、UC浏览器、快影、qq、应用宝、 淘宝、一甜相机、唯品会、 酷我音乐、百度网盘
日志获取:
1)为了确保日志完整,需要关闭SELinux权限,操作如下(这个操作在开始测试,抓取日志前完成)
adb root;
adb shell;
然后执行 setenforce 0
关闭掉SELinux权限检查
如果不做这个操作部分日志是无法获取的会出现:Exception=Cannot run program "dumpsys": error=13, Permission denied
2)如果在测试过程复现了问题,在获取了日志后,再做下bugreport,拉取出zip文件
adb bugreport
vb版本:/Pre_figure/VerifyBuild/Pre_figure_turbox-c2130c-la1.1-qssi12-dev/20230606/202306061838-4225
#27 Updated by CD FW王武军 about 2 years ago
- Status changed from ASSIGNED to NEED_INFO
- Assignee changed from CD FW王武军 to 物联网测试组_CDTS 唐霞
#28 Updated by CD FW王武军 about 2 years ago
6月8日最新进展
1、分析了测试提供的日志
测试在添加了日志的vb版本做复现测试,目前测出的问题是system发生了anr,但是系统没有出现重启,现象与票上的问题描述不一样
针对测试提供的anr日志信息:
可以获取如下信息:
06-06 19:50:03.417 1143 8294 I intercept_power: [ACTION_DOWN,0,0]
06-06 19:50:03.418 1143 22925 E ProcessRecord: ANR in com.android.systemui
06-06 19:50:03.418 1143 22925 E ProcessRecord: CPU usage from 0ms to 33632ms later (2023-06-06 19:49:29.616 to 2023-06-06 19:50:03.247):
06-06 19:50:03.418 1143 22925 E ProcessRecord: 85% 21755/com.kwai.m2u: 15% user + 70% kernel / faults: 34423 minor 20 major
06-06 19:50:03.418 1143 22925 E ProcessRecord: 71% 20854/com.UCMobile: 25% user + 45% kernel / faults: 70935 minor 29 major
06-06 19:50:03.418 1143 22925 E ProcessRecord: 43% 16413/com.tmall.wireless: 23% user + 20% kernel / faults: 36874 minor 252 major
06-06 19:50:03.418 1143 22925 E ProcessRecord: 41% 22577/com.miHoYo.Yuanshen: 17% user + 24% kernel / faults: 147676 minor 28 major
06-06 19:50:03.418 1143 22925 E ProcessRecord: 32% 95/kswapd-1:0: 0% user + 32% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 20% 1143/system_server: 5.6% user + 15% kernel / faults: 27318 minor 2 major
06-06 19:50:03.418 1143 22925 E ProcessRecord: 82% TOTAL: 18% user + 57% kernel + 1% iowait + 2.6% irq + 2.3% softirq
06-06 19:50:03.418 1143 22925 E ProcessRecord: CPU usage from 470ms to 2492ms later (2023-06-06 19:49:30.086 to 2023-06-06 19:49:32.108):
06-06 19:50:03.418 1143 22925 E ProcessRecord: 88% 1143/system_server: 13% user + 75% kernel / faults: 2041 minor 1 major
06-06 19:50:03.418 1143 22925 E ProcessRecord: 76% 22925/AnrConsumer: 8.1% user + 68% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 6.8% 1730/android.bg: 3.7% user + 3.1% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 3.1% 3900/Binder:1143_D: 1.2% user + 1.8% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 1.8% 8294/Binder:1143_15: 1.2% user + 0.6% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 0.6% 1167/Jit thread pool: 0.6% user + 0% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 45% 95/kswapd-1:0: 0% user + 45% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 106% 21755/com.kwai.m2u: 32% user + 74% kernel / faults: 535 minor 2 major
06-06 19:50:03.418 1143 22925 E ProcessRecord: 40% 22521/ClynRsMonitor: 0% user + 40% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 91% 20854/com.UCMobile: 38% user + 53% kernel / faults: 2536 minor 2 major
06-06 19:50:03.418 1143 22925 E ProcessRecord: 17% 20854/com.UCMobile: 11% user + 6.6% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 9.9% 21137/WorkHandler: 4.4% user + 5.5% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 5.5% 20865/HeapTaskDaemon: 2.2% user + 3.3% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 52% 22577/com.miHoYo.Yuanshen: 35% user + 16% kernel / faults: 619 minor
06-06 19:50:03.418 1143 22925 E ProcessRecord: 17% 22788/UnityMain: 13% user + 4.1% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 11% 22927/UnityGfxDeviceW: 4.1% user + 6.8% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 1.3% 22597/Binder:22577_2: 0% user + 1.3% kernel
06-06 19:50:03.418 1143 22925 E ProcessRecord: 93% TOTAL: 22% user + 65% kernel + 0.6% iowait + 2.2% irq + 2.3% softirq
06-06 19:50:03.420 1143 8294 I intercept_power: [ACTION_UP,0,0]
06-06 19:50:03.421 1143 22925 D AnrHelper: Completed ANR of com.android.systemui in 33805ms, latency 9ms
06-06 19:50:28.972 1143 23561 E ProcessRecord: CPU usage from 2969ms to 22214ms ago (2023-06-06 19:50:03.247 to 2023-06-06 19:50:28.430):
06-06 19:50:28.972 1143 23561 E ProcessRecord: 103% 20854/com.UCMobile: 42% user + 61% kernel / faults: 102019 minor 14 major
06-06 19:50:28.972 1143 23561 E ProcessRecord: 88% 21755/com.kwai.m2u: 10% user + 78% kernel / faults: 22526 minor 32 major
06-06 19:50:28.972 1143 23561 E ProcessRecord: 50% 1143/system_server: 8.6% user + 41% kernel / faults: 47997 minor
06-06 19:50:28.972 1143 23561 E ProcessRecord: 36% 16413/com.tmall.wireless: 20% user + 15% kernel / faults: 11736 minor 6 major
06-06 19:50:28.972 1143 23561 E ProcessRecord: 26% 95/kswapd-1:0: 0% user + 26% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 16% 514/logd: 1.5% user + 15% kernel / faults: 100 minor
06-06 19:50:28.972 1143 23561 E ProcessRecord: 13% 7259/com.tencent.android.qqdownloader: 6.7% user + 6.5% kernel / faults: 800 minor
06-06 19:50:28.972 1143 23561 E ProcessRecord: 81% TOTAL: 17% user + 56% kernel + 2.2% iowait + 2.6% irq + 2.1% softirq
06-06 19:50:28.972 1143 23561 E ProcessRecord: CPU usage from 78ms to 1746ms later (2023-06-06 19:50:06.294 to 2023-06-06 19:50:07.962):
06-06 19:50:28.972 1143 23561 E ProcessRecord: 58% 1143/system_server: 5.4% user + 52% kernel / faults: 1210 minor
06-06 19:50:28.972 1143 23561 E ProcessRecord: 57% 23561/AnrConsumer: 5.4% user + 52% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 0.6% 8658/Binder:1143_1E: 0.6% user + 0% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 30% 95/kswapd-1:0: 0% user + 30% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 80% 21755/com.kwai.m2u: 14% user + 65% kernel / faults: 231 minor
06-06 19:50:28.972 1143 23561 E ProcessRecord: 27% 22984/pool-28-thread-: 12% user + 14% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 25% 22521/ClynRsMonitor: 0% user + 25% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 19% 22985/pool-28-thread-: 0% user + 19% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 4.8% 21963/worker-handler: 0% user + 4.8% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 61% 20854/com.UCMobile: 18% user + 43% kernel / faults: 461 minor 5 major
06-06 19:50:28.972 1143 23561 E ProcessRecord: 18% 23143/sSharedPreferen: 1.5% user + 16% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 7.6% 20854/com.UCMobile: 4.6% user + 3% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 6.1% 21623/PLWorker:worker: 4.6% user + 1.5% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 4.6% 20865/HeapTaskDaemon: 1.5% user + 3% kernel
06-06 19:50:28.972 1143 23561 E ProcessRecord: 41% 16413/com.tmall.wireless: 28% user + 13% kernel / faults: 546 minor
06-06 19:50:28.972 1143 23561 E ProcessRecord: 80% TOTAL: 11% user + 59% kernel + 5% iowait + 2.3% irq + 1.8% softirq
06-06 19:50:28.978 1143 23561 D AnrHelper: Completed ANR of system in 22762ms, latency 2ms
从上面日志可以看出,针对这些特定的应用进行测试时,出现了cpu占用很高的问题,应用的cpu占用很高,导致了系统资源紧缺,
进而触发system的anr。因为日志中没能抓取到anr的trace文件,不能看到system是在具体什么方法逻辑中发生anr。
分析小结:
测试的这些应用它们本身的启动和运行会占用大量的系统资源,包括cpu、内存等,这种情况就会引起系统卡顿和资源的调度慢。
2、测试的复现测试还在继续
看后续的测试会出现什么问题,然后再继续分析。
#29 Updated by 物联网测试组_CDTS 唐霞 about 2 years ago
- Status changed from NEED_INFO to ASSIGNED
- Assignee changed from 物联网测试组_CDTS 唐霞 to CD FW王武军
使用研发提供的VB版本按测试要求未复现 重启的现场,但出现anr和冻屏问题 日志已私发研发
#30 Updated by CD FW王武军 about 2 years ago
6月9日最新进展
1、分析测试在测试过程中发现的新问题-冻屏
a)冻屏时cpu的dump信息
Load: 9.39 / 10.53 / 15.29
CPU usage from 555967ms to 255850ms ago (2023-06-07 00:10:32.929 to 2023-06-07 00:15:33.046):
103% 16537/com.tencent.qqlive:cache: 51% user + 51% kernel / faults: 581937 minor
43% 12213/com.smile.gifmaker: 29% user + 14% kernel / faults: 17951 minor
39% 972/vendor.qti.hardware.display.composer-service: 12% user + 26% kernel / faults: 831 minor
35% 26917/com.tmall.wireless: 24% user + 10% kernel / faults: 51376 minor 4 major
32% 1068/surfaceflinger: 13% user + 19% kernel / faults: 259 minor
28% 18734/com.tencent.mtt: 20% user + 7.7% kernel / faults: 224191 minor
18% 1170/system_server: 4.2% user + 14% kernel / faults: 39102 minor 2 major
14% 1280/cnss_diag: 1% user + 13% kernel / faults: 1 minor
10% 1618/llkd: 0.6% user + 9.7% kernel
b)冻屏时Activity的dump信息
ACTIVITY MANAGER ACTIVITIES (dumpsys activity activities)
home com.android.launcher3
快手 com.smile.gifmaker
快影 com.kwai.videoeditor
王者荣耀 com.tencent.tmgp.sgame
c)冻屏时操作屏幕,获取到的input日志
06-07 01:28:36.033 1170 5343 I InputDispatcher: Monitor swipe-up (server) is stealing touch from [9fef45e com.android.settings/com.android.settings.homepage.SettingsHomepageActivity (server), ]
06-07 01:22:07.652 1170 1875 W InputDispatcher: Untrusted touch due to occlusion by com.smile.gifmaker/10111 (obscuring opacity = 1.00, maximum allowed = 0.80)
06-07 01:22:07.652 1170 1875 D InputDispatcher: Stack of obscuring windows during untrusted touch (977, 751):
06-07 01:22:07.652 1170 1875 D InputDispatcher: * type=FIRST_APPLICATION_WINDOW, package=com.smile.gifmaker/10111, id=3182, mode=USE_OPACITY, alpha=1.00, frame=[0,0][2560,1080], touchableRegion=<empty>, window={com.smile.gifmaker/com.yxcorp.gifshow.HomeActivity#0}, flags={NOT_TOUCH_MODAL}, inputFeatures={NO_INPUT_CHANNEL}, hasToken=false, applicationInfo.name=ActivityRecord{279f76e u0 com.smile.gifmaker/com.yxcorp.gifshow.HomeActivity taskAffinity=10111:com.smile.gifmaker t214}, applicationInfo.token=0xb40000730b169750
06-07 01:22:07.652 1170 1875 D InputDispatcher: * [TOUCHED] type=FIRST_APPLICATION_WINDOW, package=com.android.launcher3/10077, id=3194, mode=USE_OPACITY, alpha=1.00, frame=[0,0][2560,1080], touchableRegion=[0,0][2560,1080], window={b9e9366 com.android.launcher3/com.android.launcher3.uioverrides.QuickstepLauncher}, flags={NOT_TOUCH_MODAL | LAYOUT_IN_SCREEN | LAYOUT_INSET_DECOR | SHOW_WALLPAPER | SPLIT_TOUCH | HARDWARE_ACCELERATED | DRAWS_SYSTEM_BAR_BACKGROUNDS}, inputFeatures={0x0}, hasToken=true, applicationInfo.name=ActivityRecord{ee5e924 u0 com.android.launcher3/.uioverrides.QuickstepLauncher taskAffinity=null t143}, applicationInfo.token=0xb40000730b08c470
06-07 01:22:07.652 1170 1875 W InputDispatcher: Dropping untrusted touch event due to com.smile.gifmaker/10111
06-07 01:22:07.972 1170 1875 W InputDispatcher: Untrusted touch due to occlusion by com.smile.gifmaker/10111 (obscuring opacity = 1.00, maximum allowed = 0.80)
06-07 01:22:07.972 1170 1875 D InputDispatcher: Stack of obscuring windows during untrusted touch (936, 1078):
06-07 01:22:07.972 1170 1875 D InputDispatcher: * type=FIRST_APPLICATION_WINDOW, package=com.smile.gifmaker/10111, id=3182, mode=USE_OPACITY, alpha=1.00, frame=[0,0][2560,1080], touchableRegion=<empty>, window={com.smile.gifmaker/com.yxcorp.gifshow.HomeActivity#0}, flags={NOT_TOUCH_MODAL}, inputFeatures={NO_INPUT_CHANNEL}, hasToken=false, applicationInfo.name=ActivityRecord{279f76e u0 com.smile.gifmaker/com.yxcorp.gifshow.HomeActivity taskAffinity=10111:com.smile.gifmaker t214}, applicationInfo.token=0xb40000730b169750
06-07 01:22:07.972 1170 1875 D InputDispatcher: * [TOUCHED] type=FIRST_APPLICATION_WINDOW, package=com.android.launcher3/10077, id=3194, mode=USE_OPACITY, alpha=1.00, frame=[0,0][2560,1080], touchableRegion=[0,0][2560,1080], window={b9e9366 com.android.launcher3/com.android.launcher3.uioverrides.QuickstepLauncher}, flags={NOT_TOUCH_MODAL | LAYOUT_IN_SCREEN | LAYOUT_INSET_DECOR | SHOW_WALLPAPER | SPLIT_TOUCH | HARDWARE_ACCELERATED | DRAWS_SYSTEM_BAR_BACKGROUNDS}, inputFeatures={0x0}, hasToken=true, applicationInfo.name=ActivityRecord{ee5e924 u0 com.android.launcher3/.uioverrides.QuickstepLauncher taskAffinity=null t143}, applicationInfo.token=0xb40000730b08c470
06-07 01:22:07.972 1170 1875 W InputDispatcher: Dropping untrusted touch event due to com.smile.gifmaker/10111
综合上面3个部分的日志,分析大致如下:
发生冻屏时从cpu的dump信息可以看出,快手和腾讯视屏应用当前对cpu的暂用率很高;
出现冻屏时的应用是:快手,但是从activity的dump信息可以看出,task顺序是:home->快手。
出异常时是快手的界面显示在home的上面,但是输入事件却被input丢弃了,丢弃的原因是:触发了Android 12 中不受信任的触摸事件的处理策略,
事件丢弃的表现就是点击屏幕没有效果,屏幕出现类冻屏。
在发生冻屏,尝试通过adb shell input keyevent 发送的事件系统是能够收到的,说明input的事件接收和发送是正常的,只是因为触发了Android 12 中不受信任的触摸事件的处理策略,
导致事件被丢弃了,理论上这个时候快手的界面应该是消失的,但是不知道什么原因它并没有消失,反而显示在了应该在前台显示的home上面了,这个原因并未找到。
分析小结:
针对票中问题添加的日志,在vb中并未复现出来,但是在vb中发生了anr问题和冻屏问题,
通过日志分别对anr和冻屏分别进行了分析:
1)anr问题
从日志来看是因为打开的这些应用它们保持很高的cpu占用并且内存消耗也很大,导致anr的发生,从系统层面能够想的办法有限,
可以通过lmkd策略或者主动处理后天进程的方式,主动杀除一些cpu占用高,内存占用高的后台应用,这个策略的类似方案,在recent模块有在对应,但是具体的效果还需要进一步观察;
2)冻屏问题
这次发生的冻屏是,上层不知道什么原因触发了事件不信任的机制,导致事件被丢弃,这个的原因未找到,但是有个处理的想法:在发生了这种情况和通过某种手段将盖在上面的应用直接杀除掉,
这个处理和调查还需要再观察下。
#31 Updated by CD FW王武军 about 2 years ago
- Status changed from ASSIGNED to RESOLVED
- Resolution changed from -- to FIXED
【】当前状态
通过添加相关的日志进行vb复现测试并未复现问题,虽然出现了anr和冻屏卡顿的问题,但是都不是
该票发生问题时的同样状态。
从整个分析来看,问题的发生与系统当时cpu的占比、系统内存有很大关系,从后续测试来看有很多类似的问题都与系统的卡顿
有很大的关系,针对应用进程数量的控制、内存大小的控制都有相关的优化投入,可以在后续的版本中尝试复现和跟踪。
#32 Updated by CD FW王武军 about 2 years ago
- % Done changed from 0 to 100
- Fixed Version changed from 2023-05-26 to 2023-06-16
- Root cause set to 问题原因: 系统卡顿、应用cpu占比高、内存不足
#33 Updated by CD FW王武军 about 2 years ago
- Assignee changed from CD FW王武军 to 物联网测试组_TSCD 王维
#34 Updated by 物联网测试组_TSCD 王维 about 2 years ago
目前已在以下版本验证未复现该现象:
FlatBuild_HH_MCE_FSE.M.D.user.01.00.C105.202306250045
FlatBuild_HH_MCE_FSE.M.D.user.01.00.C105.202306260100
FlatBuild_HH_MCE_FSE.M.D.user.01.00.C106.202306270027
FlatBuild_HH_MCE_FSE.M.D.user.01.00.C106.202306280111
故关闭该bug
#35 Updated by 物联网测试组_TSCD 王维 about 2 years ago
- Status changed from RESOLVED to VERIFIED
#36 Updated by 物联网测试组_TSCD 王维 about 2 years ago
- Status changed from VERIFIED to CLOSED