Saturday, July 5, 2014

Fixing random reboots on Samsung Kitkat Firmwares like I9505XXUGNF1

After updating my SGS 4 to I9505XXUGNF1 I like many others experienced random crashes which at first glance seemed to be reboots of the device. Looking at the logcat I figured out that actually the system server process crashes due to an ArrayIndexOutOfBoundsException. After reading the source code I figured what the problem is and wrote this small module to workaround the crash. My S4 is running without a single crash since.

Here are some technical details:


The logcat shows:


06-27 16:29:22.471 E/AndroidRuntime(28244): !@*** FATAL EXCEPTION IN SYSTEM PROCESS: ActivityManager
06-27 16:29:22.471 E/AndroidRuntime(28244): java.lang.ArrayIndexOutOfBoundsException: length=14; index=-1
06-27 16:29:22.471 E/AndroidRuntime(28244): at com.android.server.am.ProcessList.computeNextPssTime(ProcessList.java:580)
06-27 16:29:22.471 E/AndroidRuntime(28244): at com.android.server.am.ActivityManagerService.requestPssAllProcsLocked(ActivityManagerService.java:16951)
06-27 16:29:22.471 E/AndroidRuntime(28244): at com.android.server.am.ActivityManagerService.updateOomAdjLocked(ActivityManagerService.java:17766)
06-27 16:29:22.471 E/AndroidRuntime(28244): at com.android.server.am.ActivityManagerService.trimApplications(ActivityManagerService.java:17831)
06-27 16:29:22.471 E/AndroidRuntime(28244): at com.android.server.am.ActivityStackSupervisor.activityIdleInternalLocked(ActivityStackSupervisor.java:2992)
06-27 16:29:22.471 E/AndroidRuntime(28244): at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.activityIdleInternal(ActivityStackSupervisor.java:3961)
06-27 16:29:22.471 E/AndroidRuntime(28244): at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.handleMessage(ActivityStackSupervisor.java:3983)
06-27 16:29:22.471 E/AndroidRuntime(28244): at android.os.Handler.dispatchMessage(Handler.java:102)
06-27 16:29:22.471 E/AndroidRuntime(28244): at android.os.Looper.loop(Looper.java:157)
06-27 16:29:22.471 E/AndroidRuntime(28244): at com.android.server.am.ActivityManagerService$AThread.run(ActivityManagerService.java:2376)

The problem is in the method computeNextPssTime.

The last statement in the method accesses either the array named sFirstAwakePssTimes or sSameAwakePssTimes. An ArrayIndexOutOfBoundsException happens when the parameter procState is wrong. My first guess was that -1 is used as index into the array.
Looking at the call stack one can see that the parameter is a value taken from an instance of the class ProcessRecord. The value is initialized with -1 when that instance is created.

Somehow this invalid value for the procState is being used and finally ends up being used as the index into the array.

I don't know what causes this (could be timing issues or some changes that Samsung made). Nevertheless, it is possible to prevent the system server to crash by changing what the computeNextPssTime method does. 

Using the amazing Xposed Framework I created an Xposed module that will check the procStat parameter value and in case of an invalid valued (like -1) will return a default value for the result of the method. This works pretty well to workaround the problem. Of course, it does not fix the actual problem but my phone is running well since then.

Whenever -1 or any other invalid value is passed to computeNextPssTime a message is logged to Android's log with the tag SSCF. Using logcat you can see how often this happens and how often the system server would crash if not using the module.

Since the problematic code is in AOSP I wonder why this seems to happen on newer Samsung Kitkat ROMs only. To me the problem seems to occur with an increasing chance the more processes are running. Maybe this is why users are starting to see random reboots only after a few days (when more apps might have been installed). At least this is what happened to me.

Discussion can be found on this XDA forum thread.