..
记录一次JAVA死锁分析过程
从昨天开始,研发和测试的同学就在群里一直反馈服务器经常卡死,服务不可用,服务日志也看不到任何异常信息,让研发同学一顿查找原因,找不到原因,最后让我给解决一下。
我首先登录到服务器,容器是没有退出的
docker logs snc-fw-gateway
查看标准输出的内容,没有异常,通过浏览器发送请求,也没有新的日志产生,所以我怀疑是不是JAVA进程退出了,容器没有退出,但是好像不可能,因为进程退出以后,容器一定也会退出。
-bash-4.2$ docker exec -it snc-fw-gateway sh
/app $ jps
5 app.jar
7340 Jps
/app $
通过jsp命令,发现进程是存在的。
我怀疑是不是进程在哪里Hang住了
-bash-4.2$ docker exec -it snc-fw-gateway jstack 5 > jstack.log
-bash-4.2$ ls
jstack.log
-bash-4.2$
看线程状态
hellojukay@local ~ $ grep Thread.State jstack.log
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: WAITING (parking)
java.lang.Thread.State: TIMED_WAITING (parking)
java.lang.Thread.State: TIMED_WAITING (parking)
java.lang.Thread.State: WAITING (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: TIMED_WAITING (parking)
java.lang.Thread.State: TIMED_WAITING (sleeping)
java.lang.Thread.State: WAITING (parking)
java.lang.Thread.State: TIMED_WAITING (on object monitor)
java.lang.Thread.State: WAITING (parking)
java.lang.Thread.State: TIMED_WAITING (parking)
java.lang.Thread.State: TIMED_WAITING (parking)
java.lang.Thread.State: TIMED_WAITING (parking)
java.lang.Thread.State: TIMED_WAITING (parking)
java.lang.Thread.State: TIMED_WAITING (on object monitor)
java.lang.Thread.State: TIMED_WAITING (parking)
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: WAITING (on object monitor)
java.lang.Thread.State: TIMED_WAITING (parking)
java.lang.Thread.State: TIMED_WAITING (parking)
java.lang.Thread.State: WAITING (parking)
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: RUNNABLE
java.lang.Thread.State: WAITING (on object monitor)
java.lang.Thread.State: WAITING (on object monitor)
hellojukay@local ~ $ grep Thread.State jstack.log | wc -l
49
hellojukay@local ~ $ ^C
我们看一下被阻塞的线程正在干什么
hellojukay@local ~ $ grep BLOCKED jstack.log -C 3
得到的结果如下:
"XNIO-1 task-4" #38 prio=5 os_prio=0 tid=0x000056181f8fa800 nid=0x68 waiting for monitor entry [0x00002b8e60c99000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:543)
- waiting to lock <0x0000000083fd1800> (a java.lang.Object)
at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
--
at java.lang.Thread.run(Thread.java:748)
"XNIO-1 task-3" #37 prio=5 os_prio=0 tid=0x000056181e3f7800 nid=0x61 waiting for monitor entry [0x00002b8e606f9000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:544)
- waiting to lock <0x0000000083fd1800> (a java.lang.Object)
at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
--
at java.lang.Thread.run(Thread.java:748)
"XNIO-1 task-2" #36 prio=5 os_prio=0 tid=0x0000561820216000 nid=0x5a waiting for monitor entry [0x00002b8e603ac000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:543)
- waiting to lock <0x0000000083fd1800> (a java.lang.Object)
at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
--
at java.lang.Thread.run(Thread.java:748)
都是在等待lock 一个对象0x0000000083fd1800
,看下这个兑现现在被谁lock了,最有发现有一个线程已经lock了这个兑现,一直没有释放
"XNIO-1 task-15" #49 prio=5 os_prio=0 tid=0x000056181ff12800 nid=0xb6 runnable [0x00002b8e617a4000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:255)
at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:424)
at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(NativePRNG.java:525)
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:544)
- locked <0x0000000083fd1800> (a java.lang.Object)
at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
at sun.security.provider.NativePRNG$Blocking.engineNextBytes(NativePRNG.java:268)
at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
at java.security.SecureRandom.next(SecureRandom.java:491)
at java.util.Random.nextInt(Random.java:390)
涉及公司的隐私的源码我没有放上来,最后通过上述堆栈信息定位了问题代码:
vo.setName(vo.getName() + this.random.nextInt(100) + 1);
这里nextInt
方法是线程安全的,他会锁住当前对象中的random
对象。也就是这个属性
private Random random = SecureRandom.getInstanceStrong();
SecureRandom这个类在某些时候存在很严重的性能问题,会导致nextInt
卡死不返回。解决办法参考:https://blog.csdn.net/iteye_11910/article/details/82655175。