..

记录一次JAVA死锁分析过程

从昨天开始,研发和测试的同学就在群里一直反馈服务器经常卡死,服务不可用,服务日志也看不到任何异常信息,让研发同学一顿查找原因,找不到原因,最后让我给解决一下。

我首先登录到服务器,容器是没有退出的

docker logs snc-fw-gateway

查看标准输出的内容,没有异常,通过浏览器发送请求,也没有新的日志产生,所以我怀疑是不是JAVA进程退出了,容器没有退出,但是好像不可能,因为进程退出以后,容器一定也会退出。

-bash-4.2$ docker exec -it snc-fw-gateway sh
/app $ jps
5 app.jar
7340 Jps
/app $ 

通过jsp命令,发现进程是存在的。

我怀疑是不是进程在哪里Hang住了

-bash-4.2$ docker exec -it snc-fw-gateway jstack 5 > jstack.log
-bash-4.2$ ls
jstack.log
-bash-4.2$ 

看线程状态

  hellojukay@local ~ $ grep Thread.State jstack.log 
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: WAITING (parking)
   java.lang.Thread.State: TIMED_WAITING (parking)
   java.lang.Thread.State: TIMED_WAITING (parking)
   java.lang.Thread.State: WAITING (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: BLOCKED (on object monitor)
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: TIMED_WAITING (parking)
   java.lang.Thread.State: TIMED_WAITING (sleeping)
   java.lang.Thread.State: WAITING (parking)
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
   java.lang.Thread.State: WAITING (parking)
   java.lang.Thread.State: TIMED_WAITING (parking)
   java.lang.Thread.State: TIMED_WAITING (parking)
   java.lang.Thread.State: TIMED_WAITING (parking)
   java.lang.Thread.State: TIMED_WAITING (parking)
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
   java.lang.Thread.State: TIMED_WAITING (parking)
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: WAITING (on object monitor)
   java.lang.Thread.State: TIMED_WAITING (parking)
   java.lang.Thread.State: TIMED_WAITING (parking)
   java.lang.Thread.State: WAITING (parking)
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: RUNNABLE
   java.lang.Thread.State: WAITING (on object monitor)
   java.lang.Thread.State: WAITING (on object monitor)
hellojukay@local ~ $ grep Thread.State jstack.log  | wc -l
49
hellojukay@local ~ $ ^C

我们看一下被阻塞的线程正在干什么

hellojukay@local ~ $ grep BLOCKED jstack.log  -C 3

得到的结果如下:

"XNIO-1 task-4" #38 prio=5 os_prio=0 tid=0x000056181f8fa800 nid=0x68 waiting for monitor entry [0x00002b8e60c99000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:543)
	- waiting to lock <0x0000000083fd1800> (a java.lang.Object)
	at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
--
	at java.lang.Thread.run(Thread.java:748)

"XNIO-1 task-3" #37 prio=5 os_prio=0 tid=0x000056181e3f7800 nid=0x61 waiting for monitor entry [0x00002b8e606f9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:544)
	- waiting to lock <0x0000000083fd1800> (a java.lang.Object)
	at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
--
	at java.lang.Thread.run(Thread.java:748)

"XNIO-1 task-2" #36 prio=5 os_prio=0 tid=0x0000561820216000 nid=0x5a waiting for monitor entry [0x00002b8e603ac000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:543)
	- waiting to lock <0x0000000083fd1800> (a java.lang.Object)
	at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
--
	at java.lang.Thread.run(Thread.java:748)

都是在等待lock 一个对象0x0000000083fd1800,看下这个兑现现在被谁lock了,最有发现有一个线程已经lock了这个兑现,一直没有释放

"XNIO-1 task-15" #49 prio=5 os_prio=0 tid=0x000056181ff12800 nid=0xb6 runnable [0x00002b8e617a4000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:255)
	at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:424)
	at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(NativePRNG.java:525)
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:544)
	- locked <0x0000000083fd1800> (a java.lang.Object)
	at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
	at sun.security.provider.NativePRNG$Blocking.engineNextBytes(NativePRNG.java:268)
	at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
	at java.security.SecureRandom.next(SecureRandom.java:491)
	at java.util.Random.nextInt(Random.java:390)

涉及公司的隐私的源码我没有放上来,最后通过上述堆栈信息定位了问题代码:

vo.setName(vo.getName() + this.random.nextInt(100) + 1);

这里nextInt方法是线程安全的,他会锁住当前对象中的random对象。也就是这个属性

    private Random random = SecureRandom.getInstanceStrong();

SecureRandom这个类在某些时候存在很严重的性能问题,会导致nextInt卡死不返回。解决办法参考:https://blog.csdn.net/iteye_11910/article/details/82655175