Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Client stuck in socket read
#1
Folks,

We've been running into an intermittent issue where it appears the client side of ULC is getting stuck in a socket read operation. It does not happen for all our clients, but for the clients affected it does occur several times during the day. 

When we dump out the threads on the client, we find the ULC Communication Thread always sitting in a read as shown below. This occurs under a variety of Windows operating systems. When this happens, the client appears to be connected but frozen. We can dump out the threads over several time intervals, and the ULC thread is always in the same place. This is not the case on a normal ULC client session. 

Once frozen, the client will remain in this condition for hours (we let a workstation remain locked up to see if it would timeout or recover). At this point, we're trying to think of what we can try to either pin down or handle this condition. One thing that never changes is the locked object (- locked <0x00000000e1790650> (a java.io.BufferedInputStream)) once a client is locked. This remains the same. 


We've run jconsole and jvisualvm on both the client and server. We know there are no deadlocked threads. We also know that neither the server nor the client is sitting hung up in a section of our application code. 

We are not sure on how we could modify the configuration of the client socket connect to either force it to timeout or retry or whatever other options we might fiddle with to try to narrow down what could be causing this. The only thing we can say for sure is that the lockups will reliably happen during the course of the day.

Any thoughts or help you can offer us would be greatly appreciated.

James

"ULC Communication Controller Thread" #44 prio=6 os_prio=0 tid=0x00000000562a7000 nid=0xbc runnable [0x000000005a53d000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
- locked <0x00000000e1790650> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
- locked <0x00000000e17aa508> (a sun.net.www.protocol.http.HttpURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.access$200(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$9.run(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$9.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessController.doPrivilegedWithCombiner(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
- locked <0x00000000e17aa508> (a sun.net.www.protocol.http.HttpURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(Unknown Source)
at com.ulcjava.container.servlet.client.ResponseInfo.fromConnection(ResponseInfo.java:3)
at com.ulcjava.container.servlet.client.ConnectorCommand.execute(ConnectorCommand.java:64)
at com.ulcjava.container.servlet.client.ServletConnector.executeCommand(ServletConnector.java:90)
at com.ulcjava.container.servlet.client.ServletConnector.a(ServletConnector.java:74)
at com.ulcjava.container.servlet.client.ServletConnector.sendRequests(ServletConnector.java:23)
at com.ulcjava.base.client.UISession$k_.run(Redefined)
at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
- None
Reply
#2
Hi James,

Haven't seen this kind of a problem before.

If ULC client cannot connect to server then it will throw Connector exception. It tries 3 times and throws the exception.

ULC uses HttpUrlConnection to send requests. Then it is making call to

Code:
at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(Unknown Source)
at com.ulcjava.container.servlet.client.ResponseInfo.fromConnection(ResponseInfo.java:3)

And after that it is all Java/Sun code.

Questions:
1. Which Java version on client and ULC versions are you using?
2. When does this error occur? During which usecase?
3. Is there a problem with network link or server when this occurs?
4. Switch on the server and client side login to find out for which request the server is not able to respond.

Thanks

Janak
Reply
#3
(06-12-2018, 04:51 PM)janak.mulani Wrote: Hi James,

Haven't seen this kind of a problem before.

If ULC client cannot connect to server then it will throw Connector exception. It tries 3 times and throws the exception.

ULC uses HttpUrlConnection to send requests. Then it is making call to

Code:
at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(Unknown Source)
at com.ulcjava.container.servlet.client.ResponseInfo.fromConnection(ResponseInfo.java:3)

And after that it is all Java/Sun code.

Questions:
1. Which Java version on client and ULC versions are you using?
The client side version varies. Right now we're using 8.151 on the servers, but we will shortly upgrade to 8.172.

2. When does this error occur? During which usecase?
Typically the clients report they go away from their workstations for a while. Not all clients report this...more like about 10% of them. When they return, they attempt some basic operation like clicking a toolbar button or a menu pull down. The operation never completes and the client side of the application is frozen. In this situation, we've dumped out all the threads from both the server and the client. We couldn't see any place in our code on the server that appeared to be hung up, thought it's a bit hard to be sure given how many threads are running on the server. 

3. Is there a problem with network link or server when this occurs?
We cannot see one. Usually while the user is hung, their internet is working correctly and they are able to launch a new instance of our app. I've observed the lockup one time personally, and I was using desktop streaming to access the client's machine. I never noticed any loss of connectivity with my session, but the client locked up and stayed that way until I killed the session 3 hours later.

4. Switch on the server and client side logging to find out for which request the server is not able to respond.
I will give that a shot.

Thanks

Janak
Reply
#4
Janak,

I sent you a couple of emails with the log files you suggested we capture. If you did not receive them, please let me know and I'll resend. If you did, did they give you any insight? It looks like ULC on the client is blocked, but in the past when we've dumped out the server threads, we don't see it stuck in any of our code. It does not time out...just sits there locked up indefinitely.

James
Reply
#5
Hi James,

I just want to confirm the use case:
1. The user starts the application.
2. Interacts with it for some time.
3. Then leaves the application idle and goes off.
4. When she returns the app is frozen.

Is this correct?

When the application is left idle there are two possibilities:
1. The ULC client is constantly sending keep-alive requests at the specified interval to prevent the server side Http session from timing out. What is your keep-alive interval and what is the session time out on the server?
2. Are you using polling timers in your app? if yes the while the application is idling, the polling timer requests should be going to the server too.

So as you see these are the only two possibilities for an idling ULC client to send http requests to the server side app.

I will go through the logs and get Back to you.

Thanks

Janak
Reply
#6
Janak,

Close. The application appears to freeze as soon as they attempt any kind of UI operation, e.g., selecting an option off a menu pull down. The menu and application appears to be alive, and they can click on the pull down option. Once they do this, it then never completes the operation and will no longer accept any further user interaction. They see the Windows spinning wheel. It doesn't seem to matter what the operation actually is, and users have reported this freezing attempting a variety of operations. What is odd is that only about 10% of our users report this, and similarly configured users do not report any issue.

James
Reply
#7
Hi James,

So the application is idling for some time and then they come back and interact with the application and it freezes. So it seems that the application freezes after first roundtrip to server is initiated after idling. The roundtrip is synchronous meaning the UI is blocked. But since the server does not come back with the response, the UI is never unblocked. Now the question is what is happening to this request from client? If the client is unable to send the request 3 times it would throw a Connection Exception. Since you are not getting connection exception it means that the request has gone through. However, the response from the server is not coming back and hence the client remains blocked. We have to figure out why the response is not coming back or a corrupted response is received. May be network is own, may be the server went down after it received the request. I will check the logs and come back.

Thanks

Janak
Reply
#8
Janak,

Yes we do use the ULCPollingTimer class in our application. We are actually testing today a customer where we've disabled those that periodically fire. We suspected it might be this, but the problem was that in a dump of server threads while the client was frozen did not show us hung up in our server-side timer code.

James
Reply
#9
Janak,

I have some additional info. It appears this issue is limited to a single, albeit HUGE, internet provider. What appears to be happening is that they are either slowing or blocking our timer events on the client side. We're testing some fixes and work-arounds to this. So far, disabling our timers entirely has fixed the issue. What we're doing now is looking at one timer which frequently does really light-weight operations (fires about once every three seconds checking a queue). It appears this one may be the one triggering whatever is happening. In addition, one client at one of the failing locations was able to switch her workstation to a different provider, and her lockups disappeared. We've contacted the IP, but it's like talking to a brick wall. I'll update you once we are certain this is the issue.

James
Reply


Forum Jump: