Ethernet data transfer stalling or failing

int length = 0;
    while (client.connected()) {
         while (client.available()) {
               int c = client.read();
               // happy now? :)
               if (c<= 0) {  
                   client.stop();
               } else {
                     length++;
               }
         }
   }
   client.stop();

I've applied the patch from Google Code Archive - Long-term storage for Google Code Project Hosting. to w5100.h (I'm pretty confident I edited the right file, because I missed out a semicolon the first time and nothing would compile ;))

SurferTim:
Because it could be sending a zero byte. Or a -1 byte (not int). When the server is finished, it will close the connection. This has not failed for me:

Alas, no joy. Here's the full 'SurferTim' sketch:

#include <SPI.h>
#include <Ethernet.h>

byte mac[] = { 0x90, 0xA2, 0xDA, 0x00, 0x86, 0x67 }; //physical mac address
EthernetClient client;

#define debug(a) Serial.print(millis()); Serial.print(": "); Serial.println(a);
#define debug2(a, b) Serial.print(millis()); Serial.print(": "); Serial.print(a); Serial.println(b);

void setup(){
  Serial.begin(9600);

  // start the Ethernet connection:
  if (Ethernet.begin(mac) == 0) {
    debug("DHCP Failed");
    // no point in carrying on, so do nothing forevermore:
    while(true);
  }
  // print your local IP address:
  debug2("IP address: ", Ethernet.localIP());
}

char* host = "192.168.1.22";
int port = 4567;
char *path = "/";

int totalRequestSize = 8346;
int failures = 0;
int requests = 0;

void loop() {
  if (client.connect(host, port)) {
    debug("Connected.");
    client.print("GET "); client.print(path); client.println(" HTTP/1.0\n"); 
    int length = 0;

    while(client.connected()) {
      debug("Still connected");
      // the server will close the connection when it is finished sending packets
      while(client.available()) {
        // ...but not until this buffer is empty
        Serial.write(client.read());
        length++;
      }
      debug("Done reading for now...");
    }
    // close your end after the server closes its end
    client.stop();
    
    requests++;
    if (length != totalRequestSize) {
      failures++;
    }
    debug2("Total read: ", length);
    Serial.print("Total failures: "); Serial.print(failures); Serial.print("/"); Serial.println(requests);
  } else {
    debug("Couldn't connect.");
  }
  delay(1000);
}

... and here's the output:

4501: IP address: 192.168.1.17
4705: Connected.
4707: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:26:43 GMT
Connection: close

################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
13361: Done reading for now...
13394: Total read: 8346
Total failures: 0/1
14443: Connected.
14445: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:26:52 GMT
Connection: close

################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
23100: Done reading for now...
23135: Total read: 8346
Total failures: 0/2
24182: Connected.
24184: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:27:02 GMT
Connection: close

################################################################################
################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
32840: Done reading for now...
32874: Total read: 8346
Total failures: 0/3
33923: Connected.
33924: Still connected
33925: Done reading for now...
33933: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:27:12 GMT
Connection: close

################################################################################
################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
42638: Done reading for now...
42672: Total read: 8346
Total failures: 0/4
43720: Connected.
43722: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:27:22 GMT
Connection: close

################################################################################
################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
############################46257: Done reading for now...
46290: Still connected
46315: Done reading for now...
46349: Still connected
46373: Done reading for now...
46407: Still connected
46432: Done reading for now...
46466: Still connected
46490: Done reading for now...
46523: Still connected
46548: Done reading for now...
46581: Still connected
46607: Done reading for now...
46640: Still connected
46665: Done reading for now...
46698: Still connected
46723: Done reading for now...
46756: Still connected
46781: Done reading for now...
46815: Still connected
46839: Done reading for now...
46873: Still connected
<snip>
72265: Still connected
72291: Done reading for now...
72324: Still connected
72348: Done reading for now...
72382: Still connected
72407: Done reading for now...
72440: Still connected
72465: Done reading for now...
72499: Still connected
72523: Done reading for now...
72557: Still connected
72582: Done reading for now...
72614: Still connected
72640: Done reading for now...

As you can see, at some point it just stalls; I left it for around 25 seconds before pulling the plug. I'll paste the results of the 'doughboy' test implementation next (but it's the same story, basically)

Here's the corresponding 'doughboy' implementation, using int not char, and with Serial output to make the logging comparable:

#include <SPI.h>
#include <Ethernet.h>

byte mac[] = { 0x90, 0xA2, 0xDA, 0x00, 0x86, 0x67 }; //physical mac address
EthernetClient client;

#define debug(a) Serial.print(millis()); Serial.print(": "); Serial.println(a);
#define debug2(a, b) Serial.print(millis()); Serial.print(": "); Serial.print(a); Serial.println(b);

void setup(){
  Serial.begin(9600);

  // start the Ethernet connection:
  if (Ethernet.begin(mac) == 0) {
    debug("DHCP Failed");
    // no point in carrying on, so do nothing forevermore:
    while(true);
  }
  // print your local IP address:
  debug2("IP address: ", Ethernet.localIP());
}

char* host = "192.168.1.22";
int port = 4567;
char *path = "/";

int totalRequestSize = 8346;
int failures = 0;
int requests = 0;

void loop() {
  if (client.connect(host, port)) {
    debug("Connected.");
    client.print("GET "); client.print(path); client.println(" HTTP/1.0\n"); 

    int length = 0;
    while (client.connected()) {
      debug("Still connected");
         while (client.available()) {
               int c = client.read(); // Yes, int! :)
               if (c<= 0) {  
                   client.stop();
               } else {
                 Serial.write(c);
                 length++;
               }
         }
      debug("Done reading for now...");
   }
   client.stop();
       
    requests++;
    if (length != totalRequestSize) {
      failures++;
    }
    debug2("Total read: ", length);
    Serial.print("Total failures: "); Serial.print(failures); Serial.print("/"); Serial.println(requests);
  } else {
    debug("Couldn't connect.");
  }
  delay(1000);
}

... and the serial monitor output:

4501: IP address: 192.168.1.17
4504: Connected.
4505: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:41:16 GMT
Connection: close

################################################################################
################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
##############################################################################13102: Done reading for now...
13135: Still connected
13160: Done reading for now...
13194: Still connected
13218: Done reading for now...
13252: Still connected
13277: Done reading for now...
13310: Still connected
13335: Done reading for now...
13368: Still connected
13393: Done reading for now...
13426: Still connected
13452: Done reading for now...
<snip>
30840: Still connected
30865: Done reading for now...
30899: Still connected
30923: Done reading for now...
30957: Still connected
30982: Done reading for now...
31015: Still connected

FWIW, via Wireshark I'm still seeing the same TCP Retransmission behaviour as always from the server; in this most recent case, the Ardunio ACKs at 8069 bytes (so achingly close the full payload!), then the server sends another packet back with 194 further bytes, and ends up retransmitting the same packet ten times before I pull the plug (...in other words, I'm still relatively confident that it's not the server).

I should also say again, I really do appreciate you all taking the time to help me with this headscratcher.

@doughboy: Did you go to U.C. Berkeley? -1 int means you have not received the next packet yet. NOT the last packet. The server will close the connection when it is finished.

You can think anything you want. You can try to close the connection any way you want. My stuff works, and I have plenty of spare time to wait while you figure that out. :smiley:

@lazyatom: Normally, that corrects the problem. I'll look at your code.

SurferTim:
@lazyatom: Normally, that corrects the problem. I'll look at your code.

Thanks; if you want to replicate the server, you can find it in the gist I posted earlier: Arduino Ethernet flaky transfer example · GitHub

You'll need ruby and the 'sinatra' rubygem; running 'ruby server.rb' (if that's what you called it) will bring up a server process on port 4567.

I don't think the patch is related to your issue, because you are getting available==0 and not > 1024.

I suggest just try a different server.

I'll go with doughboy on that one. Try a standard http server that returns something you can read. It may be your server.

But if you are seeing this on your serial display, that is not printing all this because "client.available() == 0". It is returning a non-zero number when it shouldn't be. If it were returning zero, it would not have printed any of this.

################################################################################
################################################################################
################################################################################

################################################################################

ok, you can skip the test for 0 and -1 then. This is really a non standard implementation of stream read.

this is from the library source

int EthernetClient::read() {
  uint8_t b;
  if ( recv(_sock, &b, 1) > 0 )
  {
    // recv worked
    return b;
  }
  else
  {
    // No data available
    return -1;
  }
}

doughboy:
ok, you can skip the test for 0 and -1 then. This is really a non standard implementation of stream read.

this is from the library source

int EthernetClient::read() {

uint8_t b;
 if ( recv(_sock, &b, 1) > 0 )
 {
   // recv worked
   return b;
 }
 else
 {
   // No data available
   return -1;
 }
}

Stick around a while, doughboy. I could use the help! :slight_smile:

SurferTim:
I'll go with doughboy on that one. Try a standard http server that returns something you can read. It may be your server.

I've just tried Apache, with the same results after a few requests (I've removed the extremely verbose output for the moment too):

4501: IP address: 192.168.1.17
4504: Connected.
4505: Still connected
4509: Done reading for now...
4542: Still connected
4566: Done reading for now...
4598: Still connected
4622: Done reading for now...
4654: Still connected
4678: Done reading for now...
4710: Still connected
6519: Done reading for now...
6521: Total read: 8393
Total failures: 0/1
7532: Connected.
7534: Still connected
9320: Done reading for now...
9322: Total read: 8393
Total failures: 0/2
10333: Connected.
10335: Still connected
12120: Done reading for now...
12122: Total read: 8393
Total failures: 0/3
13135: Connected.
13137: Still connected
14923: Done reading for now...
14925: Total read: 8393
Total failures: 0/4
15938: Connected.
15940: Still connected
17184: Done reading for now...
17185: Still connected
17185: Done reading for now...
17208: Still connected
17233: Done reading for now...
17266: Still connected
17292: Done reading for now...
<snip>
30545: Still connected
30570: Done reading for now...
30603: Still connected
30628: Done reading for now...
30661: Still connected
30687: Done reading for now...
30720: Still connected
30745: Done reading for now...
30778: Still connected
30803: Done reading for now...
30836: Still connected
30861: Done reading for now...

... and you disabled the SD SPI interface on digital pin 4? Just checking. There have been more than a few posts here.

You are either getting interference on the ethernet connection, or the SPI interface. That is the only two interfaces you are using.

"Still connected" and "Done reading for now" back-to-back means the server is not finished sending packets. It just did not receive a new packet during that check.

regardless if the socket recv returned 0 or -1, EthernetClient read forces it to -1, so you really don't know if the -1 means server disconnected or there is just no data in the buffer.

I cannot determine if read() is blocking or not, because the source trail ends in w5100.cpp getRXReceivedSize function, and I cannot find the implementation of readSnRX_RSR if it is blocking or not. I would still assume read() is blocking, otherwise, there is no point calling available() first before calling read().

So if the code below does not work, I'd have to say the problem is in the server.

    int length = 0;
    while (client.connected()) {
      debug("Still connected");
         while (client.available()) {
                 Serial.write(client.read());
                 length++;
         }
      debug("Done reading for now...");
   }
   client.stop();

to disable SD, set pin 4 to output and set it to high.

I cannot determine if read() is blocking or not, because the source trail ends in w5100.cpp getRXReceivedSize function, and I cannot find the implementation of readSnRX_RSR if it is blocking or not.

That function is not blocking with the condition that the "605 Bug" patch has been applied.

SurferTim:
... and you disabled the SD SPI interface on digital pin 4? Just checking. There have been more than a few posts here.

In the original sketch (i.e. before I started paring it down to debug), I was, and I was also writing to the SD card in order to cache the data. I've added the following code back in to setup():

pinMode(4, OUTPUT);
digitalWrite(4, HIGH);

The good news is that the Arduino made around 50 successful requests to Apache, and around 25 requests to the ruby server, each downloading the full payload. The bad news is that both runs still eventually hit a loop where the Arduino was still connected, but no data was available to be read. Subsequent runs saw the error reappear after as little as 10 requests, which is probably few enough to suggest that it's still not the root cause.

I'd be really interested in anyone who can run these examples on their own Arduinos to see if they definitely don't see this behaviour at all.

Additionally, would usage of the SD card while the ethernet is active cause interference? Everything I've read implies that they are intended to be used at the same time (e.g. serving files from the SD card, which is a classic example), so that seems unlikely...

SurferTim:
"Still connected" and "Done reading for now" back-to-back means the server is not finished sending packets. It just did not receive a new packet during that check.

Yeah, I can see what you mean, but I am reasonably confident my reading of the wireshark logs - the Arduino is definitely due to send the server an ACK packet, but there's nothing making it onto the IP stack on the server machine. So in a sense, you are describing the situation; it's just that the server hasn't sent any new packets, because it's still waiting for an acknowledgement of the previous one.

ok, I have determined read() to be a non blocking call, since available(), which is known to be non blocking calls getRXReceivedSize as well.
So calling read() essentially calls available() for you. So it is redundant to call available(). Really a non standard implementation.

int EthernetClient::available() {
  if (_sock != MAX_SOCK_NUM)
    return W5100.getRXReceivedSize(_sock);
  return 0;
}

you can use both ethernet and SD in the same sketch. You just need to disable both at setup(), and the library will take care of enabling/disabling.

doughboy:
ok, I have determined read() to be a non blocking call, since available(), which is known to be non blocking calls getRXReceivedSize as well.
So calling read() essentially calls available() for you. So it is redundant to call available(). Really a non standard implementation.

That is correct. client.available() returns 0 if there are no characters in the rx buffer, and client.read() returns -1 (int value 65535) if there are no characters in the rx buffer. But neither are blocking.

ok, one more try.

    int length = 0;
    while (client.connected()) {
      debug("Still connected");
         while (client.available()) {
                 byte buffer[64];
                 int count = 0;
                 while ((count=client.read(buffer,64))>0) {
                         Serial.write(buffer,count);
                 }
         }
      debug("Done reading for now...");
   }
   client.stop();

I am counting on the fact that if buffered read returns 0, we are guaranteed it is the eof based on this code in socket class. and that if buffered read returns zero, we are guaranteed available() and connected() will return false. Had it returned -1, then available() would be false but connected() would be true, and you will stay in the loop.

    if ( status == SnSR::LISTEN || status == SnSR::CLOSED || status == SnSR::CLOSE_WAIT )
    {
      // The remote end has closed its side of the connection, so this is the eof state
      ret = 0;
    }

minor edit. actually, never mind this post, I just realized with available(), you are guaranteed at least 1 byte is available to read.

I like that code. That looks good to me.

Actually, that will be good when you start going the other way (like SD to ethernet). Sending one character at a time using client.write(byte) is a waste of internet resources. This works much better:

client.write(outBuffer,strlen(outBuffer));

I brought the Arduino home to see if the stalling behaviour occured on a different network, and the code from earlier (http://arduino.cc/forum/index.php/topic,95531.msg717698.html#msg717698) has already run ~350 requests without a single failure, and that's even without setting the SD pin to HIGH.

All fingers now point at the actual network at my office, which is both great to have isolated as the problem, and infuriating for obvious reasons.

Thanks again for all your help folks, I really appreciate it, and I definitely learned a lot!