Linux/Windows V0022/1.0 ethernet problem SOLVED

Hopefully the basic bugs are out of the Arduino and Atmel code to deal with this now.
http://arduino.cc/forum/index.php/topic,68248.0.html

This is a Mega2560 with the Arduino ethernet shield.

Atmel's compiler avr-gcc v4.5.1 and Arduino-V0022 still have an ethernet problem, so if you need ethernet client.read(), stay clear for a while.

void loop()
{
  int thisByte;
  int thisCount;

  if(client.connected())
  {
    thisCount = client.available();

    while(thisCount > 0)
    {
      thisByte = client.read();
      Serial.print("Available ");
      Serial.println(thisCount, DEC);
      Serial.println(thisByte, DEC);
      thisCount = client.available();
    }
    // other send and processing here, but the code only gets here once.

  }
  else
  {
    // try to reconnect
  }
  delay(100);

}

The code stays in the "while(thisCount > 0)" loop after receiving one response from the server.

The serial prints the available as 1279, decrementing after each read down to 1024, then starting again at 1279, and returns the same string of characters over and over. If you did not notice, the range is 1279 - 1024 = 255. Anyone recognize what might cause that? Looks like it is adding/comparing an int value in "available", and decrementing a byte value in "read".

The server only sent 12 characters and I see no traffic on the ethernet router like I do with the Windows compiled version.

The same server works fine with Windows compiled code. ??

EDIT: If I disable the server (terminate the connection), the program stops working. I have a heartbeat program on pin 13, and until I reconnect, it is out.

I though it was trying to connect, but that was apparently because I reconnected with the serial IDE, and it restarts the Arduino. Then it tries to connect correctly.

I am going to start debugging in client.cpp in the ethernet library. Somewhere around here:

int Client::available() {
  if (_sock != MAX_SOCK_NUM)
    return W5100.getRXReceivedSize(_sock);
  return 0;
}

int Client::read() {
  uint8_t b;
  if (!available())
    return -1;
  recv(_sock, &b, 1);
  return b;
}

If I bitwise AND the available with 1023 (drop bit 10), it almost works.

thisCount = client.available() & 1023;

I confess. I cheated. :frowning:

I went to my Windows box, where I knew all the libraries functioned, and copied them to a memory stick.
Backed up my Linux libraries just in case, and copied the Windows libraries into their place.

Same result. Cheaters never prosper. :frowning:

Knowing that those libraries worked with WinAVR, and they are just text files to be used by the compiler, something in the avr-gcc v4.5.1 compiler is still malfunctioning.

It does not even require the server to send anything. I removed the request from the loop. Now it only reads the response, which there should not be one, since a request was never sent to the server. ??

So I guess Arduino is once again out of the woods, and Atmel is back in the firing line...unless there is some other surprise lurking for me in the "/cores/arduino/" files. Does anything there change a data type (like a #define) that the ethernet client routines use?

ADD: Client::available() returns 1024 by default, rather than 0. The code below demonstrates this. I remarked out the client.read() so you can see it is always 1024. If you unremark the read(), the available goes to 1279, then decrements with every read() until it reaches 1024, then it starts over again at 1279. I have the routine slowed up so you can see it happening. Pin 13 is set to blink once a second.

It appears that the highByte of the ethernet RX buffer pointer/counter is not being initialized and decremented properly. If someone would be kind enough to point me to where in the code this is accomplished, that would be a big help. Could this be related to the increased size of the receive buffer in the Mega2560? It appears there is a difference in buffer sizes among the different models.

It should not have anything in the ethernet buffer. You can see in the code below that no request was made.

#include <SPI.h>
#include <Client.h>
#include <Ethernet.h>
#include <Server.h>
#include <Udp.h>

byte mac[] = { 0x90, 0xA2, 0xDA, 0x00, 0x59, 0x67 };  
byte ip[] = { 192, 168, 2, 2 };    
byte gateway[] = { 192, 168, 2, 1 };
byte subnet[] = { 255, 255, 255, 0 };
// www.google.com ip
byte server[] = { 74, 125, 113, 103 };

Client client(server,80);

int loopCount = 0;

void setup()
{
  
  Ethernet.begin(mac, ip, gateway, subnet); 
  Serial.begin(9600);
  pinMode(13,OUTPUT);
  
  if (client.connect()) {
    Serial.println("connected");
  } else {
    Serial.println("connection failed");
  }
}

void loop()
{
  int thisByte;
  int newCount = 0;    

  if(client.connected())
  {
    newCount = client.available();

//    if(newCount > 0) thisByte = client.read();    

    Serial.print("Available ");
    Serial.println(newCount, DEC);
  }
  else if(loopCount == 0)
  {
    // if not connected, attempt to reconnect
    client.stop();

    Serial.print("Connecting...");
    delay(10);
    
    if (client.connect()) {
      Serial.println("ok");
    } else {
      Serial.println("failed.");
      delay(10);
    }
  }

  // heartbeat
  if(loopCount > 8)
  {
    digitalWrite(13,HIGH); 
    loopCount = 0;
  }
  else
  {
    loopCount = loopCount++;
  }

  if(loopCount == 4)
  {
    digitalWrite(13,LOW); 
  }

  delay(100);
}

EDIT: This is from w5100.cpp. Do you see the pointer (ptr)? Why convert a 16 bit int to an 8 bit pointer, then back to a 16 bit pointer? That is what it is doing, right?

void W5100Class::recv_data_processing(SOCKET s, uint8_t *data, uint16_t len, uint8_t peek)
{
  uint16_t ptr;
  ptr = readSnRX_RD(s);
  // this type conversion
  read_data(s, (uint8_t *)ptr, data, len);
  if (!peek)
  {
    ptr += len;
    writeSnRX_RD(s, ptr);
  }
}

void W5100Class::read_data(SOCKET s, volatile uint8_t *src, volatile uint8_t *dst, uint16_t len)
{
  uint16_t size;
  uint16_t src_mask;
  uint16_t src_ptr;

  // this type conversion  
  src_mask = (uint16_t)src & RMASK;
  src_ptr = RBASE[s] + src_mask;

  if( (src_mask + len) > RSIZE ) 
  {
    size = RSIZE - src_mask;
    read(src_ptr, (uint8_t *)dst, size);
    dst += size;
    read(RBASE[s], (uint8_t *) dst, len - size);
  } 
  else
    read(src_ptr, (uint8_t *) dst, len);
}

I would really appreciate some help help with the debugging of the latest releases of Atmel's avr-gcc (v4.5.1) and Arduino V0022. As far as I can tell, I am down to this one single bug. Then I will have a latest release of both the Arduino and Atmel code operational. Wouldn't that be nice? Or do you prefer avr-gcc v4.3.4 all patched up?

The pay sucks. But hopefully it will not take an experienced coder very long to help debug this, or at least direct me to the code that handles the ethernet shield RX buffer and pointer/counters.

Here is the post in the development section:
http://arduino.cc/forum/index.php/topic,68624.0.html

Please keep your effort under a single topic.

Could this be related to the increased size of the receive buffer in the Mega2560?

I vaguely recall from the Developer's List that there is just such an issue.

The archive is available here...
http://arduino.cc/pipermail/developers_arduino.cc/
...and is somewhat Google searchable.

I believe Limor Fried (aka Lady Ada) volunteered to correct the problem.

Edit: Nope. The issue was with the SD card library not the Ethernet library. The symptoms are certainly similar so the issue (truncated integer value) could be the same.

Thanks for the info. However, it was a very depressing trip through history. According to a Google search that you recommended, this problem has been around since 2009, and no solution reported. It was a relief to see they were looking into the same code sections I am checking now. At least I'm not way off base.

One user claimed to have found the solution, but would not disclose it until somebody else showed interest. I think that was B.S. What do you think? It is reply#3 here:

I was hoping to tell people that had problems with a Linux compiler version to "update to the latest version", but that version will not compile "blink".

I tried. I got the two bugs out that allows the latest version to compile "blink"!
And even at that, nobody said "thanks", or "it doesn't work for me", or "we already knew about that". Just a little weird.

EDIT: I did not want anyone to think I have given up. I see reports from users of specific versions that the ethernet shield works in that version. I was hoping someone remembered what caused this, and how it was corrected.

The versions reported to be working are:
v4.3.5
v4.4.3

Not working are:
v4.4.2 (avrlibc-1.6.9)
v4.5.1 (avrlibc-1.7.1)

SurferTim:
One user claimed to have found the solution, but would not disclose it until somebody else showed interest. I think that was B.S. What do you think?

In that case, I agree with Thomas33. Because it is not his code and because he is not working in a professional capacity...

  1. There is a possibility that the problem is not in with the library but with his system. Having a second person confirm his observations dramatically increases the likelihood that the problem really is in the library.

  2. He may not have correctly identified the actual bug. By working with a second person, they are much more likely to identify the root cause. This is one of the pillars of Agile Software Development; which works very well for some people.

  3. If he had incorrectly labeled the library at fault or failed to identify the root cause of the problem and then released a "fix" he would have made the situation much worse.

Have you tried to contact Thomas33?

And even at that, nobody said "thanks", or "it doesn't work for me", or "we already knew about that". Just a little weird.

The bleeding edge can be a lonely place.

Try to bear in mind that the target audience for the Arduino are folks who have very little interest in compilers and libraries are little more than a means to an end. The typical user wants to build something that works and then move on. If the something is buggy, they will simply use a different version or work around the problem.

Also try to bear in mind that for the vast majority of users the Arduino is a hobby. Even the folks who are interested in your work. It may take weeks or even months before they have time to absorb and understand what you are doing simply because life gets in the way.

EDIT: I did not want anyone to think I have given up.

Excellent! That's the attitude that makes the Arduino such a great concept.

I didn't mean to make it sound like I thought others were ungrateful. It was only the lack of interest. I thought there would be a bigger Linux audience here than this, being open source and all. It appears Windows is the O.S. of choice. Not mine tho.

I don't want to take up a lot of your time, but I'm not really sure I understand how all this software got here.
Who wrote and maintains the Atmel Linux version compiler?
Who wrote and maintains the Arduino IDE for Linux?

The new Linux version compiler works pretty good once it is patched. If I don't need the ethernet client, I use it rather than the Windows version. It compiles and uploads faster.

Knowing what I know now, I think there would be a much larger Linux audience here if we could get a current version working. Maybe a lot of Linux people wandered off because the compiler in their repository (like mine) didn't work.

I do not agree with your evaluation of Thomas33. He should have presented the fault, and his solution. Then it would be up to the others on the forum to check it and respond.

I could have said "the new Atmel compiler won't compile blink, and I know how to fix it. Would you like to know how? Not gonna tell ya unless you grovel a little". And then I disappeared.

But that isn't me.

ADD: Do you think there would be this lack of response if the Windows version would not even compile blink? What would we do then?

Who wrote and maintains the Atmel Linux version compiler?

I have no idea but I've given you contact information for someone who can answer that question.

Who wrote and maintains the Arduino IDE for Linux?

David Mellis is responsible for the Arduino IDE, the Arduino Core, and the Standard Libraries. I suspect he participated in choosing the GCC AVR toolset provided with the Windows package.

Thanks for the info. I found the Savannah site for the Atmel compiler and posted the delay.h bug and fix.
I'll look into submitting the ethernet bug soon.

I am hoping that someone will come along, recognize this problem, and have a solution.

Is there a debugging program for the Arduino/Atmel compiler? I was thinking maybe being able to single step through that section of code might help.

I have not tried this new avr-gcc version on anything but a Mega2560. I was also hoping that someone would try this on a Uno or other board to see if the ethernet bug is a global problem, or limited to the Mega2560.

EDIT: I do not want anyone to think I am unhappy with Arduino and Atmel. I am very pleased with both. My concept-to-prototype time has been greatly reduced with Arduino. That is why my effort to get the Linux version working.

You should be able to use AVR Studio (from Atmel) and JTAGICE-like hardware (ie an Atmel "Dragon") to debug Arduino sketches.
Easier to do it the old fashioned way (disassemble and read the assembler code looking for bits that are wrong.)

You said you ran the same sketch with the 4.3.x compiler (as provided with the windows Arduino IDE), and it worked OK there, right?

Have you looked at all at the gcc buglist? How about 46779 – [4.4/4.5/4.6 Regression][avr] wrong code generation for values held in R28/R29 ?
It mentions another bug that specifically refers to Arduino issues, having to do with bytewide shifts of the sort I would expect to exist in the network code...

Thanks for the tips on the debugging.
Edit: What disassembler would you recommend?
I was going to ask how you get to the assembled code to disassemble it, but I found it in the /tmp directory with a buildxxxx directory name.

Yes, the Windows IDE downloaded from Arduino works fine with the same sketch. And that is v4.3.x?

How does GCC fit into this? Isn't the Atmel compiler avr-gcc? Is there some "sharing" of the compiler duties here? I'm still new with the Atmel stuff, so I may still ask newbie questions.

ADD: This code is from w5100.cpp. It uses a "pointer" that isn't really a pointer. The code type casts a 16 bit value as a pointer, then passes that value to the read_data routine, at which time it is converted back to a 16 bit value. Does anyone know why this is coded this way? Why not just pass the value as a 16 bit value? I added remarks at the questionable points.

void W5100Class::recv_data_processing(SOCKET s, uint8_t *data, uint16_t len, uint8_t peek)
{
  uint16_t ptr;
  ptr = readSnRX_RD(s);

// first type cast of 16 bit value to "pointer".
// what I expected was: read_data(s, &ptr, data, len);
  read_data(s, (uint8_t *)ptr, data, len);

  if (!peek)
  {
    ptr += len;
    writeSnRX_RD(s, ptr);
  }
}

// Value passed to this routine as a "pointer".
// whai I expected was: void W5100Class::read_data(SOCKET s, volatile uint16_t *src, volatile uint8_t *dst, uint16_t len)

void W5100Class::read_data(SOCKET s, volatile uint8_t *src, volatile uint8_t *dst, uint16_t len)
{
  uint16_t size;
  uint16_t src_mask;
  uint16_t src_ptr;

// second type cast from "pointer" to 16 bit value
// what I expected was:  src_mask = *src & RMASK;
  src_mask = (uint16_t)src & RMASK;
  src_ptr = RBASE[s] + src_mask;

  if( (src_mask + len) > RSIZE ) 
  {
    size = RSIZE - src_mask;
    read(src_ptr, (uint8_t *)dst, size);
    dst += size;
    read(RBASE[s], (uint8_t *) dst, len - size);
  } 
  else
    read(src_ptr, (uint8_t *) dst, len);
}

I could see if sizeof(uint8_t *) changed, this could be a problem. Maybe not now, but...

What disassembler would you recommend?

"avr-objdump -S" (part of the gcc binutils) works fine; mixed C source/asm output and everything. Not interactive at all, of course.

How does GCC fit into this? Isn't the Atmel compiler avr-gcc? Is there some "sharing" of the compiler duties here?

Atmel doesn't HAVE a C compiler. They have an IDE (AVR Studio) that supports a number of third party C Compilers, including gcc. I actually don't recall whether the avr studio includes gcc as part of the install or whether you have to install it separately. I am not sure what level of involvement Atmel has in gcc development and bugfixing; they sure don't have much "control" over the direction and philosophy of gcc in general (avr being a minor and annoying 8-bit CPU that is at the very edge of the architectural landscape of things that gcc supports. (perhaps the ONLY 8bit cpu supported by gcc?)) Compiler geeks can be really annoying :frowning:

// second type cast from "pointer" to 16 bit value
// what I expected was: src_mask = *src & RMASK;
src_mask = (uint16_t)src & RMASK;
src_ptr = RBASE + src_mask;[/quote]
(I have not looked at the ethernet library very carefully yet.) This sort of ugliness is pretty common in dealing with memory areas (usually on peripheral devices) with "odd" hardware limitations. I believe (after a quick look) that this is managing the 2k circular windows of buffer memory for each "socket" on the WizNet chip. RMASK is 0x7FF, and this sort of operation on pointers (AND with mask) is a common way of implementing a circular buffer whose size is a power of two, though of course the C language doesn't permit such "logic" operations on actual pointer data types. You could probably implement the whole thing using array indicies as the passed parameters rather than pointers. but that would have other disadvantages...
> I could see if sizeof(uint8_t *) changed, this could be a problem. Maybe not now, but...
Well, yes. Things would be more complicated (but perhaps clearer?) if the size of a pointer on the AVR and the size of a pointer within the WizNet architecture were different, and all hell will break loose nearly everywhere if pointers to different sized objects become different sizes themselves. Even as we speak, the x86 world is going through hell because of the assumptions that have been made for the last 20 years about sizeof(int) == sizeof(long) == sizeof(ptr) == sizeof(int32_t)

Thanks, westfw. That helped clear things up a bit. If the Atmel processors use GCC to compile the code, does that mean they are somewhat x86 compatible? Or does GCC do more than I thought?

That section of code handles the ethernet buffers. That is where I expect the problem to be. I have found the locations of the 4 two-byte values that hold the buffer counts. I see an array of "structures", one "structure" for each of the 4 possible socket connections. When I get some spare time, I will try to access those memory locations directly to see if there is a memory or addressing problem in the current code. The buffer pointers (array indices) are right there also.

I changed my version to pointers in that section of code to what I had expected. I don't care for weird coding. The "integer to pointer and back" was too much of a coding error looking for a time to happen. It did not fix this problem tho.

ADD: This was not as easy as it seemed. The socket.cpp file needed some changes. Mostly this

W5100.read_data(s, (uint8_t *)ptr, buf, data_len);

to

W5100.read_data(s, &ptr, buf, data_len);

does that mean they are somewhat x86 compatible? Or does GCC do more than I thought?

GCC does more than you thought. Originally targetting the Motorola 68000 CPU (Sun, HP, etc), GCC supports a wide variety of CPUs and architectures. x86 is one of the most visible and important, due to the linux projects, but GCC itself is a general purpose compiler system with a back-end (parsing and pseudo-code generation) and a bunch of front ends for different CPUs: Host/Target specific installation notes for GCC - GNU Project It's supposed to be moderately easy to provide support for new CPUs as long as they meet the basic model of a CPU that gcc supports (32bit registers and pointers.)
This is particularly wonderful for chip makers and providers of embedded equipment. With a relatively small effort, your chip can be supported by gcc, given your potential customers quick access to a suite of compilers and related tools. As a chip user, you can use the same compile environment for a large variety of chips. cisco (for example) started using gcc in the 68000 days, and was able to continue using it for MIPS, PPC, and ARM based products.
OTOH, this means that gcc is a massively huge piece of development effort, subject to many of the same problems as any large piece of software coming from a large commercial vendor, especially when it comes to getting timely updates to problems that affect a small subset of the actual users. And that "relatively small effort" tends to be bigger than you expect once you include the libraries that people want to use as well, especially if you want those to be optimized for your CPU (thus avr-libc being a separate project from avr-gcc, and Arduino libraries being separate again.)

I found it. It was in the w5100.h file starting at line 253 in my version. The new compiler version did not like the old way.

The old code:

#define __SOCKET_REGISTER16(name, address)                   \
  static void write##name(SOCKET _s, uint16_t _data) {       \
    writeSn(_s, address,   _data >> 8);                      \
    writeSn(_s, address+1, _data & 0xFF);                    \
  }                                                          \
  static uint16_t read##name(SOCKET _s) {                    \
    uint16_t res = readSn(_s, address);                      \
    res = (res << 8) + readSn(_s, address + 1);              \
    return res;                                              \
  }

The new code:

#define __SOCKET_REGISTER16(name, address)                   \
  static void write##name(SOCKET _s, uint16_t _data) {       \
    writeSn(_s, address,   _data >> 8);                      \
    writeSn(_s, address+1, _data & 0xFF);                    \
  }                                                          \
  static uint16_t read##name(SOCKET _s) {                    \
    uint16_t res = readSn(_s, address);                      \
    uint16_t res2 = readSn(_s,address + 1); 			\
    res = res << 8;						\
    res2 = res2 & 0xFF;						\
    res = res | res2;						\
    return res;                                              \
  }

Works great now! :slight_smile:

This means I have a current version running!
One bug/fix in avr-libc, one in wiring.h, and one in the ethernet library.

so it WAS 46779 – [4.4/4.5/4.6 Regression][avr] wrong code generation for values held in R28/R29 (or related)?? Wow!

westfw:
so it WAS 46779 – [4.4/4.5/4.6 Regression][avr] wrong code generation for values held in R28/R29 (or related)?? Wow!

I don't know. I did not need to get that deep. As soon as I determined how that code worked, it was easy to troubleshoot. First try did it.
I presumed by the write function above that, it also does the &0xFF with the low byte to insure a clear high byte. If you look at that new code, the read function is pretty much a reverse of the write function.

Edit: The closer I look at that, the more it looks like a case of partial uninitialized variable. A 16 bit variable was assigned an address, but never fully initialized (assigned a value). The read returns only a byte. So the high byte (edit: after the second read in the old code) would still be uninitialized after a read.

Hi SurferTim,

I am experiencing the same issues with Mega 2560R2 + Ethernet Shield. I was about to give up until I saw your post. Could you help me understand what steps are exactly necessary to get this fixed? Are any of the code changes checked into githib? If not, could you share with me/us the changed libraries?

Thanks so much in advance,
Sounder030

If you are talking about just the ethenet shield fix, that is in
/arduino-0022/libraries/Ethernet/utility/w5100.h
Open that file with a text editor. At about line 253, you will see the "old code" I posted above.
Edit to the "new code" and save.

Add: That is all one line of code to the compiler. Insure the backslashes stay intact at the end of each line. The only line that does not have a backslash is after the last curly brace.

1000 thanks to SurferTim, CodingBadly and westfw. Now it works for me too.

@CodingBadly: Thanks for putting the arguments, why I didn't posted any assumed bugfix without getting any confirmation even on the bug itself. They exactly matches my reasons. Plus my assumed bugfix next day turned out to not really do it :slight_smile:

SurferTim:
I found it. It was in the w5100.h file starting at line 253 in my version. The new compiler version did not like the old way.

I had (and still have) gcc 4.3.5, and experienced the bug
(I mean the one described in http://arduino.cc/forum/index.php/topic,57972.0.html to avoid confusion). Now I only took your patch from the previous post (w5100.h) and now it works. I did not take earlier mentioned change in socket.cpp.

Even my keep-alive socket connect (for very fast sensor polling) is working now! :)))

If anyone want to see the code (arduino code for the server and java for the client side) just ask for it. I just dont want to flood the board with code nobody wants to see.

Thanks again to you guys!