Go Down

Topic: dhcp request (ethernet.begin(mac)) hangs in parseDHCPResponse of Dhcp.cpp (Read 5364 times) previous topic - next topic

DanH

Hi,

First, as a first time poster I'd just like to state that the arduino hardware, software, libraries, and user communities are fantastic. I'm truly amazed at the quality.

I am planning to use an arduino to monitor power buses for our computer room in a fortune 500 company. I purchased an ethernet shield a while back and ran a couple of the examples and everything ran properly out of the box (the primary test being UdpNtpClient).

I finally brought the hardware into the office and started trying to make it work here. Ethernet.begin(mac) failed (hung) right off the bat. I temporarily bypassed the problem by using static IP addressing, but it troubled me DHCP wasn't working so I came back to figure it out.

I found that the call to DHCPResponse never returned. Digging in deeper I found that this procedure would continue attempting to process DHCP options after the end option ($FF) was hit.

when endoption is parsed, it just drops out of the switch statement:
                case endOption :
                    break;

there *should* be nothing left in the buffer at this time, so the while loop should exit:
        while (_dhcpUdpSocket.available() > 0)

BUT for some reason it doesn't. I examined the packet with a protocol analyzer, and there is nothing in the packet after the FF option.

I don't understand what is happening well enough to figure out why the buffer isn't empty but I am able to make it empty by modifying  the endOption case like this:

                case endOption :
                    _dhcpUdpSocket.flush();
                    break;

This works and DHCP no longer hangs when I use our work DHCP server.

I can only guess the reason the original DHCP.cpp code works at home and not in the office has something to do with the options the office dhcp server transmits. It transmits these options: 53, 1, 58, 59, 51, 54, 3, 16, 14, $FF.

My 'fix' works, but it doesn't address the underlying reason as to why either the buffer isn't empty or the function at least doesn't think it is empty.

Dan



SurferTim

It probably thinks it has stuff in the rx buffer when it is empty. The "605 Bug" is the most likely suspect. It affects about all ethernet functions, including udp and dhcp.
http://code.google.com/p/arduino/issues/detail?id=605

Here is a thread where another user had about the same problem with dhcp:
http://arduino.cc/forum/index.php/topic,93623.0.html

DanH

Thanks for the response.

It sounded like this should fix the problem, yet somehow doesn't. I spent the afternoon trying to figure out the problem myself. Even with your change, when I first get back the DHCPOffer packet, it reports 614 bytes in the UDP part of the packet. Looking at wireshark the correct size should have been 311.

Here is how I examine the size of the udp packet (see Serial.println):

Code: [Select]
uint8_t DhcpClass::parseDHCPResponse(unsigned long responseTimeout, uint32_t& transactionId)
{
uint16_t avail;
    uint16_t cc = 0;
    uint8_t type = 0;
    uint8_t opt_len = 0;

    unsigned long startTime = millis();

    while((avail = _dhcpUdpSocket.parsePacket()) <= 0)
    {
        if((millis() - startTime) > responseTimeout)
        {
            return 255;
        }
        delay(50);
    }
    // start reading in the packet
    RIP_MSG_FIXED fixedMsg;
    Serial.print("~bytes at top of parseDHCPResponse=");Serial.println(_dhcpUdpSocket.available());
    _dhcpUdpSocket.read((uint8_t*)&fixedMsg, sizeof(RIP_MSG_FIXED));


I checked  this all the way to the code you suggest modifying and it returns what I see here.

The other puzzling part of this is I used this same arduino/ethernet shield and version of the compiler at home and it worked fine.



SurferTim

Quote
It sounded like this should fix the problem, yet somehow doesn't.


Somehow doesn't? Does it somehow still lock up in the Ethernet.begin(mac) dhcp routine? Did your shield somehow get an ip assigned or somehow not?

DanH

Correct, it never returns from ethernet.begin because it never finds the end of the packet before it goes loopy.

One difference between my home and office environ: At the office there is a redundant DHCP server. The DHCP request gets 2 offers from 2 servers. I wonder if somehow those two packets are in the buffer together and that is why the code thinks there is more data when there should not be. I wouldn't think that would be possible, but I've never worked this close to hardware before.

Unfortunately it would be difficult for me to filter out the 2nd server's offer, but I'm thinking when I get into the office tomorrow I'll dump the entire buffer and look at it to see if I see 2 offers in it.

SurferTim

Two dhcp servers on the same localnet, both sending an offer, may confuse the shield.


DanH

Sure enough, both packets are in the hardware buffer which explains why the code doesn't stop when you would expect it

Code: [Select]
02
01 06 00 00 00 03 31 00 00 80 00 00 00 00 00 0A
02 EF 13 0A FE 01 1F 0A 02 E1 05 90 A2 DA 0D 02
6B 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 63 82 53 63 35
01 02 01 04 FF FF E0 00 3A 04 00 13 C6 80 3B 04
00 22 9B 60 33 04 00 27 8D 00 36 04 0A FE 01 1F
03 04 0A 02 E1 01 06 0C 0A 02 C1 10 0A 01 0A 1E
0A FE 01 1F 0F 07 64 72 31 2E 65 69 00 FF 0A 02
E1 04 00 43 01 2F 02 01 06 00 00 00 03 31 00 00
80 00 00 00 00 00 0A 02 EF 13 0A FE 01 1F 0A 02
E1 04 90 A2 DA 0D 02 6B 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 63 82 53 63 35 01 02 01 04 FF FF E0 00 3A
04 00 13 C6 80 3B 04 00 22 9B 60 33 04 00 27 8D
00 36 04 0A FE 01 1F 03 04 0A 02 E1 01 06 0C 0A
02 C1 10 0A 01 0A 1E 0A FE 01 1F 0F 07 64 72 31
2E 65 69 00 FF


My fix in Dhcp.cpp fixes the problem though I don't know if it is the best solution (perhaps it will also flush something it shouldn't):

Code: [Select]
               
case endOption :
    Serial.println("end option hit");
     _dhcpUdpSocket.flush(); // dwh
     break;


Granted this problem is going to be rare, not many individuals/companies are running redundant DHCP servers, but should I try to notify someone of this problem and the possible fix? It has definitely burned up several hours of mine!

Dan

SurferTim

Very rare.

In your application, the best you can hope for is for you to find the solution. I do not recommend two dhcp servers on the same localnet.

My routers have a routine that checks for what is referred to as a "rogue dhcp server" on each localnet. If it finds one, it will attempt to "arp poison" any contact with that mac address.


pepik

I had this exact same issue - worked great at home, hung at work. DanH's fix worked great for me, thanks!

DanH

Cool, I'm glad I was able to help someone!

I went back and looked carefully at our network and found the problem wasn't due to backup DHCP server, but due to a problem in the routing causing the same DHCP reply to come thru both the primary connection back to HQ and the backup connection.

wadevcamp

I am experiencing this symptom (hang on Ehternet.begin(mac)), so want to try DanH's fix, but it's not obvious to me where to add his line of code.  Did he have a "case endoption" in his sketch, or did he somehow manage to add it to the Ethernet library code?  If the latter, how?

DanH

The fix was made in the file Dhcp.cpp in the procedure parseDHCPResponse. If you look at

https://code.google.com/p/arduino/issues/attachmentText?id=716&aid=7160017002&name=Dhcp.cpp&token=dc0FWLnNWk097X5Gg0mwu8iETSE%3A1330153164746

the fix would be inserted after line 295.

DanH

Go Up