Your best code fix.

In response to Jeremy1998's thread here
http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1279398364, how about a "Most heroic fix" thread?

At an employer I worked for in the past, I was the lead developer (ok, the only developer) of an in-house developed CRM and trouble ticketing system (in addition to software version control, packaging, and distribution) developed in VB6, with a backend Access 2003 database (boo, hiss, moan).

This system was used by about 100 people in the company on a daily basis, concurrently. Needless to say, the Access DB backend couldn't keep up all the time, and errors would crop up on certain records. This happened randomly; often enough to be a pain for me to fix, but not often enough for anybody to tell me "look, take the time and figure out why".

The solution on the "fix" was simple enough - what would happen was that the record on a table would corrupt in one of the fields; the data in the field would read something like "##ERROR##" - you couldn't do anything with that field (IIRC), updates wouldn't work - nothing. What I found though, was that you could copy the entire record to a new record, and then that field would become editable. You could do this from within Access fairly easily, but quite often this problem would crop up when I wasn't at work (early in the morning usually) - so I would have a lot of "please help us" emails from various staff when it would break. That got old quick.

So I did what any programmer would do - I replaced myself with a small shell script.

Actually, what I did was write a piece of code into the application that would do the fix for me, on whichever table was having the issue. It would find the record, copy it, delete the old one, update any pointers/keys elsewhere in the system as needed, and reset the affected field to its default value, with a note somewhere on the record that indicated the record was "auto-fixed", and that some data was lost. Generally, this lost data was not a big deal, and could be easily handled by the rest of the staff (most of the time, the issue was on certain status fields, never on a memo or other free-form text field).

After I had that in place, and had run it a few times on actual errored records, and I knew it was working ok, I then added it to a randomised routine that would, when someone would log in, randomly select to "repair" the tables; if a repair was going on, I had another mechanism so that other "random lucky winners", when they logged in, it would see the flag and know that the system was in a repair mode. This auto-repair scheme didn't take long to finish, even with the tons of data we were using (the DB was a few gig - which is huge and not reccommended for an Access DB, btw). So, only one user's system would initiate the repair, and the others would only allow a login when the repair completed (a few minutes at most).

The user's loved it - it worked great, but the defining moment came about a week after it had been in place. A repair was needed, and it took care of it. But the way I found out about it was priceless - it was from a user who (for some reason) was out of the loop of the new process, and didn't know I had changed it to auto-update. She wrote me an email saying "please fix this issue, I need it to work"; about 15 minutes later, she wrote another email saying "thank you!".

At the time both of those emails were sent, I was at home, still in bed, sleeping.

;D

I ultimately knew we had to get off that DB system, Access just wasn't going to cut it. I ended coding a new system to use an ODBC connection to a PostgreSQL DB (I chose this route because I wanted to convert the entire system to something akin to a LAMP setup - but MySQL wasn't as advanced as it is now, so PostgreSQL was chosen), and was testing it and such when the company let me go (after working there for 8 years); I later found out it was because they were selling the company, and I looked bad on their balance sheet (because my task was to work on this non-funded internal application).

They told me when they let me go that they were going to look into an alternate system to replace the homegrown one. I told them good luck.

About four years later I had the opportunity to visit my old stomping grounds (I was picking up a Parallax USB OScope I had won in a Nuts and Volts magazine contest - it had been accidentally sent to the wrong address); they were still using the same system, still running Access 2003 (they never migrated to the PostgreSQL version) - and nobody had been hired to update it since the day I left (except for my boss, who told me he made certain minor changes, but nothing extensive). They never managed to find a system that would do everything my system did (because it was so tightly integrated with their business processes, and had a ton of stuff customized just for that business, and it also handled a portion of their billing and reporting).

I was told, though, that the company that bought them thought the application was well designed and worked good; I don't know if they ever did anything with it further or not - I tend to think not (at least, I hope they didn't - I wouldn't want to inflict that kind of pain on any future programmer)...

:slight_smile:

Mowcius doesn't feel that he can post in this thread ;D

I am better in breaking code :wink:

Not really programming, but at my last job we were being assessed. The assessment had about £1.5m riding on it.

We'd been given a spreadsheet to populate, and we'd used a sharepoint folder to house it so everyone had access. This included subfolders with documentary evidence in which were hyperlinked in the excel spreadhseet.

Just before the assessors arrived I copied the folder to a USB stick for the assessors to run on their laptop. I gave it a final check and nearly died when I noticed all the hyperlinks were absolute, not relative.

I had to go through and change about 1500 absolute hyperlinks to relative ones as quickly as possible in order to save my employer £1.5m.

I gave it a final check and nearly died when I noticed all the hyperlinks were absolute, not relative.

Ouch!

Not really a code-fix, but...
I used to work for a manufacturer of automatic test equipment (ATE).
The machines were the size of a large desk, packed with microprocessor racks and custom hardware, and had a bed-of-nails vacuum fixture that tested whole assembled boards (think 286-based PC motherboard), by tri-stating the processor and emulating its bus cycles to test all the connected peripherals and memory.

I came in one Thursday morning after a day off to find a major panic on.
One of our customers was a large French manufacturer of PC and minis, and they were hopping mad because an upgrade to their ATE was destroying the boards that they were testing.

I had worked on the upgrade programme from a firmware POV, so was told to drop everything and sort out their problem.
It was thought a recently-shipped upgrade was to blame by over-driving the power supplies, but the problem could not be replicated.

I gathered all the test gear that I could carry and listings (the software release ran to a dozen or so EPROMs), and set off for the airport with a colleague from production engineering.
In the days before affordable laptops (or mobile phones), I could take no development equipment with me; our assembler ran on a PDP-11.

Caught late Thursday flight to Paris, and because of customs and the amount of gear we were carrying, missed our connecting internal flight.
Bought basic road map from an airport shop just as it was closing, hired a car, drove over 350 miles through the night across France to near the Swiss border, arriving in the small hours.
Couldn't find our hotel, so slept in the car in a side-street.
Woken a couple of hours later by the dustmen doing their rounds, so washed and breakfasted in the railway station.
Arrived to frosty reception at customer site on Friday morning.

We diagnosed the fault within minutes.
We had shipped to our Paris office an upgrade kit, with the intention of sending a senior service engineer from our head office in the UK,
but the package had been intercepted by an over-keen junior engineer in Paris who had carried out the upgrade without authority.

Because he didn't have instructions for the kit, he had simply added all the bits in the kit to the existing hardware.
One of these bits was a replacement sense wire for the unit-under-test (UUT) power supply.
This is designed to sense the voltage at the test fixture, and feed it back to the supply, so that a consistent voltage is supplied to the UUT.
However, because of the additional length and a poor connection, the voltage drop on the sense wire itself meant that the supply was over-compenstating, jacking-up the
voltage to the test fixture and blowing the boards.

He'd also changed the firmware (taking the old EPROMs with him back to Paris), which introduced a new feature that the hardware didn't support - that part of the upgrade was due to be shipped with the proper service engineer.
I couldn't leave the customer over the weekend with non-functioning ATE and several days worth of untested production, so trawled through the listings (every single line written in assembler) to find the three bytes of a JSR to the feature I needed to disable.
Fortunately, the customer had an identical EPROM programmer (a good old Data IO) to the one we used, so after a bit of hex offset calculation, I located the correct device, copied the contents of the EPROM to the Data IO's memory, erased the EPROM, NOPed the offending call and blew the firmware back to the EPROM.
Worked first time!

An added twist was on the return home on Saturday, our head office could only book us one seat on the return flight from the local airport, so I had to drive back all the way to Paris to get a flight from there.

(The junior French service engineer wasn't with the company much longer).

While porting a fuzzy logic name matching database engine to a Burroughs (I think it was Burroughs, I ported to all sorts of platforms in those days and my memory's not so great) mainframe our hashing algorithm totally shat the bed. We had to write the code for this product in standard K&R C so it was portable to any platform but it failed on the Burroughs.

I can't remember the way code was written but basically the hashing fiddled with bits a lot and expected a byte to have 8 of them. The Burroughs had 9-bit bytes.

Working in a crappy mainframe environment that I was totally unfamiliar with, in a city 100s of miles from home, no email or mobiles phones then, with a large project in the balance was not one of my favourite experiences at the best of times, I could have done without 9-bit bytes.

YAY! I inspired something! I, like Mowcius and bld, can't post here... I always (And this is a bad habbit that I need to break.) give up when I fail, and sometimes I revisit it in like a week...

Now... If you had said best code hack... I might find something in my collection of clunkers.

My motto... Why write 40 new lines of code that can break easily when you can re-use 150 lines of tested code instead. So a lot of my code is just a hack of the last program I wrote with some big shoe-horn code to make it work right.

One of my early achievements (when I realized that code was not something only other people do...) dates back to 1982.

I managed a large installation of T1 multiplexers at the time and I needed to have field service crews modify the VAX communications boards to be able to to be uniquely identified to the massive Serial Port switching device. The settings were tricky but there was an algorithm.

I was so tired of answering the question of how to set the dip switches over and over (Pre-web days) that I wrote some VaxBasic code that would graphically draw the multiple DIP switch settings for them so they would know which ones to change. This was the first time I sat down to learn how to write code because I had a reason.

That's when I realized that it's a lot of fun to write code to SOLVE a real world problem.

That's when I realized that it's a lot of fun to write code to SOLVE a real world problem.

Indeed. I am always much more motivated for a project that I know has an application rather than just something I want to do

Mowcius

Indeed. I am always much more motivated for a project that I know has an application rather than just something I want to do

I usually get bored of 'real' stuff. At the moment I'm enjoying getting individual things to work with the arduino - like components, or small bits that I can use in bigger projects like the soft power switch I did this evening.

I've got a 'real' project going on that could well make me some dosh* but I'm struggling with motivation to get it done. It'll involve doing a proper website backend. That's completely different to hacking a working solution together, it needs to be properly secure and that's an arse.

*I work as an administrator in the NHS (national health service). There's a problem with different Trusts (local NHS organisations) sharing information such as alerts pertaining to patients. These might be that the patient has been violent or aggressive to staff, or that there are safeguarding concerns... anything really, from a few different types of Trust in one area.
My idea is to have an arduino based (of course!) machine that's an arduino, ethernet shield, keypad and LCD screen. You tap in a couple of identifiers, it polls a website on the intranet and brings back contact details if there is an alert.
Each trust has its own website for the arduinos to poll, these sites share information with each other securely.
For example, say I've been aggressive to a district nurse when she visited. If a social worker (so from a different Trust) is due to visit me she taps my details into an arduino and it says "One Alert. Contact Community Health Risk Office, tel 3456".

That way, very limited information is shared over the network and actual details are only shared once a human has verified the appropriateness of sharing them.

The current system relies on staff having time to use a slow network connection to check their local system, which requires suitable access and training and more time than tapping in an ID number and date of birth on a made-for-purpose device. Plus that local database is unlikely to have details from other Trusts, sharing isn't that good at the moment.

I've got the arduino working, I just need to do the web stuff for it to talk to.

Sounds like a great project but you've hit the product wall :frowning:

There's a world of difference between having a working widget and a marketable product, and I suspect most of us here get our kicks from the research and design and getting something working, once that's achieved we want to move to another challenge, not cross every i and dot every t and produce user manuals for the next 6 months.