I think you could do it in a similar way that the old LORAN C navigation worked. Only instead of an RF-only solution, you could use ultrasonic-only, or combination RF/ultrasonic methods.
Basically, you have one "master" station, and two "slave" stations. I would probably choose the RF/ultrasonic method, simply because of the speed of sound vs. the speed of light (and RF, of course). Here's how..
The master transmits an RF pulse.
The two slaves and the mobile receiver see the pulse virtually simultaneously, or close enough to not be a factor. There might well be RF reflections, but we don't care, because the pulse following the direct path arrives first at every receiver.
At the same time as the RF pulse (or a very short time later), the master sends out an ultrasonc pulse.
Measuring the time between the RF and the ultrasonc pulses gives the mobile receiver 1 piece of information; its distance from the master, which will be a point somewhere on a circle.
The two slaves see the RF pulse. The first slave delays a fixed amount after the RF arrives, then sends an ultrasonic pulse.
The mobile receiver know how long the fixed delay is, and it can calculate the distance from the first slave by calculating the time from the fixed delay to receprion of the sound. The mobile unit now has two pieces of information; it is situated at a point located at one of the two points where the circles intersect.
The second slave also delays a fixed amount of time after the RF reception, that is longer than the delay of the first slave.
The mobile unit calculates its distance from the second slave in the same way as it did from the first slave, and the intersections of the three circles will form a small triangle with curved sides, giving all the the informationneeded to determine its location.
It would also be wise to make the two ultrasonic delays different. For example, master to slave 1 might be 10 ms, and master to slave 2 might be 15 ms, and the delay to the next master might be 25 ms, so that the mobile receiver can definitively identify the beginning of the sequence.
Some fiddling with the pulse lengths of the RF and sounds, the time between the pulse events, and the idle time between ranging sequences will show you what sort of values will work for any given coverage area. You will have to take reflections of the sounds into account, of course, and that may be problematic if the system is to be used in different sized areas.
The neatest thing about this is that the technology for LORAN was developed during WWII.