All I need is a way to get that address...
You can compile the addresses off line, put it in a device-specific .h file and write your macros assuming availability of such header file and built your applications off these macros. When you port your code to a different chip, you can include the device-specific .h file with the addresses coded in, and your code will work. The same principle that I talked about earlier.
This is no different from the arduino approach, in that it is a trade between convenience / portability vs. speed / performance, especially if you want to resolve the pins at run time vs. compile time.
As to your read-modify-write comment, that is largely alleviated by the "slow" IO necessitated by the use of such an approach. As such, the PINx registers are ignored. If you are speed conscious, and you are OK with compile time resolution of pins, you may need to re-instate the PINx registers.
On older ARM chips, they have bit-banding that is quite helpful in situations like this.