How to get XYZ Coordinates of a Controller with IR Markers Using Moving Camera

I wanted to make a Shauddy VR system that uses your phone instead of an actual headset. Originally, I was just going to send two controllers rotation/acceleration over wifi using an ESP-32, but because of dead reckoning, its much better to use sensor fusion. I was debating whether I should use Optical, Acoustic, or Magnetic tracking, and went with optical, as i knew that it could be low cost, and was able to be a standalone system, like in the oculus quest. Which were both things i prioritized. I wanted it to be low cost, just... because, and I wanted there to be no lighthouses because I like the simplicity and ease-of-use in the quest. Now, for all I know, these may be unreasonable requirments to begin with, so I might have to re-vist my choise of tracking.

The way I wanted the setup to be is that the headset you would place your phone in would have an IR camera, and the two controllers would have IR markers which the camera would track individually as to not mix the controllers up. I would fuse the data from the IMU and the optics, and if this wasnt enough, I may have to implement visual SLAM, yet I know the least about this topic and know it requires more power than an ESP-32, so I would have to use a more expensive board, the raspi. I have run into a few problems, although. Most tutorials I can find for IR tracking, (which are little) pose key differences to my project, as to where they have stationary cameras, since mine will be on the headset it should be dynamic. Now, the amout of tutorials I find which have this difference would lead me to believe my approach were impossible, if not for the existence of the oculus quest. Now, there might be something intrinsic in the quest's design which I simply cannot replicate, and that woud put me out of commision for good.

So I would like to know:

•is this possible?

•would I be able to keep my prefrences in the project? (standalone, inexpensive)

•is there is a better approach I could take instead of this?

and some tutorials to get me started or modules with libraries I can access and play around with. My end goal is to send the positions and rotations over wifi to the game service unity, and then plot their positions there in order to play VR games I made myself. I can give more information if requested, and I have found some seemingly good college abstracts Ill give the link to here, but aside from that I would like to thank you all for reading this! Also, please note that any explanation you give I would like or you to, sort of, 'dumb it down' for me, for I am only 14, and know I probably shouldnt reveal my age on the internet, but what even is privacy at this point? lol. =)