So it's a configurator of sorts then. That means that either:
You have a different video (or GIF) for each possible configuration. Then you swap out which one is shown based on the selection. Or
You have a WebGL model (or models) where you can toggle the display of various parts depending on what the user selected.
In both cases, most of the work will probably be before any code is written. Most of the work is setting up the models and other assets. It'd probably be best to use a program like Blender to create the models. If you haven't done something like this before, it might be worth hiring someone to help you with the models.
In any case, it doesn't seem like you have clear enough requirements for hiring a software developer right now