Interview with AMD's Richard Huddy – 25:00
— TechteamGB

What I shoul point out is some IMHO interesting stuff mentiones:

Radeons R9 300 are made by different manufacturing process that their predecesors, witch is why the chips get new names. It was known that Grenada (R9 390/X) is, compared to Hawii (R9 290/X) different by memory controller supporting fast GDDR5 rams and have different power management. Huddy claims that the new process is more effective per watt, sadly no question was asked if this is still TSMC or GlobalFoundries (the later made APU Kaveri/Godavari and Carrizo). Other changes produce different speed on same clocks, but no specificatons.
Probably the computing cores are actualized? Or some cache latency tweaks? All in all this should give 15% more effectivity.

Interesting part is about the HBM rams. Huddy admit, that their usage for Fiji was a bit overkill, because GPU Fiji was designed for bus with 350GB/s speeds, while HBM give 512GB/s. At least they can, because of that excessive bandwitch, use ram as buffer, transfering data between HBM and main memory w/o speed impact, witch is not possible with GDDR. That also means that 4G of Vram is not a problem under any circumstances.

DirectX 12 claimed more performance with AMD FX processors, witch does not excel in single thread performance (DX11), so now (a bit late) they should prove worthy for playing. Dunno, if that can change perception of AMD FX CPU's as gaming CPU's, IMHO it is a bit late and still the DX12 games are nonexistant ATM. So sure AMD FX CPU's can now run faster, but who cares now?
Next bit was the DX12 async shaders. Compatible GPU cores with them allow more threads at once, w/o waiting for one function to be done. This works well on Xbox already (his API is better that DX11, witch is a testament that even M$ can do things well) and allegedly developers are pretty excited with them. To make it simple, it is like hyperthreading for GPU. It allows the GPU to do more at once and if these works have different requirments (one for example need ram transfers, another computing performance), then they will be finished much sooner that with traditional serial execution.

For example is possible to add low-priority visibility test of the scene into the engine and this taks will be done faster that the rendering and it does not even prolonged the rendering time…!

All in all, about 25% performance gain is expectable when games start using asynchron shaders. But OTOH too many nVidia older cards are seriously limited in this field: