meditation

RL

Today I see a paper about the loss function choice.

But one of its explaination on DQN loss is novel to me. Goes something like this:

That the update process involves minimizing the differences of 2 outcomes calculated with same model structure with same weights at the different time.

There is weights theta and less updated/synced weights theta^(-1), which should be the previous to be improved upon brain.

FrameLord

I tried a more clearer frames timeline approach, it’s rewarding, refactor accelerates the improvement on existed functions.