Write the following machine learning questions regarding gradient 2. Gradient estimators, 20 points. All else being equal, it’s useful for a gradient estimatorto be unbiased. The unbiasedness of a gradient estimator guarantees that, if we decay the stepsize and run stochastic gradient descent for long enough (see Robbins & Monroe), it will convergeto a local optimum.The standard REINFORCE, or score-function estimator is defined as:e(2.1)gSF [f] = f (b)log p(b|0),b ~ p(b|0)(a) [5 points] First, let’s warm up with the score function. Prove that the score function haszero expectation, i.e. Ep(x|0)[Ve log p(x|0)] = 0. Assume that you can swap the derivativeand integral operators.(b) [5 points] Show that REINFORCE is unbiased: Ep(b/0) f (b) 50 log p(b|0)] = 30(c) [5 points] Show that REINFORCE with a fixed baseline is still unbiased, i.e. show thatSo Ep(ble) [f (b)].Ep(b10) [[f(b) – clog logp(b|0)] = 3086 Ep(b10) [f (b)] for any fixed c.(d) [5 points] If the baseline depends on b, then REINFORCE will in general give biasedgradient estimates. Give an example where Ep(b|0) [[f (b) – c(b)]5 logp(b|0)] + 56for some function c(b), and show that it is biased.( Ep(b10) [f (b)]The takeaway is that you can use a baseline to reduce the variance of REINFORCE, but notone that depends on the current action.
Machine learning questions
Get your custom paper done at low prices
Free formatting (APA, MLA, Chicago, Harvard and others)
12 point Arial/Times New Roman font
Free title page
Free bibliography & reference
What Students Are Saying
Outstanding service, thank you very much.
Awesome. Will definitely use the service again.