Content text [Exercise] Skip-gram [Eng].docx
▪ . o for j=0 to 6: [___, ___, ___, ___, ___, ___, ___]. o Sum . o : [___, ___, ___, ___, ___, ___, ___]. Exercise 1.2: Loss Function with Softmax ● Formulas and Explanation: o Loss: , where is the predicted probability for (index=1), and is the one-hot vector for . o Cross-entropy measures the difference between predicted and true distributions. ● Calculations (fill in the blanks): o (for index=1): ___. o $L = -\log ___ = ___. Exercise 1.3: Backward Pass and Parameter Update Formulas and Explanation: o Gradient for : ( is one-hot for ). o Gradient for : (3x7 matrix). o Gradient for : (3x1 vector). o Gradient for : . o Update: . ● Calculations (fill in the blanks): o (one-hot for index=1). o . o . o . o Gradient : [___, ___, ___]. o Update . o Update $W[2, :] \leftarrow [___, ___, ___] - 0.01 \cdot [___, ___, ___] = [___, ___, ___]. o Check: Compute with new parameters, compare . Exercise 1.4: Inference and Embedding Application ● Formulas and Explanation: o Final embeddings: Rows of (or average of and ). o Cosine similarity: (measures semantic similarity).