Nội dung text [Exercise] Skip-gram [Eng].docx
▪ ("fresh", index=5): Context (t=4,5,7): "to" (3), "eat" (4), "fish" (6) → pairs: (5,3), (5,4), (5,6). ▪ ("fish", index=6): Context (t=5,6): "eat" (4), "fresh" (5) → pairs: (6,4), (6,5). o Total Pairs: 2 + 3 + 4 + 4 + 4 + 3 + 2 = 22 pairs. o Sample Pair for Exercises: Center word = "likes" (index=2), context word = "cat" (index=1). o Parameters: Embedding size . Learning rate . Negative sampling distribution (Part 2): Uniform (). Number of negative samples . ● Initial Parameter Setup: o (7x3): Input embedding matrix (each row is an embedding for a word). o (3x7): Output embedding matrix (each column is an embedding for a word). Part 1: Skip-gram without Negative Sampling (Full Softmax) This part utilises a full softmax to compute probabilities over the entire vocabulary, employing cross-entropy loss. Exercise 1.1: Vectorization and Forward Pass ● Formulas and Explanation: o Vectorization: Convert to a one-hot vector to access its embedding. o Center embedding: (row of for ). o Scores: (vector V x 1, scores for all words). o Softmax: (probability for word j). ● Calculations (fill in the blanks): o One-hot for ("likes", index=2): . o Embedding . o Compute : ▪ . ▪ . ▪ ... (similarly for to ).