Content text [Exercise] Skip-gram [Eng].docx
[Exercises] Skip-gram Model (Word2Vec) The following exercises focus on manually implementing the Skip-gram model for learning word embeddings. Skip-gram predicts context words given a center word. The exercises are divided into two parts: Part 1 uses full softmax for the loss function, and Part 2 uses negative sampling for optimization. Students must show detailed calculations with specific numerical values. To assist with calculations, the use of tools like Microsoft Excel is recommended for handling matrix and vector operations. Sample Data and Training Pair Generation ● Sample Data: Text: "The cat likes to eat fresh fish." o Tokenized into words: ["The", "cat", "likes", "to", "eat", "fresh", "fish"]. o Vocabulary: {"The":0, "cat":1, "likes":2, "to":3, "eat":4, "fresh":5, "fish":6} (V=7 words, indexed from 0 to 6). ● Generating Training Pairs for Skip-gram: o Use a context window size of 2: For each center word at position (t=1 to 7), create pairs (, ) where is a context word within positions to (excluding itself and staying within the sentence boundaries). o List of word pairs: ▪ ("The", index=0): Context (t=2,3): "cat" (1), "likes" (2) → pairs: (0,1), (0,2). ▪ ("cat", index=1): Context (t=1,3,4): "The" (0), "likes" (2), "to" (3) → pairs: (1,0), (1,2), (1,3). ▪ ("likes", index=2): Context (t=1,2,4,5): "The" (0), "cat" (1), "to" (3), "eat" (4) → pairs: (2,0), (2,1), (2,3), (2,4). ▪ ("to", index=3): Context (t=2,3,5,6): "cat" (1), "likes" (2), "eat" (4), "fresh" (5) → pairs: (3,1), (3,2), (3,4), (3,5). ▪ ("eat", index=4): Context (t=3,4,6,7): "likes" (2), "to" (3), "fresh" (5), "fish" (6) → pairs: (4,2), (4,3), (4,5), (4,6).
● Calculations (fill in the blanks): o Embedding for "cat" (row 1 of updated ): [___, ___, ___]. o Embedding for "fish" (row 6): [___, ___, ___]. o Dot product: ___ \cdot ___ + ___ \cdot ___ + ___ \cdot ___ = ___. o . o . o . o Interpretation: ___ (e.g., high cosine indicates semantic similarity). Exercise 1.5: Counting Parameters ● Formulas and Explanation: o Total parameters: (sum of parameters in and ). ● Calculations (fill in the blanks): o For V=7, D=2: . o For D=3: . o For D=5: $#\Theta = ___. Part 2: Skip-gram with Negative Sampling This part uses negative sampling with to approximate softmax, making computations more efficient. Exercise 2.1: Vectorization and Forward Pass for Positive Word ● Formulas and Explanation: o Vectorization: for . o Center embedding: . o Positive score: (dot product measuring similarity). ● Calculations (fill in the blanks): o One-hot for ("likes", index=2): . o Embedding . o (for "cat"). o $s_o = ___ \cdot ___ + ___ \cdot ___ + ___ \cdot ___ = ___. Exercise 2.2: Negative Sampling and Scores for Negative Words ● Formulas and Explanation: