מתודולוגיה - קולות נודדים

🎯 הרעיון הבסיסי🎯 The Basic Idea

דמיינו שאתם רוצים לדעת לאן עברו הקולות של מפלגה X בבחירות הקודמות. ברור שחלק מהמצביעים נשארו נאמנים, אחרים עברו למפלגה Y, ואחרים למפלגה Z. אבל איך אפשר לדעת את הפילוח הזה בלי לשאול כל אזרח איך הצביע?

הפתרון: להסתכל על קלפיות בודדות. בכל קלפי יש כמה מאות מצביעים, ויש לנו את תוצאות ההצבעה שלהם בשתי בחירות עוקבות. אם בקלפי מסוימת הייתה תמיכה גבוהה במפלגה X בבחירות הקודמות, ובבחירות הנוכחיות יש תמיכה גבוהה במפלגה Y - זה רמז שמצביעי X עברו ל-Y.

כמובן, קלפי בודדת היא מדגם קטן ורועש. אבל כשמנתחים אלפי קלפיות יחד, הרעש מתקזז והתמונה האמיתית מתגלה.

Imagine you want to know where party X's voters went in the previous election. Clearly some stayed loyal, others moved to party Y or Z. But how can you determine this breakdown without asking every citizen how they voted?

The solution: look at individual ballot boxes. Each box has a few hundred voters, and we have their voting results in two consecutive elections. If a certain box had high support for party X in the previous election, and high support for party Y in the current election — that's a hint that X voters moved to Y.

Of course, a single ballot box is a small, noisy sample. But when analyzing thousands of boxes together, the noise cancels out and the true picture emerges.

📊 המודל המתמטי📊 The Mathematical Model

אנחנו מחפשים מטריצת מעבר $M$, כך שכל תא $M_{ij}$ מייצג את ההסתברות שמצביע שהצביע למפלגה $i$ בבחירות הקודמות יצביע למפלגה $j$ בבחירות הנוכחיות.

המודל מניח שהתפלגות ההצבעה בכל קלפי מקיימת את המשוואה:

We seek a transfer matrix $M$ where each cell $M_{ij}$ represents the probability that a voter who voted for party $i$ in the previous election will vote for party $j$ in the current one.

The model assumes each ballot box's voting distribution satisfies:

$$\mathbf{V}_{\text{before}} \cdot M \approx \mathbf{V}_{\text{after}}$$

⚙️ האופטימיזציה⚙️ The Optimization

המטריצה $M$ מחושבת באמצעות אופטימיזציה קמורה, שמוצאת את $M$ שממזערת את סכום ריבועי השגיאות על פני כל הקלפיות:

Matrix $M$ is computed via convex optimization, finding $M$ that minimizes the sum of squared errors across all ballot boxes:

$$\min_M \sum_{\text{boxes}} \big\| \mathbf{V}_{\text{before}} \cdot M - \mathbf{V}_{\text{after}} \big\|_F^2$$

תחת האילוצים הבאים:

אי־שליליות: $M_{ij} \geq 0$ (לא ייתכן מעבר שלילי)
סטוכסטיות שורה: $\sum_j M_{ij} = 1$ (כל המצביעים הולכים למקום כלשהו)

הפתרון מתקבל באמצעות הספרייה CVXPY עם הפותר SCS.

Subject to:

Non-negativity: $M_{ij} \geq 0$ (no negative transfers)
Row stochasticity: $\sum_j M_{ij} = 1$ (every voter goes somewhere)

Solved using CVXPY with the SCS solver.

📈 מדד איכות: R²📈 Quality Metric: R²

מדד $R^2$ (R-squared) מציין כמה טוב המודל מסביר את השונות בנתונים. ערך 1.0 מציין התאמה מושלמת, וערך 0 מציין שהמודל לא מסביר כלום.

בפועל, אנחנו מקבלים ערכי $R^2$ בטווח 0.7–0.9, שמעידים על התאמה טובה אך לא מושלמת — מה שהגיוני, כי המודל הוא פישוט של המציאות.

$R^2$ (R-squared) indicates how well the model explains variance in the data. A value of 1.0 means perfect fit; 0 means the model explains nothing.

In practice we obtain $R^2$ values of 0.7–0.9, indicating good but imperfect fit — reasonable, since the model is a simplification of reality.

פיזור שאריות: מודל דיריכלהResidual Dispersion: A Dirichlet Model

פותר האופטימיזציה הקמורה שלנו מפיק מטריצת העברה $M$ שממפה את חלקי ההצבעה בבחירות הקודמות לבחירות הנוכחיות בממוצע. אך קלפיות בודדות סוטות מהממוצע — חלק מהקלפיות סוטות חזק יותר ממה שהמטריצה צופה, אחרות פחות. אנו מודלים את הפיזור הזה בעזרת התפלגות דיריכלה: עבור כל קלפי $k$, וקטור החלקים בבחירות החדשות נדגם מ:

$$\mathbf{y}_k \sim \mathrm{Dirichlet}(\alpha \cdot \hat{\mathbf{p}}_k), \quad \hat{\mathbf{p}}_k = \frac{\mathbf{x}_k^\top M}{\|\mathbf{x}_k^\top M\|_1}$$

הסקלר $\alpha$ (פרמטר ה־"ריכוז") קובע עד כמה כל קלפי צמודה לתחזית המטריצה. השונות של כל רכיב היא:

$$\mathrm{Var}(y_{kj}) = \frac{\hat{p}_{kj}(1 - \hat{p}_{kj})}{\alpha + 1}$$

$\alpha$ גדול → רעש נמוך (כל קלפי דומה לתחזית המטריצה); $\alpha$ קטן → רעש גבוה (קלפיות סוטות באופן משמעותי).

Our convex-optimization solver produces a transfer matrix $M$ that maps K-prev party shares to K-curr party shares on average. But individual ballots vary around this average — some precincts swing harder than the matrix predicts, others softer. We model this residual variation with a Dirichlet distribution: for each ballot $k$, the K-curr vote proportions are drawn from

$$\mathbf{y}_k \sim \mathrm{Dirichlet}(\alpha \cdot \hat{\mathbf{p}}_k), \quad \hat{\mathbf{p}}_k = \frac{\mathbf{x}_k^\top M}{\|\mathbf{x}_k^\top M\|_1}$$

The scalar $\alpha$ (the "concentration parameter") controls how tightly each ballot clusters around the matrix-predicted proportion. The variance of each component is

$$\mathrm{Var}(y_{kj}) = \frac{\hat{p}_{kj}(1 - \hat{p}_{kj})}{\alpha + 1}$$

Large $\alpha$ → low noise (every ballot looks like the matrix prediction); small $\alpha$ → high noise (precincts wobble considerably).

אמידה אמפירית של αEstimating α empirically

בהינתן המטריצה המשוחזרת $M$ והנתונים ברמת הקלפי, אנו משתמשים בשיטת המומנטים:

$$\hat{\alpha} + 1 = \frac{\sum_{k,j} \hat{p}_{kj}(1 - \hat{p}_{kj})}{\sum_{k,j} (y_{kj} - \hat{p}_{kj})^2}$$

אנו מחשבים זאת גם אגרגטיבית על כל תאי השאריות, וגם לפי מפלגת יעד (טור של $M$), כדי לחשוף אילו מפלגות צמודות בעקביות לתחזית המטריצה ואילו מתפזרות בקלפיות באופן בלתי צפוי.

למה האנלוגיה ל"גודל מדגם אפקטיבי"?

אם כל קלפי הייתה דוגמת מצביעים באופן בלתי תלוי מ־$\hat{\mathbf{p}}_k$, השונות הייתה $p(1-p)/N$ עבור $N$ מצביעים. בקלפיות ישראליות יש בדרך כלל 300–400 מצביעים, אך הנתונים מראים שונות שווה רק ל־$\alpha \approx 30$–$120$ הגרלות עצמאיות — כלומר גורם פיזור־יתר של פי 3–14 ביחס לדגימה מולטינומיאלית טהורה. עודף זה משקף השפעות מקומיות שהמטריצה לא תופסת: ביקורי מועמדים, אירועי שכונה, בלוקי הצבעה משותפים וכד'.

Given the recovered matrix $M$ and ballot-level data, we use method-of-moments:

$$\hat{\alpha} + 1 = \frac{\sum_{k,j} \hat{p}_{kj}(1 - \hat{p}_{kj})}{\sum_{k,j} (y_{kj} - \hat{p}_{kj})^2}$$

We compute this both pooled across all (ballot, party) cells, and per destination party (column of $M$), revealing which parties cluster tightly around the matrix vs. which fragment unpredictably across precincts.

Why the "effective sample size" analogy?

If each ballot's voters drew independently from $\hat{\mathbf{p}}_k$, the variance would be $p(1-p)/N$ for $N$ voters. Real Israeli ballots have ~300–400 voters but the data shows variance equivalent to only $\alpha \approx 30$–$120$ independent draws — i.e., a 3–14× over-dispersion factor relative to pure multinomial sampling. This excess reflects unmodeled local factors: candidate visits, neighborhood-specific events, single-block voting blocs, etc.

α אגרגטיבי לפי מעברPooled α per transition

כל שורה פותרת את האופטימיזציה הקמורה, ואז מחשבת את $\hat\alpha$ על כל תאי השאריות. $N/\alpha$ = פיזור־יתר ביחס לדגימה עצמאית טהורה.

Each row solves the convex optimization, then computes $\hat\alpha$ over all (ballot, party) residuals. $N/\alpha$ = over-dispersion vs. pure independent sampling.

מעברTransition	R²	α (אגרגטיבי)(pooled)	גודל קלפי ממוצעAvg ballot size	N/α (פיזור־יתר)(over-disp)	מס׳ קלפיות# ballots

תבנית: $\alpha$ מגיע לשיא בבחירות עוקבות חוזרות (K21→K22, K22→K23) שבהן התנהגות המצביעים יציבה ביותר, ויורד למינימום במהלך מהפכים פוליטיים (K20→K21). Pattern: $\alpha$ peaks during back-to-back repeat elections (K21→K22, K22→K23) when voter behavior is most stable, and bottoms out during realignments (K20→K21).

α לפי מפלגת יעד — מפת חוםPer-destination α heatmap

תא אחד לכל זוג (מעבר, מפלגה). הצבע מקודד את $\alpha$: ירוק־כחול עמוק = ריכוז הדוק, אדום = פיזור גבוה. תאים ריקים = המפלגה לא הייתה בבחירות. ריחוף מציג ערך מדויק וחלק ממוצע.

One cell per (transition, party) pair. Color encodes $\alpha$: deeper teal = tighter clustering. Empty cells = party not in election. Hover for exact value + mean share.

α נמוך (תנודתי)Low α (volatile) α גבוה (לכיד)High α (cohesive)

שלושה משטרים מובחנים: Three regimes emerge:

חרדים (ש״ס, יהדות התורה): $\alpha = 30$–$225$ — ההצבעה הקהילתית הצפופה ביותר במדינה. תמיכת רבנים קובעת בפועל את התמהיל בכל קלפי. Ultra-Orthodox (ש״ס, יהדות התורה): $\alpha = 30$–$225$ — the tightest communal voting in the country. Rabbinic endorsement effectively dictates every ballot's mix.

מפלגות מרכז־ימין יהודיות גדולות (הליכוד, יש עתיד, כחול לבן, קדימה): $\alpha = 50$–$160$. עקביות בינונית; מצביעי המיינסטרים מתפזרים מעט אך המטריצה תופסת את רוב האות. Major centrist/right Jewish parties (Likud, יש עתיד, כחול לבן, קדימה): $\alpha = 50$–$160$. Moderately consistent; mainstream voters scatter somewhat but the matrix captures most of the signal.

מפלגות ערביות (חד״ש, רע״ם, בל״ד, הרשימה המשותפת): $\alpha = 4$–$25$. הנמוך ביותר במדינה — לא בגלל שמצביעים ערבים אישית הפכפכים, אלא בגלל שהתפלגות התמיכה גיאוגרפית דו־מודלית (כ־80% בקלפיות ערביות, קרוב ל־0% בשאר), והמטריצה לא יכולה לתפוס שני המשטרים בו־זמנית, כך שהשאריות גדולות. Arab parties (חד״ש, רע״ם, בל״ד, Joint List): $\alpha = 4$–$25$. The lowest in the country — not because individual Arab voters are flighty, but because their geographic distribution is bimodal (≈80% support in Arab-majority precincts, near-0% elsewhere). The matrix's single transfer fraction can't capture both regimes, so the residuals are huge.

השלכותImplications

$\alpha$ הוא סטטיסטיקה שימושית למספר מטרות:

סימולציה קדימה (כמו תרחיש K26): הזרקה של $\alpha$ למודל רעש דיריכלה ייצור קלפיות סינתטיות מציאותיות בהינתן מטריצה משוערת.
רווחי סמך ללא בוטסטראפ: שונות בצורה סגורה למקדמי המטריצה, בהינתן $\alpha$ והתפלגות גדלי הקלפיות.
דיאגנוסטיקה: $\alpha$ נמוך עבור מפלגה מצביע שמודל ההעברה הליניארי־סטוכסטי לא מתאים לה — בשל דו־מודאליות גיאוגרפית, שינוי מבני, או פיזור־יתר אמיתי.

הטבלה המלאה לפי (מעבר, מפלגה) זמינה גם כ־JSON: data/alpha_estimates.json.

$\alpha$ is a useful summary statistic for several purposes:

Forward simulation (e.g., K26 scenario): plug $\alpha$ back into a Dirichlet noise model to generate realistic synthetic ballots given a hypothesized matrix.
Bootstrap-free CIs: closed-form variance for transfer-matrix coefficients, given $\alpha$ and ballot-size distribution.
Diagnostic: low $\alpha$ for a party indicates that the linear-stochastic transfer model is a poor fit for that party — either due to demographic bimodality, structural change, or genuine over-dispersion.

The full per-(transition, party) table is also available as JSON: data/alpha_estimates.json.

⚠️ מגבלות ואזהרות⚠️ Limitations and Caveats

הנחת אחידות: המודל מניח שדפוס המעבר זהה בכל הארץ. במציאות, מצביעי ליכוד בתל אביב עשויים להתנהג אחרת ממצביעי ליכוד בירושלים. הפרמטר $\alpha$ שתואר למעלה כן מודֵל את הסטיות הללו, אך לא ברמת המטריצה עצמה.
מצביעים חדשים ונפטרים: המודל מתעלם מכניסת מצביעים חדשים (בני 18+) וממצביעים שנפטרו. אלה מיוצגים באופן מאולץ כ"מעבר" ממפלגה כלשהי.
שינויים בהרכב הקלפי: תושבים עוברים דירה, קלפיות מתפצלות או מתמזגות. אנחנו משווים רק קלפיות עם אותו מזהה, מה שמפספס חלק מהתמונה.
קורלציה ≠ סיבתיות: המודל מוצא קשרים סטטיסטיים, לא מוכיח שמצביעים באמת עברו. יכולים להיות גורמים נסתרים שמסבירים את הקורלציות.
אי־ודאות: התוצאות הן אומדנים סטטיסטיים עם שולי שגיאה. מעברים קטנים (פחות מ־5%) עשויים להיות רעש סטטיסטי ולא מגמה אמיתית.

Uniformity assumption: The model assumes the transfer pattern is identical nationwide. In reality, Likud voters in Tel Aviv may behave differently from those in Jerusalem. The $\alpha$ parameter described above does model these deviations, but not the matrix coefficients themselves.
New and deceased voters: The model ignores new eligible voters (18+) and those who passed away. These are forced into appearing as "transfers" from some party.
Ballot box changes: Residents move, boxes split or merge. We only compare boxes with the same ID, missing part of the picture.
Correlation ≠ causation: The model finds statistical associations, not proof that voters actually switched. Hidden factors may explain the correlations.
Uncertainty: Results are statistical estimates with margins of error. Small transfers (<5%) may be noise rather than genuine trends.

📚 קריאה נוספת📚 Further Reading

הקוד המלא וכל הנתונים זמינים ב־GitHub.

Full source code and data are available on GitHub.