Advertisement
Guest User

Untitled

a guest
Oct 15th, 2019
82
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.14 KB | None | 0 0
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# Compute Co-Occurence Matrix \n",
  8. "I always forget this trick that allow to compute a co-occurence matrix in every language easily. \n",
  9. "\n",
  10. "if we have a matrix in the shape `n_classes * m_examples` you can compute the co-occurence matrix for the classes my multiplying it by it's transpose."
  11. ]
  12. },
  13. {
  14. "cell_type": "markdown",
  15. "metadata": {},
  16. "source": [
  17. "## Example\n",
  18. "We have a matrix w with 3 classes and 5 examples ( each one validate a class or not ), the classes can be word and the examples can be texts where the column indicate for each of the 5 texts if they contain a given word or not."
  19. ]
  20. },
  21. {
  22. "cell_type": "code",
  23. "execution_count": 16,
  24. "metadata": {},
  25. "outputs": [],
  26. "source": [
  27. "import numpy as np\n",
  28. "w = np.array([[0,1,0,0,0],[0,1,1,1,0],[1,0,0,0,0]])"
  29. ]
  30. },
  31. {
  32. "cell_type": "code",
  33. "execution_count": 17,
  34. "metadata": {},
  35. "outputs": [
  36. {
  37. "data": {
  38. "text/plain": [
  39. "array([[0, 1, 0, 0, 0],\n",
  40. " [0, 1, 1, 1, 0],\n",
  41. " [1, 0, 0, 0, 0]])"
  42. ]
  43. },
  44. "execution_count": 17,
  45. "metadata": {},
  46. "output_type": "execute_result"
  47. }
  48. ],
  49. "source": [
  50. "w"
  51. ]
  52. },
  53. {
  54. "cell_type": "markdown",
  55. "metadata": {},
  56. "source": [
  57. "dot-multyplying by the transpose will give us the co-occurence matrix for each word, so the first row 2nd column number indicate the number of time `word 1` and `word 2` are in the same text. The diagonal is a simple count of the word in all the text ( here `word 2` appears 3 times in all the text ) and the matrix is obviously symetric by the diagonal"
  58. ]
  59. },
  60. {
  61. "cell_type": "code",
  62. "execution_count": 19,
  63. "metadata": {},
  64. "outputs": [
  65. {
  66. "data": {
  67. "text/plain": [
  68. "array([[1, 1, 0],\n",
  69. " [1, 3, 0],\n",
  70. " [0, 0, 1]])"
  71. ]
  72. },
  73. "execution_count": 19,
  74. "metadata": {},
  75. "output_type": "execute_result"
  76. }
  77. ],
  78. "source": [
  79. "np.dot(w,w.T)"
  80. ]
  81. },
  82. {
  83. "cell_type": "markdown",
  84. "metadata": {},
  85. "source": [
  86. "we can also compute that for the examples, in this case we can see how many common words 2 texts share"
  87. ]
  88. },
  89. {
  90. "cell_type": "code",
  91. "execution_count": 20,
  92. "metadata": {},
  93. "outputs": [
  94. {
  95. "data": {
  96. "text/plain": [
  97. "array([[1, 0, 0, 0, 0],\n",
  98. " [0, 2, 1, 1, 0],\n",
  99. " [0, 1, 1, 1, 0],\n",
  100. " [0, 1, 1, 1, 0],\n",
  101. " [0, 0, 0, 0, 0]])"
  102. ]
  103. },
  104. "execution_count": 20,
  105. "metadata": {},
  106. "output_type": "execute_result"
  107. }
  108. ],
  109. "source": [
  110. "np.dot(w.T,w)"
  111. ]
  112. }
  113. ],
  114. "metadata": {
  115. "kernelspec": {
  116. "display_name": "Python 3",
  117. "language": "python",
  118. "name": "python3"
  119. },
  120. "language_info": {
  121. "codemirror_mode": {
  122. "name": "ipython",
  123. "version": 3
  124. },
  125. "file_extension": ".py",
  126. "mimetype": "text/x-python",
  127. "name": "python",
  128. "nbconvert_exporter": "python",
  129. "pygments_lexer": "ipython3",
  130. "version": "3.6.8"
  131. }
  132. },
  133. "nbformat": 4,
  134. "nbformat_minor": 2
  135. }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement