Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- {
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Compute Co-Occurence Matrix \n",
- "I always forget this trick that allow to compute a co-occurence matrix in every language easily. \n",
- "\n",
- "if we have a matrix in the shape `n_classes * m_examples` you can compute the co-occurence matrix for the classes my multiplying it by it's transpose."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Example\n",
- "We have a matrix w with 3 classes and 5 examples ( each one validate a class or not ), the classes can be word and the examples can be texts where the column indicate for each of the 5 texts if they contain a given word or not."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 16,
- "metadata": {},
- "outputs": [],
- "source": [
- "import numpy as np\n",
- "w = np.array([[0,1,0,0,0],[0,1,1,1,0],[1,0,0,0,0]])"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 17,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([[0, 1, 0, 0, 0],\n",
- " [0, 1, 1, 1, 0],\n",
- " [1, 0, 0, 0, 0]])"
- ]
- },
- "execution_count": 17,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "w"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "dot-multyplying by the transpose will give us the co-occurence matrix for each word, so the first row 2nd column number indicate the number of time `word 1` and `word 2` are in the same text. The diagonal is a simple count of the word in all the text ( here `word 2` appears 3 times in all the text ) and the matrix is obviously symetric by the diagonal"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 19,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([[1, 1, 0],\n",
- " [1, 3, 0],\n",
- " [0, 0, 1]])"
- ]
- },
- "execution_count": 19,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "np.dot(w,w.T)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "we can also compute that for the examples, in this case we can see how many common words 2 texts share"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 20,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "array([[1, 0, 0, 0, 0],\n",
- " [0, 2, 1, 1, 0],\n",
- " [0, 1, 1, 1, 0],\n",
- " [0, 1, 1, 1, 0],\n",
- " [0, 0, 0, 0, 0]])"
- ]
- },
- "execution_count": 20,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "np.dot(w.T,w)"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.8"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
- }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement