Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- {
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "1.Analytic Approach\n",
- "Descriptive analytics\n",
- "Descriptive analytics is a preliminary stage of data processing that creates a summary of historical data to yield useful information and possibly prepare the data for further analysis"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "2.Data Requirements\n",
- "My latest 5 years emails, including sender, recipient, subject, sending time, etc"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "3.Data Collection\n",
- "using Excel + vba\n",
- "\n",
- "Sub GetSender()\n",
- "\n",
- "Dim myOlApp As Outlook.Application\n",
- "Dim mpfInbox As Outlook.MAPIFolder\n",
- "Dim obj As Outlook.MailItem\n",
- "Dim myexApp As Excel.Application\n",
- "Dim i As Integer\n",
- "Set myOlApp = CreateObject(\"Outlook.Application\")\n",
- "Set mpfInbox = myOlApp.GetNamespace(\"MAPI\").GetDefaultFolder(olFolderInbox)\n",
- "Workbooks(\"Book1.xls\").Worksheets(\"my personal email\").Select\n",
- "For i = mpfInbox.Items.Count To 1 Step -1\n",
- " If mpfInbox.Items(i).Class = olMail Then\n",
- " Set obj = mpfInbox.Items.Item(i)\n",
- " Cells(i, 1) = obj.SenderEmailAddress\n",
- " Cells(i, 2) = obj.SenderName\n",
- " Cells(i, 3) = obj.ReceiverEmailAddress\n",
- " Cells(i, 4) = obj.ReceiverName\n",
- " Cells(i, 5) = obj.Subject\n",
- " End If\n",
- " \n",
- " \n",
- "Next i\n",
- "End Sub\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "4.Data Understanding and Preparation\n",
- "\n",
- "import pandas as pd\n",
- "from pandas import Series, DataFrame\n",
- "mails = DataFrame(pd.read_excel('data.xlsx'))\n",
- "print mails\n",
- "\n",
- "pysqldf = lambda sql: sqldf(sql,globals())\n",
- "sql1 = \"select sendername,count(*) from mails group by sendername\"\n",
- "print(pysqldf(sql1))\n",
- "sql2 = \"select receivername,count(*) from mails group by receivername\"\n",
- "print(pysqldf(sql1))\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "5.Modeling and Evaluation\n",
- "The top 5 receivers: tangzh,sunzc,zhaobx,tianliang,yinyue\n",
- "The top 5 senders:tianliang,yinyue,hanhongyi,tangzh,guomeng\n",
- "The answer to the question:\n",
- " From the point of view of Email communication, I have the closest relationship with whom?\n",
- " tangzh,tianliang,yinyue"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python",
- "language": "python",
- "name": "conda-env-python-py"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.7"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 4
- }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement