shinemic

利用公共表表达式(CTE)及自连接查找重复元素

Feb 20th, 2024
108
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
JSON 4.73 KB | None | 0 0
  1. {
  2.  "cells": [
  3.   {
  4.    "cell_type": "markdown",
  5.    "id": "779c90b0",
  6.    "metadata": {},
  7.    "source": [
  8.     "# 利用公共表表达式(CTE)及自连接查找重复元素"
  9.    ]
  10.   },
  11.   {
  12.    "cell_type": "markdown",
  13.    "id": "3c7e4a21",
  14.    "metadata": {},
  15.    "source": [
  16.     "> 不同数据库实现略有不同,这里仅以 Oracle 及 PostgreSQL 为例"
  17.    ]
  18.   },
  19.   {
  20.    "cell_type": "markdown",
  21.    "id": "b2d31279",
  22.    "metadata": {},
  23.    "source": [
  24.     "## Oracle"
  25.    ]
  26.   },
  27.   {
  28.    "cell_type": "markdown",
  29.    "id": "c7a42d1f",
  30.    "metadata": {},
  31.    "source": [
  32.     "```sql\n",
  33.     "DROP TABLE T_CONTACTS;\n",
  34.     "\n",
  35.     "CREATE TABLE T_CONTACTS(\n",
  36.     "    EMAIL VARCHAR2(255)\n",
  37.     ");\n",
  38.     "\n",
  39.     "INSERT ALL\n",
  40.     "    -- x1\n",
  41.     "    INTO T_CONTACTS VALUES('uncautiously@phenocryst.org')\n",
  42.     "    INTO T_CONTACTS VALUES('innocuous@grosz.net')\n",
  43.     "    -- dup x2\n",
  44.     "    INTO T_CONTACTS VALUES('hopvine@dilligrout.edu')\n",
  45.     "    INTO T_CONTACTS VALUES('hopvine@dilligrout.edu')\n",
  46.     "    -- dup x3\n",
  47.     "    INTO T_CONTACTS VALUES('chamaesiphon@provisor.org')\n",
  48.     "    INTO T_CONTACTS VALUES('chamaesiphon@provisor.org')\n",
  49.     "    INTO T_CONTACTS VALUES('chamaesiphon@provisor.org')\n",
  50.     "    -- dup x4\n",
  51.     "    INTO T_CONTACTS VALUES('miltonian@soarability.edu')\n",
  52.     "    INTO T_CONTACTS VALUES('miltonian@soarability.edu')\n",
  53.     "    INTO T_CONTACTS VALUES('miltonian@soarability.edu')\n",
  54.     "    INTO T_CONTACTS VALUES('miltonian@soarability.edu')\n",
  55.     "SELECT 1 FROM DUAL;\n",
  56.     "\n",
  57.     "WITH Y AS (SELECT EMAIL, SYS_GUID() RN FROM T_CONTACTS),\n",
  58.     "     T AS (SELECT * FROM Y)\n",
  59.     "    SELECT DISTINCT T1.EMAIL\n",
  60.     "      FROM T T1\n",
  61.     "INNER JOIN T T2\n",
  62.     "        ON T1.EMAIL = T2.EMAIL\n",
  63.     "       AND T1.RN <> T2.RN;\n",
  64.     "```"
  65.    ]
  66.   },
  67.   {
  68.    "cell_type": "markdown",
  69.    "id": "f53269d2",
  70.    "metadata": {},
  71.    "source": [
  72.     "注意最后的查询中临时表 `Y` 增加了一列字段 `rn`,但在 Oracle 中 CTE 在后续的查询中引用中会重复计算(暂不清楚是否与易失性函数如 `SYS_GUID` 或 `DBMS_RANDOM.VALUE` 有关),所以加了一个临时表 `T` 用于固化 `Y` 的结果,最后自关联采用 `EMAIL` 相等且存在 `RN` 不等判断出重复值"
  73.    ]
  74.   },
  75.   {
  76.    "cell_type": "markdown",
  77.    "id": "688a7072",
  78.    "metadata": {},
  79.    "source": [
  80.     "---"
  81.    ]
  82.   },
  83.   {
  84.    "cell_type": "markdown",
  85.    "id": "1055780b",
  86.    "metadata": {},
  87.    "source": [
  88.     "## PostgreSQL"
  89.    ]
  90.   },
  91.   {
  92.    "cell_type": "markdown",
  93.    "id": "b5818644",
  94.    "metadata": {},
  95.    "source": [
  96.     "```sql\n",
  97.     "DROP TABLE IF EXISTS DATALAB.I_CONTACTS;\n",
  98.     "\n",
  99.     "CREATE TABLE DATALAB.I_CONTACTS(\n",
  100.     "    EMAIL VARCHAR(255)\n",
  101.     ");\n",
  102.     "\n",
  103.     "INSERT INTO DATALAB.I_CONTACTS VALUES\n",
  104.     "    ('uncautiously@phenocryst.org'),\n",
  105.     "    ('innocuous@grosz.net'),\n",
  106.     "    ('hopvine@dilligrout.edu'),\n",
  107.     "    ('hopvine@dilligrout.edu'),\n",
  108.     "    ('chamaesiphon@provisor.org'),\n",
  109.     "    ('chamaesiphon@provisor.org'),\n",
  110.     "    ('chamaesiphon@provisor.org'),\n",
  111.     "    ('miltonian@soarability.edu'),\n",
  112.     "    ('miltonian@soarability.edu'),\n",
  113.     "    ('miltonian@soarability.edu'),\n",
  114.     "    ('miltonian@soarability.edu');\n",
  115.     "\n",
  116.     "WITH T AS (SELECT EMAIL, RANDOM() RN FROM DATALAB.I_CONTACTS)\n",
  117.     "    SELECT DISTINCT T1.EMAIL\n",
  118.     "      FROM T T1\n",
  119.     "INNER JOIN T T2\n",
  120.     "        ON T1.EMAIL = T2.EMAIL\n",
  121.     "       AND T1.RN <> T2.RN;\n",
  122.     "```"
  123.    ]
  124.   },
  125.   {
  126.    "cell_type": "markdown",
  127.    "id": "ca5a4705",
  128.    "metadata": {},
  129.    "source": [
  130.     "注意 PostgreSQL 就没有 Oracle 中的问题了,可能与 CTE 的执行与实现原理有关"
  131.    ]
  132.   },
  133.   {
  134.    "cell_type": "markdown",
  135.    "id": "d678657a",
  136.    "metadata": {},
  137.    "source": [
  138.     "---"
  139.    ]
  140.   },
  141.   {
  142.    "cell_type": "markdown",
  143.    "id": "12db6b55",
  144.    "metadata": {},
  145.    "source": [
  146.     "以上结果均为:\n",
  147.     "\n",
  148.     "| Email |\n",
  149.     "| ----- |\n",
  150.     "| hopvine@dilligrout.edu |\n",
  151.     "| chamaesiphon@provisor.org |\n",
  152.     "| miltonian@soarability.edu |\n",
  153.     "\n",
  154.     "(顺序可能有不同)"
  155.    ]
  156.   }
  157.  ],
  158.  "metadata": {
  159.   "kernelspec": {
  160.    "display_name": "Python 3 (ipykernel)",
  161.    "language": "python",
  162.    "name": "python3"
  163.   },
  164.   "language_info": {
  165.    "codemirror_mode": {
  166.     "name": "ipython",
  167.     "version": 3
  168.    },
  169.    "file_extension": ".py",
  170.    "mimetype": "text/x-python",
  171.    "name": "python",
  172.    "nbconvert_exporter": "python",
  173.    "pygments_lexer": "ipython3",
  174.    "version": "3.9.16"
  175.   }
  176.  },
  177.  "nbformat": 4,
  178.  "nbformat_minor": 5
  179. }
  180.  
Add Comment
Please, Sign In to add comment