Guest User

Untitled

a guest
Dec 25th, 2017
127
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 37.23 KB | None | 0 0
  1. ISM 6218 Advance Database Management
  2. Project Report
  3. on
  4. BullFlix, a Movie App (with additional functionality of messaging)
  5.  
  6.  
  7.  
  8.  
  9.  
  10.  
  11.  
  12.  
  13.  
  14.  
  15.  
  16.  
  17.  
  18. Table of Contents
  19.  
  20. 1. Executive Summary………………………………………………………………………………3
  21. 2. Assumptions…………………………………………………………………………………………4
  22. 3. Logical Design……………………………………………………………………………………….6
  23. 4. Capacity Planning………………………………………………………………………………….9
  24. 5. Data Generation…………………………………………………………………………………..10
  25. 6. DDL Queries………………………………………………………………………………..……….16
  26. 7. Data Integrity………………………………………………………………………………..……..23
  27. 8. Performance Tuning………………………………………………………………..…………..26
  28. 9. Stored Procedure…………………………………………………………………..…………….30
  29. 10. DBA Scripts……………………………………………………………………………..…………..31
  30. 11. Interface Design………………………………………………………………………….……...36
  31. 12. Data Visualization………………………………………………………………………….…...39
  32. 13. Queries………………………………………………………………………………………….…...45
  33.  
  34.  
  35.  
  36.  
  37.  
  38.  
  39.  
  40. Executive Summary:
  41. The BullFlix database has been developed to provide an application to users through which they can search movie names and released year, awards and genres, cast and directors, theaters and their location. We have also come up with a design for this database where users would be able to follow other users, named as “fans”.
  42. They can like the description given by another fan or can send the friend request to another fan. Once, a fan accepts the friend request send by another user, these both users would be able to send instant message to each other. This facility is only available to “friends”.
  43. For this project, we have shown how we were able to gather data, the logical design. We have performed the integrity checks, optimization and have shown the DBA scripts and Visualization.
  44. We have given following weights to these activities:
  45.  
  46. Topic Weights:
  47. Topic Area Description Weight
  48. Database Design: Logical Design, Data Generation, Data Integrity, DDL Statements 30
  49. Query Writing: Queries, Stored Procedure 25
  50. Performance Tuning: Indexing, Btree, Bitmap 15
  51. Other Topics: Capacity Planning, DBA Scripts, Interface Design, Data Visualization 30
  52.  
  53.  
  54.  
  55.  
  56. Assumptions:
  57. We are assuming following things in designing our BullFllix app.
  58. 1. [MESSAGE] table will contain information about sender (MES_SENDER_FAN_ID), message body, subject (optional), date when the message was created (automatically time stamped), message expiry date (optional) to allow several attempts to deliver message and drop it if unsuccessful. This attribute can be uniformly set automatically (e.g. to 1 year or never) or manually.
  59.  
  60. 2. [MESSAGE] also allows creation of “threads” where a flow of messages (both peer-to-peer or chat groups) may be organized in threads. A special attribute (MES_PARENT_MESSAGE_ID) represents unary relation that may anchor a group of messages to a “parent” message.
  61.  
  62. 3. [MESSAGE] table has no information about who a message was sent to. [MESSAGE_RECIPIENT] table was created to accommodate one to many nature of group chats (please see p. 4 and 5 below). When a sender sends a direct (peer-to-peer) message to recipient, information on such recipient is stored in [MESSAGE_RECIPIENT] table under (MR_Message_Rec_Fan_ID) attribute along with time stamp of when a message was created. Although we assume that only “friends” can exchange direct messages it is not explicitly imbedded in a DB design.
  63.  
  64. 4. Information on chat groups is stored in [GROUP_CHAT] table. When a fan creates a chat group this information is inserted into [GROUP_CHAT] table (one tuple per every group chat existing in the system), where title, creation date and “active” flag are stored. List of participants of every group is stored in [GROUP_CHAT_LIST] table.
  65.  
  66. 5. When a group message is sent a row is inserted into [MESSAGE_RECIPIENT] with (GCL_Group_ID) attribute. MR_Message_Rec_Fan_ID is null in that case, and vice versa GCL_Group_ID is null for direct chat. [MESSAGE_RECIPIENT] may be divided into 2 tables: one for direct chat and another for group chat, but since no additional options will be introduced (other than direct or group) and to have a more compact and clear design we assume that a column of “wasted” values for each message is not a big cost.
  67.  
  68. 6. [FRIEND_REQUEST] table keeps track of all interactions among fans with respect to establishing friends’ relationships. There are only 4 types of requests allowed. Initially a fan (REQ_SENDER_FAN_ID) can send a “friend request” to any other fan (REQ_RECEPIENT_FAN_ID). That recipient fan can either “accept request”, “reject request” or simply ignore. Later if request is accepted and when relationship is established either “friend” can “unfriend” the other one. That type of request is defined in (RET_REQUEST_TYPE_TITLE) attribute which is a foreign key from a separate [REQUEST_TYPES] table.
  69.  
  70. 7. Running SQL queries on [FRIEND_REQUEST] we may determine if a friend status exist for any pair of fans at any particular time. For example, to determine if Alice and Bob are friends we can select rows from [FRIEND_REQUEST] table where (REQ_SENDER_FAN_ID) is Alice or Bob and (REQ_RECEPIENT_FAN_ID) is Bob or Alice, look at the last row’s (RET_REQUEST_TYPE_TITLE) attribute. If it says “accept request” then a friend relation exists. If it is either one of “friend request”, “reject request” and “unfriend” or the tuple does not exist than there is no friend relation.
  71.  
  72. 8. We have assumed that many movies can be shown in a single theater and many theaters may show a particular movie. We store date and time of respective shows along with Movie and Theater IDs. However, at this point we do not plan to utilize a possibility that a movie may be shown in a particular theater at 2 or more screens at the same time and date.
  73.  
  74. 9. We assume that our customer is interested in schedule first. We think it is a viable compromise for our app since otherwise we would be dealing with unstructured data from many theaters and would have to account for physical peculiarities of the venues and normalize it to be used in a single search output.
  75.  
  76. 10. We will store theaters' physical location as GPS coordinates in our Theaters table. Additional fields concerning usual address format (street, city, state, zip) will be stored in a separate table.
  77.  
  78. 11. We assume that only one theater may be located at one set of GPS coordinates.
  79.  
  80. 12. We further assume that we might be able to use third party services to populate and update (as needed) our Theaters Locations table. GPS coordinates will not change, but once our DB grows older our street address may become outdated because of changes in street names, expansion overseas (with different address structure) and other reasons.
  81.  
  82. 13. Our DB can store Theaters' ratings by fans registered in the system (existing Fans table).
  83.  
  84. 14. We assume that any fan may rate any number of Theaters and add a written review (many to many relation).
  85.  
  86.  
  87.  
  88.  
  89.  
  90.  
  91.  
  92. Logical Design:
  93. 1. The logical design for BullFlix, with an additional capability of “Theatres”:
  94.  
  95.  
  96.  
  97.  
  98.  
  99.  
  100. 2. Whereas the logical design for BullFlix with functionality of “Friends and followers”:
  101.  
  102.  
  103.  
  104. 3. The logical Design of the whole BullFlix consolidated into one:
  105.  
  106.  
  107.  
  108. In the above ERD diagram, we have displayed the original design in Green, the theatres in Red and in Blue the functionality of messaging.
  109.  
  110.  
  111.  
  112. Capacity Planning:
  113. Capacity planning is done in order to utilize physical resources like disk space, memory, CPU, storage space etc. effectively. We plan this by keeping business requirements in mind. For instance, what would be the growth rate of tables in future, how much extra spce we should keep etc.
  114. Below we are going to show the available and allocated spaces in our database.
  115. Select * FROM DBA_TEMP_FREE_SPACE
  116.  
  117. From the above screen shot we can see how much free space we have. Now, we are going to see how much space is consumed by each table.
  118. SELECT table_name, num_rows, avg_space, avg_row_len
  119. from all_tables
  120. where owner = 'RELMDB'
  121.  
  122.  
  123. From the above table we can see that the space utilized by each table in relmdb database. Here, we are going to calculate the space requirement for fans and movies table.
  124. We can see that average row length for fans is 254. At present we have 7655 fans. So, together all these fans will take 254 * 7655 = 1944370 bytes. Similarly, movies table will require 271 * 283 = 76,693 bytes.
  125. On an average if we say that 500 movies are added every year and 1000 fans, based on this our database tables will grow very fast. For this we should look for range portioning or distributed database.
  126.  
  127.  
  128.  
  129. Data Generation:
  130. To load data into our tables, we have generated data sets using Excel. Using this data, we were able to simulate what an application’s social media elements could produce.
  131. Message and message recipient tables
  132. One of the most important tables that tied together communications among fans is the MESSAGE table. Core of MESSAGE table is MES_MESSAGE_BODY attribute that is to store texts of the messages sent among fans. We further intended those messages to be ready for text sentiment analysis (see p. Sentiment analysis to gauge users’ evaluation of movies) so it could not be a mere random combination of characters. To satisfy that condition we came up with the following requirements to the strings making MES_MESSAGE_BODY attribute:
  133. 1. Messages must be compiled of English words.
  134. 2. Total number of messages to be generated was arbitrarily chosen to be 5000.
  135. 3. Messages must be of a variable length uniformly distributed in a range from 1 to 20 words.
  136. 4. Some portion of all messages (arbitrarily chosen to be 30%) shall contain movies’ titles as part of the message body. Such messages containing titles shall be randomly distributed among all messages. Title to be mentioned in such messages shall be randomly chosen from a list contained in MOVIE table (283 movies). Position of the title (among all words making such message’s body) must be randomly chosen from 1 to message’s length.
  137. To satisfy the abovementioned requirements, and specifically p. 1, a population of 20 words was created to be used in messages:
  138. good cool not a fan bad worst
  139. disgusting hello what when I
  140. you like want grab food
  141. address car pick up director know
  142. Words were chosen to include typical vocabulary that fans discussing movies and their plans would use. Though 20 words is obviously a “poor man’s version” of what actual vocabulary would be, it was used as an example and can be extended.
  143. MS Excel, including its random(), norm.inv() and other functions, was used to put together the strings for MESSAGE table. In the picture below a screenshot of our working spreadsheet is presented.
  144.  
  145.  
  146.  
  147. Legend for the screenshot above:
  148.  
  149. Message size is determined. It is randomly chosen from 1 to 20 using Excel’s randbetween() function.
  150.  
  151. Based on the size of the message (determined at step 1) a position of the movie title is determined randomly from 1 to message size using Excel’s randbetween() function.
  152.  
  153. Movie title is selected from the list of 283 movies contained in MOVIE table using Excel’s randbetween() and lookup functions.
  154.  
  155. Flag is set to TRUE (1) if a message shall contain a movie title, selected at step 3, and put to position selected at step 2 (about 30% of the time).
  156.  
  157. Set of words that can be used in constructing a message.
  158.  
  159. Based on size of the message, determined at step 1, words are randomly selected from the set (see step 5) using Excel’s randbetween() and lookup functions.
  160.  
  161. All words selected at step 6 are concatenated to form a string using Excel’s concat() function.
  162.  
  163. Several base inputs like maximum message size are outlined in that control form.
  164.  
  165. After creation of messages’ bodies, the rest of the attributes of MESSAGE table were generated. Screenshot bellow shows a part of generated table.
  166.  
  167.  
  168. Legend for the screenshot above:
  169.  
  170.  
  171. Message identifier, determined as a positive integer, keeping track of number of messages.
  172.  
  173. Message subject is decided to be a movie title in case such title is present as one of the words in a message body; null if no movie title is mentioned in a message.
  174.  
  175. Sender is selected randomly from a list of fan’s IDs of the FAN table.
  176.  
  177. First 20 messages were arbitrarily chosen as the only messages that could be “parent” messages to other messages to simulate possible “threads” of messages. It was further assumed that each time a sender (see step 3) who sent one of the first 20 messages shows up as a sender in other message, such message is considered to be a part of “thread” and a MES_PARENT_MESSAGE_ID is set accordingly.
  178.  
  179. Message body, generated at previous steps as shown above.
  180.  
  181. On a separate sheet distribution of messages by time was determined. Such distribution was set to obey 3 assumptions: a) distribution within a week: Mon - 7%, Tue - 11%, Wed – 10%, Thu - 29%, Fri - 21%, Sat - 18%, Sun ¬5%; b) weeks 15 through 30 shall have 2 times more messages than weeks 1 through 15 and 50% less messages than weeks 30+; c) actual number of messages for a specific date shall be randomly determined assuming normal distribution with mean equals to calculated based on previous 2 assumptions and standard deviation of 1 using Excel’s random() and norm.inv() functions.
  182.  
  183. Messages are set to never expire by setting expiration date to Jan 1, 2099.
  184.  
  185. MESSAGE_RECIPIENT table was generated the following way: 4900 messages out of 5000 were set aside to be peer-to-peer messages between friends. Random friends pair were selected to attributeв ещ those messages. 100 messages out of 5000 were supposed to illustrate group messaging. Groups IDs were selected manually to evenly distribute 100 messages across 5 existing groups (see p. Groups and Friends tables).
  186.  
  187. Groups and Friends Tables
  188. GROUP_CHAT was manually populated with 5 groups (“Tarantino movies”, “Matrix trilogy”, “I'm gonna make him an offer he can't refuse”, “Paulie from Goodfellas” and “Not a fan of Scarface”) all created in 2016.
  189. GROUP_CHAT_LIST was manually populated with 5 groups’ members. Each group got 5 members, that were comprised of fans with FAN_Fan_ID equal to 1 through 25.
  190. FRIEND_REQUEST table was populated to showcase a typical activity of fans willing to establish connections among each other. First illustrated case: fans with ID = 3, 8 and 16 sent requests (RET_REQUEST_TYPE_TITLE is “Send friend request”) to randomly chosen fans (41, 43 and 94 fans respectively). Each of those requests is answered with acceptance (RET_REQUEST_TYPE_TITLE is “Accept friend request”). Second case illustrated is fan with ID = 22 sent friend requests to 76 other fans but is answered with rejection (RET_REQUEST_TYPE_TITLE is “Reject friend request”). Finally, fan with ID = 3 “unfriends” 2 of his or her friends. All requests take place during January 2016.
  191.  
  192.  
  193. Followers, Movie Ratings and Likes Table
  194. ‘FOLLOWER’ table was generated using similar techniques employed for MESSAGE table. Below a screenshot of our working model is presented.
  195.  
  196.  
  197. Legend for the screenshot above:
  198.  
  199. On step 1 a random list of fans (followees) was generated with a number of items arbitrarily set to 364 or about 5% of the total number of registered fans using Excel’s randbetween() function.
  200.  
  201. Random number of followers in a range of 1 to 20 was assigned to each followee using Excel’s randbetween() function.
  202.  
  203. Random fans’ IDs were assigned to each followee accounting for total number of followers such followee has (see step 2).
  204.  
  205.  
  206. After a matrix of follower-followee relations was built, it was dynamically transposed into 2 columns for FOL_FOLLOWER_FAN_ID and FOL_FOLLOWEE_FAN_ID attributes of the FOLLOWER table. Date of statuses was randomly chosen sometime in January of 2016.
  207. MOVIE_RATING table was generated to illustrate functionality that allows followers to “like” ratings and descriptions made by their followees.
  208.  
  209.  
  210. Legend for the screenshot above:
  211.  
  212.  
  213. Rating identifier, determined as a positive integer, keeping track of number of ratings.
  214.  
  215. Random fan ID selected from the FAN table using Excel’s randbetween() function.
  216.  
  217. Random movie ID selected from the MOVIE table using Excel’s randbetween() function.
  218.  
  219. Random numerical rating normally distributed with mean equals to movie’s IMDB rating and standard deviation of 4 using Excel’s random() and norm.inv() functions. Further adjusted to remain within 0 to 10 range using Excel’s min() and max() functions.
  220.  
  221. Rating description randomly generated by the same algorithm employed for MES_MESSAGE_BODY attribute of the MESSAGE table. Word bank was somewhat changed and included the following: good, cool, not a fan, bad, worst, disgusting, best, never, script, action, recommend, fake, real, president, gang, guns, car, happy end, director, know.
  222.  
  223. Date was randomly selected within 2016.
  224.  
  225. No favorite flag was set.
  226.  
  227. After generating data, the next step was to create tables and load this data into those tables. For this step, we have given DDL queries below.
  228. DDL Queries:
  229. 1. Message Table
  230. CREATE TABLE "DB204"."MESSAGE"
  231. ( "MES_MESSAGE_ID" NUMBER(4,0),
  232. "MES_SUBJECT" VARCHAR2(128 BYTE),
  233. "MES_SENDER_FAN_ID" NUMBER(6,0),
  234. "MES_PARENT_MESSAGE_ID" NUMBER(4,0),
  235. "MES_MESSAGE_BODY" VARCHAR2(256 BYTE),
  236. "MES_CREATE_DATE" VARCHAR2(26 BYTE),
  237. "MES_EXPIRE_DATE" VARCHAR2(26 BYTE)
  238. ) SEGMENT CREATION IMMEDIATE
  239. PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  240. NOCOMPRESS LOGGING
  241. STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  242. PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  243. BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  244. TABLESPACE "STUDENTS" ;
  245.  
  246.  
  247. 2. Follower Table
  248. CREATE TABLE "DB204"."FOLLOWER"
  249. ( "FOL_FOLLOWER_FOLLOWEE_ID" NUMBER(4,0),
  250. "FOL_FOLLOWER_FAN_ID" NUMBER(6,0),
  251. "FOL_FOLLOWEE_FAN_ID" NUMBER(5,0),
  252. "FOL_FOLLOW_STATUS" VARCHAR2(26 BYTE),
  253. "FOL_FOLLOW_START_DATE" VARCHAR2(26 BYTE),
  254. "FOL_FOLLOW_STATUS_UPDATE_DATE" VARCHAR2(26 BYTE)
  255. ) SEGMENT CREATION IMMEDIATE
  256. PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  257. NOCOMPRESS LOGGING
  258. STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  259. PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  260. BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  261. TABLESPACE "STUDENTS" ;
  262.  
  263.  
  264. 3. Friend Request Table
  265. CREATE TABLE "DB204"."FRIEND_REQUEST"
  266. ( "REQ_REQUEST_ID" NUMBER(4,0),
  267. "REQ_SENDER_FAN_ID" NUMBER(6,0),
  268. "REQ_RECEPIENT_FAN_ID" NUMBER(6,0),
  269. "RET_REQUEST_TYPE_TITLE" VARCHAR2(26 BYTE),
  270. "REQ_REQUEST_DATE" VARCHAR2(26 BYTE)
  271. )
  272. SEGMENT CREATION IMMEDIATE
  273. PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  274. NOCOMPRESS LOGGING
  275. STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  276. PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  277. BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  278. TABLESPACE "STUDENTS" ;
  279.  
  280.  
  281.  
  282. 4. Group Chat Table
  283. CREATE TABLE "DB204"."GROUP_CHAT"
  284. ( "GC_GROUP_ID" NUMBER(3,0),
  285. "GC_NAME" VARCHAR2(128 BYTE),
  286. "GC_CREATE_DATE" VARCHAR2(26 BYTE),
  287. "GC_IS_ACTIVE" NUMBER(3,0)
  288. ) SEGMENT CREATION IMMEDIATE
  289. PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  290. NOCOMPRESS LOGGING
  291. STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  292. PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  293. BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  294. TABLESPACE "STUDENTS" ;
  295.  
  296.  
  297. 5. Likes Table
  298. CREATE TABLE "DB204"."LIKES"
  299. ( "LIK_LIKE_ID" NUMBER(4,0),
  300. "FOL_FOLLOWER_FOLLOWEE_ID" NUMBER(6,0),
  301. "MR_MOVIE_RATING_ID" NUMBER(6,0),
  302. "LIK_LIKE_DATE" VARCHAR2(26 BYTE)
  303. ) SEGMENT CREATION IMMEDIATE
  304. PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  305. NOCOMPRESS LOGGING
  306. STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  307. PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  308. BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  309. TABLESPACE "STUDENTS" ;
  310.  
  311.  
  312. 6. Group Chat List Table
  313. CREATE TABLE "DB204"."GROUP_CHAT_LIST"
  314. ( "GCL_GROUP_ID" NUMBER(4,0),
  315. "GC_GROUP_ID" NUMBER(3,0),
  316. "FAN_FAN_ID" NUMBER(4,0),
  317. "GCL_CREATE_DATE" VARCHAR2(26 BYTE),
  318. "GCL_IS_ACTIVE" NUMBER(3,0),
  319. "GCL_CHAT_ADMIN" NUMBER(3,0)
  320. ) SEGMENT CREATION IMMEDIATE
  321. PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  322. NOCOMPRESS LOGGING
  323. STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  324. PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  325. BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  326. TABLESPACE "STUDENTS" ;
  327.  
  328.  
  329.  
  330. 7. Message Recipient Table
  331. CREATE TABLE "DB204"."MESSAGE_RECIPIENT"
  332. ( "MR_MESSAGE_REC_ID" NUMBER(4,0),
  333. "MES_MESSAGE_ID" NUMBER(4,0),
  334. "GCL_GROUP_ID" VARCHAR2(26 BYTE),
  335. "MR_MESSAGE_REC_FAN_ID" NUMBER(6,0),
  336. "MR_IS_READ" NUMBER(3,0)
  337. ) SEGMENT CREATION IMMEDIATE
  338. PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  339. NOCOMPRESS LOGGING
  340. STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  341. PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  342. BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  343. TABLESPACE "STUDENTS" ;
  344.  
  345.  
  346.  
  347.  
  348. 8. Request Types Table
  349. CREATE TABLE "DB204"."REQUEST_TYPES"
  350. ( "REQUEST_TYPES" VARCHAR2(26 BYTE)
  351. ) SEGMENT CREATION IMMEDIATE
  352. PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  353. NOCOMPRESS LOGGING
  354. STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  355. PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  356. BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  357. TABLESPACE "STUDENTS" ;
  358.  
  359.  
  360. Data Integrity:
  361. We have applied integrity checks on our tables so that data remains in a consistent state. To begin this, first we have assigned primary keys on the tables and then we have made appropriate foreign keys where ever it was required.
  362. 1. Primary Key Constraint on Message Table
  363. ALTER TABLE message
  364. ADD PRIMARY KEY (MES_MESSAGE_ID);
  365.  
  366.  
  367. 2. Primary Key Constraint on Group_Chat_List
  368. ALTER TABLE GROUP_CHAT_LIST
  369. ADD PRIMARY KEY (GCL_GROUP_ID);
  370.  
  371.  
  372.  
  373.  
  374. 3. Foreign Key Constraint on Message Recipient
  375. ALTER TABLE message_recipient
  376. ADD FOREIGN KEY (mes_message_id) REFERENCES message(mes_message_id);
  377.  
  378.  
  379.  
  380. 4. Foreign Key Constraint on Group Chat List
  381. ALTER TABLE GROUP_CHAT_LIST
  382. ADD FOREIGN KEY (GC_GROUP_id) REFERENCES GROUP_CHAT(GC_GROUP_ID);
  383.  
  384.  
  385.  
  386.  
  387.  
  388. 5. Foreign Key Constraint on Friend Request
  389. ALTER TABLE FRIEND_REQUEST
  390. ADD FOREIGN KEY (REQ_SENDER_FAN_ID) REFERENCES FAN(FAN_ID);
  391.  
  392.  
  393.  
  394.  
  395.  
  396. Performance Tuning:
  397. To perform optimization on our database, we have done indexing on our tables. The below examples will show performance without an index and after indexing. We have also performed B-tree indexing and Bitmap indexing.
  398. Case 1: Query without an Index
  399. We have written a query to fetch film title results that contains the word ‘Lord’ in them. Without an index, we see the cost as 4 and session logical reads 956.
  400. SELECT film_title
  401. FROM movies
  402. WHERE film_title like '%Lord%';
  403.  
  404.  
  405.  
  406. Case 2: Querying with a B-Tree Index
  407. To demonstrate that indexes reduce cost or session logical reads, we created a B-tree index and ran the same query. The cost has come down to 2 and session logical reads have reduced to 497.
  408.  
  409.  
  410. Case 3: Querying using Bitmap Index
  411. To demonstrate that bitmap indexing lower downs the cost, we have ran the same query with Bitmap Index. We see in the result that cost is 2 and session logical reads have further come down to 10.
  412.  
  413.  
  414.  
  415.  
  416.  
  417. Case 4: Querying using Complex Queries
  418. To study this, we are going to query first without an index and then after creating an index.
  419.  
  420. 4.1 Query without Index
  421. CREATE TABLE casts AS SELECT *
  422. FROM RELMDB.casts;
  423. Select distinct movies.film_title, casts.cast_member
  424. from relmdb.movies, relmdb.casts
  425. where movies.Film_ID = casts.Film_ID and (cast_member = 'Tom Hanks' or cast_member = 'Tim Allen')
  426. order by film_title;
  427.  
  428.  
  429.  
  430. From the above query, we have got cost for running the statement as 8 and session logical reads as 143. In the next step we will create an index and will compare the result.
  431.  
  432.  
  433. 4.2 Query with Index:
  434. We are creating an index on cast_member column and will execute the same query from the above step.
  435. CREATE INDEX cast_member_btree ON casts (CAST_MEMBER);
  436. We can easily conclude from the below result that session logical reads has reduced significantly ( it has come down to 15 from 143 ) even if the cost of statement is same as the previous step.
  437.  
  438.  
  439.  
  440.  
  441.  
  442.  
  443.  
  444.  
  445.  
  446. Stored Procedure:
  447. 1. To validate username and password, following procedure is required:
  448. CREATE proc log_in(username IN VARCHAR2, password IN VARCHAR2)
  449. RETURN VARCHAR2
  450. AS count_match NUMBER;
  451. BEGIN SELECT COUNT(*)
  452. INTO count_match
  453. FROM ACCOUNT
  454. WHERE fan_id=username AND password=abc;
  455. IF count_match = 0 THEN dbms_output.put_line('Wrong username or password');
  456. ELSIF count_match = 1
  457. THEN dbms_output.put_line('Login successful');
  458. ELSE dbms_output.put_line('Try Again');
  459. END IF;
  460. END;
  461.  
  462. 2. To find out movie rating by a fan:
  463. create proc MovieFanRating (
  464. FILM_TITLE VARCHAR2(80),
  465. FAN_ID VARCHAR2(10,0))
  466. As
  467. BEGIN
  468. SELECT FILM_ID, FILM_TITLE, FILM_YEAR,
  469. (SELECT IMDB_RATING FROM FAN_RATINGS fanR
  470. WHERE FanR.FAN_ID = FAN_ID AND FanR.FILM_ID = move.FILM_ID
  471. ) USER_RATING,
  472. (SELECT COUNT(*) FROM FAN_RATINGS fr
  473. WHERE fr.FAN_ID = FAN_ID AND fr.FILM_ID = move.FILM_ID
  474. ) IS_RATED
  475. FROM MOVIES move
  476. WHERE FILM_TITLE LIKE '%' + FILM_TITLE + '%'
  477. ORDER BY FILM_TITLE
  478. END;
  479.  
  480.  
  481.  
  482. DBA Scripts:
  483. DBA scripts are run by database administrators to for monitoring the database. The scripts can be run at server side known as shell scripts or can be run at sql developer. In our project, we are focusing on sql developer scripts.
  484.  
  485. 1. To query information on sessions, following script is required:
  486. SELECT statsname.name, sstatistics.value FROM v$sesstat sstatistics, v$statname statsname, v$session sen
  487. WHERE sstatistics.statistic# = statsname.statistic# AND sen.audsid = SYS_CONTEXT ('USERENV','SESSIONID') AND sen.sid = sstatistics.sid ;
  488.  
  489.  
  490.  
  491.  
  492. 2. To check default passwords, the query written below is used:
  493. SELECT a.username, b.account_status
  494. FROM dba_users_with_defpwd a
  495. JOIN dba_users b ON a.username = b.username
  496. ORDER BY 1;
  497.  
  498.  
  499. 3. For backing up purposes, a DBA can record path of the directories in a database with this script:
  500. SELECT directory_name, ORIGIN_CON_ID, directory_path
  501. FROM dba_directories;
  502.  
  503.  
  504.  
  505.  
  506. 4. To identify a locked session, the following query is used by a DBA. This display’s session id, the user name and so on:
  507. SELECT l.session_id||','||v.serial# sid_serial, l.oracle_username DB508,
  508. o.object_name, o.object_type,
  509. DECODE( l.locked_mode, 0, 'None',1, 'Null', 2, 'Row-S (SS)', 3, 'Row-X (SX)', 4, 'Share', 5, 'S/Row-X (SSX)', 6, 'Exclusive', TO_CHAR(l.locked_mode) ) lock_mode,
  510. o.status, to_char(o.last_ddl_time,'dd.mm.yy') last_ddl FROM dba_objects o,
  511. gv$locked_object l, v$session v
  512. WHERE o.object_id = l.object_id AND l.session_id=v.sid ORDER BY 2, 3;
  513.  
  514. Note: As there is no lock present, we get a blank output screen.
  515.  
  516. 5. To identify session statistics, this query is used:
  517. SELECT job_name, owner, log_date, actual_start_date, run_duration, status
  518. FROM dba_scheduler_job_run_details
  519. ORDER BY log_date;
  520.  
  521. 6. To identify the disk free space, the DBA can run following query:
  522. Select df.file_name, nvl(fr.bytes/(1024*1024),0) free_space_MB, df.bytes/(1024*1024) total_size, df.tablespace_name, ((df.bytes-nvl(fr.bytes,0))/df.bytes)*100 pct_used
  523. from (select sum(bytes) bytes, file_id
  524. from dba_free_space
  525. group by file_id) fr, dba_data_files df
  526. where df.file_id = fr.file_id(+)
  527. order by 1, df.file_id;
  528.  
  529.  
  530.  
  531.  
  532.  
  533.  
  534.  
  535.  
  536. Interface Design:
  537. In this section, we are going to show the interface design for the BullFlix application. Here, we are concentrating on BullFlix Message aspect.
  538.  
  539. Login Page:
  540.  
  541. The login page is the gateway of our application. All the users must login to access the system. If a user has login name and password, he can enter the credentials and use the system otherwise a user can sign up for becoming registered member.
  542.  
  543.  
  544.  
  545.  
  546. Chat Page:
  547. This is the page where users will be able to send message to each other, in the below screenshot, we have given the group chat functionality.
  548.  
  549.  
  550. Dashboard:
  551. The dashboard gives us the option of various functionalities present, user can add a group, delete a group or add friends or delete friends etc.
  552.  
  553.  
  554. Friends:
  555. On friends page, user is able to see the list of his friends, other users who has sent him reuest and add them to his friend list or return to the home page.
  556.  
  557. Data Visualization:
  558. In this section we present a model analysis of our data. All data used was generated by us to mimic what a messaging application could present in terms of user engagement statistics.
  559.  
  560. 1. Users Engagement and General Trend Analysis
  561.  
  562. Every manager needs to know how its product is doing in terms of basic metrics. For our messaging application a basic measurement of performance is number of messages users exchange within Bullflix app. To perform such analysis, we exported data from MESSAGE table to Excel and made several plots described below.
  563. First daily use of messaging application was analyzed by plotting daily number of messages exchanged in BullFlix over the period of its operations (Jan 1, 2017 till Nov 13, 2017).
  564.  
  565.  
  566.  
  567. Raw data plot demonstrates a rather noisy behavior which mirrors highly volatile engagement of users within period of observation. Visually one may spot that consistently increasing high points suggest that messaging application gains popularity as measured by number of messages. To get a better idea about overall trend a linear trendline was depicted (dotted line on a graph above). It’s positive slope also confirm visual assessment that number of messages indeed grows within a period in question. Frequent and repeating spikes and troughs also suggest that users’ behavior is affected by seasonality.
  568.  
  569. 2. Pattern of Growth Analysis
  570.  
  571. Deciphering growth pattern required us to produce aggregated data to see through noisy raw statistics. A natural time frame to analyze messaging activity is one week. Consequently, number of messages was combined on a weekly basis starting January 1, 2017 till November 11, 2017, excluding two days of observations which do not make a full week.
  572.  
  573. Plotting weekly data, we can see growth that we could witness on daily graph as well. However, with reduced noise we can more clearly see growth pattern. Specifically, we can determine that a steady growth was observed since the inception of messaging services till the end of observations in the middle of November 2017. Taking a closer look, one can determine two “hops” that user’s engagement had. First – in the end of April; second – in the end of July.
  574. Weekly data was further divided into 15 weekly periods (or roughly 4 months each). For each period an average number of messages per week was calculated. At the graph above, one might see that messaging service started in January at around 58 messages per week. In late April weekly exchange almost doubled and reached an average of 107 messages per week. In August weekly exchange jumped roughly 50% and reached almost 170 messages per week.
  575.  
  576. 3. Seasonality Analysis
  577.  
  578. To study seasonal effect on users’ engagement we need to analyze general product’s life cycle as well as understand patterns of how clients use the application.
  579. Since we review only first 11 months of application’s life it is hard to expect that we could spot any variability stemming from differences of application use in different seasons of the year. Any upward trend would smooth the seasonal effect that span over a month. Therefore, a more granular time periods may be needed to make any judgement about usage seasonality. A natural time frame to analyze messaging activity is one week. Therefore, distribution of messages within a week (i.e. on each weekday) was analyzed. For each day of observations starting January 1, 2017 day of week was defined. After that total number of messages that were exchanged on every day of week (e.g. on all Mondays, on all Tuesdays, etc.) during 11 months of observations were summarized in the table, shown below.
  580.  
  581. Day of week Total number of messages
  582. Monday 350
  583. Tuesday 551
  584. Wednesday 465
  585. Thursday 1488
  586. Friday 1040
  587. Saturday 869
  588. Sunday 237
  589. Total 5000
  590.  
  591. Summary data shown above was plotted to visually analyze distribution of message throughout a week. Such relative (share of total number) adjusted data is shown in the graph below.
  592.  
  593. As one may see users’ activity (based on all observations) peak on Thursdays, remains high on Fridays and Saturdays. On Sundays activity bottoms and on Monday through Wednesday stays moderate. One hypothesis to explain such behavior could be the following: users are getting ready for weekends when they want to go to watch movies, primarily on Friday night and on Saturday. When working week start on Monday, users’ activity is relatively low, then as people do away with their business tasks and as weekends approach they slightly increase engagement with the app. On Thursday people feel the urgency to make plans and their activity peaks. On Friday messaging remains high as people complete their last-minute planning. Same activities continue Saturday. However, on Sunday people tend to get busy with any other plans they had and keep communication at a minimum level.
  594.  
  595.  
  596. 4. Dynamics of Seasonality of Service Usage
  597.  
  598. Despite quite conclusive pattern of distribution of messages within a week shown in previous paragraph, managers shall be interested in dynamics of such seasonality. Did it change in the first 11 months of application’s life?
  599. To answer that question seasonality analysis has been repeated for each of 45 full weeks of observations. Data points were divided into full weeks (Monday through Sunday) starting January 2, 2017 till November 12, 2017, excluding 2 days of observations which did not make full weeks. Distribution of messages within a week throughout 11 months of 2017 was depicted in a graph below.
  600.  
  601.  
  602. Though distribution of messages on weekdays did not remain intact, no clear trend could be determined based on 11 months of observations. Shares of weekdays varied, but remained close to its average values. No weekday gained or lost attention over time relative to other weekday. For planning purposes and timing of marketing activities, Bullflix’s managers can rely on general use pattern within a week that was determined in the previous section.
  603.  
  604. 5. Sentiment Analysis to Gauge Users’ Evaluation of Movies
  605.  
  606. In this section we describe our attempt to use messages text data to derive users’ attitude towards movies they discuss. Since all data used in this experiment was generated by us, the result is predetermined to a great extent. However, techniques illustrated here could be used on other data sets. First, a dictionary of terms was assembled along with sentiment scores, attributed to those terms. Dictionary that was used for that analysis is provided below.
  607. Positive sentiment Negative sentiment
  608. Keywords Good, cool Not a fan, bad, worst, disgusting
  609. Sentiment score 1 -1
  610.  
  611. Second, we determined which messages contain movies titles. Out of 5000 messages 1521 messages contained a movie title. Third, within those 1521 messages containing a movie title, each message was evaluated individually to determine its net positive sentiment score, which is sum of sentiment scores of all identified terms in the message. Fourth, based on total net positive sentiment score each message was evaluated to be either “positive”, or “negative”, or “neutral”. Summary of “positive” and “negative” sentiments of users expressed in their messages is depicted in a graph below.
  612.  
  613. As we can see users tend to express their negative sentiment much more frequently, which outweighs the positive sentiment. For most movies it makes their net score negative. After such observation we made a conclusion that most of negative sentiment is self-sustaining “hate speech” that skews our observations. Until better methodology is developed we decided to ignore any negative sentiment. Having done so, our top liked movies chart looks the following:
  614. Position Title Total number of positive messages
  615. 1 The General 9
  616. 2 Shutter Island 9
  617. 3 Howl's Moving Castle 9
  618. 4 Gladiator 8
  619. 5 The Lord of the Rings: The Two Towers 8
  620. 6 The Big Lebowski 7
  621. 7 The Man Who Shot Liberty Valance 7
  622. 8 The Matrix 7
  623. 9 Notorious 6
  624. 10 My Neighbor Totoro 6
  625.  
  626.  
  627.  
  628. Queries:
  629. 1. For a Fan with ID say ‘3’ list the types of requests he has sent to users on the app.
  630. Select REQUEST_TYPES.REQUEST_TYPES, FRIEND_REQUEST.REQ_REQUEST_ID, FAN.FAN_ID
  631. from REQUEST_TYPES
  632. inner join FRIEND_REQUEST on REQUEST_TYPES.REQUEST_TYPES=FRIEND_REQUEST.RET_REQUEST_TYPE_TITLE
  633. inner join FAN on FRIEND_REQUEST.REQ_SENDER_FAN_ID=FAN.FAN_ID
  634. where FAN.FAN_ID='3';
  635.  
  636. 2. Display the ID of the fan who has received a message and also the status if he/she has read the message or not when a fan sends a message to the recipient.
  637. Select MESSAGE_RECIPIENT.MR_IS_READ, MESSAGE_RECIPIENT.MR_MESSAGE_REC_FAN_ID
  638. from MESSAGE_RECIPIENT
  639. inner join MESSAGE on MESSAGE_RECIPIENT.MES_MESSAGE_ID=MESSAGE.MES_MESSAGE_ID
  640. join FAN on MESSAGE.MES_SENDER_FAN_ID=FAN.FAN_ID;
  641.  
  642. 3. Query to display if the group chat window of a particular fan is active or not.
  643. select GROUP_CHAT.GC_IS_ACTIVE
  644. from GROUP_CHAT
  645. join GROUP_CHAT_LIST on GROUP_CHAT_LIST.GC_GROUP_ID=GROUP_CHAT.GC_GROUP_ID
  646. join FAN on GROUP_CHAT_LIST.FAN_FAN_ID=FAN.FAN_ID;
Add Comment
Please, Sign In to add comment