cis4930 study guide

Python:
- data types
    1. Numbers - integers, floating-point numbers, and complex numbers.
    2. Strings - a sequence of characters enclosed in single or double quotes.
    3. Boolean - a type that can only take the values True or False.
    4. Lists - a collection of elements of any data type, enclosed in square brackets and separated by commas.
    5. Tuples - similar to lists, but they are immutable and enclosed in parentheses.
    6. Sets - an unordered collection of unique elements, enclosed in curly braces.
     7. Dictionaries - a collection of key-value pairs, enclosed in curly braces, where the keys must be unique.
- loops & conditionals
    1. for loops
    2. while loops
    3. if, else, elif loops
- functions
    def function(x, y):
        z = y + x
        return z
- modules
    - import ____
    - import ____ as ____
    - from ____ import ____

Key Value Databases:
- Arrays To Key Value DB (?)
- Essential Features of Key-Value Databases
    - scalable
    - high performance
    - high availability
    - flexibility
    - ease of use
    - low latency
- Keys
    - a key is a unique identifier for a particular piece of data.
    - The key is used to locate and retrieve the corresponding value in the database.
- Characteristics of Values
    - a value is the data that is associated with a particular key.
    - In other words, it is the information that is stored in the database and can be retrieved using the corresponding key.

Key-Value Database Terminology:
- Key-Value Database Data Modeling Terms
    - relational schema: formal description of a table, blueprint of information needed to make a table
    - attribute domain: set of values an attribute may take
- Key-Value Architecture Terms
    - Tables: structures that store information
    - candidate keys: when multiple attributes can serve as primary keys, we call these...
    - primary key: main identifier for a row in a table
    - foreign key: a key used to link multiple tables to one another
- Key-Value Implementation Term (?)

Designing for Key-Value Databases:
- Key Design and Partitioning
    - following a naming convention
    - range-based components (date/int counter)
    - use a common delimiter
    - partitioning can be done by range or by hash
- Designing Structured Values
    - common cases: attributes that are used together
    - store commonly used values in RAM, store logically linked info together
    - duplication of data can improve performance (denormalization)
- Limitations of Key-Value Databases
    - lookups are only possible by key
    - range queries are not supported by default
    - no standard query language
- Design Patterns for Key-Value Databases
    - TTL keys
        - keys that expire after some amount of time
    - Emulating Tables
        - implement get and set operations so attributes can be assigned/retrived
    - Aggregates
        - using a common table to store attributes of subtypes
    - Atomic Aggregates
        - all properties must be updated at the same time or not at all
    - Enumerable Keys
        - using counters/sequences to create keys

PickleDB:
import pickledb

# Create a new PickleDB
db = pickledb.load('example.db', True)

# Add key-value attributes to the PickleDB
db.set('name', 'John')
db.set('age', 25)
db.set('city', 'New York')

# Update a key-value attribute in the PickleDB
db.set('age', 26)

# Delete a key-value attribute from the PickleDB
db.rem('city')

# Locate and display key-value attributes from the PickleDB
name = db.get('name')
age = db.get('age')
city = db.get('city')

print('Name:', name)
print('Age:', age)
print('City:', city)

Document Databases:
- What Is a Document?
    - A document is a self-contained data structure that contains all the information related to a specific object or entity.
    - The document can be thought of as a unit of storage, and it typically contains multiple fields or attributes that describe the properties of the object.
- Avoid Explicit Schema Definitions
    - This allows your document database to be flexible and store blob data more easily
- Basic Operations on Document Databases
    1. Create: A new document can be created by inserting a new JSON or BSON object into the database.
    2. Read: Documents can be retrieved from the database using various query operators that filter, sort, and limit the results.
    3. Update: Documents can be updated by modifying one or more fields or attributes of the document.
    4. Delete: Documents can be deleted from the database using a delete operation that specifies the criteria for selecting the documents to be deleted.
    5. Query: Documents can be queried using a query language that supports filtering, sorting, and aggregation.
    6. Indexing: Documents can be indexed for fast retrieval of data.
    7. Transaction: Documents can be updated or deleted as part of a transaction, which ensures that all changes are either committed or rolled back together.

Document Database Terminology:
- Document and Collection Terms
    - document: a set of ordered key-value pairs
    - collection: group of related documents
    - embedded document: a document being stored within another document
    - polymorphic schema: documents within a collection have multiple different forms
    - schemaless: do not require specification step before adding document to a collection
- Types of Partitions
    - vertical partitioning: within one server, breaking down columns in a relational table into multiple tables
    - horizonal partitioning, across multiple servers
    - partitioning algorithm: ranges, lists, or hash values
- Data Modeling and Query Processing
    - deletion anomaly: when removing an entry removes a piece of data that was only found there
    - insertion anomaly: cannot insert partial information into table
    - update anomaly: when one fact changes and must be updated in multiple places
    - normalization means there are no modification anomalies
        - this can be done by joining tables together
    - query processor: takes input queries and data abt document collections and creates operations to retrieve that data

Designing Document Databases:
- Normalization, Denormalization, and the Search for Proper Balance
    - In a document database, normalization involves breaking down the data into separate collections or documents to avoid redundancy.
    - Instead of storing all the data in a single document, the data is split across multiple documents, and relationships between them are established using references or embedded documents.
- Planning for Mutable Documents
    - allocate extra memory ahead of time to reduce the chance of needing to move and free document location
- The Goldilocks Zone of Indexes
    - create a good number of indices to keep overhead low while maintaining read speed
- Modeling Common Relations
    - one to many relationship (embed a document within another document)
    - many to many relationship (with two documents, embed document within the other)
    - heirarchies (contain a reference to the parent object within the child)
- being able to create JSON Files / JSON Formatting
{
  "name": "John Smith",
  "age": 35,
  "email": "[email protected]",
  "address": {
    "street": "123 Main St",
    "city": "Anytown",
    "state": "CA",
    "zip": "12345"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "555-1234"
    },
    {
      "type": "work",
      "number": "555-5678"
    }
  ]
}

Mongo DB and Python:
import pymongo

# Connect to your local MongoDB
client = pymongo.MongoClient("mongodb://localhost:27017/")

# Drop your document database (if it exists)
client.drop_database("mydb")

# Create your document database
mydb = client["mydb"]

# Create a collection in your document database
mycol = mydb["mycollection"]

# Insert items into your collection
mydict1 = { "name": "John", "address": "Highway 37" }
mydict2 = { "name": "Jane", "address": "Baker Street 221B" }
mycol.insert_many([mydict1, mydict2])

# Using find, display all items in your collection to the screen
for x in mycol.find():
  print(x)

in Python locate items in a Document DB and displays the results with limiting attributes, limiting results, sorting
import pymongo

# Connect to your local MongoDB
client = pymongo.MongoClient("mongodb://localhost:27017/")

# Retrieve a collection named "mycollection"
mycol = client["mydb"]["mycollection"]

# Define the query object with limiting attributes
query = { "name": "John" }
projection = { "name": 1, "age": 1 }

# Perform the find query with limiting attributes, limiting results, and sorting
result = mycol.find(query, projection).sort("age", pymongo.ASCENDING).limit(10)

# Print the results to the screen
for x in result:
  print(x)
In this example, we define a query object with a limiting attribute "name": "John".
We also define a projection object with limiting attributes "name": 1 and "age": 1.
This will limit the results to only include the "name" and "age" fields.
We then perform the find query with the limiting attributes, sort the results by the "age" field in ascending order using sort, and limit the results to a maximum of 10 documents using limit.

In a single Python script :
import pymongo

# Connect to your local MongoDB
client = pymongo.MongoClient("mongodb://localhost:27017/")

# Using find, display all items in your collection to the screen
mycol = client["mydb"]["mycollection"]
for x in mycol.find():
  print(x)

# Create a find that using an $lt
query = { "age": { "$lt": 30 } }
result = mycol.find(query)
print(f"Documents where age < 30: {result.count()}")

# Create a find that using an $gte
query = { "age": { "$gte": 30 } }
result = mycol.find(query)
print(f"Documents where age >= 30: {result.count()}")

# Create a find that using an $eq
query = { "name": { "$eq": "John" } }
result = mycol.find(query)
print(f"Documents where name = 'John': {result.count()}")

# Create a find that using an $ne
query = { "name": { "$ne": "John" } }
result = mycol.find(query)
print(f"Documents where name != 'John': {result.count()}")

# Create a find that using an $or
query = { "$or": [ { "name": "John" }, { "age": { "$lt": 30 } } ] }
result = mycol.find(query)
print(f"Documents where name = 'John' or age < 30: {result.count()}")

# Create a find that using an $and
query = { "$and": [ { "name": "John" }, { "age": { "$lt": 30 } } ] }
result = mycol.find(query)
print(f"Documents where name = 'John' and age < 30: {result.count()}")

# Create a find that using an $not
query = { "name": { "$not": { "$eq": "John" } } }
result = mycol.find(query)
print(f"Documents where name != 'John': {result.count()}")

# Create a find that using an $exists
query = { "age": { "$exists": True } }
result = mycol.find(query)
print(f"Documents where age exists: {result.count()}")

# Create a find using {item: null } null search
query = { "item": None }
result = mycol.find(query)
print(f"Documents where item is null: {result.count()}")

# Create a find using {item: {$exists : false} } null search
query = { "item": { "$exists": False } }
result = mycol.find(query)
print(f"Documents where item does not exist: {result.count()}")

# Create a find using {item: {$type : 10} } null search
query = { "item": { "$type": 10 } }
result = mycol.find(query)
print(f"Documents where item is null or undefined: {result.count()}")