Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Python:
- - data types
- 1. Numbers - integers, floating-point numbers, and complex numbers.
- 2. Strings - a sequence of characters enclosed in single or double quotes.
- 3. Boolean - a type that can only take the values True or False.
- 4. Lists - a collection of elements of any data type, enclosed in square brackets and separated by commas.
- 5. Tuples - similar to lists, but they are immutable and enclosed in parentheses.
- 6. Sets - an unordered collection of unique elements, enclosed in curly braces.
- 7. Dictionaries - a collection of key-value pairs, enclosed in curly braces, where the keys must be unique.
- - loops & conditionals
- 1. for loops
- 2. while loops
- 3. if, else, elif loops
- - functions
- def function(x, y):
- z = y + x
- return z
- - modules
- - import ____
- - import ____ as ____
- - from ____ import ____
- Key Value Databases:
- - Arrays To Key Value DB (?)
- - Essential Features of Key-Value Databases
- - scalable
- - high performance
- - high availability
- - flexibility
- - ease of use
- - low latency
- - Keys
- - a key is a unique identifier for a particular piece of data.
- - The key is used to locate and retrieve the corresponding value in the database.
- - Characteristics of Values
- - a value is the data that is associated with a particular key.
- - In other words, it is the information that is stored in the database and can be retrieved using the corresponding key.
- Key-Value Database Terminology:
- - Key-Value Database Data Modeling Terms
- - relational schema: formal description of a table, blueprint of information needed to make a table
- - attribute domain: set of values an attribute may take
- - Key-Value Architecture Terms
- - Tables: structures that store information
- - candidate keys: when multiple attributes can serve as primary keys, we call these...
- - primary key: main identifier for a row in a table
- - foreign key: a key used to link multiple tables to one another
- - Key-Value Implementation Term (?)
- Designing for Key-Value Databases:
- - Key Design and Partitioning
- - following a naming convention
- - range-based components (date/int counter)
- - use a common delimiter
- - partitioning can be done by range or by hash
- - Designing Structured Values
- - common cases: attributes that are used together
- - store commonly used values in RAM, store logically linked info together
- - duplication of data can improve performance (denormalization)
- - Limitations of Key-Value Databases
- - lookups are only possible by key
- - range queries are not supported by default
- - no standard query language
- - Design Patterns for Key-Value Databases
- - TTL keys
- - keys that expire after some amount of time
- - Emulating Tables
- - implement get and set operations so attributes can be assigned/retrived
- - Aggregates
- - using a common table to store attributes of subtypes
- - Atomic Aggregates
- - all properties must be updated at the same time or not at all
- - Enumerable Keys
- - using counters/sequences to create keys
- PickleDB:
- import pickledb
- # Create a new PickleDB
- db = pickledb.load('example.db', True)
- # Add key-value attributes to the PickleDB
- db.set('name', 'John')
- db.set('age', 25)
- db.set('city', 'New York')
- # Update a key-value attribute in the PickleDB
- db.set('age', 26)
- # Delete a key-value attribute from the PickleDB
- db.rem('city')
- # Locate and display key-value attributes from the PickleDB
- name = db.get('name')
- age = db.get('age')
- city = db.get('city')
- print('Name:', name)
- print('Age:', age)
- print('City:', city)
- Document Databases:
- - What Is a Document?
- - A document is a self-contained data structure that contains all the information related to a specific object or entity.
- - The document can be thought of as a unit of storage, and it typically contains multiple fields or attributes that describe the properties of the object.
- - Avoid Explicit Schema Definitions
- - This allows your document database to be flexible and store blob data more easily
- - Basic Operations on Document Databases
- 1. Create: A new document can be created by inserting a new JSON or BSON object into the database.
- 2. Read: Documents can be retrieved from the database using various query operators that filter, sort, and limit the results.
- 3. Update: Documents can be updated by modifying one or more fields or attributes of the document.
- 4. Delete: Documents can be deleted from the database using a delete operation that specifies the criteria for selecting the documents to be deleted.
- 5. Query: Documents can be queried using a query language that supports filtering, sorting, and aggregation.
- 6. Indexing: Documents can be indexed for fast retrieval of data.
- 7. Transaction: Documents can be updated or deleted as part of a transaction, which ensures that all changes are either committed or rolled back together.
- Document Database Terminology:
- - Document and Collection Terms
- - document: a set of ordered key-value pairs
- - collection: group of related documents
- - embedded document: a document being stored within another document
- - polymorphic schema: documents within a collection have multiple different forms
- - schemaless: do not require specification step before adding document to a collection
- - Types of Partitions
- - vertical partitioning: within one server, breaking down columns in a relational table into multiple tables
- - horizonal partitioning, across multiple servers
- - partitioning algorithm: ranges, lists, or hash values
- - Data Modeling and Query Processing
- - deletion anomaly: when removing an entry removes a piece of data that was only found there
- - insertion anomaly: cannot insert partial information into table
- - update anomaly: when one fact changes and must be updated in multiple places
- - normalization means there are no modification anomalies
- - this can be done by joining tables together
- - query processor: takes input queries and data abt document collections and creates operations to retrieve that data
- Designing Document Databases:
- - Normalization, Denormalization, and the Search for Proper Balance
- - In a document database, normalization involves breaking down the data into separate collections or documents to avoid redundancy.
- - Instead of storing all the data in a single document, the data is split across multiple documents, and relationships between them are established using references or embedded documents.
- - Planning for Mutable Documents
- - allocate extra memory ahead of time to reduce the chance of needing to move and free document location
- - The Goldilocks Zone of Indexes
- - create a good number of indices to keep overhead low while maintaining read speed
- - Modeling Common Relations
- - one to many relationship (embed a document within another document)
- - many to many relationship (with two documents, embed document within the other)
- - heirarchies (contain a reference to the parent object within the child)
- - being able to create JSON Files / JSON Formatting
- {
- "name": "John Smith",
- "age": 35,
- "email": "[email protected]",
- "address": {
- "street": "123 Main St",
- "city": "Anytown",
- "state": "CA",
- "zip": "12345"
- },
- "phoneNumbers": [
- {
- "type": "home",
- "number": "555-1234"
- },
- {
- "type": "work",
- "number": "555-5678"
- }
- ]
- }
- Mongo DB and Python:
- import pymongo
- # Connect to your local MongoDB
- client = pymongo.MongoClient("mongodb://localhost:27017/")
- # Drop your document database (if it exists)
- client.drop_database("mydb")
- # Create your document database
- mydb = client["mydb"]
- # Create a collection in your document database
- mycol = mydb["mycollection"]
- # Insert items into your collection
- mydict1 = { "name": "John", "address": "Highway 37" }
- mydict2 = { "name": "Jane", "address": "Baker Street 221B" }
- mycol.insert_many([mydict1, mydict2])
- # Using find, display all items in your collection to the screen
- for x in mycol.find():
- print(x)
- in Python locate items in a Document DB and displays the results with limiting attributes, limiting results, sorting
- import pymongo
- # Connect to your local MongoDB
- client = pymongo.MongoClient("mongodb://localhost:27017/")
- # Retrieve a collection named "mycollection"
- mycol = client["mydb"]["mycollection"]
- # Define the query object with limiting attributes
- query = { "name": "John" }
- projection = { "name": 1, "age": 1 }
- # Perform the find query with limiting attributes, limiting results, and sorting
- result = mycol.find(query, projection).sort("age", pymongo.ASCENDING).limit(10)
- # Print the results to the screen
- for x in result:
- print(x)
- In this example, we define a query object with a limiting attribute "name": "John".
- We also define a projection object with limiting attributes "name": 1 and "age": 1.
- This will limit the results to only include the "name" and "age" fields.
- We then perform the find query with the limiting attributes, sort the results by the "age" field in ascending order using sort, and limit the results to a maximum of 10 documents using limit.
- In a single Python script :
- import pymongo
- # Connect to your local MongoDB
- client = pymongo.MongoClient("mongodb://localhost:27017/")
- # Using find, display all items in your collection to the screen
- mycol = client["mydb"]["mycollection"]
- for x in mycol.find():
- print(x)
- # Create a find that using an $lt
- query = { "age": { "$lt": 30 } }
- result = mycol.find(query)
- print(f"Documents where age < 30: {result.count()}")
- # Create a find that using an $gte
- query = { "age": { "$gte": 30 } }
- result = mycol.find(query)
- print(f"Documents where age >= 30: {result.count()}")
- # Create a find that using an $eq
- query = { "name": { "$eq": "John" } }
- result = mycol.find(query)
- print(f"Documents where name = 'John': {result.count()}")
- # Create a find that using an $ne
- query = { "name": { "$ne": "John" } }
- result = mycol.find(query)
- print(f"Documents where name != 'John': {result.count()}")
- # Create a find that using an $or
- query = { "$or": [ { "name": "John" }, { "age": { "$lt": 30 } } ] }
- result = mycol.find(query)
- print(f"Documents where name = 'John' or age < 30: {result.count()}")
- # Create a find that using an $and
- query = { "$and": [ { "name": "John" }, { "age": { "$lt": 30 } } ] }
- result = mycol.find(query)
- print(f"Documents where name = 'John' and age < 30: {result.count()}")
- # Create a find that using an $not
- query = { "name": { "$not": { "$eq": "John" } } }
- result = mycol.find(query)
- print(f"Documents where name != 'John': {result.count()}")
- # Create a find that using an $exists
- query = { "age": { "$exists": True } }
- result = mycol.find(query)
- print(f"Documents where age exists: {result.count()}")
- # Create a find using {item: null } null search
- query = { "item": None }
- result = mycol.find(query)
- print(f"Documents where item is null: {result.count()}")
- # Create a find using {item: {$exists : false} } null search
- query = { "item": { "$exists": False } }
- result = mycol.find(query)
- print(f"Documents where item does not exist: {result.count()}")
- # Create a find using {item: {$type : 10} } null search
- query = { "item": { "$type": 10 } }
- result = mycol.find(query)
- print(f"Documents where item is null or undefined: {result.count()}")
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement