Untitled

(ii)	Database choice:
One good database solution for NASA Exoplanet Archive would be Cassandra. It’s a great database for data that has a lot of repeated or empty values. Also the Exoplanets are stored as semi-structured data, which fits great the purpose of this solution

Detailed Explanation of how data will be stored:
Because of the huge repetition in the methods of discovery, the keyspace will be separated into families according to the method of discovery. Every family then will be split in sub-families, each one of them according to the number of planets in system. After that every exoplanet will be a single row, each with its own attributes (columns)

Benefits from this solution
Since there are a few columns that are empty on almost every planet (Planet Radius, Planet Density). The way Cassandra stores its values is a number of columns for each row. Also rows can be split into families, which will reduce the number of repeated values (Discovery method, Number of planets in the system etc.) In terms scalability Cassandra is also a good fit. It will make adding information about a new planets really easy. The size of the data wouldn’t be a problem for Cassandra too.

The QoS [8]
Availability: There are 3 key points that are related to the high availability of Cassandra. First of them is its structure – instead of using the Master-Slave model, Cassandra clusters in ring-fashion model. Because of that there is no single point of failure. The second point is the multi-data center – Cassandra is available to spread across multiple data centers, as it supports both virtual and physical DC. The third important point is Replication. Cassandra provides built in customizable replication, which stores an amount of copies spread across nodes in Cassandra ring.

Scalability: Cassandra’s scalability is linear – which means that it size can be increased simply as adding new nodes. Cassandra can scale both horizontally (adding more datacenters) or vertically (adding more nodes).

Performance: Usually NoSQL databases come with a number of well-known practices to increase their performance. Since Cassandra is fully NoSQL it has all the performance advantages of other NoSQL databases. Actually Cassandra supports all of the NoSQL best practices which are as follow: Fully Distributed, Asynchronous, Eventual Consistency.

Maintainability: Cassandra’s eventual consistency feature makes it really easy to recover failed nodes and also this also provides really smooth upgrades. With the help of tools Cassandra can be backed up and restores really easy.