Module Code: |
DATA C9001 |
Full Title
|
Data Architecture
|
Valid From: |
Semester 1 - 2019/20 ( June 2019 ) |
Language of Instruction: | English |
Module Author |
Peadar Grant
|
Module Description: |
Students are familiarised with data and its storage within varied IT environments including cloud, onsite and legacy systems. A practical problem-based approach to relational, non-relational and allied data storage technologies is followed. Student analysts will interact with a wide variety of contemporary technologies and will specify suitable data storage systems for varied application domains.
|
Learning Outcomes |
On successful completion of this module the learner will be able to: |
# |
Learning Outcome Description |
LO1 |
Utilise industry-standard database systems for analytics workloads. |
LO2 |
Design data storage components based on industry standard relational and non-relational databases. |
LO3 |
Optimise storage and query performance for various database types |
LO4 |
Construct appropriate interfacing for near-realtime heterogeneous data stores |
LO5 |
Develop data architecture to store and process unstructured data in varied formats |
LO6 |
Design suitable hardware and software solutions for data storage requirements in analytics-centric projects |
Pre-requisite learning |
Module Recommendations
This is prior learning (or a practical skill) that is strongly recommended before enrolment in this module. You may enrol in this module if you have not acquired the recommended learning but you will have considerable difficulty in passing (i.e. achieving the learning outcomes of) the module. While the prior learning is expressed as named DkIT module(s) it also allows for learning (in another module or modules) which is equivalent to the learning specified in the named module(s).
|
No recommendations listed |
Incompatible Modules
These are modules which have learning outcomes that are too similar to the learning outcomes of this module. You may not earn additional credit for the same learning and therefore you may not enrol in this module if you have successfully completed any modules in the incompatible list.
|
No incompatible modules listed |
Co-requisite Modules
Students must also take, or have successfully taken, modules listed as co-requisites in order to enrol for this module.
|
No Co-requisite modules listed |
Indicative Content |
Data
Types of data: structured, semi-structured & unstructured data; files, streams and databases; four Vs of data; contemporary global data trends; modelling considerations; acquisition; storage and retrieval patterns; distributing; scaling; common file and stream data formats; compression.
|
IT environment
Analytics and transaction processing requirements; client/server data access patterns; analyst-client environment trends; shared file systems; server-centric database storage; mainframe data integration; storage devices; storage concepts (DAS/NAS/SAN); data centre, cloud and hybrid-cloud environments; object storage systems.
|
Relational databases
RDBMS system overview [PostgreSQL]; Application domains; tabular data (1-N-F); data types; data manipulation and querying using SQL; views; application query API; multi-table JOINS; foreign-key relationships; E-R modelling; geospatial data handling; user-defined functions; aggregate queries; transactions; ACID properties; replication; sharding; CAP theorem; RDBMS limitations.
|
Performance optimisation
Goals of optimisation; query planner and explanation; use of indices; materialised views; caching systems [Redis].
|
Non-relational databases
NoSQL characteristics; concept of BASE; implicit/explicit schema; problem-based practical application of range of non-relational database solutions to domain-specific data: document stores [MongoDB], key/value stores [Riak], column stores [Cassandra], graph databases [Neo4J], LDAP directories [Active Directory]; design considerations; ad-hoc and programmatic querying; non-relational facilities within RDBMS systems; RDBMS integration; clustering.
|
Unstructured data
Challenges of unstructured data; key application areas; large-file storage solutions; Role of Full-text searching; ETL of file-based data; rich-format data challenges [PDF, DOCX]; RDBMS-based full text search capabilities and limitations; full-text search engines; integration with RDBMS and Document store systems.
|
Module Content & Assessment
|
Assessment Breakdown | % |
Course Work | 100.00% |
AssessmentsFull Time
No End of Module Formal Examination |
Part Time
No End of Module Formal Examination |
Reassessment Requirement |
No repeat examination
Reassessment of this module will be offered solely on the basis of coursework and a repeat examination will not be offered.
|
Reassessment Description Reassessment will consist of one design & implementation project covering and one class test covering both semesters' work.
|
DKIT reserves the right to alter the nature and timings of assessment
Module Workload & Resources
Workload: Full Time |
Workload Type |
Contact Type |
Workload Description |
Frequency |
Average Weekly Learner Workload |
Hours |
Practical |
Contact |
Practical lab session |
Every Week |
3.00 |
3 |
Independent Study |
Non Contact |
Practice with technologies studied in class |
Every Week |
4.00 |
4 |
Directed Reading |
Non Contact |
Lecturer-recommended supporting texts |
Every Week |
1.00 |
1 |
Total Weekly Learner Workload |
8.00 |
Total Weekly Contact Hours |
3.00 |
Workload: Part Time |
Workload Type |
Contact Type |
Workload Description |
Frequency |
Average Weekly Learner Workload |
Hours |
Practical |
Contact |
Practical lab session |
Every Week |
3.00 |
3 |
Independent Study |
Non Contact |
Practice with technologies studied in class |
Every Week |
4.00 |
4 |
Directed Reading |
Non Contact |
Lecturer-recommended supporting texts |
Every Week |
1.00 |
1 |
Total Weekly Learner Workload |
8.00 |
Total Weekly Contact Hours |
3.00 |
Resources
|
Recommended Book Resources |
---|
-
Connolly, Thomas & Begg, Carolyn. (2015), Database Systems, 6th. Addison Wesley.
-
Pramod J. Sadalage and Martin Fowler. (2012), NoSQL Distilled, Addison Wesley.
-
Luc Perkins, Eric Redmond, Jim Wilson. (2018), Seven Databases in Seven Weeks, 2nd.
| This module does not have any article/paper resources |
---|
Other Resources |
---|
-
[Online manual], PostgreSQL 11 reference manual,
-
[Online manual], Riak database manual,
-
[Online manual], Redis documentation,
-
[Online manual], MongoDB manual,
-
[Online manual], Neo4J documentation,
-
[Online manual], Cassandra documentation,
|
|