Data Storage & Databases: Review Questions

Self-Study & Discussion

These questions cover the 9 sections of the Data Storage & Databases overview and are organized into 5 topic clusters. No answers are provided — the goal is to test your understanding of storage trade-offs, SQL fundamentals, database connectivity, SQL injection prevention, and choosing the right tool.

The questions mix conceptual understanding, technical reasoning, and practical decision-making.

Cluster 1: Storage Landscape & Flat Files (Sections 1–2)

The overview presents six storage options (in-memory, flat files, SQLite, PostgreSQL/MySQL, MongoDB, Redis). For each, name its primary strength and its most significant limitation. Why does the overview say "these storage options aren't rungs on a ladder — they're parallel choices"?
The overview divides storage into "no server required" (embedded/file-based) and "requires a server" (separate process). Explain the practical difference: what does it mean for SQLite to be "embedded" vs PostgreSQL requiring a server process? How does this affect deployment?
Compare flat-file storage (JSON) and SQLite across setup complexity, query capability, and concurrency. Both require no server — when would you choose one over the other? Why is SQLite described as a bridge between flat files and full databases?
Walk through the flat-file read-modify-write cycle: read entire file, parse JSON, modify array, write entire file back. What is the race condition problem, and how does it manifest when two HTTP requests arrive simultaneously?
The overview says "don't reach for PostgreSQL when a JSON file will do." Give two specific scenarios where a flat file is genuinely the right choice, and two where it would be a serious mistake. What are the warning signs that you've outgrown flat-file storage?

Cluster 2: Relational Databases & SQL (Section 3)

Explain what makes a database "relational" — not just "tables are related" but the mathematical concept of a relation. What are the five key features (tables, rows, foreign keys, constraints, ACID transactions) and what does each guarantee?
Using the stories table schema from the overview, explain each column's type and constraints. What does SERIAL PRIMARY KEY do? What is the difference between VARCHAR(200) NOT NULL and TEXT (nullable)?
Map the five CRUD operations to their SQL statement, HTTP method, and REST endpoint (from the overview's table). Why does this three-layer mapping (CRUD → SQL → REST) exist, and what does it tell you about the consistency of web application architecture?
Explain the RETURNING * clause in PostgreSQL. Why is it "extremely useful in REST APIs"? What would you need to do without it (in MySQL, for example) to achieve the same result after an INSERT?
Write the SQL to: (a) find all stories with priority 'high' and status 'todo', (b) update story #3 to status 'done', (c) delete all stories with status 'done'. For each, identify which CRUD operation and HTTP method it maps to.

Cluster 3: Connecting from Code (Section 4)

Compare the Node.js pg library and PHP's PDO across installation, connection setup, simple queries, parameterized queries, and result access. What does it mean that PDO is "built-in" while pg requires npm install?
Explain what a connection pool is (Node.js Pool from pg) and why it's needed. What happens if every HTTP request opens a new database connection and closes it when done? How does pooling solve this?
The overview shows Node.js using $1, $2, $3 placeholders and PHP using :id, :title named placeholders. What is the practical difference? Why might named placeholders be more readable for complex queries?
Both code examples use environment variables for database credentials (process.env.DB_PASSWORD / getenv('DB_PASSWORD')). Why is this critical? What happens if you hardcode a password and push to a public GitHub repository?
The overview shows try/catch error handling around every database call. Why is this important in a web application? What should the server return to the client when a database error occurs, and what should it NOT reveal?

Cluster 4: SQL Injection & Security (Section 5)

Explain the SQL injection attack step by step: what does the input ' OR '1'='1' -- do when concatenated into a WHERE clause? Why does '1'='1' always evaluate to true, and what does -- do?
The overview shows a more destructive injection: '; DROP TABLE stories; --. Walk through how the database processes this as multiple statements. Why is this possible with string concatenation but impossible with parameterized queries?
Explain the two-step PREPARE/EXECUTE mechanism of parameterized queries. Why can't malicious input change the query structure in Step 2 when the structure was locked in Step 1?
The overview says SQL injection is "the #1 web vulnerability" (OWASP Top 10). Beyond data theft, list four other consequences of SQL injection mentioned or implied in the overview. Why is this vulnerability so common despite having a simple, known fix?
Compare the "WRONG" and "RIGHT" code examples for both Node.js and PHP from the overview. In the parameterized version, when an attacker sends 1; DROP TABLE stories as the id parameter, what does the database see and why does no damage occur?

Cluster 5: NoSQL, ORMs & Choosing Storage (Sections 6–8)

Map SQL concepts to MongoDB equivalents: Database, Table, Row, Column, SELECT, INSERT, UPDATE, DELETE, JOIN. Why does MongoDB use $lookup for joins instead of a native JOIN keyword? What does this tell you about MongoDB's design philosophy?
The overview gives five criteria for choosing SQL vs NoSQL. Apply these to two concrete scenarios: (a) a banking application, and (b) a content management system where article structures vary wildly. Which would you choose for each and why?
Explain what an ORM does: mapping tables to classes, generating SQL, handling relationships, managing migrations, abstracting database differences. Using the overview's comparison table, show how Story.findAll() replaces SELECT * FROM stories. What is lost in this abstraction?
Compare the Active Record pattern (Sequelize, Eloquent) and Data Mapper pattern (Prisma, Doctrine) from the overview. In Active Record, story.save() writes directly to the database — why does this make testing harder? How does Data Mapper solve this?
Using the overview's decision framework table, recommend storage for: (a) a course tutorial project, (b) a production e-commerce site, (c) a user session store, (d) an analytics ingestion pipeline. Explain each choice and why the overview's advice against "over-engineering for tutorials" matters.