Data flow, validation, and tests

Table of Contents

Overview
Data flow and validation
Implications for tests

Status: under discussion.

Overview

While refactoring the code that loads a Book and its Page objects, I started writing a roundtripping test from Page to a markdown string and back. The idea is that we don't lose any of the Page's data in the process of writing it out to disk, and reading it back.

The program can probably make some assumptions about the validity of the markdown data on disk:

That data is only created by the program; it's not random text from the outside world.
We can control exactly what we write, and how we read it back.

Still, it would be nice to have some resiliency/robustness in the face of malformed markdown+frontmatter, or invalid data in the frontmatter's fields. This will come about gradually as we remove .unwrap() and integrate error propagation.

In the process of making the roundtripping test work, I started to have a more clear idea of how data flow should happen in the program.

Data flow and validation

I think it would be beneficial to have this kind of separation for the data flow:

First, route handlers obtain the request data as mostly stringly-typed structs right now, for example

pub struct PageEdit {
    pub id: String,
    pub uuid: String,
    pub current_title: String,
    pub title: String,
    pub content: String,
    pub layout: String,
    pub tags: String,
    pub slug: String,
    pub menu: String,
    pub published: String,
    pub featured: String,
    pub image_list: String,
    pub file_list: String,
}

I'd like to see if we can gradually turn those into actual types, for example to see if we can make the front-end send us stuff so that tags and image_listdeserialize as Vec<T> instead of getting comma-separated strings. Maybe this involves the front-end building a JSON array; I don't know.

Then, this request data gets validated into "real" types. For example, have a Tag type that is only constructible when one has a well-formed tag name (not empty, without forbidden characters, with normalized capitalization, etc.). Similiarly, a Slug type that would have different constraints that pertain to valid file path components. Let's call these types with values for "request parameters".
The actual work gets done internally based on validated "request parameters". This is not to have a bureaucratic organization of the code, but rather to enable edge-free programming. Summary: operate only on pre-validated data, so you need minimal error handling and special-casing during computations.
When we have to send back a response, separate the "response data" from the "internal data". For example, right now the front-end only looks at the clean_created date from a Page, which is a human-readable version of the actual data that the program cares about, which is in Page.raw_created. With this scheme, Page would lose the clean_created field, and maybe a PageResponse struct would have a human_readable_created_date field.

Implications for tests

I think a scheme above would let us have a very clean separation for unit tests:

The validation step is obvious: test that stringly-typed data from the request gets validated properly, and the validation process either returns Ok(valid_data) or Err(some_error). An error lets us know whether there is a bug in the front-end (is it sending invalid data?), or whether to present that error nicely to the user (that's not a valid tag name; pick another).
We can have mini-integration tests from known-valid data to the expected results.
We can test that response data is rendered correctly from the internal data.