TDD experience part 2

By akash_rawal, created: 2023-04-01, last modified: 2023-11-18

I have been using TDD for 3 months now. How are things coming along?

Accidentally testing a private API

In my last article, I talked about using Test Driven Development while writing code for this site.

So I wanted to measure how well this blog is performing (speed and efficiency stuff, not popularity). The idea is simple,

The server maintains atomic integers representing performance counters.
I insert code at measurement points to increment the performance counters.
Every 5 minutes, the server takes a snapshot of the performance counters, and reset the counters to 0.
The server stores the snapshot into the database

Measurement code (runs every request)
   |
   |  Atomic increment  
   V
Performance counters
   | 
   |  Atomic replace with 0
   V
Sampler (runs every 5 minutes in an hour on wall clock, beginning at 00)
   |
   |  INSERT INTO pm_counters VALUES (...);
   V
Database

One of the stated benefits of TDD is that it helps you improve the design of your code. I did not talk about code, because the problem was trivial. I already had design in mind, so I would write the same code regardless of the method used.

This time, it is different. I was going to write sampler code, and I had no design in mind. The code needs to trigger at multiples of 5 minutes on a wall clock, so a simple 5 minute sleep loop isn't gonna cut it.

Ah yes, a clock. Abstracting one would be interesting (read: difficult). Oh I know, why don't I just tell the implementation what the time is, and the implementation should return the time when to trigger it again.

This is the first attempt at the tests:

#[cfg(test)]
mod test {
...

    #[test]
    fn next_trigger_from_arbitrary() {
        let mut sampler = init(rfc3339utc("2000-01-01T11:59:17")); //< We tell it the current time
        sampler.trigger();
        assert_eq!(sampler.trig_time, rfc3339utc("2000-01-01T12:00:00")); //< We check the next trigger time
        assert_eq!(sampler.storage.len(), 0);
    }

    #[test]
    fn next_trigger_from_trigger() {
        let mut sampler = init(rfc3339utc("2000-01-01T12:00:00"));
        sampler.trigger();
        assert_eq!(sampler.trig_time, rfc3339utc("2000-01-01T12:05:00"));
        assert_eq!(sampler.storage.len(), 1);
    }
}

This is the code that passes the tests:

pub struct Sampler {
    trig_time: DateTime<Utc>, //< Next wakeup time
    duration_s: u32, //< Duration between triggering, in seconds (upto 3600)
    counters: Arc<Counters>,
    storage: Vec<Row>, 
}

impl Sampler {
    pub fn trigger(&mut self) {
        let time = self.trig_time.time();
        let hour = time.hour();
        let minute = time.minute();
        let second = time.second();

        //Only retain accuracy upto seconds. 
        let round_time = self.trig_time.date_naive()
            .and_hms_opt(hour, minute, second)
            .expect("Unable to round off sub-second part");
        self.trig_time = DateTime::from_utc(round_time, Utc);

        let seconds_in_hour = second + (minute * 60);
        let extra_seconds = seconds_in_hour % self.duration_s; //5min

        if extra_seconds == 0 {
            //Sample statistics
            let stats = self.counters.take();
            self.storage.push(Row {
                end_time: self.trig_time, 
                counters: stats,
            });
        }

        //Save the next trigger time. 
        self.trig_time = self.trig_time
            .add(Duration::seconds((self.duration_s - extra_seconds) as i64));
    }
}

And this is the untested function I ended up writing to drive it.

pub async fn sampler_task(
    counters: Arc<Counters>, 
    duration_s: u32,
    retain_days: u32,
    pool: PgPool
) {
    let mut sampler = Sampler {
        trig_time: Utc::now(),
        counters,
        duration_s,
        storage: Vec::new(),
    };

    sampler.trigger();
    loop {
        //Cleanup
        let retain_time = Utc::now() - Duration::days(retain_days as i64);
        if let Err(e) = model::cleanup(&pool, retain_time).await {
            log::warn!("Unable to perform cleanup: {}", e);
        }

        //sleep till the trigger time
        let duration = match (sampler.trig_time - Utc::now()).to_std() {
            Ok(value) => value,
            Err(_) => {
                log::warn!("Lag? trigger time is {}", sampler.trig_time);
                sampler.trig_time = Utc::now();
                sampler.trigger();
                (sampler.trig_time - Utc::now()).to_std()
                    .expect("Unable to figure out sleep time")
            }
        };

        tokio::time::sleep(duration).await;
        sampler.trigger();

        //Insert into database
        for row in sampler.storage.drain(0..) {
            if let Err(e) = model::insert(&pool, &row).await {
                log::warn!("Unable to insert into database: {}", e);
            }
        }
    }
}

What? Code under test is smaller than the untested driver code. I ended up only calling the untested function from outside the module; thus all the code I wrote tests against is effectively private API.

When I initially read about TDD, I thought TDD makes it hard to build bad designs. This is just a quick and easy easy screw-up.

The large, untested function bothers me additionally. Ideally, we'd have integration tests to guard code like this, but let's be real... we deal with legacy code where majority of code has dodgy unit tests, if they exist at all. We'd be lucky to have a working CI setup that can run integration tests. It would be nicer if our unit tests could test more, even if ideally it shouldn't.

I made a note to myself, sometime later I'll try again from scratch.

Further progress

Since past few months, I have been writing as much code as I can using TDD.

At my day job, I was urgently asked to write a small service in Go. I did not know Go at that time. Multiple interfaces I had to use were not determined at that time, and solution to the problem it needed to solve was also not known. I cannot go into the details, but imagine a farmer handed you a buffalo one morning and asked you to take it to a faraway place. As soon as the farmer left, the buffalo starts acting erratically. You try out everything you know; you ask your friends, but the buffalo just wouldn't do what you want it to do. Eventually the farmer himself arrived, but he too cannot control the buffalo. It is an absolute chaos.

Over one month, the service code changed so much, there was almost nothing left in common compared to my initial implementation done in the first week. But throughout the time, I was able to drastically change behavior in less than a day... which I had to do at-least a dozen times.

I have made a few notes,

TDD makes it significantly easier to learn new programming languages. It helped me quite a bit while learning Go and enabled me to start writing production code under a short notice. I think continuing to practice TDD is worth it for this benefit alone.
Hardest part of TDD is starting to write a new code while also not having any reference for how its interface should look like. Conversely, modifying an existing module with existing tests is the easiest.
When I write a new module, initially its quality is the poorest. The more I modify the module, the easier it becomes to change. Without TDD, I have to rely on experience to ensure that code remains easy to change throughout all modifications.
The signs that TDD sends you about your design is not always in the form of tests being hard to write.
Tests written after writing code have 70-80% coverage. With TDD, code has almost 100% coverage, all without trying to chase coverage.
Code written with TDD is initially more complex than needed. The additional complexity is only justified when adding more functionality and then TDD takes the lead. TDD does not make sense for temporary/disposable code.
We do not manually call the functions that represent the test cases, so these function names are as good as comments. We can go nuts on naming the tests, and we probably should. It is possible to speed up troubleshooting by naming the tests properly, though I have yet to learn how to properly take advantage of this. This leads neatly to the next note.
A significant reduction in printf-debugging. The better the failing test point towards where I messed up, the less likely I am to add log statements, use a debugger, etc. Before using TDD, my tests routinely had logging statements, now they don't.
Stress reduction. Those moments where you put all your brains into figuring out how to write this code are all gone, now I develop each module incrementally, adding one requirement fragment at a time.

Attempt 2 at performance monitoring

With some experience gained, I set out to write the performance monitoring code again.

pub trait Clock {
    fn now(&self) -> DateTime<Utc>;
    fn sleep(&mut self, duration: Duration) -> BoxFuture<'_, ()>;
}

pub trait Sampler {
    fn reset(&mut self) -> BoxFuture<'_, ()>;
    fn sample(&mut self, end_time: DateTime<Utc>) -> BoxFuture<'_, ()>;
}

pub async fn sampling_task(
    settings: Settings,
    mut clock: impl Clock,  //< Sleep and getting current time is handled via a trait.
    mut sampler: impl Sampler, //< It turns out that the code relevant for
                               //< timing does not actually need to handle
                               //< the counters directly.
) {
    let interval = Duration::seconds(settings.interval_s as i64);
    let tolerance = Duration::seconds(settings.tolerance_s as i64);

    let mut target = Utc.timestamp_opt(0, 0).unwrap();
    let mut store_row = false;
    loop {
        let now = clock.now();
        let diff = target - now;
        if diff < -tolerance || diff > (interval + tolerance) {
            store_row = false;
            target = next_target(now, settings.interval_s);
        } else if diff > tolerance {
            clock.sleep(diff).await;
        } else {
            if store_row {
                sampler.sample(target).await;
            } else {
                sampler.reset().await;
            }
            store_row = true;

            target += interval;
        }
    }
}

fn next_target(time: DateTime<Utc>, interval_s: u32) -> DateTime<Utc> {
    let hour = time.hour();
    let minute = time.minute();
    let second = time.second();

    let seconds_in_hour = second + (minute * 60);
    let extra_seconds = seconds_in_hour % interval_s; //5min
    let sleep_time = interval_s - extra_seconds;

    DateTime::from_utc(
        time.date_naive()
            .and_hms_opt(hour, minute, second)
            .expect("Unable to round off sub-second part"),
        Utc
    ) + Duration::seconds(sleep_time as i64)
}

Turns out that the sampler does not need to handle the atomic integers directly either.

pub trait Delegate: 'static + Send {
    fn reset(&mut self);
    fn sample(&mut self, stats: &mut Statistics);
}

...
impl<ErrorHandler> Sampler for SamplerImpl<ErrorHandler>
where ErrorHandler: Fn(sqlx::Error) + Send {
    fn reset(&mut self) -> BoxFuture<'_, ()> {
        Box::pin(async move {
            for x in &mut self.delegates {
                x.reset();
            }
        })
    }

    fn sample(&mut self, end_time: DateTime<Utc>) -> BoxFuture<'_, ()> {
        Box::pin(async move {
            let mut stats = Statistics::default();
            for x in &mut self.delegates {
                x.sample(&mut stats);
            }
            let row = Row {
                end_time,
                counters: stats,
            };
            if let Err(e) = model::insert(&self.pool, &row).await {
                (self.error_handler)(e);
            }
        })
    }
}

Such a design allowed for more sources of performance data. I could use anything that fit the Delegate trait. e.g I added some statistics from /proc

impl Delegate for Procfs {
    fn sample(&mut self, stats: &mut Statistics) {
        let res : Result<(), String> = (|| {
            let proc_self_stat = procfs::process::Process::myself()
                .map_err(|e| format!("Cannot read /proc/self: {}", e))?
                .stat()
                .map_err(|e| format!("Cannot read /proc/self/stat: {}", e))?;

            let load_average = procfs::LoadAverage::new()
                .map_err(|e| format!("Cannot read /proc/loadavg: {}", e))?;

            stats.rss = Some(((proc_self_stat.rss * procfs::page_size())
                              /1024) as i64);
            stats.load_avg = Some((load_average.five * 1000.0) as i64);
            stats.utime = Some((((proc_self_stat.utime - self.cur_utime) * 1000)
                               / procfs::ticks_per_second()) as i64);
            stats.stime = Some((((proc_self_stat.stime - self.cur_stime) * 1000)
                               / procfs::ticks_per_second()) as i64);

            self.cur_utime = proc_self_stat.utime;
            self.cur_stime = proc_self_stat.stime;
            Ok(())
        })();
        if let Err(msg) = res {
            log::warn!("{}", msg);
        }
    }
    fn reset(&mut self) {
        self.sample(&mut Statistics::default());
    }
}