Misconceptions your team might have during The Big Rewrite

2020-06-03

Disclaimer: I enjoy the project I am working on and this is still a work in progress. I just had to rant about the stuff I go through in my job here, but it does not reflect the opinions of my emplorer, and my personal opinion is despite these troubles we are coming along nicely

I joined a team that was doing the big rewrite in 2018. I was involved in the project before then and knew it's ins and outs, and frankly think it's still a great system. In order to break it's "limitations" a grand v2 gets started. I think my team has been good. My tech lead is really good at architecture. Where I really resist kind of "writing new architecture that is not already there", he can pull up entirely new concepts and abstractions that are all pretty good. Myself, I don't much enjoy writing "new architecture" if there is something already there that I can use, and I'll try to refer to the existence of an existing thing instead of creating new exotic stuff.

Now, what happened during the big rewrite so far. 4 people on the team, 2 years in

#Persistent confusion about sources of slowness in our app

claim: "it's only slow because devtools is open"
cold water: maybe it is! but this is definitely a red herring. the code should work with devtools open. reason that's been stated: devtools adds a "bunch of instrumentation to the promises that slows it down"...stated without any evidence during a 3 hour long planning call...
claim: "it's only slow because we're using a development build of react, try a production build"
cold water: the production build makes some stuff faster, but it is NOT going to save your butt if you are constantly rerending all your components unnecessarily every millisecond during user scroll, which is something we suffered from, and it creeps back in if you are not careful because you can't write tests against this so often one day I'll be looking at my devtools and suddenly things are rendering twice per frame (signature of calling an unnecessary setState), tons of unnecessary components rendering in every frame (signature of componentShouldUpdate/bad functional react memoizing, etc))
claim: "it's slow because we are hogging the main thread all the time"
cold water: in our app, we made a very complex webworker framework. now main thread contention is a concern, but really our app needs to just be performant all around, webworkers just offloads that cpu spinning to another core. what we have done in v2 is we went whole hog and made our code rely on OffscreenCanvas which 0 browsers support. also, our webworker bundles (worker-loader webpack build) are huge webpack things that pretty much contain all the code that is on the main thread so it's just massive. that makes it slow at loading time, and makes it harder to think about our worker threads in a lighter-weight way, and the worker concept is now very deeply entrenched in a lot of the code (all code has to think of things in terms of rpc calls)
claim: "it's slow because there are processes that haven't been aborted spinning in the background, so we must build out an intensive AbortController thing that touches the entirety of all our code including sending abort signals across the RPC boundary in hopes that a locked up webworker will respond to this"
cold water: our first version of the software had zero aborting, did not from my perspective suffer. arguments with the team have gotten accusatory where I just claim that there is no evidence that the aborting is helping us, pointing to the fact that our old code works fine, and that if our new code suffers without aborting, that means something else is wrong. I have not really been given a proper response for this, and so the curse of passing AbortSignals onto every function via an extra function parameter drags on
claim: "it's slow because we are not multithreading"
cold water: so we try multi-threading, but this repeatedly downloads the same data twice into different webworkers, and parses it separately, which leads to more resource spent, more network IO, more slowness...

#Persistent confusion about what our users needs are

claim: there should be per-track scroll bars
cols water: doing this leads to many scrolls within-scrolls on the page, which makes it very hard to scroll the page)
claim: the old search indexing system is "bad".
cold water: yes it is a bit slow but is it really THE critical problem we face? likely not: bioinformatics people run a data pipeline, it takes a couple days, so what. use elasticsearch if it sucks so bad
claim: our users are "stupid" so they need to have every single thing GUI editable. cold water: interesting endeavor, but our design for this has been difficult, and has not yet delivered on simplifying the system for users, instead, the config editing GUI is monstrously complex
claim: our users "do not like modal popups"...so we design everything into a tiny sidedrawer that barely can contain the relevant data. cold water: now everything is in a tiny side drawer...a major constraint on user interface design
claim: having interest in catering to obscure or not very clear "user stories" like displaying the same exact region twice on the screen at once saying "someone will want to do this". cold water: this causes a lot of extra logical weirdness in the app that has unclear benefit in the long run

#On-going misdirections

problem: not catering to emerging areas of user needs such as breaking our large app into components that can be re-used, and instead just going full hog on a large monolith project and treating our monolith as a giant hammer that will solve everyones problems, when in reality, our users are also programmers that could benefit from using smaller componentized versions of our code
there is confusion about "what our competitors have". my team one day claimed "alright so if we just do that and then we will have everything product X has?" and I just had to be clear and be like, no! the X has a really complex system of their own that we could never hope to replicate. we likely don't have even 20% of X. and we are slower and not as scalable as X. we do have our own strengths that make us compelling still
etc...

but what does all this imply?

there are persistent confusion about what the challenges we face are, what the architectural needs are, what our user stores are, what our new v2 design goals are, and more. It's really crazy