Some "Thoughts" on Coding « damnever's blog

When I was in school, I felt that having learned enough rules, laws, and problem-solving templates, I always lacked interest in these things. Therefore, when I first started writing code and encountered the MVC pattern, I felt that MVC and the like were not good practices after trying a few times. Why should one make things so complicated when building a small website? Adding a small feature requires changes everywhere, isn’t it tiring? Later, when I self-studied Python, I was also keen on various dark magic and tricks, until I saw The Zen of Python and translations by various experts, I vaguely felt a hint of the flavor. Speaking of which, I am also a person who likes to pay attention to details, so I tried to translate PEP 8 – Style Guide for Python Code and apply these detailed rules in the process of writing code. Later, I also received positive feedback, so when I learn a new language and tool, I always first look at related style guides and documents.

After reading many style guides and the like (of course, more often relying on tools), I tried to write “efficient and aesthetically pleasing” code in the process of coding. But later, when I started to be responsible for a larger module in my work and started a project from scratch, constantly refactoring, adding new features, and reading code, I deeply felt that these were not enough. I had a feeling that I was overemphasizing the local details and neglecting the relationship between them.

Next, I’m going to switch concepts and nonsense. In junior high school, the teacher of ideological and moral character or social science class once said a sentence that is still fresh in my memory and is relatively absolute:

In judgment questions, any absolute statement is wrong.

Splitting

I think splitting, on a small scale, is about clarifying the relationship between various data structures, and on a large scale, it’s about software architecture. The most important thing is to clarify the relationship and define boundaries.

Horizontal Splitting

Many patterns and routines are actually talking about horizontal splitting, such as MVC. The advantage of horizontal layering is that the specific implementation of each layer can be flexibly replaced. Therefore, I think horizontal splitting is the most challenging for abstraction ability. After the whole software is horizontally layered and responsibilities are separated, the dependency on the interfaces between each layer is inevitable. The quality of interface abstraction design is the key. If it is not done well, it will lead to serious coupling and quarrels, and it will not be able to flexibly use different implementations to replace different horizontal layers. For projects with relatively single application scenarios, it seems unnecessary to design on the horizontal layering, but in some cases, layering can make the program logic and data flow clearer and can reduce coupling to a certain extent.

Take the most common service client SDK as an example, under normal circumstances, it is divided into the user interface layer, logic layer, protocol layer, and transport layer. Of course, in most cases, we only need to do the user interface layer and logic layer, the lower layer is directly shielded by HTTP, GRPC, and other protocol toolkits, and there is no need for splitting. In more complex and special scenarios, HTTP, GRPC, and other protocols are just an option for the transport layer, and TCP or even UDP will be used as the transport layer. In this case, complex encoding and decoding often involve mapping byte streams to data structures in memory (protocol layer). If the logic layer, protocol layer, and transport layer are mixed together, as follows:

type Request struct {
    X int32
    Y int32
    Z int32
}

type Response struct {
    R int32
}

type Client struct {
    conn *TCPConn
}

func (c *Client) Call(req Request) (Response, error) {
    // a lot of logic ...

    c.conn.Write([]byte("->"))
    c.conn.Write(itob(req.X)) // itob: int32 -> bytes (big endian)
    // c.conn.Write([]byte(","))
    c.conn.Write(itob(req.Y))
    // c.conn.Write([]byte(","))
    c.conn.Write(itob(req.Z))
    c.conn.Write([]byte("#"))

    resp := Response{}
    buf := [4]byte{}
    c.conn.Read(buf[:2])
    if string(buf[:2]) != "<-" {
        return resp, fmt.Errorf("xxx")
    }
    c.conn.Read(buf[:4])
    resp.R = btoi(buf[:4]) // btoi: bytes -> int32 (big endian)
    n := c.conn.Read(buf[:1])
    if string(buf[:1]) != "#" {
        return resp, fmt.Errorf("yyy")
    }

    // a lot of logic ...
    return resp, nil
}

If we want to change the encoding and decoding method of Request/Response, we have to rewrite a lot of Write/Read in a lot of logic. So we split the code related to Write/Read out, so that the changes are separated from other logic. But if different Write/Read methods are applicable to different scenarios (performance/resource utilization rate) coexist, we might do this:

func (c *Client) CallV1(req Request) (Response, error) {
    return c.call(req, false)
}

func (c *Client) CallV2(req Request) (Response, error) {
    return c.call(req, true)
}

func (c *Client) call(req Request, useV2 bool) (Response, error) {
    // a lot of common logic ...
    var resp Response
    if useV2 {
        err := c.writeRequestV2(req)
        err = c.readResponseV2(&resp)
    } else {
        err := c.writeRequestV1(req)
        err = c.readResponseV1(&resp)
    }
    // a lot of common logic ...
}

func (c *Client) writeRequestV1(req Request) error {
    // V1...
}

func (c *Client) writeRequestV2(req Request) {
    // V2...
}

func (c *Client) readResponseV1(resp *Response) error {
    // V1...
}

func (c *Client) readResponseV2(resp *Response) {
    // V2...
}

Later, we might end up like this (where WriteRequestVn/ReadResponseVn also depend on each other some common reusable methods):

func (c *Client) call(req Request, version Version) (Response, error) {
    // a lot of common logic ...
    var resp Response
    switch version {
    case V1:
    case V2:
    case V3:
    case Vn:
    }
    // a lot of common logic ...
}

func (c *Client) writeRequestVn(req Request) {
    // Vn...
    c.writeCommonFromV1()
    c.writeCommonFromV5()
    // ...
}

func (c *Client) readResponseVn(resp *Response) error {
    // Vn...
    c.readCommonFromV3()
    // ...
}

In fact, a better way is to abstract WriteRequest/ReadResponse and combine different versions:

type Protocol interface {
    WriteRequest(Request) error
    ReadResponse(*Response) error
}

type Client struct {
    proto Protocol
}

func NewClient(proto Protocol) *Client {}

func (c *Client) Call(req Request) (Response, error) {
    // a lot of logic ...
    err := c.proto.WriteRequest(req)
    var resp Response
    err = c.proto.readResponse(&resp)
    // a lot of logic ...
}

// protocol/v1/protocol.go: Implementation of V1
type protocolV1 struct {}

// protocol/vn/protocol.go: Implementation of Vn
type protocolVn struct {}

In this way, we only need to use a different protocol implementation NewClient(NewProtocolV3()) , the common encoding and decoding methods between protocols can be further abstracted without direct interdependencies, which is more cohesive. Then, if we want to replace the implementation of TCPConn or add a layer of Buffer on it, we are happy to find that Golang’s net.Conn is an abstract that can be used directly. However, although the standard library is stable enough, it may still have some methods we don’t need, and we may not use the network stack to send and receive data, so this semantics is a bit wrong in some scenarios. Declare another abstract as follows:

type Transport interface {
    Write([]byte) (int, error)
    Read([]byte) (int, error)
    Close() error
}

So, in the end, we will have the following usage:

NewClient( NewProtocolV1( NewTransportV5() ) )

Layered abstraction shields the implementation details of the lower layer and has good replaceability. Most software projects use the idea of layering, such as network protocol stack, etc.

Vertical Splitting

There seems to be not much to say about vertical splitting, probably because it is inherently intuitive and not so noticeable? Awkward.

The high-level perspective of vertical splitting should be more related to business scenarios. It requires a deep understanding of business scenarios. After splitting, each unit can have the greatest independence, which is more conducive to cooperation between multiple people or different teams. The previous horizontal splitting is conducive to friendly cooperation between multiple people only when there are good norms and definitions for the interface abstraction at each layer.

Components/Modules

I am often obsessed with the idea that any function module can be used as a reusable library, and finally, at a high level, it is combined into a complete software. And a reusable library must have a high degree of cohesion, in which transparency and control are the most important.

Number of Source File Changes

Due to my limited attention and brain capacity, when writing code, I directly and bluntly believe that the fewer the number of files involved in a change, the higher the cohesion of the function module (excluding refactoring and new files, especially the number of files outside the function module). For example, many open source projects like to write code like this:

// handlers.go
func registerHandlers() {
    router.GET("/users/:id", handleUserGET)
    router.POST("/users", handleUserPOST)
    // Hundreds of lines of similar code
    router.GET("/posts/:id", handlePostGET)
    router.POST("/posts", handlePostPOST)
    // Hundreds of lines of similar code
}

// users.go
func handleUserGET() {}
func handleUserPOST() {}

// posts.go
func handlePostGET() {}
func handlePostPOST() {}

This has a certain advantage, which is that when reading the source code, you know what APIs are in the whole. But when reading the source code, you usually have a purpose and rarely read it line by line. When there are many APIs, you need more extra operations such as searching. Especially when adding new APIs to posts, you have to scroll for a while or need to search to put them together. So I prefer the following. When making changes, you only need to modify one file, and when reading the source code, you can see which API interface corresponds to a certain method more quickly and clearly. The extra generated document is the best way to understand what APIs are in the whole:

// xxx.go
func registerHandlers() {
    registerUserHandlers(router)
    registerPostsHandlers(router)
    // ...
}

// users.go
func registerUserHandlers(router Router) {
    router.GET("/users/:id", handleUserGET)
    router.POST("/users", handleUserPOST)
}
func handleUserGET() {}
func handleUserPOST() {}

// posts.go
func registerPostsHandlers(router Router) {
    router.GET("/posts/:id", handlePostGET)
    router.POST("/posts", handlePostPOST)
}
func handlePostGET() {}
func handlePostPOST() {}

This can be said to be the principle of proximity, including the definition of constants, errors, etc.

Global Variables

There have been enough discussions on why to use less global variables, and there are even languages that do not support global variables directly, which I think is quite good.

Although global variables are global, one of their major drawbacks, in my opinion, is the uncertainty of their life cycle. For other modules, the best assumption is that global variables that have been initialized will not be destroyed. It may feel cool to add them slowly when developing, but it is a disaster when reading code. Moreover, as the software becomes complicated and the new and old functions alternate and need to be compatible, adding condition checks to global variables interspersed in various parts of the code and doing tests and so on may make you collapse.

There is another kind of global variable that I think is the kind that is passed all the way through parameters and shared with multiple modules (including peer and nested level modules), such as the global configuration object. Although this is at least transparent, the downside is that it is dependent on all modules. Deleting a field is a heartache, changing and renaming will affect multiple component modules, so don’t be lazy and exercise your hand strength to define a configuration object that is only related to itself for each module component separately, to improve cohesion.

The Golang standard library has some examples to be aware of:

For example, many projects use the global flag, and these projects can also be used as a library. If we also register a lot in the global FlagSet, there will be many flags defined by other libraries that we don’t need.
Another example is the global rand variable which has a big lock, which may cause performance problems in the program.
There are some other examples that I won’t give. Serious programs should not lazily use these global and convenient methods.

Logs

I think whether a relatively low-level component module is transparent and controllable depends on whether additional log information output is needed.

As a library, the place where logs are needed is generally that the internal logic is too complicated, which makes it impossible to fully expose the necessary state to the upper layer users. The log is essentially a way of information feedback. The disadvantage is that it is implicit, and the upper layer of code callers cannot directly handle some processing logic based on these feedbacks. For example:

// lib
func Run() {
    // ...
    go func() {
        if err := work(); err != nil {
            LOG("exit with error: %v", err)
        }
    }()
}

// caller
{
    sigc := make(chan os.Signal, 1)
    signal.Notify(c, syscall.SIGTERM, syscall.SIGINT)
    lib.Run()
    LOG("exit with signal: %v", <-sigc)
}

Or like this:

// lib
func Run(ctx context.Context) {
    // ...
    go func() {
        if err := work(ctx); err != nil {
            LOG("exit with error: %v", err)
        }
    }()
}

// caller
{
    sigc := make(chan os.Signal, 1)
    signal.Notify(c, syscall.SIGTERM, syscall.SIGINT)
    ctx, cancel := context.WithCancel(context.Background())
    lib.Run(ctx)
    LOG("exit with signal: %v", <-sigc)
    cancel()
}

The problem with both of the above is that Run (the internal goroutine) may have exited abnormally, and the caller is still blocked there without being aware of this state. The following looks more cumbersome but is more transparent and controllable:

// lib
func Run(ctx context.Context) error {
    // ...
    return work(ctx)
}

// caller
{
    ctx, cancel := context.WithCancel(context.Background())
    errc := make(chan error, 1)
    go func() {
        if err := lib.Run(ctx); err != nil {
            errc <- err
        }
    }()
    sigc := make(chan os.Signal, 1)
    signal.Notify(c, syscall.SIGTERM, syscall.SIGINT)
    select {
    case err := <-errc:
        LOG("exit with error: %v", err)
    case sig := <-sigc:
        LOG("exit with signal: %v", sig)
    }
    cancel()
}

But sometimes there is no better way to simplify the logic, and some information may be necessary for troubleshooting, and there is no more elegant way to reveal to the upper layer. At this time, the way of log output should be controllable, and it should be decided by the upper layer whether to print the relevant log and what the format of the log should be.

Logs are necessary, but they should be restrained when used. I have seen too many codes that output the same meaning but different form of log information at different levels. A more appropriate way, as I mentioned earlier, is to output necessary log information in a relatively high-level module, while low-level modules should be as transparent and controllable as possible.

Encapsulation

Over-encapsulation may bring worse control ability, and worse control ability may lead to more malformed encapsulation.

For example, some code libraries decide whether to enable a certain function through switch parameters and the like, which shields some more detailed control rights. After turning off this function, the upper layer cannot elegantly use a better implementation to replace it, and can only modify the source code.

And sometimes not encapsulating may bring mental burden to the upper layer caller, it’s hard.

For example, the following is a bit strange example:

type X struct {
    // xxx
}

func (x *X) GetA(x string) (string, error) {
    x.l.RLock()
    // A...
    x.l.RUnlock()
    return
}

func (x *X) PutA(x, y string) error {
    x.l.Lock()
    // A...
    x.l.Unlock()
    return
}

func (x *X) GetB(y string) (string, error) {
    x.l.RLock()
    // B...
    x.l.RUnlock()
    return
}

func (x *X) PutB(x, y string) error {
    x.l.Lock()
    // B...
    x.l.Unlock()
    return
}

It seems there is no problem, but suppose that we need to do a PutA(“only me”, “special one”) when an error occurs in calling GetA and the parameter is “only me”. The problem at this time is that there is no lock protection between the two calls, which may be called by other threads to call PutA, causing the writing of other threads to be overwritten and lost. Then we might add another function:

func (x *X) GetOrPutA(x, y string) (string, error) {
    x.l.Lock()
    // A...
    x.l.Unlock()
    return
}

Later, suppose we need to GetA and GetB (or PutA&PutB) at the same time and ensure that the two happen at the same time, so:

func (x *X) GetAandB(x, y string) (string, string, error) {
    x.l.RLock()
    // A,B...
    x.l.RUnlock()
    return
}

func (x *X) PutAandB(x1 y1, x2, y2 string) error {
    x.l.Lock()
    // A,B...
    x.l.Unlock()
    return
}

Later, when these combinations become more and more, the methods after encapsulation will become more and more strange, then throw the lock to the upper layer, which brings a certain mental burden to the upper layer:

func (x *X) RLock() {}
func (x *X) RUnlock() {}
func (x *X) Lock() {}
func (x *X) Unlock() {}

func (x *X) GetX(x string) (string, error) {
    // X...
    return
}

func (x *X) PutX(x, y string) error {
    // X...
    return
}

The ideal form of a separate library/module should have both high-level functional encapsulation and expose enough low-level interfaces and control rights. This can solve the problem of whether to encapsulate to a certain extent.

Dependencies

First of all, you must be clear whether the dependency is stable and controllable, and whether you have the right to interpret it, otherwise, add another layer of abstraction…

Then, regarding the definition of the interface and the dependency, for a module, the introduced dependency should follow the minimum usability principle. For example, an implemented interface:

type X interface {
    Query() error
    Create() error
    Delete() error
}

When we depend on this implementation, we should define a minimum usable interface internally instead of directly introducing the above interface definition from the outside to save trouble:

type X interface {
    Crete() error
}

Even if all the functions of this implementation are dependent, they should be defined internally as much as possible; this is more cohesive and can avoid many problems such as circular dependencies and so on.

In short, what should be dependent is the minimal usable abstraction, not the large and complete abstraction or implementation.

The Zen of Python

The Zen of Python is written really well!

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Some “Thoughts” on Coding