The absolute statement is false.

Rolling in the Deep (with Python)

(NOTE: this article includes content translated by a machine)

It’s quite nice to occasionally doubt life (⊙﹏⊙)b

Love

The advantage of Python is that it’s enjoyable to use, has high development efficiency, and its concise syntax can be said to be the reason for its enjoyability but not the reason for its high development efficiency. Nowadays, who would do daily development with Notepad or naked Vi? I think the reasons for its high development efficiency are:

  1. Dynamic language features. Basically, you can do anything you want to reduce repetitive work (if you use Golang’s reflect, you will silently shed tears), and then its ability to handle data is very powerful. When others are still writing iterations for type conversion, your data has already been processed.

  2. Comprehensive standard library and rich third-party libraries. Although some parts of the standard library are somewhat chicken ribs, it’s better to have them than not. As for third-party libraries, you’ll know when you search on PyPI.

Hate

The joy of use is also a sin. Can you feel the thrill of your fingers flying over the blue axis keyboard all day, with the sound of the keyboard echoing in your ears, and the sour and refreshing feeling in your fingers?

You must ensure a high test coverage. It’s not impossible for Variable referenced before assignment/AttributeError/ImportError/Cannot use str as int to fly all over the sky. Of course, pylint can help do some of the work.

Facing CPU-bound tasks, you might want to cry. Of course, there are C libraries to help, but dependencies that rely on ctypes/ldconfig and cannot be installed directly from the source are very troublesome..

Multithreading cannot utilize multi-core so you have to use multiprocessing, which can be said to be a fatal flaw! Communication and resource sharing between multiple processes are troublesome and ungrateful, perhaps this is the reason for having multithreading?

Let’s talk about the situation at work based on the above points:

  1. It’s impossible to share connections in a connection pool in multiprocessing. In our configuration distribution and service registration/discovery SDK, the solution is: only one process uses the connection to pull data from the remote end, using local file caching, and other processes read the cache file. Here, the process that pulls and writes data is not independent. To ensure the consistency of the interface, I borrowed the factory method pattern from tornado, only not through platform differences to generate different instances, but through competing for the file lock to generate different instances. The service uses the pre-fork model of gunicorn, and gunicorn’s master will restart worker as needed to avoid memory leaks, so in some cases, it is inevitable to handle it. If the process that pulls and writes data hangs up, other processes must make up for it. Here, for the reader, a separate thread must be started to continuously check whether the file lock can be obtained. If it can, it can be assumed that the writer has hung up, and then he can take over.. Of course, it could also be that the lock file was deleted, so the writer also needs to check whether he still holds this lock (⊙﹏⊙)b..

  2. In the recent development of another SDK, let’s not talk about what the internal service is doing. The data pulled is “huge”, and in order to reduce memory usage and not let each process load the cache file into memory, I also built a RPC based on unix domain socket for this thing.. Making it like this needs to handle more situations. Actually, the server side provides an HTTP interface. Here, let’s talk about why not use it directly. Firstly, I’m afraid that the service can’t handle it. Secondly, I’m afraid it’s too slow. Then, it’s not based on DNS but provides an HTTP interface to give out the backend service list. I have to do a simple load balancing and fault tolerance encapsulation, and I also asked why not use the SOA (service registration/discovery + RPC) method (doing some subtraction is not bad, at least the new feature does not need to push the SDK update again, and does not need to deal with the problems caused by the old SDK), well.. Back to the situation to be handled, because the data file is quite large, during the data loading (Python, yes, the whole process will be blocked), and when the unix domain socket file is deleted, the query has to fall back to the HTTP remote query method. Due to various non-standardizations on the Java service side, the processing of query results is heartbreaking. Then there is the situation of writer switching, and the RPC client has to rebuild the connection..

Above, the only consolation is that even though so many XXX have been made, the development efficiency (progress) is still above Java. Of course, this is not good. The faster side will find some potential defects and unreasonable designs, and then have to wait for the other side to improve and modify the docking protocol (ˇˍˇ)

Interweave

Sudden…