Code filesystem trees & dependency digraphs
Consider a collection of source code. It's arranged in a structure in the filesystem. Often, each file is a module named according to its directory-and-file-path. The filesystem structure is a tree (which is a directed graph).
At the same time, the code forms a dependency tree/DAG. A module is a node, and a module importing another module forms an edge. Alternately, items in the code such as classes or functions can be nodes, and an item using another item forms an edge. This is usually a directed acyclic graph. In some cases, cycles are allowed.
In good code, is there a similarity between the filesystem structure and the dependency structure?
Sometimes a library has interface modules that import implementation modules deeper in the filesystem hierarchy. Sometime implementation modules import a "utils" module that's their cousin. In fact, the entire standard library is a cousin to most code, and most code imports something from the standard library. Modules rarely import their direct parents (e.g. "MyLibrary.Types" wouldn't import "MyLibrary" even if "MyLibrary" imports "MyLibrary.Types"). That's not a lot to go on.
What sort of similarity might we look for? Can the nodes of each graph be the same? Both can have files as nodes. However, the filesystem tree has files as leaf nodes and directories as non-leaf nodes. The dependency graph usually does not have directories as nodes at all, and has files as both leaf and non-leaf nodes. We could make the graphs look more similar by modifying the dependency graph. Instead of files being nodes, directories could be nodes that are the combination of all their direct file children's contents. This is tempting, though it throws away a lot of information. Alternatively, we could try to determine a correspondence between the two structures' nodes using graph-theoretic algorithms. This might be interesting (but it might not
What can we do without looking for similarities?- Suppose MyLibrary.A imports YourLibrary.B and YourLibrary.C imports MyLibrary.D. That's often a bad idea even when it doesn't technically involve an import cycle.
- We can detect unused imports. Some languages make them easy to detect; some (like C++) make it quite difficult. But unused imports are a detail to clean up, not a sign of badly organized code.
- We can compute how balanced the filesystem tree or dependency graph is ("balancedness" as in "balanced binary trees"). We can compute how deep it is (tree depth is simple; digraph depth could be longest non-repeating path). Probably there is a level of unbalancedness, and a level of shallowness or depth (for a given amount of code), extreme enough that it would be bad style.
- Etc.?
I continue to contemplate this.