Foreword
CocoaPods cloud analytics is one of a series of cloud infrastructures provided by the Developer Tools department of ByteDance’s Client Infrastructure team. The Developer Tools team is committed to building the next-generation mobile cloud infrastructure, which optimizes the quality, cost, security, efficiency and experience of the development and delivery process of the company’s various businesses by using technologies such as cloud IDE, distributed build, compilation and linking, and so on. The Developer Tools team is dedicated to building the next-generation mobile cloud infrastructure. Through cloud IDE technology, distributed build, compilation and linking, the team optimizes the quality, cost, security, efficiency, and experience of the development and delivery process of the company’s various businesses.
I. Background
CocoaPods has become a standard dependency management tool in the iOS industry under the iOS componentized development model. However, as business capabilities continue to expand and iterate, the number of components continues to increase, resulting in a sharp increase in the complexity of the App project, a serious decline in the efficiency of dependency management, and even the emergence of potential stability problems. In order to manage the component dependencies of large-scale projects faster and more stably, iOS build department has created a set of centralized dependency management service - Cloud Dependency Analysis, which converges the dependency management process at the level of the tool chain, accelerates the speed of resolution, and aggregates the failure problems.
II. What is Cloud Dependency Analysis?
CocoaPods-based iOS project management, every time you execute pod install, you need to synchronize the component index information Spec repository to the local, generally rely on the git repository clone, and then read the Podfile, Lockfile, and other configuration files, and then start to enter the dependency analysis, dependency download, project integration and other steps.
Cloud Analysis is a cloud service that relies on ByteDance’s self-developed product repository platform, uploads local project build materials through the toolchain, quickly returns dependency analysis results, and centrally manages iOS project dependencies. The cloud analysis service will rely on the product library to provide all component index information; and through the cloud analysis local tools in the process of environment preparation to obtain local engineering materials, unified upload to the cloud for dependency resolution tasks, the cloud with a series of optimization means and server performance, rapid return of a resolution result, the local receive the resolution result after the subsequent dependency download and engineering integration process.
The access to the cloud analysis is also extremely easy, no need to increase the configuration file, there is no need to modify the original research and development model, in a non-intrusive, no access costs, does not affect the development process way to access to the project. The only thing you need to do is to add the RubyGem plugin for cloud analytics to the CocoaPods toolchain and add a control switch parameter in the pod install command to enable optimization.
3. How to speed up resolution
3.1 Product Repositories (Full Component Indexing Information)
The iOS development system based on Cocoapods is very sloppy in managing iOS products, directly using different git repositories as index repositories for build products (podspec files), which plays the role of a product repository. With the complexity of iOS project, the increase of git repositories leads to the difficulty of querying the index information of components and the slow synchronization speed of repositories.BitNest product repository is a self-developed product management system of the company’s mobile terminal, which is used to manage the build products generated in the process of continuous integration. The product repository centrally manages podspec sources separated in various git repositories, and through a complete set of CLI commands, it can quickly pull and query podspec information. The cloud analytics service leverages the capabilities of the repository to provide real-time access to a full set of podspec sources in the cloud. Each CocoaPods task does not need to update the podspec source information, nor does it need to update the podspec source information in a timely manner in order to find the podspec information for the latest release of the component.
3.2 Caching
Before describing the caching mechanism, let’s briefly describe the flow of dependency analysis in pod install. At the first execution (ignoring the lockfile), CocoaPods will read the specific plugin, source, target, pod, etc. from the Podfile via DSL, and create the corresponding objects to complete the preparation phase. In each Target object, each pod is created as a Dependency object, and there are specific Requirements objects. All Dependency objects of all Target objects are added to the stack one by one, and a Graph dependency node graph is created. Each Dependency object goes to the corresponding Source repository to find the corresponding pod according to its Requirements. If there is no repository information in the Requirements, it will traverse from the public Source of the podfile to find the corresponding pod. After finding the corresponding pod, it will first create a version list and find out all the pods that meet the Requirements from the version list, and then read the content of the corresponding podspec file. The resolution will create new Dependencies for the implicit pods in the Spec object and add them to the analysis stack and Graph. If a version of the Spec does not meet the Requirements of another dependency with the same name when traversing the Graph dependency graph, it will be withdrawn from the stack and the dependency graph until all Dependencies have been found to be corresponding to the Spec object, and the analysis is complete. As you can see, in CocoaPods dependency management process, there are a lot of repetitive object creation and sorting and searching process, which greatly reduces the development efficiency. Imagine that the objects required by CocoaPods tasks are always kept in a ready state, and whenever a task request is received, the dependency analysis work is executed immediately, and the results can be returned quickly. The cloud analysis service centralizes all CocoaPods dependency management tasks and builds an object caching mechanism for repetitive tasks. The lazy loading model is used to cache new objects and immediately enter the dependency resolution process after the next task comes in.
3.2.1 Sorting Version Cache
When analyzing each pod, in order to get the latest version of the pod dependency, CocoaPods creates a corresponding Version object for all the version numbers in the source repository and sorts them. Currently, most of the internal product versions have reached tens of thousands, and without specifying the source, both binary and source versions will be sorted and read, and finally get a version that meets the requirements and is the latest version. Since component version numbers are separated by “.” and “-“ segments, most component versions have 4 or 5 fields or more. This results in tens of thousands of components being sorted, and each sorting comparison needs to be traversed more than 4 times, increasing the time complexity by several times and greatly increasing the time consumption.
In order to obtain an organized version list faster, the repository service maintains a version file of all pod components sorted from largest to smallest; every time a new pod version is added, the repository inserts a new version into the file; and when it is deleted, it deletes the corresponding version field.
With the ordered version file, the main purpose of adding the Version cache in Cloud Analytics is to maintain the version segmentation information in the Version object all the time, so that you can quickly determine whether the current Version meets the requirements of the dependency. Version Caching can speed up the dependency management process by about 10-12 seconds.
Cloud analysis without version caching will prioritize reading the data in the version file to get an ordered list of versions directly; if the length of the version list does not match the length of the component version directory in the source, it will fall back to the original method (version list error to ensure the correctness of the analysis). In the case of a cache hit, it is also necessary to determine whether the length of the cached version list is equal to the length of the pod version catalog (there is a new version added, and the cache is not added), then the difference version will be looked up from the version list array, and the cache will be corrected.
3.2.2 Spec Object Cache
When CocoaPods looks for a podspec that meets the dependency requirements from a sorted version, it reads in the contents of all the versions of the podspec that meet the dependency requirements and does a dependency resolution traversal. If no version is specified, all versions of the podspec file will be read, and if no source is specified, all pods where the source exists will be read. For 10,000 podspec files to be read, it takes about 30 seconds (depending on the disk).
Cloud Analytics caches the contents of the podspec file for each analysis task IO read. When the next task fetches the Spec object, it can directly get the corresponding Spec object according to the three fields: source, pod_name, and version.
Meanwhile, in order to ensure the correctness of the Spec and prevent the Spec from changing its content without changing its version, the cache of Spec objects exists in the form of a multi-dimensional array, and by judging the modification time of the podspec file, the contents of the podspec in the cache can be updated to the latest submission, so as to make sure that the checksum calculation is the same as that of the local pulling dependency analysis, and to realize the correctness of the cloud dependency analysis. This ensures that the checksum calculation is the same as the value calculated by the local pull-bin dependency analysis. In the future, we will also increase the number of Spec cache hits, Spec object expiration time, etc. to realize the Spec cache cleanup strategy.
3.2.3 Cache Reuse!
The cloud analysis will also cache the analysis results, so that the next time the same analysis task can be reused directly. The cloud will do a global hash calculation and a segmented hash calculation on the material after fetching the material once, and cache the Complete Analysis Result
and Analysis Result Graph
respectively. For the next analysis task, if the materials are exactly the same, we can directly return a complete analysis result; if there is no match, we will calculate the first-level platform information key
by some target, platform and other information to determine the specific app information; then we will calculate the hash value of all the component dependencies under the target one by one, and obtain the second-level hash array Then calculate the hash value of all the component dependencies under the target one by one to get the second level
hash arraykey
, which corresponds to a graph value of the analyzed result; through fuzzy matching to match the key of the hash array, we can match the similar graphs with the same and the highest number of dependencies, and then replace the locked dependencies in the material to speed up the analysis. Of course, the fuzzy matching capability has some limitations, and cannot speed up the analysis of the originally uploaded lockfile material.
3.3 Material Pruning
Cloud analysis transforms CocoaPods objects into byte streams for transmission. The uploaded material and analysis results are as follows:
1. Uploading material
The cloud analysis toolchain will take the Podfile object, the Molinillo Graph object generated by the lockfile, the specified Source object, the plugin adapter, and all the external source Specs objects (specifically, the pre-release objects of the specified git, path, and podspec) as upload materials. In fact, cloud analysis does not need all the information of these local objects, and can prune these objects, for example, the Podfile object only needs the chain list of target_definitions; the Molinillo Graph object only needs the nodes corresponding to all pods, and does not need to record the logs of the operating nodes; the Source object only needs to know the name and the repo_definitions; the Source object only needs to know the name and the repo_definitions of the pods. Molinillo Graph object only needs all the nodes corresponding to the pods and does not need to record the logs of the operating nodes; Source object only needs the name and repo_dir, and so on. Among them, some resolution optimization plug-ins need to transfer some extra configuration Config objects through plug-in adapters.
2. Result Return
The result returned by the cloud analysis is a hash object with Target as the key and the corresponding Specs array as the value. Before the result is returned, the Source of all Specs will be pruned first. Since the Source corresponding to each Spec is only used in the subsequent process to classify the url field and generate the lock file. Therefore, other useless fields of the Source object can be deleted to minimize the transmission content and speed up the response time. After pruning the returned results, the size of transferred content can be reduced by about 10MB or more.
3.4 Resolution Policy Compatibility
To ensure the correctness and uniqueness (single truth) of the resolution results, the cloud analysis is compatible with the toolchain of each CocoaPods resolution strategy optimization within ByteDance. According to the construction configuration parameters in the project, the local plug-in of Cloud Analytics identifies the specific resolution strategy and passes it to the Cloud Analytics server to activate the corresponding resolution strategy algorithm for fast resolution. At the same time, combining the existing resolution optimization strategy and the cloud optimization acceleration mechanism, the dependency management process of CocoaPods reaches second return.
IV.
This article mainly shares a CocoaPods cloud-based optimization scheme within ByteDance, which converges and reuses a large number of repetitive iOS engineering pipeline build tasks, accelerates the dependency management rate under the premise of guaranteeing the correctness of dependency resolution, and improves R&D efficiency. At present, the cloud analysis service has completed the first phase of development and has been used by several core production lines within the company. For example, after the headline accessed the cloud analysis service, the time consumption of pipeline’s dependency analysis phase was accelerated by more than 60%. In the future, the download optimization of CocoaPods and project caching service are also under technical exploration, and the related technical articles will be shared one after another, please look forward to it!