Using Open Source Projects

We’re in a golden age of open source software. Largely due to the popularity of Github, there’s thousands of great open source projects written in any and every programming language. Such an easily accessible repository of others’ code provides a great deal of value :

  • Reusing others’ code in your own project Why reinvent the wheel when others have already built something similar to what you need?
  • Learning from others Reading the code that experienced programmers have written is a tremendous learning opportunity. While it’s harder to just dive into large, complex projects and begin reading random files of code without knowing how the whole system operates, looking at specific changesets can still give insight into good coding practices and software development techniques.
  • Helping others Taking the next step up from just reading code, developers can actively contribute to projects. Finding and fixing bugs, fixing bugs that others have reported, adding new features, or cleaning up existing features are all actions that can help improve a project.
  • And more There’s many more benefits to the crowdsourced nature of open source software: transparency, security, the ability to arrive at better solutions, the ability to grow an involved community, etc.

But with all these advantages that open source software provides, when it comes to integrating others’ open source projects into your own code, there’s often some serious challenges.

Licensing

This isn’t frequently mentioned, but licensing even free software can sometimes pose a problem. Software licensing is kind of a hairy topic. And if you work within a conservative organization that doesn’t want to open itself up to lawsuits, software licenses matter a great deal. The MIT license—generally the most open and agreeable license—tends to be pretty common on Github. But there are still many projects with BSD, GPL, or even without a license at all, that may require red-tape approval—or even require legal departments to get involved—before a third-party project is allowed to be included in shipped software. I look forward to the day where Github makes it easy to add—or even requires—proper licenses on all projects.

Learning

In order to start using someone else’s code, you have to know how it works. All of the big, popular libraries and frameworks have dedicated API documentation, getting started, and example pages. But many projects are severely lacking good documentation. A single Readme file often isn’t enough for all but the most simple project. Sometimes projects’ tests are comprehensive and illustrative, but tests shouldn’t have to serve as documentation.

Investigative costs

So if there isn’t comprehensive documentation, you’re inevitably reading through the source code. But even if a project does have good documentation, you should be reading through the source code anyway. If you don’t know and trust the author, in order to integrate someone else’s code with your own, you need to verify the quality of the code.

  • Security Are there any blatant bad practices that would open your software up to security exploits? If it’s a large and popular project then this, presumably, isn’t a huge concern, since the hope is that a lot of people contributing to and using the project will catch any obvious security holes.
  • Too many dependencies This tends to be a concern with browser-based JavaScript projects: if too many other dependencies (jQuery, Underscore, etc.) are required that you aren’t already using and don’t plan to use, then including them just to make use of the feature you want is a poor choice, performance-wise. If a browser-based JavaScript framework does too much—provides too many features that you don’t have a need for—then that’s also another performance problem, since you’re including potentially thousands of uneeded kilobytes of code to every webpage.
  • Performance There are many different ways to solve a problem. And many different ways to solve a problem poorly. Even a quick glance through another project’s code can give you an indication as to whether performance was even a consideration to the original author.
  • Other red flags If the project has poor test coverage or the code isn’t very well-written (poorly-named variables, commented-out code, etc.), well-structured (overly procedural), or inconsiderate of others (modifying global prototypes in JavaScript, for instance) then either another project or a roll-your-own solution is in order.

Keeping up to date

Once you do decide to pull in a third-party’s code, your work isn’t done. In order to take advantage of new features or bug fixes, you must regularly check for updates. Once you pull in newly updated code you must retest where your own application integrates with it. If the third-party package made major changes, then your application may be broken and may require refactoring in order to accommodate those major changes.

Once you begin using others’ projects, you have to know when new updates are available. You can subscribe to contributors’ blogs or create for yourself regular, recurring tasks to manually check for updates, read the changelog, pull the code in, and re-test your application.

Finding and fixing bugs

If you find a bug in a project’s code, you have several options.

  1. You can work around the bug in your own code. This is a short-sided solution. When you decide to update the external project and the bug has been fixed, then at worst your program may break, and and best, your workaround code is now unnecessary.
  2. You can submit an “Issue.” But first you have to see if the issue’s already been filed by someone else. Once the issue’s been filed, depending on the severity of the bug and the popularity of the project, it may take awhile for someone else to triage and fix and problem.
  3. You can investigate the bug, create a test case for it, fix it, and submit a pull request for the change. This is the type of behavior projects try to encourage. But for large, complex projects, it could take days to familiarize yourself with the codebase to feel comfortable enough to find and fix the problem and be sure the code change didn’t break something else. Some organizations also require contributors to sign Individual Contributor License Agreements before their pull requests can be accepted; this is another hurdle.

Time

It takes time to find, investigate, integrate, and stay up-to-date with third-party source code. In the time it takes to perform all of these tasks, you might’ve been able to just write your own solution. And depending on the problem you’re trying to solve, that’s often true. The time you’re attempting to save by not re-implementing a solution that someone else has already solved can get frittered away by the non-programming tasks involved with consuming that solution.

In a corporate environment, rather than waiting for approval to use licensed free software, you can build your own. Rather than trying to integrate with another API, you can write your own API. Rather than having to worry about additional dependencies that a project brings with it, you can determine what dependencies are appropriate.

As you write your own solution, you can decide whether to open source it. Then as a project owner, you have an entirely new set of challenges than that of a open source project’s consumer.


All of this may make it sound like I’m overly down on open source projects. I’m not. Some portion of virtually everything a modern software developer builds comes from the open source contributions of others. But knowing the challenges involved with integrating others’ code into your own product can help inform your decision regarding whether to use someone else’s solution or to just write your own.

There’s some problems I’m content to let others solve for me (and hope to never have to deal with) and there’s other problems where I know I can do better than the solutions that are available. Sometimes I know that I’m solving a problem that’s already been addressed, but for the reasons mentioned here I’ve chosen to just roll-my-own solution.