.NET Hobbyist Programmer

Staying Confused in a Busy World
posts - 258, comments - 207, trackbacks - 673

My Links

News

Neat Stuff Read all my hurricane entries While you are here, visit the Geeks With Blogs main feed
Advertising
Links Status of the Navy
Channel 9

Tag Cloud

Article Categories

Archives

Post Categories

Google Code Search - Finding Source Code

Google Labs has come out with a new means to find source code online.  Google Code Search is a feature that returns hits from within posted source code.  It did not take long for me to ignore the main page and set the advanced search page as my shortcut.  So far, it seems heavily weighted towards Java (not surprising), with less available from C-based sites.  The results layout also needs more differentiation between each separate result -- things blend together too much.  More to follow as I use the system -- recommended, so far.


The associated FAQ is pretty simple at the moment:

1. What kind of code are you crawling?

We're crawling as much publicly accessible source code as we can find, including archives (.tar.gz, .tar.bz2, .tar, and .zip), CVS repositories and Subversion repositories.

2. What regexp syntax does Code Search support?

Google Code Search supports POSIX extended regular expression syntax, excluding backreferences, collating elements, and collation classes. To search for a space character, escape it with a backslash, as in hello,\ world. You can search for literal strings by enclosing the strings in quotation marks, as in "hello, world". We also support Perl extensions \d, \D, \s, \S, \w, and \W.

3. How do I restrict by language, license or filename?

You can either use the Advanced Code Search page or use our operators. Our operators include:

  • The lang: operator, which restricts by programming language (e.g., lang:"c++", -lang:java, or lang:^(c|c#|c\+\+)$)
  • The license: operator, which restricts by software license (e.g., license:apache, -license:gpl, or license:bsd|mit)
  • The package: operator, which restricts by package URL (e.g., package:"www.kernel.org" or package:\.tgz$)
  • The file: operator, which restricts by filename (e.g., file:include/linux/$ or -file:\.cc$)

The argument to each of these operators can be either a quoted literal string or a regular expression. As illustrated in some of the examples above, each of the operators can be used as a negative by placing a minus sign ("-") in front of it.

4. Can I add Google Code Search results to my website, IDE or application?

Yes. Code Search results are available via a GData/XML feed, and we encourage you to help create IDE plugins and add Google Code Search to your site.

5. How do you decide what software license to list for a piece of code?

We do our best to determine the software license for code packages by looking for a license in the comments or in a separate license file (e.g., LICENSE, LICENCE, COPYRIGHT, COPYING). If we can't find a license, we indicate that the license is "Unknown." Please note that our license detection is not perfect -- we try to list the license as indicated by the code's author, but we can make mistakes and sometimes the author indicates the wrong license. The Code Search results also can't tell you what patents may cover a piece of software. We tell you what we can about the likely license terms, but understanding the legal requirements to reuse a piece of code is your responsibility.

6. How do I add my code to Google Code Search results?

You can submit your code using our online form. Please note that we do not add all submitted code to our index, and we cannot make any predictions or guarantees about when or if it will appear.

7. How do I block you from crawling my code?

Google Code Search respects robots.txt, so there are a couple ways you can block us from crawling your code:

  • If you have access to the robots file for your web server, you can add the your code's path to the Disallow: line. Learn more.
  • Alternatively, you can simply put a robots file in the root directory of your code package. This will work for both archives and source control repositories like CVS and Subversion. For example, to indicate you want none of your code crawled, you could add a file called robots.txt in the root directory with the following:
         User-agent: *
         Disallow: /

Please note that it may take some time for Code Search to update its index and remove your code. If you have an urgent request, please let us know by emailing codesearch-issues@google.com.

8. I have suggestions for product improvements. How do I let you know?

To share your thoughts with us, please post them to the discussion group. Google Code Search is part of Google Labs, so we're still in the early stages of development. Your feedback is important and will help us improve the product.

9. How do I let Google know if I see a Code Search result that I think should be removed?

Google Code Search is still in Google Labs, so the search results may not be perfect. If you notice a significant problem with the search results, please let us know by emailing codesearch-issues@google.com. If you're a copyright owner and believe you've found results that infringe your copyright, please follow our DMCA process to request removal.

10. What are the terms of use?

Please refer to the Google Code terms of service.

Print | posted on Thursday, October 05, 2006 6:09 PM | Filed Under [ Programming & Etc. ]

Feedback

No comments posted yet.
Post A Comment
Title:
Name:
Email:
Website:
Comment:
Verification:
 
 

Powered by: