All right, all you developers out there... let's see a show of hands. How many of you delight in finding new ways to solve a problem?
You. Yes, you in the back. Get your hand up. You can't call yourself a developer if you don't enjoy finding a new (preferably somewhat convoluted) to solve a problem.
I've been doing some work that involves converting C# code to VB.NET code. I was sitting in the speaker lounge at VS Live, shortly after getting into San Francisco. I'd played a little bit on the plane with the idea of creating a tool that would walk the directory tree and do some of the preliminary work to convert C# to VB.NET.
Of course, there's a great code translator available online (even if it has problems with LINQ). So I thought it would be kind of cool to leverage that, rather than doing the conversions by hand. I mentioned this to Beth Massi, and she said something about
XML literals rock my world!
while whipping up a little code sample to fix the HTML from the translator site. Something like this:
1: input = input.Replace("<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">", "")
2: input = input.Replace("<?xml version=""1.0"" encoding=""utf-16""?>", "")
3: input = input.Replace(" ", "")
4: input = input.Replace("&", "&")
5: input = input.Replace("/*]]>*/", "")
6:
7: Dim html As XElement
8: Using sr As New StringReader(input)
9: html = XElement.Load(sr)
10: End Using
11:
12: Dim code = (From data In html...<ul> Where data.@id = "code-result").FirstOrDefault()
Turns out that Beth had already blogged about how to use XML literals for screen scraping. There's also tidy.exe that we could have used.
Now, I just needed to figure out how to post a request to the code converter site, and get the returned code. Or, to be honest, I needed to figure out how to borrow code to do this. Thanks to Google, this didn't take long.
One problem that I ran into was that I needed to specify the name of the object I was passing as parameter. Fiddler to the rescue!
(Does anyone else find it ironic that I found a C# code sample to demonstrate a concept I needed to use to automate conversion from C# to VB.NET?)
So, I ended up with a nice little class of helper methods that I could leverage while walking through a directory tree and converting all of the .cs files I find there. (And, yes, I'm looking at trying to convert csproj files to vbproj as well.)
The helper class looks something like this:
1: Imports <xmlns="http://www.w3.org/1999/xhtml">
2: Imports System.Net
3: Imports System.IO
4: Imports System.Text
5:
6: Public Class ScreenScraper
7: Public Shared Function GetHtmlPageWithPost(ByVal strURL As String, ByVal postContent As String) As String
8: Dim httpRequest As HttpWebRequest = CType(WebRequest.Create(strURL), HttpWebRequest)
9: httpRequest.Method = "POST"
10: httpRequest.ContentType = "application/x-www-form-urlencoded"
11:
12: Dim arrRequest As Byte() = (New UTF8Encoding).GetBytes("Code=" & postContent)
13: httpRequest.ContentLength = arrRequest.Length
14:
15: Using requestStream As Stream = httpRequest.GetRequestStream
16: requestStream.Write(arrRequest, 0, arrRequest.Length)
17: End Using
18:
19: Using reader As New StreamReader(httpRequest.GetResponse.GetResponseStream(), Encoding.UTF8)
20: Return reader.ReadToEnd()
21: End Using
22: End Function
23:
24: Public Shared Function GetCodeFromHTML(ByVal input As String) As String
25: input = input.Replace("<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">", "")
26: input = input.Replace("<?xml version=""1.0"" encoding=""utf-16""?>", "")
27: input = input.Replace(" ", "")
28: input = input.Replace("&", "&")
29: input = input.Replace("/*]]>*/", "")
30:
31: Dim html As XElement
32: Using sr As New StringReader(input)
33: html = XElement.Load(sr)
34: End Using
35:
36: Dim code = (From data In html...<ul> Where data.@id = "code-result").FirstOrDefault()
37:
38: Dim codeText As String = code.ToString
39: codeText = codeText.Replace(vbCrLf, "")
40: codeText = codeText.Replace("<ul id=""code-result"" xmlns=""http://www.w3.org/1999/xhtml"">", "")
41: codeText = codeText.Replace("<li>", "")
42: ' replace keyword tag with a space to fix parsing issues
43: codeText = codeText.Replace("<span class=""keyword"">", " ")
44: codeText = codeText.Replace("</span>", "")
45: codeText = codeText.Replace("</li>", vbCrLf)
46: codeText = codeText.Replace("</ul>", "")
47: ' minimum effort removal of white space (up to 28 spaces)
48: codeText = codeText.Replace(New String(" "c, 16), " "c)
49: codeText = codeText.Replace(New String(" "c, 12), " "c)
50: codeText = codeText.Replace(New String(" "c, 8), " "c)
51: codeText = codeText.Replace(New String(" "c, 6), " "c)
52: codeText = codeText.Replace(New String(" "c, 4), " "c)
53: codeText = codeText.Replace(New String(" "c, 2), " "c)
54: codeText = codeText.Replace(New String(" "c, 2), " "c)
55:
56: Return codeText.Trim
57: End Function
58: End Class
Isn't that a whole bunch more fun than buying a commercial code translator or using Reflector?
There are a few interesting things to note here.
First, in VB.NET, you can import an XML namespace. This is required for the LINQ to XML query to work properly.
Second, the code returned from this helper class is not pretty -- indentation isn't preserved. The web page returns html with lots of <span> tags to provide keyword coloring and other formatting. We dropped these tags, and all of the associated CSS formatting. I don't see this as a big deal, since I'm not editing code in Notepad. The IDE will take care of making the code look good.
Third, I'm sure there's some really cool way to parse the HTML tree to make the string manipulation much simpler. Maybe a regex expression that would be sweet. I didn't worry too much about it -- brute force worked well enough.
Most importantly, in my mind, this is a cool little way to use HTTP POST to send C# to a website, then screen scrape the results, and then get VB.NET code out. (And, yes, you could equally well use the VB.NET -> C# version of the translator web page.)