6 months in review


Mother of all Blog Posts

Building expertise in WPF / Silverlight

·         Core Programming Concepts: Declerative programming, Dependency properties, DataBinding, Commands, Triggers, VSM, Control templating, the inheritence model , XAML extensions and control templating

·         Transforms and animations - SVG experience helps

·         Comprehend the core differences in SL and WPF-  lack of Commands and Triggers (Behaviours and VSM are superior anyway), Calling different types of services (WCF, Sockets, Duplex, ADO.NET Data Services, RIA services) à naturally supports good architecture,

·         Silverlight 3 -  navigation , out of browser experience, popups etc

·         MSDN + Adam Nathan book are a great place to kick off

Product Details

·         Petzold’s WPF 3D Book – A great resource for learning 3D in general – it really mixes well 3D maths with the core programmatic constructs of WPF 3D. There are 3 major limitations to WPF 3D: 1) Retained mode graphics have performance issues. 2) for a high level scene graph, the primitives are still primitive – wouldn’t be great if we had intrinsic data structures for rendering NURBS patches. 3) Point animation of a mesh vertex is on the heap (each interpolation is like writing to the same string object)  à requires annoying work arrounds.

Product Details

·         Silverlight Animation book – collision detection, kinematrics, particle systems, etc. I like the way that the VSM is leveraged for cartoon like affects and the technique of programmatically manipulating the storyboard as a declarative timer.

Product Details

·         Silverlight Toolkit / WPF Toolkit – especially DataGrid and Charts

image

·         Expression

o   Blend - how else would you template your controls. I feel pretty comfortable using this tool(may polish up the data binding side)

 

o   Encoder - media encoding with markers, leveraging IIS smooth streaming

 Microsoft Expression Encoder 2

o   DeepZoom – creating Image pyramids

dzcAlbums

o   (I have not yet dug deep into expression Design

·         RIA Services – a prescriptive model that takes n-tier development to the next level of abstraction.

Infrastructure behind a DomainService and DomainContext

·         Blend Behaviours – abstract out triggers and actions into declarative components

 

·         Perf tools

·         Automation using White

·         SilverlightSpy

·         SL unit testing

·         MVVM – the view has a view model and the view model has a model (whose change notifications it observes). Leverage XAML databinding and DelegateCommands (as the VM is out of the Logical Tree)

·         Composite Application Library – Constructing a mutiview GUI leveraging MVVM, Loading isolated module via IoC and DI, mapping views to regions -  Unity DI Container, DelegateCommands, EventAggreagater, RegionManager etc.

Maths

in order to construct AI algorithms, you need 2 core subsets of knowledge: Programmatic and Mathematic skillsets. Learning Software Engineering through 10 years of practice can yield for me the ability to take a set of algorithms to construct an n-tier solution (with all the –ability adjectives {scalability, maintainability, performability (ha), etc}) while a straight mathematician would flounder on basic programming constructs. By learning the math, I canaggregate knowledge silos - eg in a few hours comprehend the gist of a highly specialized phd thesis that someone spent 5 years writing).  To that affect I’ve gone through the following Schaums Guides (plus watched Jason Gibbs maths training DVD in Linear Algebra - http://www.mathtutordvd.com/ )

·         Advanced Calculus – reading through this felt like sense of achievement one gets from climbing a mountain. I achieved a decent comprehension

 Product Details

·         Probability – slightly simplistic. Stochastic trees are a great mathematical visualization technique for understanding Bayes Theorum. Useful for reasoning under uncertainty.

Product Details

·         Linear Algebra –different techniques for solving systems of linear equations.

Product Details 

·         Operations Research – allocation of scarce resources

Product Details

·         Numerical Recipes – function approximation. The notes were slightly sparse.

Product Details

Putting things into perspective:

·         Larry Gonicks history books – a great way to learn the tapestry of history in a compact and humorous manner

Product Details

·         The ascent of money – I’ve always enjoyed Nial Ferguson’s geopolitical and socioeconomic analysis. Can we unify behavioral and quantitative finance?

Product Details

·         Ending Aging – a snapshot of the current state of research into alleviating the affects of the 7 malady categories of aging: - I corresponded with the author during my Bioinformatics days.

Product Details

·         The trouble with physics – by Lee Smolin. String “Theory” (more like conjencture / cult of Ed Witten) is actually based on another unproven conjecture – supersymmetry. It is not currently falsifiable and has turned into consensus science, drawing away from other potential theories such as Twister theory or Loop Quantum Gravity. M theory is just a vague concept and there are an infinite number of string theories – in programming ,we call this a code smell. While it holds mathematically (and has led to new maths such as Guage theory), it wont be the 1st mathematically elegant theory that doenst hold in reality.

Product Details

·         A new kind of science – just like cellular autonoma, Wolfram proposes that complexity evolves out of simple rules. Reads like a pop sci book. Strangely enough, the concepts of quantum or probabilistic states and states that modify rules are not covered!

Product Details

.NET Parallel Development

·         TPL – a .NET 4 scale up stack for data & task parallelization. Various levels of abstraction – PLINQ, parallel iterators, Tasks (similar to an automated threadpool), futures (tasks that promise to asynchronously return values). + many new thread model synchronization locks. see - http://blogs.msdn.com/pfxteam/default.aspx, http://msdn.microsoft.com/en-us/library/dd460693(VS.100).aspx , http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=348f73fd-593d-4b3c-b055-694c50d2b0f3

·         DryadLINQ – a declarative scale out API of MS Research for scaling processing out to multiple nodes - SEE http://research.microsoft.com/en-us/projects/DryadLINQ/ , http://connect.microsoft.com/site/sitehome.aspx?SiteID=891

 

·         Accelerator – a non LINQ enabled GPGPU API from ms research which is limited to 2D matrices. It leverages a high level shader to map processing into the GPU - SEE http://research.microsoft.com/en-us/projects/accelerator/default.aspx

nvidia CUDA

·         Axum – a research API based upon the agent paradigm which isolates agent interactions through message channels. Feels clunky to me. SEE http://blogs.msdn.com/maestroteam/default.aspx, http://msdn.microsoft.com/en-us/devlabs/dd795202.aspx

·         Solver Foundation – (horrendous doco) - Optimization API host exposed as a web service, a C# API, or an excel plugin. It suceeds where Excel Services failed. http://code.msdn.microsoft.com/solverfoundation

Solver Foundation by Microsoft Research on Flickr.

·         Windows HPC – a grid computing host management environment – includes a prioritized job scheduler and management utilities  - SEE http://msdn.microsoft.com/en-us/library/cc907080(VS.85).aspx , http://www.microsoft.com/hpc/en/us/default.aspx

 

NET 4

·         Oslo + Quadrant – still half baked.

·         WF 4 – still half baked – currently no state machine or rule engine support evident. Note that persistence and tracking have been abstracted out to Dublin (WAS 2.0) , which wont be released in the NET 4 timeframe.

·         Entity Framework – pretty easy to learn these changes

·         WCF – discovery, REST templates

·         System.Xaml – still half baked

·         What happened to Software Factories (Blueprints)?

·         Azure – what a mess! SQL Data Services doesn’t work, Cloud Workflows were a fork and have been scrapped and Silverlight SMEWAs require live mesh client! Note: Azure is NOT A GRID COMPUTING PLATFORM!!! I’ll wait for v2 thanks.

Work

·         Constructed Silverlight Timeline for Matach

·         Constructed prototype of Silverlight media playback system for Youniversity

·         Completed Amdoc’s staffing workflow system

·         Constructed Southridge WPF labs for MS

·         Constructed WPF control Library  for Elta

·         Attrrnded the Mix conference

·         Taught the Metro WPF course in Israel, Australia, New Zealand, Turkey

·        App Profiling for IFN - fun with Ants profiler!

·         Constructed the Silverlight 3 Courseware Library for MS

·         Constructed WPF GIS  control and architectured Composite client for current customer.

 

author: JoshReuben | posted @ Sunday, July 05, 2009 2:13 PM | Feedback (0)

Dev Team Management best practices


  • Ensure your developers build code to meet the spec (the contract of customer requirements) in a timely manner
  • Ensure your developers build code changes that don’t break the spec and that pass functional requirements
  • Ensure your developers build code which is robust (not fragile) and that meets design and code quality guidelines
 
  • Make sure that the technical analyst provides appropriate specs!
  • Keep it deliverable – compile-able. No point building mounds of UML diagrams that don’t stay in synch with the project
  • consists of: Use Case Scenarios, Interfaces, Code Contracts, Unit Tests, pre & post conditions.
  • To plug in Use Case Scenarios, need a generic baseline architecture.
 
  • Scenarios and Tasks
    • Each use case focuses on describing how an end user or client can achieve a goal.
    • Forces you to think about the spec functional requirements.
    • decompose into Tasks
    • Scenarios and their Tasks map to VSTS.
    • checkin policies force developers to associate each changeset with a task – track progress, avoid unnecessary work.
    • From nouns in scenarios identify entities. Each task maps to a method.
    • Consider Workflows – each scenario maps to a sequential or state machine Workflow and each task maps to an Activity
  • VSTS benefits
    • 20 tools in one
    • Integrated work item list – scenarios, bugs, tasks, tests, QoS
    • Source control with Checkin policies
    • Database schema version control, comparison and test data generation
    • Real Project health tracking via reportage
    • Integrated testing environment – 6 types of test: regression, web, load, custom, manual
    • Build & test servers
 
  • Interfaces for Design By Contract
    • define precise verifiable interface specifications for software components
    • In VS, can generate classes from Interfaces, then automatically construct class diagram
    • Each interface method should map to a task
    • The interfaces force the developer to implement all tasks
  • Interface Code Contracts
    • See previous post!
    • Applies Preconditions and Postconditions -
using System.Diagnostics.Contracts;
 
[ContractClass(typeof(IFooContract ))]
interface IFoo
{
int Count { get; }
void Put(int value );
}
 
[ContractClassFor(typeof(IFoo))]
sealed class IFooContract : IFoo
{
 
int Count
{
get
{
CodeContract.Ensures( 0 <= CodeContract.Result<int>() );
return CodeContract.Result<int>(); // dummy return
}
}
 
void Put(int value)
{
CodeContract.Requires( 0 <= value );
}
 
}
 
 
  • Ensure code standards in [relevant] rules are met – checks: Correctness, Library design , Localization , Naming conventions , Performance , Security
  • Analyzes the compiled code for conformance to Microsoft's .NET Framework Design Guidelines.
·         One of the easiest “code smells” to deodorise is that of "duplicate code" - Simian is a Similarity Analyser that detects duplications in source code (C#, C, C++ ASP, XML, HTML etc); once detected you can easily locate the duplication and perform an "Extract Method" refactor in Visual Studio.
  • Simian is a command line tool - but you can integrate it into Visual Studio via the extensibility of the External Tools menu. Here is an example of the standard output produced by Simian when run against some source code:
Similarity Analyser 2.2.23 - http://www.redhillconsulting.com.au/products/simian/index.html
Copyright (c) 2003-08 RedHill Consulting Pty. Ltd. All rights reserved.
Simian is not free unless used solely for non-commercial or evaluation purposes.
{failOnDuplication=true, ignoreCharacterCase=true, ignoreCurlyBraces=true, ignoreIdentifierCase=true, ignoreModifiers=true, ignoreStringCase=true, threshold=6}
Found 6 duplicate lines in the following files:
 Between lines 201 and 207 in /Users/haruki_zaemon/Projects/redhill/simian/build/dist/src/java/awt/image/WritableRaster.java
 Between lines 1305 and 1311 in /Users/haruki_zaemon/Projects/redhill/simian/build/dist/src/java/awt/image/Raster.java
Found 6 duplicate lines in the following files:
 Between lines 920 and 926 in /Users/haruki_zaemon/Projects/redhill/simian/build/dist/src/com/sun/imageio/plugins/jpeg/JFIFMarkerSegment.java
 Between lines 908 and 914 in /Users/haruki_zaemon/Projects/redhill/simian/build/dist/src/com/sun/imageio/plugins/jpeg/JFIFMarkerSegment.java
 
  • enforce a common set of best practices for layout, readability, maintainability, and documentation of C# source code.
  • produce elegant, consistent code that your team members and others who view your code will find highly readable.
 
  • Ensure that all methods work – that changes to code don’t break other methods
  • Generate a Test project – each [Task mapped] method should have a postcondition test
  • Regression test – attach the util to the DLL
  • Automatically generate unit tests using Pex:
 
Enterprise Library and Policy Injection
  • Improved Productivity (satisfy common application concerns out of the box) , Configuration Driven Design
  • EntLib application blocks - Caching , Data Access (redundant), Cryptography , Exception Handling , Logging , Security , Validation
  • Policy Injection – Aspect oriented programming - separatecross-cutting logic from domain-specific logic. Inversion of Control - instead of the programmer specifying cross-cutting functionality via function calls, instead register desired responses to particular policy rule defined events.
  • Eg for a function - inject validation, tracing , exception handling , Authorization, caching, update performance counters. Using the PIAB, such functionality would be introduced through the creation of policies that define the implementation details needed at run time. Policies can be applied to objects through configuration or by decorating classes with attributes. Each policy has a set of matching rules to determine where it should be applied, and a collection of handlers that determine which behaviors should be introduced when invoking methods on the object. The PIAB comes with a number of useful handlers for validation, logging, exception handling, managing performance counters, authorization, and caching. Many of these handlers leverage EntLibs other application blocks. PIAB supports matching rules based on types, assemblies, member names, method signatures, namespaces, parameter types, property types, return types, custom attributes, tags
 
 

author: JoshReuben | posted @ Saturday, November 22, 2008 8:32 AM | Feedback (1)

Code Contracts - the dark horse of the PDC


·         Make Coding Assumptions Explicit and Tool Discoverable. provide a language-agnostic way to express coding assumptions in .NET programs. The contracts take the form of pre-conditions, post-conditions, and object invariants. Contracts act as checked documentation of your external and internal APIs. The contracts are used to improve testing via runtime checking, enable static contract verification, and documentation generation.
 
 
·         bring the advantages of design-by-contract programming to all .NET programming languages. benefits:
o   Improved testability – A) each contract acts as an oracle, giving a test run a pass/fail indication. B) automatic testing tools, such as Pex, can take advantage of contracts to generate more meaningful unit tests by filtering out meaningless test arguments that don't satisfy the pre-conditions.
o   Static verification - use contracts to reduce false positives and produce more meaningful errors.
o   API documentation - The same contracts used for runtime testing and static verification can also be used to generate better API documentation, such as which parameters need to be non-null, etc.
·         consists of a [library|Contract Language] for writing pre-conditions, post-conditions, and object invariants. The use of a library has the advantage that all .NET languages can immediately take advantage of contracts. There is no need to write a special parser or compiler. Furthermore, the respective language compilers naturally check the contracts for well-formedness (type checking and name resolution) and produce a compiled form of the contracts as MSIL. Authoring contracts in Visual Studio allows programmers to take advantage of the standard intellisense provided by the language services.
·         Previous approaches based on .NET attributes fall far short as they neither provide an expressive enough medium, nor can they take advantage of compile-time checks.
·         tools: 1) cccheck, a static checker that verifies contracts at compile-time. 2) ccrewrite, for generating runtime checking from the contracts. The plan is to add further tools for Automatic API documentation generation & Intellisense integration
·         Contracts are expressed using static method calls at method entries. Tools take care to interpret these declarative contracts in the right places. These methods are found in the System.Diagnostics.Contracts namespace.
o   Contract.Requires takes a boolean condition and expresses a pre-condition of the method. A pre-condition must be true on entry to the method. It is the caller's responsibility to make sure the pre-condition is met.
o   Contract.Ensures takes a boolean condition and expresses a post-condition of the method. A post-condition must be true at all normal exit points of the method. It is the implementation's responsibility that the post-condition is met.
·         Download - At the moment, only a release for VS2008 with an academic license is available here.
 The devlab version for VS2010 is coming soon. 
 
 
 

author: JoshReuben | posted @ Sunday, November 02, 2008 9:32 PM | Feedback (0)

Oslo Required!


  • Limitations in workflows requiring workarounds:
  • Designer is slow – make a coffee slow!
  • A setStateActivity is not the end of the Workflow’s current flow!
  • In binding, how do you access the properties of dependnecy property sub-properties?
  • a dependency property custom activity for receieveactivity binding must be inside the receiveactivity
  • In rule condition, you cannot reference a dependency property in a custom activity
  • a code activity cannot work on an adjacent activity without explicitly specifying its name
  • declarative rules condition editor doesn’t support LINQ eg .contains
  • declarative rules condition editor doesn’t support constants
  • Copying an activity does not copy name for modification
  • No data flow, only control flow à requires a lot of code and tree traversal
  • No support for SendActivity custom WCF bindings
  • Cannot use a DelayActivity from inside an EventDrivenActivity
  • Instance versioning
  • RuleActionTrackingEvent only works with PolicyActivity, not IfElseActivity

author: JoshReuben | posted @ Friday, October 24, 2008 7:07 PM | Feedback (0)

10 posts in 1


  Imagine that every project in codeplex had to be decomposed into codeplex hosted unit tested functions which complied to some xunit standard - that would go along way towards code reuse, and move away from reinventing the wheel.

  worked on 2 Silverlight projects recently - see screenshots:

and

Last 3 months learned the following:

  • NET Components revise – reread Yuval Loweys book – revised best mechanism for interface usage,  delegates (circa .NET Framework 2.0), threading. The Interface stuff leads me to think upon the design patterns Decorator, Bridge and Proxy, and on the Unity AB IoC. The threading stuff is good – advises not to go low level – forget Monitor, Mutex, AutoResetEvent and just use Synchronization domains. Superseded by Pfx TPL and PLINQ  Task and Future<T>. the delegate stuff was written before lambda expressions and OOB delegate types.
  • WCF changes since NET Framework 3.5: Web Programming – exposing Atom Syndication feeds, non SOAP endpoints (POST, GET) , JSON objects for AJAX clients
  • ASPNET MVC – route engine is nice. I like the way Uris map to controller actions (whose return type is unit testable). Controllers then access models and load appropriate views. Seems to abandon the usage of classic ASP.NET server controls – a slight adoption problem.
  • Silverlight 2 – covered everything. Some considerable differences from WPF – no property inheritance, no 3D support, binding syntax is different. Thought about a distributed Silverlight / WCF Grid computing.
  • FxCop Rule creation – looked at the clunky introspection API (as opposed to reflection).
  • LINQ to Entities – ObjectContext instead of DataContext. Differences: M-M support in the EDM, object services, lazy vs eager loading. Some stuff on precompiled queries. IUpdatable as a complement to IQueryable
  • Unity AB – Inversion of control / dependency injection. Basically via config or on demand can register types and instances. Used in Prism
  • C# revise – always relearn something new (an anti-oxymoron!). yield statement, boxing
  • VSTO – nothing useful here. In C# every VBA wrapper method must pass 50 nulls !
  • Silverlight Services – how to host a .xap file on the mesh
  • Excel Services – interesting. web services for accessing Sharepoint hosted spreadsheets – display in WebParts. Why is this coupled with MOSS?
  • VLinq – an access like LINQ designer. I await OOB.
  • Read Book LINQ in Action – interesting section was on predefined delegate types - Function, Action, Predicate, ExpressionTree – will dig more here.
  • Book:ASP.NET Pageflakes  - Ajax and ASP.NET caching & membership optimizations for a portal like site. However, I am more interested in Silverlight – ASP.NET & AJAX are passé
  • Learn License theory – one day I will drive in Israel – even though everybody else is on the wrong side of the road.
  • Windows HPC – scale out Grid computing for embaressingly parallel algorithms (in compliance with Amdahl law) – especially for Financial Services – Distribute your Black-Scholes.
  • .NET Framework 3.5 sp1 changes – Asp.NET Dynamic Data , SQL Server 2008 DATE & FILSTREAM datatype support, WPF optimizations
  • Entity framework – EDM 3 xml files (CSDL, SSDL and MSL)  supports M-M scenarios, MEST. WCF support. ESQL differences from TSQL (notion of association vs relation). ObjectServices – looks like LINQ to Entities but is actually extension methods that call ESQL with ObjectParameter params.
  • Data Services Framework – like Workflow Services, this is an abstraction over WCF. Exposes Entity Framework via RESTful querystrings. Support for batching &concurrency
  • Prism – guidance for Modular WPF . AB utilizes Unity in Bootsrapper , modules use services and place views in regions. Would be nice if it leveraged some of the architecture of ASP.NET MVC.

I have been contemplating the concept of ratios and magnitude. the Pi ratio, the Riemann Sum and the definition of rational numbers vs irrational numbers.

Until PDC, .NET is in a plateau - while there are plenty of areas I would like to improve in (IIS 7, performance, debugging) I am taking a hiatus for 4 months from tech knowldege in order to dive down the mathematical rabbithole - calculus , linear algebra, probability and numerical methods.

author: JoshReuben | posted @ Tuesday, June 03, 2008 12:45 PM | Feedback (0)

Sharepoint Architecture


About 2 months ago, I had to evaluate MOSS / WSS - compare its features and capacity plan. I'm not sure i want I to go down the Sharepoint path - Its too restrictive, too meta-programmy , it doesnt give you OOTB AJAX enabled WebParts, and is a different path than Silverlight (maybe the next version will be SilverPoint ?). On the flipside, the concepts of pluggable WebParts, InfoPath, Excel Services and Sharepoint Workflows are appealing.

Basically, to learn sharepoint you have to read the following:

author: JoshReuben | posted @ Monday, March 03, 2008 5:34 AM | Feedback (1)

Knowledge accumulation


Read 3 good books this month:

1) the Emotion Machine - by Marvin Minsky - postulates how emotional states are a mechanism for changing the priority weighting of our cognitive machinary in regards to goals and tasks. A fair bit of conjecture as opposed to experimentally backed up theorum, and some rehash of his previous work 'The Society of Mind'.

2) MultiAgent Systems - Wooldridge - design of distributed workflow services that have different goals and tasks, and often which must compete , communcate and coordinate. Alot of different ideas on modal logic, game theory and auctions. Alot of the communication protocols are redundant with WCF. strangely, no mention of probabibilistic models.

3) Programming the Universe - Seth Loyd - a fascinating introduction to the science of quantum computing. unlike PLINQ where parallel threads must run tasks that doent interfere with each others resources, quantum computing is actually optimized for massively parallel tasks that contain side effects! Eg searching for the 2 prime factors that make up a public encryption key. Postulates the universe may be a quntum computer computing itself - 'it from bit'.

author: JoshReuben | posted @ Friday, February 08, 2008 6:19 AM | Feedback (0)

Workflow services


·         Workflow services, new for 3.5, are services that are authored using workflows. Durable services are services that use a persistence provider to persist state information after an operation has completed.
·          The implementation of the service contract is handled through one or more ReceiveActivity activities, which are sequence activities that support either one-way or request/response message exchanges with a client. The client invokes operations through SendActivity activities, which are basic activities that support the same message exchange scenarios as the workflow service.
Workflow Services Samples
·         Calculator Client Sample - the client application that is used with the calculator state machine service.
·         Durable Service Sample - implement a basic calculator as a durable service.
·         Sequential Workflow Service Sample - create a workflow service by using a sequential workflow & create a service contract in place.
·         State Machine Workflow Service Sample - Demonstrates how to create a workflow service by using a state machine workflow. implement a basic calculator by using a state machine workflow.
·         Workflow First and Security Sample - Demonstrates the "workflow first" method of authoring services as well as security features within workflow services.
·         Conversations Sample - Demonstrates how a workflow service can have parallel conversations with multiple clients over the same contract.
·         Duplex Workflow Service Sample - Demonstrates how to perform asynchronous duplex communication between two communicating services. Also demonstrates how to perform localhost-to-workflow communication by using messages.
·         Workflow Service Utilities - Contains all the utilities that the other samples in this section use to manipulate the context and create the listener infrastructure for local services.

author: JoshReuben | posted @ Monday, January 21, 2008 6:14 AM | Feedback (0)

List of thought processes


Heres an interesting Wikipedia topic - http://en.wikipedia.org/wiki/List_of_thought_processes

This is a list of thinking styles, methods of thinking (thinking skills), and types of thought.

I wonder how many of these have been abstracted into algorithms and computational models?

I've read 2/3 of Norvig's AI book, and i can see that serious attempts have been made at problem solving, reasoning, planning, reasoning under uncertainty and learning. I've also looked into the problem of Attention.

 

author: JoshReuben | posted @ Sunday, January 13, 2008 6:25 AM | Feedback (0)

SSAS DM Algorithms


·         DM algorithms that come with SSAS
o        Decision Trees Algorithm: uses the values, or states, of the designated “input columns” to predict the states of the column that was designated as “predictable”. It identifies the attribute tree that best predicts the result. allows for interplay between attributes and provides a hierarchy of attribute definitions that can be used to take a decision.
o        Clustering Algorithm: grouping of the cases that contain similar characteristics. Identifies how the data forms subgroups and how these subgroups are different from each other. finds patterns without a specific target result.
o        Naive Bayes Algorithm: Identifies the attribute that is most likely to predict the result. less computationally intense than others - useful for quickly generating a DMM to discover relationships between input columns and predictable columns. Use to do initial explorations of data, and then later apply the results to create additional DMMs with other algorithms that are more computationally intense and more sophisticated.
o        Association Algorithm: Association models are built on datasets that contain identifiers both for individual cases and item set that the cases contain. An association model is made up of a series of item sets and the rules that describe how those items are grouped together within the cases. The rules that the algorithm identifies can be used to predict a customer's likely future purchases, based on the items that already exist in the customer's shopping cart. It basically identifies the subgroup of data that participates in a specific transaction.
o        Sequence Clustering Algorithm:  Identifies the event that is likely to happen next. takes a sequence of events as input parameter and is well suited for click stream. similar to the Clustering Algorithm. However, instead of finding clusters of cases that contain similar attributes, this algorithm finds clusters of cases that contain similar paths in a sequence.  
o        Time Series Algorithm:  for predicting continuous columns such as product sales. While other Microsoft algorithms create models, time series model is based only on the trends that the algorithm derives from the original dataset to create a forecast model. It basically identifies the trends that are happening and predicting future from the current data.
o        Neural Network Algorithm:  Similar to the Decision Trees algorithm, this algorithm also Identifies attribute tree that best predicts the result, but involves more than 2 attributes analyzed at a time. probabilities for each possible state of the input attribute when given each state of the predictable attribute.
o        Logistic Regression Algorithm: a variation of the Neural Network algorithm, where the HIDDEN_NODE_RATIO parameter is set to 0. This setting will create a neural network model that does not contain a hidden layer, and that therefore is equivalent to logistic regression.
o        Linear Regression Algorithm: variation of the Decision Trees algorithm, where the MINIMUM_LEAF_CASES parameter is set to be greater than or equal to the total number of cases in the dataset that the algorithm uses to train.
·        
DM strategies - 2 main kinds of models: predictive & descriptive
  • Predictive Models -classification, regression, time series analysis, prediction. can be used to forecast explicit values, based on patterns determined from known results.
o        Classification algorithms - predict one or more discrete variables, based on the other attributes in the dataset. E.g. Decision Trees Algorithm.
o        Regression algorithms - predict one or more continuous variables, based on other attributes in the dataset. e.g. Regression Algorithm.
o        Time Series algorithms - forecast the patterns based on the current set of continuous predictable attributes. e.g. Time Series algorithm
o        Prediction - the estimation of future outcomes. works on continuous attribute set. Time Series and Decision Trees Algorithms.
  • Descriptive Models - clustering, summarization, association rules, sequence discovery. describe patterns in existing data, and are generally used to create meaningful subgroups such as demographic clusters.
o        Segmentation algorithms - divide data into groups, or clusters, of items that have similar properties. e.g. Clustering Algorithm.
o        Summarization algorithms - similar to clustering algorithm but instead of grouping the data, it would quantify the members of the group, such as group 1 has more number of line items available and it has most probability of occurring. e.g. Clustering Algorithm.
o        Association algorithms - find correlations between different attributes in a dataset. creating association rules, which can be used in a market basket analysis. e.g. Association Algorithm.
o        Sequence analysis algorithms - summarize frequent sequences or episodes in data, such as a Web path flow.  e.g. Sequence Clustering Algorithm.
·        
  •  Choosing the right algorithm to use for a specific business task - can be a challenge. While you can use different algorithms to perform the same business task, each algorithm produces a different result, and some algorithms can produce more than one type of result. E.g. you can use the Microsoft Decision Trees algorithm not only for prediction, but also as a way to reduce the number of columns in a dataset, because the decision tree can identify columns that do not affect the final DMM.
  • Combining algorithms - can use different algorithms to perform the same business task and each algorithm produces a different result. Lift charts check the accuracy of the DMMs once built on the input data. Use more than one algorithm to produce results and analyze the results for choosing the right one. Different algorithms produce different results. The choosing of the algorithms is based on the accuracy and on the business need. Use algorithms together – use some algorithms to explore data, and then use other algorithms to predict a specific outcome based on that data. E.g. you can use a clustering algorithm, which recognizes patterns, to break data into groups that are more or less homogeneous, and then use the results to create a better decision tree model. Use multiple algorithms within one solution to perform separate tasks - E.g. regression tree algorithm can be used to obtain financial forecasting information, and a rule-based algorithm to perform a market basket analysis.

  • The bottom line - Task & algorithms to use
o        Predicting a discrete attribute. E.g. to predict whether the recipient of a targeted mailing campaign will buy a product. use: Decision Trees , Naive Bayes , Clustering,
o        Neural Network
o        Predicting a continuous attribute. E.g. to forecast next year's sales. use: Decision Trees, Time Series
o        Predicting a sequence. E.g. to perform a clickstream analysis of a company's Web site. use: Sequence Clustering
o        Finding groups of common items in transactions. E.g. to use market basket analysis to suggest additional products to a customer for purchase. use: Association , Decision Trees
o        Finding groups of similar items. E.g. to segment demographic data into groups to better understand the relationships between attributes. use: Clustering , Sequence Clustering

author: JoshReuben | posted @ Monday, December 31, 2007 6:19 AM | Feedback (0)