Geeks With Blogs
Josh Reuben
Big Data keeps evolving. Stone Age Hadoop was alot of Java bolierplate for defining HDFS access, Mapper & Reducer. This was superceded by Bronze Age Spark, which provided a succint Scala unification of:ML pipelinesin-memory structured DataSets over RDDs via a SparkSession SQL APIDistributed Streams(Note: You can run such jobs easily in a dynamically ......

The Docker CLI commands actually encapsulate Rest calls to the host Docker Daemonhttps://docs.docker.c... daemon listens on unix:///var/run/docker.sock - which you can curl into: curl --unix-socket /var/run/docker.sock http:/containers/jsonIn theory, a Container app could be Cluster-enabled via startup ......

HashiCorp Consul https://www.consul.io/ provides an easy to use, multi-region Service Discovery / Health-Check + distributed config keyval store. If you have ever hacked away at coding ZooKeeper to support distributed systems, Consul's Agent based architecture requires an order of magnitude less effort to roll out.Here's my Consul QuickRef:CLI options:agent ......

Redis is an in-memory NoSQL store that can store 5 types of data: Strings, Lists, Sets, Sorted-Sets and HashMaps. It is a good choice for fast reads/writes in distributed systems for a relatively small dataset size that is memory-capacity bound.v2.8.x is supported by AWS Elasticache for a DevOps-free deploy, which supports replication strategy of 1 ......

The JVM (Java Virtual Machine) is a virtual "execution engine" instance that executes the bytecodes in Java class files on a CPU - Knowing how to tune its myriad flags affects how your application executes. JIT Compiler TuningHotspot Compilation Mechanism Selection-client - client compiler (C1) - begins compiling earlier -> optimize startup time. ......

When constructing Spark Machine Learning Pipelines - I find it really helpful to maintain a bird's eye view of the various transformers and estimators available.in a nutshell: fit trainingData (train a model), transform testData (predict with model)Transformer: DataFrame => DataFrameEstimator: DataFrame => TransformerTransformersToke... sentence ......

System Architecture patternsN-TierEvent-Driven - Mediator / BrokerMicrokernelMicroServi... - MVC / MVP / MVVMserver - RPC / Remoting / WS / SOA / RESTSpace-BasedSOA PatternsFoundational StructuralService Host - infraActive Service - worker thread for upstream pre-fetchTransactional ServiceWorkflowEdge ComponentQoS PatternsDecoupled Invocation ......

Overview Developed by Facebook HiveQL is a SQL-like framework for data warehousing on top of MapReduce over HDFS. converts SQL query into a series of jobs for execution on a Hadoop cluster. Organizes HDFS data into tables - attaching structure. Schema on Read Versus Schema on Write - doesn’t verify the data when it is loaded, but rather when a query ......

Big Data has a plethora of Data File Formats - its important to understand their strengths and weaknesses. Most explorers start out with some NoSQL exported JSON data. However, specialized data structures are required - because putting each blob of binary data into its own file just doesn’t scale across a distributed filesystem. TL/DR; Choose Parquet ......

As chief architect of an Ad-Tech startup, part of my role involves hiring and mentoring software engineering candidates. Our technology stack includes Scala, SBT, Akka, Spray, Spark + MLlib, AWS, Ecmascript 6, BeEF, Linux environment, Git, Docker, Bash, Kafka, ELK, NGinX, and as of today, Mesosphere. I can tell you, it ain't easy to find Scala Devs. ......

Copyright © JoshReuben | Powered by: GeeksWithBlogs.net