Sunday, September 21, 2014

Data Supply Chains at DataWeek and API World 2014

I attended DataWeek and API World this year at the Hotel Kabuki in San Francisco.  The "data sector" is an emerging subset of the enterprise computing sector.  I first noticed it when I attended this conference last year.  The term "digital supply chain" already describes content creation in the media sector, so the data sector needs a special term for its own value-added process.  The best fit so far is "data supply chain."  Accenture defines the data supply chain and InfoWorld describes how to build one.  My best insights from this conference are in bold text.


Coding for apps, APIs, and SDKs is a new form of literacy that requires knowledge of interactive tools, formats, and standards.  It also requires knowledge of visualization concepts for the final presentation of data.  Poor use of visualization inhibits the progression of data up the DIKW pyramid to ultimate wisdom.  Mastering these skill sets is probably beyond the capabilities of those on the left-hand side of the IQ bell curve.  Data visualization is still relevant to less literate audiences as elites use it to channel their behavior.  One DataWeek speaker on visualization mentioned that "data trumps opinion" at Google and other highly literate enterprises.  Most humans don't think at that level.  Visualization does not have to enable critical thinking to be useful.  It is valuable as an elite tool for social management.  Control systems in a post-literate society will be very pretty to behold.

The keynote on graphing reiterated the importance of link density in assessing the value of data points and networks nodes.  Links form valuable patterns and fraud detection relies upon identifying links to outlier data.  Gartner's five graphs of the consumer web help visualize the value of linked data.  Some verticals are good early adopters of graph solutions.  SaaS providers should seek use cases in segments that must link several master data sets.  Pain points in visualization applications are connected to an enterprise's need to manage workflows that handle critical data volumes.

Money transfer could use more robust data supply chains.  Your local bank still has physical branches yet smartphones enable payment and money transfer independent of banks.  Your transaction history should be tied to your personal identity first and your banks' identity second.  I have attended enough fin-tech meetings in San Francisco to see the innovation coming out of unregulated startups, and regulated financial institutions are taking note.

Visualization best practices exist primarily in the works of Edward Tufte and Stephen Few.  Designers of enterprise dashboards, UIs, knowledge management pages, and data visualization products should read those geniuses.  They should also read my DataWeek 2013 write-up where I mentioned some really good sources for chart display factors.  There must be a market for cheap and simple business intelligence tools priced for SMBs.

John Musser from API Science presented ten reasons why developers hate your API.  Read the ten reasons yourself so I don't have to repeat them here.  These are the kinds of best practices that make attending the conference well worth my time.  Programmable Web is a huge API directory for developers.  Developers should pay attention to these factors because at some point their APIs will get traction if they get these things right.  Heavily-adopted APIs must then migrate from small freemium hosts to major cloud hosts and they will face all of the growing pains of an SMB that becomes a large enterprise.  Check out the developer pages of Facebook, GitHub, Google, Twitter, Apple, and Microsoft to see how industry leaders manage their API platforms.

My biggest discovery at DataWeek / API World is that multiple freemium platforms enable the creation of an entire data supply chain . . . free of charge.  That's right, folks.  Do some Google searches yourself to find the providers.  I have not seen a standard definition of an API life cycle but some common stages are emerging around development, deployment, management, and retirement.  All of this can be done at no cost to developers, at least until the data supply chain gains traction with other developers and users.  Converting a Google Doc or Microsoft Office file into an app or API establishes a minimum viable product.  Freemium translation platforms can also automatically build an API into an SDK.  They can even add speech recognition.  This opens up transformational possibilities for business domain experts who are not proficient in coding.  I am tempted to create a data supply chain for Alfidi Capital.  A valuable data supply chain reflects unique domain knowledge and deep master data sets.  Remember that Data.gov is a free source, ready for the taking.

Anya Stettler from Avalara had one of the best talks I've ever seen at a tech conference, hands down.  Her tips on documenting APIs walked through examples of technical references, code snippets, tutorials, and live interactive formats that keep developers excited about an API.  Check out her presentation on SlideShare, because it's too good to miss.  More speakers need to focus on action items with just enough of a soft sell to let us know their brand is a go-to source for expert services.  "Do this to be successful" is the kind of talk I like to hear.

Open data from governments is a free input to data supply chains.  Bay Area government agencies have held "datathons" encouraging citizens to construct visualization products from government data.  It's understandable that they have to farm out product development to the public if government agencies don't have the bench strength in data science to do it themselves.  The philosophy behind the Science Exchange's reproducibility initiative needs to make its way to government research.  Aspiring data analysts don't have to wait for the creation of a GitHub or Bitbucket for the analytics community; they can get right to work on DataSF.

I learned a new term during an excellent talk on API lifecycle management.  That term is "data scraping," the harvesting of proprietary data from a popular API.  Platform managers who implement protective measures against scraping will also deter legitimate developers from using an API.  There's always a tradeoff between usability and security.

The IoT talks were not as informative as I had expected because they were mostly disguised product pitches.  The platforms with the largest API ecosystems - Google and a few others - will be the defaults for IoT integration when devices are ready to connect to ERP systems.  That's why Google bought Nest.  Data scientist job descriptions in the future will include a lot more emphasis on machine learning and BRMS than they do today.  Mark my words.

Privacy thought leaders are on board with the Privacy by Design approach that allows tailoring for different regulatory regimes.  It won't be enough in an era of persistent surveillance but it's the thought that counts.  SaaS vendors know they need a new industry-wide seal of approval to reassure consumers that someone has their privacy in mind.  The new C-suite position of Chief Privacy Officer is the likely final resting place of a Peter Principle manager who can't perch anywhere else.  Any cloud-based strategy to protect privacy should start with strict internal controls on access to personal data, which would make profitability impossible because customer service reps would never get access.  Masking metadata is another technically feasible solution that will unravel when enterprises need to share opt-in data with third parties.  The US-EU Safe Harbor privacy principles predate any surveillance revelations, which is why they need strengthening.  Forget any illusions about monetizing an open source privacy policy generator; those are already free online.

IBM had a lot to say about analytics at this conference.  Their offerings at IBM Watson Analytics and IBM InfoSphere BigInsights look cool.  Big Data requires iterative and exploratory analytics in a whole new layer between the Hadoop back-ends and data processing front-ends.  This sounds like the old term "middleware" made new again to incorporate compilers and optimizers.  Analytic language must graduate from running on small systems because Big Data is so big.  Business domain experts can learn more about this at Big Data University because they need to close their language gaps with data scientists.

The potential end of Moore's Law has implications for data storage.  If data flow volume grows faster than data storage density, SaaS and PaaS cloud providers will have to spend major capex building out data centers.  I am doing some research on data center providers organized as REITs.  I first noticed them as potential hard assets in a hyperinflation-resistant portfolio, because they are really just inputs into a supply chain.  I now think they may be a growth opportunity by themselves as a pick-and-shovel play on the data sector.  Keep watching my blog for future discussions of data centers.

The final panel I attended was appropriately the venture investors' discussion of funding and acquisition for data and infrastructure startups.  If the emerging term "Infrastructure 2.0" for the data / API sector catches on, it must encompass apps, SDKs, and tools for visualization and analytics.  The VCs think they can make money at all levels of the tech stack but I have blogged many times that unaddressed needs in ERP links are probably the most lucrative markets.  I do not share their pessimism that open source business models are too hard to monetize.  After all, IBM seems to be doing just fine selling Hadoop-based solutions because they can address multiple vertical segments as a "horizontal" provider.  I did pay close attention to the mention of several parts of a data science pipeline:  data cleansing, feature engineering, collaboration, and modeling.  I now have some new buzzwords to throw around at the next data sector conference.

I had a blast exploring new developments in the data supply chain.  Rest assured that any Alfidi Capital effort to construct data manipulation products will not in any way collect any user information at all.  I never have to publish a privacy policy if I never ask anyone to hand over anything private.  Hot women are always welcome to send me their hot photos but those won't be going into any hypothetical data products either.  It is safe to say that any Alfidi Capital apps, APIs, or SDKs will be a big hit in the data sector thanks to my extreme genius.  I can't wait until next year's DataWeek / API World to see how the sector reacts to what I plan to launch.