Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Nokogiri: History of a Gem (A valid and well-formed talk on an open-source success) - Mike Dalessio @flavorjones mike.daless.io / blog.flavorjon.es
- - Introduction
- - RubyForge, speakers first open-source contributions. Pat Nakojima and Arron Patterson
- - [Business Cat] 'climbing corporate ladder, can't get down'
- - FAQ
- - What is nokogirl?
- - a gem which depends on nokogiri
- - What is nokogiri?
- - A ruby API for XML SAX/DOM parsing
- - What does nokogiri mean?
- *- Speaker Hates XML, but he loves making painful things not painful
- - Nokogiri has been downloaded millions of times. (*amazeballs*) Rails has 27.6 million downloads, Formtastic has 1.7 million for comparison
- - If Rails is KISS (used to be bleeding edge, but now it's something your parents listen to)
- - Formtasitic is Come on Eileen (sort of catchy in the middle)
- - Nokogiri is Kelly Clarkson (there all the time, in the background)
- - Nokogiri is Japanese which is a play on saw (a tree of XML which needs cutting)
- - Why can't I install Nokogiri on OS X using macports.
- - long story
- - History
- - Why? History is important because if you don't know it you will repeat the bad bits of it
- - 2006
- - History of Ruby parsers
- - REXML is in the stdlib but it's /slow/ and doesn't fix broken mark-up
- - Hpricot is the de facto standard for XML/HTML parsing. Issues with poor API (easy to shoot self in foot)
- - USPOWERGEN
- - Pull request to add client-side cert to Mechanize. @tenderlove was open and kind and responsive. Eventually speaker got commit privs
- - 2008
- - Pharos
- - Scraping broken HTML with Mechanize
- - HPricot would crash all the time (if HTML was exactly 16k long *lol*)
- - Talking about the genesis of nokogiri
- - @tenderlove... "I've submitted those test cases as bugs for hpricot as well. I would have patched hpricot, but it is to hard for me to read."
- - Most significant words to an open-source maintainer - "How can I help out?"
- - DL bindings (dynamic language bindings)
- - DL is slow... really slow. We killed it and started writing a C extension to call libxml2 directly
- - Discovering and debugging libxml2's memory management
- - matching it to Ruby's GC world was really difficult
- - Management strategy
- - libxml frees from top down... but libxml2 has dictionaries and caches and string node merging
- - not a 1:1 C to Ruby match at all... what does my Ruby object point to?
- - huge comment to explain memory management logic (see slides for this -- it's a wall a text)
- - If you want to write and debug C extensions make sure you know:
- - valgrind to find unsafe memory ops and memory leaks
- - perftools.rb
- - API Design
- - The first bits of Nokogiri API was hpricot rip off.. had a shim layer to make it Hpricot compatible (soon discontinued)
- - First official release on 17 nov, 2008
- - Do not release on a DST weekend if you work at a trading company.. speaker spent most of the weekend fixing DST bugs at his job
- - Early Adopters
- - Loofah ne'e Dryopteris (@brynary)
- - Slidedown (@nakajima)
- - Merb (@ykatz) --- pushed hard for performance
- - Lots of people had opinions on benchmarks
- - Benchmarking was actually largely pointless, other than driving us to fix bottlenecks more quickly.
- *- Personal Tangent
- - introduced in 2008 introduced to Pivotal Labs by Pat Nakajima
- - "I can't imagine hiring someone that I didn't know through open source." - DHH
- - Pivotal still regularly receives applications without a link to a Github profile, or references to your personal work.
- - 2009
- - programming is rather thankless. u see your works become replaced by superior ones in a year. unable to run at all in a few more. -- Why The Lucky Stiff (twitter)
- - Why The Lucky Stiff vanishes.... reference to his tweet
- - The Dawn of JRuby
- - started getting some street cred as people started using it
- - most people tend to use MRI (room poll 80%)
- - In '09 JRuby didn't fully support the C extension API. This was a problem for nokogiri.
- - Speaker had a dream that would he would write nokogiri to run everywhere, his FFI phase
- - FFI is magic. Shout out to Wayne Meissner (@wmeissner) -- implementer of FFI... FFI is a gem which is like DL but actually working.
- - Took 3,049 of Ruby/FFI code to reproduce 4,150 lines of C code... kind of a smell
- - not much density, basically a full port of the C code
- - pain, "segfault-driven development" ... no compile time checking
- - crashes on JRuby... portable string handling is hard, JVM GCs strings out from under you
- - Author didn't want to write C code in Ruby
- - Much slower than direct C extension
- - Harassment-driven development
- - speaker got harassed at RubyConf '09 for a few hours about not being able to use nokogiri on google app engine because shared library loading isn't available there
- - FFI Lessons (check slides.. couldn't copy down.. speaker moving too quickly and low on time)
- - "Portability is for people who cannot write new problems" -- Linus Torvalds
- - Enter Sergio
- - GSOC (google summer of code) ... ported pieces to Java.
- - Flavour of the Zeitgeist
- - tried everything but native, had to go native and it worked <snipped for time>
- - Many platforms
- - windows
- - nobody has a build chain
- - nobody has libxml2
- - crosscompiled all these libraries into DLLs -- "fat binary gem"
- - 10 MiB versus 200 KiB
- - Luis Lavena (@luislavena) supports the windows build system pretty much single-handedly
- - jRuby
- - dependencies
- - same solutions, fat binary 2 MiB versus windows's 10 MiB
- - OS X
- - dependencies
- - inconsistent build platform
- - libxml2 2.9.X homebrew default... breaks css queries due to XML bug
- - solved by mini_portile
- *- "Mugatu" meme [I invented fat source gems! What have you done! Nothing!]
- - Call to Action
- - Help Nokogiri on Windows
- - Help people with open-source projects
- - Scratch your itch... make the world a better place... (and get to hang out with @tenderlove)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement