Extracting relevant information from data is crucial for all forms of learning. The information bottleneck (IB) method formalizes this, offering a mathematically precise and conceptually appealing framework for understanding learning phenomena. However the nonlinearity of the IB problem makes it computationally expensive and analytically intractable in general. Here we explore a few recent approaches towards making IB approaches practical. First, we derive a perturbation theory for the IB method and exactly characterize the limit of maximum relevant information per bit extracted from data. We test our results on synthetic probability distributions, finding good agreement with the exact numerical solution near the onset of learning. Next, we discuss earlier work on an alternative formulation of IB that replaces mutual information with entropy, which we call the deterministic information bottleneck. As suggested by its name, the solution turns out to be a deterministic encoder, or hard clustering, as opposed to the stochastic encoder that is optimal under IB. We show that IB and this approach perform similarly in terms of the IB cost function, but that IB significantly underperforms when measured by this modified objective. Finally, we turn to the question of characterizing optimal representations for supervised learning. We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of a desired predictive family. Empirically, DIB can be used to enforce a small generalization gap on downstream classifiers and to predict the generalization performance of neural networks.