High ROI Python Patterns (Part 1)

Structured logging and automatic event capture

Jan 17, 2022

Logging is super difficult to get right for it to provide any value within an application. It’s usually easy to setup, but for it to actually be useful for the end-user, a developer and/or for debugging purposes is a completely different story. If you were to ask me why this is the case, I’d probably say because it’s up to the developer to accurately describe why something was logged and any context around the logged event.

Full disclosure: I’m *extremely* guilty of doing not-so-useful “logging” in almost all my public projects on Github.

structlog is one of those Python libraries that has changed my perspective of what logging should be. It forced me to think of logs as “events with context” as supposed to just lines of text prefixed by “[!]”, “[*]”, “[+]” like almost every Infosec tool in existence (Have no clue where this tradition started from, maybe Metasploit?).

Setting up logging with structlog can seem a bit daunting at first but that’s just the price to pay for the amount of flexibility it provides.

A while ago, I decided to enhance an existing tool for the sake of playing with structlog. I wanted the following out of my logging setup:

“Run-of-the-mill” logging to the console
Logging to a file (without the fancy colors structlog provides by default with console logging)
A different log format when “debug mode” gets enabled from the command line.

The structlog documentation provides examples of doing almost all of the above, so that wasn’t such a big deal.

However, I also found myself wanting to be able to query all of the emitted logs from the tool’s RESTful API. Obviously, I could just read the already emitted logs to the file but that would be hella ugly and prone to failure.

To do this “right” I needed to essentially duplicate each log (e.g. each call to log.msg()), transform it to JSON and capture it in a variable. The contents would then get returned via the HTTP API to any client calling the /logs endpoint.

After doing some more reading of the structlog documentation, turns out there’s JSONRenderer() processor which does transform each log to JSON. The problem was finding a way to “duplicate and capture” each log, more specifically capturing it without outputting duplicate log entries to the console. If you were to just attach another logging.StreamHandler() with JSONRenderer() in the ProcessorFormatter() chain, you would get duplicate entries to the console with every call to log.msg().

The answer to this problem was adding structlog.testing.LogCapture() in the structlog ProcessorFormatter() chain along with JSONRenderer(). As the module name implies structlog.testing.LogCapture() is meant for testing but gave me exactly what I wanted in my particular use case.

The key parts of this pattern are shown in the below snippet:

log = structlog.get_logger("myapp")

capturer = structlog.testing.LogCapture()

json_capture_formatter = structlog.stdlib.ProcessorFormatter(
    processors = [
        structlog.stdlib.ProcessorFormatter.remove_processors_meta,
        capturer,
        structlog.processors.JSONRenderer(),
    ]
)


jh = logging.StreamHandler()
jh.setFormatter(json_capture_formatter)
log.addHandler(jh)

log.info('this is an event', event_data='some good stuff', interface='127.0.0.1', hacked='everything')

Using this setup, whenever something gets logged the entry will automatically duplicate, turned into JSON (without getting outputted twice to the console) *AND* be available under capturer.entries. All you have to do know is return capturer.entries whenever someone calls your /logs API endpoint. You essentially get logging and queryable events “for free” by using this pattern which I find to be pretty elegant.

For a full working example of this structlog recipe with an API (using FastAPI), see the below Github Gist.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

Show hidden characters

	from logger import capturer
	from typing import Optional
	from fastapi import FastAPI

	app = FastAPI()

	@app.get("/logs")
	async def get_logs(event_name: Optional[str] = None):

	if not event_name:
	return capturer.entries

	if event_name:
	filter_expression = lambda l: l['event'] == event_name

	return list(
	filter(
	filter_expression,
	capturer.entries
	)
	)

view raw api.py hosted with ❤ by GitHub

Show hidden characters

	#! /usr/bin/env python3

	import logging
	import logging.handlers
	from time import sleep

	from logger import (
	log,
	plain_formatter,
	json_capture_formatter,
	capturer
	)

	from pprint import pprint

	# Log to a file as well as console
	fh = logging.handlers.WatchedFileHandler("myapp.log")
	fh.setFormatter(plain_formatter)
	log.addHandler(fh)

	# This will effectively duplicate all logs and it will save a copy of them in capturer.entries in JSON format
	jh = logging.StreamHandler()
	jh.setFormatter(json_capture_formatter)
	log.addHandler(jh)

	log.info('this is an event', event_data='some good stuff', interface='127.0.0.1', hacked='everything')

	pprint(capturer.entries)

	while True:
	sleep(5)

view raw app.py hosted with ❤ by GitHub

Show hidden characters

	import logging
	import logging.handlers
	import structlog

	capturer = structlog.testing.LogCapture()
	timestamper = structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M:%S")

	plain_formatter = structlog.stdlib.ProcessorFormatter(
	processors = [
	structlog.stdlib.ProcessorFormatter.remove_processors_meta,
	#structlog.stdlib.add_log_level,
	structlog.dev.ConsoleRenderer(colors=False),
	]
	)

	colored_formatter = structlog.stdlib.ProcessorFormatter(
	processors = [
	structlog.stdlib.ProcessorFormatter.remove_processors_meta,
	#structlog.stdlib.add_log_level,
	structlog.dev.ConsoleRenderer(colors=True, exception_formatter=structlog.dev.rich_traceback),
	]
	)

	debug_formatter = structlog.stdlib.ProcessorFormatter(
	processors = [
	structlog.stdlib.ProcessorFormatter.remove_processors_meta,
	structlog.stdlib.add_log_level,
	#structlog.stdlib.add_logger_name,
	structlog.dev.ConsoleRenderer(colors=True, exception_formatter=structlog.dev.rich_traceback),
	]
	)

	json_capture_formatter = structlog.stdlib.ProcessorFormatter(
	processors = [
	structlog.stdlib.ProcessorFormatter.remove_processors_meta,
	capturer,
	structlog.processors.JSONRenderer(),
	]
	)

	sh = logging.StreamHandler()
	sh.setFormatter(colored_formatter)

	myapp_logger = logging.getLogger("myapp")
	myapp_logger.setLevel(logging.INFO)
	myapp_logger.addHandler(sh)

	structlog.configure(
	processors=[
	#structlog.stdlib.filter_by_level,
	#structlog.stdlib.add_logger_name,
	#structlog.stdlib.add_log_level,
	structlog.stdlib.PositionalArgumentsFormatter(),
	timestamper,
	structlog.processors.StackInfoRenderer(),
	structlog.processors.format_exc_info,
	#structlog.processors.UnicodeDecoder(),
	structlog.stdlib.ProcessorFormatter.wrap_for_formatter
	],
	logger_factory=structlog.stdlib.LoggerFactory(),
	wrapper_class=structlog.stdlib.BoundLogger,
	cache_logger_on_first_use=True,
	)

	log = structlog.get_logger("myapp")

view raw logger.py hosted with ❤ by GitHub

Security through clusterf*ckery

Discussion about this post