Exercises

Targeted exercises

Integrating data using scipy.integrate

Exercise 0

The cumulative integral is the integral from the beginning of the data up to a given data point, plotted against the value of the last data point integrated. Write a code that plots the cumulative integral of a Gaussian function with an amplitude of 1, a center of 5, and a width of 1, from 0 to 10 with 100 points on the $x$-axis.

import numpy as np
from scipy.integrate import cumulative_trapezoid
from codechembook.quickPlots import quickScatter 

# Define x-axis
x = np.linspace(0, 10, 100)

# Define Gaussian function
amplitude = 1
center = 5
width = 1
gaussian = amplitude * (width * np.sqrt(2*np.pi))**-1 * np.exp(-((x - center)**2) / (2 * width**2))

# Compute cumulative integral
cum_int = cumulative_trapezoid(gaussian, x, initial=0)

# Plot cumulative integral

quickScatter(x, cum_int, xlabel= 'x-axis', ylabel = "cumulative y-values")

Generating test data to evaluate the accuracy of numerical analysis

Exercise 1

Write two functions, one that returns a Lorentzian distribution and one that returns a log-normal distribution. You can either write them yourself or import these functions from LMFIT now that you know how to find them in the library files (see Exercise X of Chapter 7). These functions should take as arguments the $x$ data points, the center, the sigma (width), and the amplitude. Generate four different $x$-data arrays spanning 0 to 10 with 10, 50, 100, 1000 points, respectively. For each, of these $x$-data arrays:

Generate the corresponding $y$-values for a mean of 5, a width of 0.5, and an amplitude of 1. Because you have two functions, you will have 8 $y$-value arrays.
Numerically integrate these 8 $y$-value arrays.
Make a log-log plot of the difference between the numerical solution and the known analytical solution (1, since they are normalized) as a function of the spacing between the points on $x$-value arrays.
Increase the width to 1, 2, and 4 and integrate each the same way over the same range. You will notice that the accuracy decreases substantially. Why?

import numpy as np
from scipy.integrate import trapezoid
from plotly.subplots import make_subplots

# Lorentzian distribution (normalized)
def lorentzian(x, center, sigma, amplitude):
    return amplitude * (sigma / np.pi) / ((x - center)**2 + sigma**2)

# Log-normal distribution (normalized by area)
def lognormal(x, center, sigma, amplitude):
    with np.errstate(divide='ignore'):
        return amplitude * (1 / (x * sigma * np.sqrt(2 * np.pi))) * np.exp(-((np.log(x) - np.log(center))**2) / (2 * sigma**2))

# x resolutions
N_points = [10, 50, 100, 1000]
widths = [0.5, 1, 2, 4]
center = 5
amplitude = 1

spacing = []
lorentzian_errors = {w: [] for w in widths}
lognormal_errors = {w: [] for w in widths}

for sigma in widths:
    for N in N_points:
        x = np.linspace(0, 10, N)
        dx = x[1] - x[0]
        spacing.append(dx) if sigma == 0.5 else None  # Only need to store once

        # Lorentzian
        y_lorentz = lorentzian(x, center, sigma, amplitude)
        integral_lorentz = trapezoid(y_lorentz, x)
        error_lorentz = abs(integral_lorentz - 1)
        lorentzian_errors[sigma].append(error_lorentz)

        # Log-normal (avoid division by zero at x = 0)
        x_ln = x[x > 0]
        y_logn = lognormal(x_ln, center, sigma, amplitude)
        integral_logn = trapezoid(y_logn, x_ln)
        error_logn = abs(integral_logn - 1)
        lognormal_errors[sigma].append(error_logn)


fig = make_subplots()

for sigma in widths:
    if sigma == 0.5:
        style = "solid"
    elif sigma == 1:
        style = "longdash"
    elif sigma == 2:
        style = "dash"
    elif sigma == 4:
        style = "dot"
    fig.add_scatter(x=spacing, y=lorentzian_errors[sigma], mode='lines+markers',
                    name=f'Lorentzian σ={sigma}', 
                    showlegend  = False,
                    line = dict(color = "darkcyan", dash = style),
                    marker = dict(color = "darkcyan"))
    fig.add_scatter(x=spacing, y=lognormal_errors[sigma], mode='lines+markers',
                    name=f'Log-normal σ={sigma}',
                    showlegend = False,
                    line = dict(color = "darkred", dash = style),
                    marker = dict(color = "darkred"))
    
    
    fig.add_annotation(text = f'Lorentzian σ={sigma}', showarrow = False,
                       x = np.log10(spacing[0]+0.05), y = np.log10(lorentzian_errors[sigma][0]),
                       xanchor = "left", yanchor = "middle",
                       font = dict(color = "darkcyan")
                       )
    fig.add_annotation(text = f'Log-normal σ={sigma}', showarrow = False,
                       x = np.log10(spacing[0]+0.05), y = np.log10(lognormal_errors[sigma][0]),
                       xanchor = "left", yanchor = "middle",
                       font = dict(color = "darkred")
                       )
    

fig.update_layout(
    title="Log-Log Plot of Integration Error vs Spacing",
    xaxis_title="dx (spacing between points)",
    yaxis_title="|Numerical Integral − 1|",
    xaxis_type='log',
    yaxis_type='log',
)
fig.show("png")

As the width increases, the distribution becomes broader and flatter. If you’re integrating over a fixed range (0 to 10), a broader function may extend outside that range significantly, especially for distributions with heavy tails (like Lorentzian and log-normal.)

Creating customized versions of existing functions using wrappers

Exercise 2

Write a wrapper that combines quickReadCSV() and quickOpenFilename() from codechembook.quickTools so that you can quickly read a csv file using a dialog box to identify the file.

def quickOpenReadCSV(title="Select file to open", initialpath='.', filetypes='All files, *.*',
cols = None, delimiter = ",", skip_header = 1):

	file = quickOpenFilename(title=title, initialpath=initialpath, filetypes=filetypes)
	
	
	return quickReadCSV(file, cols = cols, delimeter = delimiter, skip_header = skipheader)

Simplifying the processing of iterable objects using comprehensions

Exercise 4

Write a list comprehension that accomplishes each of the following tasks, in a single line of code:

Add the suffix ‘_old’ to each name in a list of names (as strings). You could also use this approach to rename, for example, files in a list of files.
Cerate a sine wave with amplitude 1 and 10 points spanning 0 to $2\pi$. Create a list of points that are positive or zero and a second list of the points that are negative.
Redo what you just did, but now change the value of the numbers failing the test to zero instead of omitting them from the list.
Create a new list that contains the three point moving average (the current point, the point prior, and the point after) of your list for positive values from part c. Your new list will have to have two fewer items in it than the original list. You might use this to attempt to smooth data—though we will address much ways to smooth data in the next chapter.
Calculate the slope between adjacent points for the same list. Your new list will have to have one less points in it than the original list. This might be used to take the numerical derivative of your data. This idea will be explored much more thoroughly in Chapter 13.

For each of these, invent some test data and use it to verify that your solution works

names = ["file1", "file2", "data"]
names_old = [name + "_old" for name in names]

import numpy as np
x = np.linspace(0, 2*np.pi, 10)
y = np.sin(x)
positive = [val for val in y if val >= 0]
negative = [val for val in y if val < 0]

positive_with_zeros = [val if val >= 0 else 0 for val in y]
negative_with_zeros = [val if val < 0 else 0 for val in y]

moving_avg = [(positive_with_zeros[i-1] + positive_with_zeros[i] + positive_with_zeros[i+1])/3
              for i in range(1, len(positive_with_zeros)-1)]

slopes = [(positive_with_zeros[i+1] - positive_with_zeros[i]) / (x[i+1] - x[i])
          for i in range(len(positive_with_zeros)-1)]

The answer you get will depend on the test data you invent.

Comprehensive exercises

Exercise 1

Sometimes what you want to know is a quantity, but what you actually measure is the derivative of that quantity with respect to something else (often time). Suppose you want to know the capacity of a battery. It is easy to measure the current drawn from the battery with respect to time (i.e., $\frac{dq}{dt}$), but much harder to measure the charge stored in or dissipated from a battery. So, instead, you can start with the current data and integrate it to get the total charge gained or lost:

$\Delta q = \int_0^t i(t′)dt′$

where $t$ is the time to which you want to integrate and $t′$ is the time in between the start of the integral and $t$. Suppose you have a new battery that you want to characterize. You want to determine its capacity, which you are defining as the charge that needs to be removed to reduce the battery’s voltage to 90% of its maximum. Your experiment gives you voltage versus time ($V(t)$) and current vs. time ($i(t)$).

Compute the cumulative integral of the data.
Find the first point at which $V(t)$ drops below 90% of its peak value.
Use this time to determine the value of the cumulative integral
Plot the current, voltage, and cumulative integral versus time. Make two subplots: the top one should have current versus time and voltage versus time, where voltage and current are on different $y$-axes (check the online Plotly documentation for this); on the bottom, plot the cumulative integral of the current. Make a vertical line on both plots at the time at which the voltage falls to 90%.

Exercise 2

Write a code where you can supply a lists of arbitrary numbers of initial and final integration limits and then the code will integrate data over these limits and output the results for each range.

import numpy as np
from scipy.integrate import trapezoid

# Function to perform multiple integrations
def integrate_multiple_ranges(start_limits, end_limits):
    if len(start_limits) != len(end_limits):
        raise ValueError("Start and end limits must have the same length.")
    
    results = []
    for a, b in zip(start_limits, end_limits):
        result = trapezoid(f, a, b)
        results.append({ # record the results as a dictionary, so we also have the ranges.
            'range': (a, b),
            'integral': result,
        })
    return results