Beginner in Python and Web Scraping – Looking for Feedback on My Script [closed]

Original title: Beginner in Python and Web Scraping – Looking for Feedback on My Script [closed] I’m a software engineering student currently doing an internship in the Business Intelligence area at a university. As part of a project, I decided to create a script that scrapes job postings from a website to later use in data analysis. Here’s my situation: I’m completely new to both Python and web scraping. I’ve been learning through documentation, tutorials, and by asking ChatGPT :’( . ...

August 29, 2025

Create unique index for each group PySpark

Original title: Create unique index for each group PySpark I am working with a relatively large dataframe (close to 1 billion rows) in PySpark. This dataframe is in “long” format, and I would like to have a unique index for each group defined by a groupBy over multiple columns. An example dataframe: +————–+——-+———+——+——+ |classification| id_1| id_2| t| y| +————–+——-+———+——+——+ | 1| person| Alice| 0.1| 0.247| | 1| person| Alice| 0.2| 0.249| | 1| person| Alice| 0.3| 0.255| | 0| animal| Jaguar| 0.1| 0.298| | 0| animal| Jaguar| 0.2| 0.305| | 0| animal| Jaguar| 0.3| 0.310| | 1| person| Chris| 0.1| 0.267| +————–+——-+———+——+——+ ...

August 29, 2025

Import prints, even when stdout and stderr are suppressed

Original title: Import prints, even when stdout and stderr are suppressed I have a module that I need to use. However, when it is imported, it helpfully prints out some status information. import problematic_module Connected to local cache! Detected version 1.0.0! I want to suppress this, so a CLI tool does not have unnecessary cruft surrounding the output. There are many questions about how to suppress console output, and a popular solution is something like this.: import contextlib import os import sys ...

August 29, 2025

CtkButton doesn't load images correctly

Original title: CtkButton doesn’t load images correctly I have coded a python application for help me manage my works. For do this I need some buttons. I chose to use CustomTkinter because of the visual effect but mostly of the times the icons of the buttons doesn’t load. Here is my code: import customtkinter as ctk from PIL import Image import os from pathlib import Path import tkinter directory_base = Path.home() / “AppData” / “Local” / “TestApp” ...

August 29, 2025

Why does this Python function append a value to a list on every call, even with an empty argument [duplicate]

Original title: Why does this Python function append a value to a list on every call, even with an empty argument [duplicate] I’m encountering a bizarre bug where my function seems to be “remembering” previous calls. I’ve simplified the code to demonstrate the issue: def append_to_list(value, my_list=[]): my_list.append(value) return my_list First call works as expected result1 = append_to_list(1) print(result1) # Output: [1] Second call should return [2], but it doesn’t! result2 = append_to_list(2) print(result2) # Output: [1, 2] # Wait, what? Where did the 1 come from? ...

August 29, 2025

Python in a Docker Container refused connection to Postgres in a networked Docker Container

Original title: Python in a Docker Container refused connection to Postgres in a networked Docker Container I am trying to connect from a python script to a postgres database. Here’s my docker-compose: networks: backend: name: value_tracker_backend external: true services: db: build: context: ./sql dockerfile: db.dockerfile environment: POSTGRES_PASSWORD: postgres POSTGRES_DB: EntitiesAndValues ports: - “5431:5432” healthcheck: test: [“CMD-SHELL”, “pg_isready -U postgres -d EntitiesAndValues”] interval: 10s retries: 5 start_period: 30s timeout: 10s networks: - backend dataseeder: build: context: . dockerfile: dataseeder/dataseeder.dockerfile depends_on: db: condition: service_healthy restart: true networks: - backend ...

August 29, 2025

How to retrieve one data value from the result of a pandas DataFrame.groupby().mean()

Original title: How to retrieve one data value from the result of a pandas DataFrame.groupby().mean() Using Pandas 2.3.2 on Python 3.9.2 via JupyterLab. I’ve collected a bunch of thermal data from a thing. I’ve already collated that data into DataFrame chunks that look like this: zone data Setpoint 9 zone1 40.34347 40 13 zone1 40.07553 40 17 zone1 39.98359 40 21 zone1 40.06895 40 25 zone1 40.04465 40 .. … … … 952 zone4 109.91890 110 956 zone4 109.90520 110 960 zone4 110.00600 110 964 zone4 110.02160 110 968 zone4 109.94940 110 ...

August 29, 2025

Relating icons to the texts in PDF files [closed]

Original title: Relating icons to the texts in PDF files [closed] I have large number of PDF files that are generated from a system. The PDF files are about Customer surveys and document the yes, no, may be kind of responses and assigns traffic light icons like Red, Green and Orange which indicates criticality based on the response itself. These responses are also grouped by so called Dimensions and each response has some weightage based on which traffic lights are calculated at the Dimension level as well. These traffic lights along with the respones are unfortunately not stored in any database table but only generated by an application and its logic calculates Red, Green or Orange based on some configurable tables. So basically the result of calculation is only stored in the PDF file. These PDF files are shared with the Customers as counter feedback and having explanations of implication of their response and recommendations. So now I am confronted with thousands of PDF files containing Customer surveys with Dimensions as Section Headers and Respon ...

August 29, 2025

Why won't VSCode Smart Send my Python code line by line to the terminal as it used to?

Original title: Why won’t VSCode Smart Send my Python code line by line to the terminal as it used to? I generally just run my Python code line-by-line using the Python Smart Send feature on VSCode, but after I was away from work for a bit, it suddenly stopped working. Normally Shift + Enter or Right Click > Run Python > Run Selection/line in Terminal will run one chunk/line of code at a time directly in the integrated Python terminal. Now, it does absolutely nothing. No error message is sent, it’s just inert. Near as I can tell, this is related to the fact that the integrated terminal is opening as Powershell instead of a command prompt that automatically starts up a Python instance. I have tried setting terminal.integrated.defaultProfile.Windows to Command Prompt, but with no luck; it actually continues to open up a Powershell on first load and whenever I open a terminal without specifying that it should be a command line. Each new powershell terminal runs the following command when it is opened, if it’s any help to anyone in understanding things: (& C:\Users\REDACTED\AppData\Local\mi ...

August 29, 2025

calling a class object in an iterative way on python

Original title: calling a class object in an iterative way on python I made the next class obj = MyClass() fds=[‘a’,‘b’,‘c’] for i in fds: attribute_name = f"{i}" setattr(obj, attribute_name, [f"{i}"]) print(obj.i) I know that obj.i is gonna get me an error. What I want is something similar to: print(obj.a) print(obj.b) print(obj.c) which gives me: [‘a’] [‘b’] [‘c’] Is there a way to make it in an iterative way? That’s why my first attempt was to make something like obj.i inside the loop. I want to do this to check the data inside the list is what I was expecting or not. Probably instead of print is better to make a list with obj.a, obj.b and obj.c for that but I have the same problem, to append the data in an iterative way. ...

August 29, 2025